## Rolling dice to find motifs

We will now turn to randomized algorithms that flip coins and roll dice in order to search for motifs. Making random algorithmic decisions may sound like a disastrous idea; just imagine a chess game in which every move would be decided by rolling a die. However, an 18th Century French mathematician and naturalist, Comte de Buffon, first proved that randomized algorithms are useful by randomly dropping needles onto parallel strips of wood and using the results of this experiment to accurately approximate the constant π. 

Randomized algorithms may be nonintuitive because they lack the control of traditional algorithms. Some randomized algorithms are Las Vegas algorithms, which deliver solutions that are guaranteed to be exact, despite the fact that they rely on making random decisions. Yet most randomized algorithms, including the motif finding algorithms that we will consider in this chapter, are Monte Carlo algorithms. These algorithms are not guaranteed to return exact solutions, but they do quickly find approximate solutions. Because of their speed, they can be run many times, allowing us to choose the best approximation from thousands of runs.

We previously defined Profile(Motifs) as the profile matrix constructed from a collection of k-mers Motifs in Dna. Now, given a collection of strings Dna and an arbitrary 4 x k matrix Profile, we define Motifs(Profile, Dna) as the collection of k-mers formed by the Profile-most probable k-mers in each string from Dna.

![Matrix](data/motifs_profile.png)

In general, we can begin from a collection of randomly chosen k-mers Motifs in Dna, construct Profile(Motifs), and use this profile to generate a new collection of k-mers: Motifs(Profile(Motifs), Dna).

Why would we do this? Because our hope is that Motifs(Profile(Motifs), Dna) has a better score than the original collection of k-mers Motifs. We can then form the profile matrix of these k-mers, Profile(Motifs(Profile(Motifs), Dna)). 

And use it to form the most probable k-mers, Motifs(Profile(Motifs(Profile(Motifs), Dna)), Dna).

for as long as the score of the constructed motifs keeps improving, which is exactly what RandomizedMotifSearch does. To implement this algorithm, you will need to randomly select the initial collection of k-mers that form the motif matrix Motifs.

![Randomized](data/randomized.png)


In [2]:
import numpy as np

#Return another motifs from profile matrix
def ProfileMostProbable(dna, k, profileMatrixRows):
	hProb = 0.0
	tempProb = 1.0
	patternHighProb = ""

	for i in range(len(dna) - k + 1):
		pattern = dna[i: i+k]
		for s in range(len(pattern)):
			if pattern[s] == 'T':
				tempProb *= float(profileMatrixRows[3][s])
			elif pattern[s] == 'G':
				tempProb *= float(profileMatrixRows[2][s])
			elif pattern[s] == 'C':
				tempProb *= float(profileMatrixRows[1][s])
			elif pattern[s] == 'A':
				tempProb *= float(profileMatrixRows[0][s])

		if tempProb > hProb:
			hProb = tempProb
			patternHighProb = pattern
		tempProb = 1.0

	return patternHighProb

#Make profile matrix
def Profile(motifs, k):
	matrix = np.zeros(shape=(4, k))

	for motif in motifs:
		for i in range(len(motif)):
			if motif[i] == 'A':
				matrix[0][i] += 1
			elif motif[i] == 'C':
				matrix[1][i] += 1
			elif motif[i] == 'G':
				matrix[2][i] += 1
			else:
				matrix[3][i] += 1

	return matrix / float(len(motifs))

#Return the score of a motif
def Score(motifs):
	matrix = np.zeros(shape=(4, len(motifs[0])))

	for motif in motifs:
		for i in range(len(motif)):
			if motif[i] == 'A':
				matrix[0][i] += 1
			elif motif[i] == 'C':
				matrix[1][i] += 1
			elif motif[i] == 'G':
				matrix[2][i] += 1
			else:
				matrix[3][i] += 1

	maxi = np.max(matrix,axis=0)
	score = len(motifs) - maxi 
	score = np.sum(score)

	return score