Given a profile matrix Profile, we can evaluate the probability of every k-mer in a string Text and find a Profile-most probable k-mer in Text, i.e., a k-mer that was most likely to have been generated by Profile among all k-mers in Text. For example, ACGGGGATTACC is the Profile-most probable 12-mer in GGTACGGGGATTACCT. Indeed, every other 12-mer in this string has probability 0. In general, if there are multiple Profile-most probable k-mers in Text, then we select the first such k-mer occurring in Text.

# Profile-most Probable k-mer Problem: Find a Profile-most probable k-mer in a string.

Input: A string Text, an integer k, and a 4 × k matrix Profile.
Output: A Profile-most probable k-mer in Text.
Code Challenge: Solve the Profile-most Probable k-mer Problem.

# Sample Input:
ACCTGTTTATTGCCTAAGTTCCGAACAAACCCAATATAGCCCGAGGGCCT
5
0.2 0.2 0.3 0.2 0.3
0.4 0.3 0.1 0.5 0.1
0.3 0.3 0.5 0.2 0.4
0.1 0.2 0.1 0.1 0.2
# Sample Output:
CCGAG

In [1]:
def score(text, k, matrix):
    i = 0
    count = 0
    for char in text:
        if char == "A":
            count = count + matrix[0][i]
        if char == "C":
            count = count + matrix[1][i]
        if char == "G":
            count = count + matrix[2][i]
        if char == "T":
            count = count + matrix[3][i]
        i = i + 1
        if i >= k:
            i = 0
    return count

def ProfileMostProbable(text, k, matrix):
    max = 0
    probable = ""
    for i in range(len(text)-k):
        substr = text[i:i+k]
        if score(substr, k, matrix) > max:
            max = score(substr, k, matrix)
            probable  = substr
    return probable

In [2]:
text = "ACCTGTTTATTGCCTAAGTTCCGAACAAACCCAATATAGCCCGAGGGCCT"
k = 5
matrix = [[0.2, 0.2, 0.3, 0.2, 0.3],[0.4, 0.3, 0.1, 0.5,0.1],[0.3, 0.3,0.5, 0.2, 0.4],[0.1, 0.2, 0.1, 0.1, 0.2]]

print(ProfileMostProbable(text, k, matrix))


CCGAG


In [3]:
text = "TGGGGGATTAACTCGAACTGTTTTTTCTGGGCAGTTCCTAGAGCGCCAAGACCGTGATCCTGTCCAATGTGAGTAAGATATGCGGTGTAGAAACTTGGTTCGTGCGTTCGATGGTCAAGGCATGTAAATCGGAGTCGCCGACGGAACTAATAACGTAGTCCAAATCGAACGTCTCATGAGCATAGATGCATAACATCGACGGATTGCACCTCCAGGGGGAAGGTTCGGAAGCACATTGCGGTGAGAGTCTCACCAAAGAAGCTTCGGTACTGTTATCAGAGATCGAGCAGTCGACTGTAAGGGTGCGCAGTAGATCGGGCTCCATGTTTCGACCCAATAATTCCGTTGTTAGTTCAGCGTTATGGATGGCACGATCGTTTGGAAGCTAGACAAAGAACGATCTGACCCCACTGTCCTATAGATCTTGTGGCAAGCCATTTCCTAGGAAGCCAACTATATCTTTAGAGCAGAGACTTGAGCGTAAGTTTTTTCGGCTAACCTGTATTGTCAGCGCTATGCCTGCTAATGACAGAAGTTCGGGTGGGGACGGACATCTCGGCGGCATAGGGAATCCATCATACCGGTACACCCAGCCTCTCGGCTATTCGCACTTGTAGCACTCATTATGGGAGAAGGCCAAATGGGCCTTATCACTAACGCTGCCGCACGGAAGCACTCCAGTCATCTCAGGTGGTGCGTGTGCGATTTGAGTGATGCACGGCTACGACTGGGTTGACTTGTGTTACATATGAGATTAAACCAATGGTCATGAATGCGCGAGGTAGTGATCACGCGAAACATAACCGGTTCGGCACCCGAAGTTGCCTGATCGAGAACCACGGGACAGTGCAGGCCATCGGCCCAATCGTCAAGCAAATCTTAAGACAAATTAATAATACAAGGAAAACCAAGGGGATGCTCACAGTTCAACGTATGGAAGCTAAATTCGGTGGCGCAAGGGAATAGTCACTTTGCGAAGGCGACGCGAGAATAGGCGCTT"
k = 15
matrix = [[0.364, 0.152, 0.273, 0.212, 0.167, 0.182, 0.197, 0.258, 0.288, 0.197, 0.288, 0.318, 0.212, 0.212, 0.167],
[0.227, 0.242, 0.227, 0.303, 0.258, 0.333, 0.227, 0.303, 0.197, 0.273, 0.333, 0.212, 0.348, 0.136, 0.258],
[0.242, 0.318, 0.303, 0.212, 0.333, 0.242, 0.258, 0.227, 0.288, 0.273, 0.076, 0.182, 0.242, 0.364, 0.273],
[0.167, 0.288, 0.197, 0.273, 0.242, 0.242, 0.318, 0.212, 0.227, 0.258, 0.303, 0.288, 0.197, 0.288, 0.303]]

print(ProfileMostProbable(text, k, matrix))

ATGCGCGAGGTAGTG


# Code Challenge: Implement GreedyMotifSearch.

Input: Integers k and t, followed by a space-separated collection of strings Dna.
Output: A collection of strings BestMotifs resulting from applying GreedyMotifSearch(Dna, k, t). If at any step you find more than one Profile-most probable k-mer in a given string, use the one occurring first.

## Sample Input:
3 5
GGCGTTCAGGCA AAGAATCAGTCA CAAGGAGTTCGC CACGTCAATCAC CAATAATATTCG
## Sample Output:
CAG CAG CAA CAA CAA

```
GreedyMotifSearch(Dna, k, t)
    BestMotifs ← motif matrix formed by first k-mers in each string from Dna
    for each k-mer Motif in the first string from Dna
        Motif1 ← Motif
        for i = 2 to t
            form Profile from motifs Motif1, …, Motifi - 1
            Motifi ← Profile-most probable k-mer in the i-th string in Dna
        Motifs ← (Motif1, …, Motift)
        if Score(Motifs) < Score(BestMotifs)
            BestMotifs ← Motifs
    return BestMotifs
```