Code Challenge: Implement MotifEnumeration (reproduced below).
     Input: Integers k and d, followed by a collection of strings Dna.
     Output: All (k, d)-motifs in Dna.


    MotifEnumeration(Dna, k, d)
        Patterns ← an empty set
        for each k-mer Pattern in Dna
            for each k-mer Pattern’ differing from Pattern by at most d mismatches
                if Pattern' appears in each string from Dna with at most d mismatches
                    add Pattern' to Patterns
        remove duplicates from Patterns
        return Patterns

Sample Input:

    3 1
    ATTTGGC
    TGCCTTA
    CGGTATC
    GAAAATT
Sample Output:

    ATA ATT GTT TTT

In [None]:
# Write your MotifEnumeration() function here along with any subroutines you need.
# This function should return a list of strings.
def MotifEnumeration(dna, k, d):
    Patterns = set()
    kmer = []
    for i in range(len(dna[0]) - k + 1):
        kmer.append(dna[0][i:i + k])
    for pattern in kmer:
        neighbors = Neighbors(pattern, d)
        for pattern2 in neighbors:
            if checkSubstring(dna, pattern2, d):
                Patterns.add(pattern2)
    return list(Patterns)


def checkSubstring(dna, pattern2, d):
    for i in range(1, len(dna)):
        if not checkdnaString(dna[i], pattern2, d):
            return False
    return True


def checkdnaString(dnastring, pattern2, d):
    for j in range(len(dnastring) - len(pattern2) + 1):
        if HammingDistance(pattern2, dnastring[j:j + len(pattern2)]) <= d:
            return True
    return False


def Neighbors(Pattern, d):
    if d == 0:
        return {Pattern}
    if len(Pattern) == 1:
        return {"A", "C", "G", "T"}
    Neighborhood = set()
    SuffixNeighbors = Neighbors(Pattern[1:], d)
    for Text in SuffixNeighbors:
        if HammingDistance(Pattern[1:], Text) < d:
            for x in ["A", "C", "T", "G"]:
                Neighborhood.add(x + Text)
        else:
            Neighborhood.add(Pattern[0] + Text)
    return Neighborhood


def HammingDistance(p, q):
    result = 0
    for i in range(len(p)):
        if p[i] != q[i]:
            result += 1
    return result

Code Challenge: Implement MedianString.
     Input: An integer k, followed by a collection of strings Dna.
     Output: A k-mer Pattern that minimizes d(Pattern, Dna) among all k-mers Pattern. (If there are multiple such strings Pattern,
     then you may return any one.)

Sample Input:

    3
    AAATTGACGCAT
    GACGACCACGTT
    CGTCAGCGCCTG
    GCTGAGCACCGG
    AGTACGGGACAG
Sample Output:

    ACG

In [21]:
# Write your MedianString() function here, along with any subroutines that you need.
# You should return your answer as a string.
def MedianString(dna, k):
    distance = 2 ** 31
    kmer = generate_kmer(k)
    for pattern in kmer:
        temp = DistanceBetweenPatternAndStrings(pattern, dna)
        if distance > temp:
            distance = temp
            median = pattern
    return median


def generate_kmer(k):
    result = ["A", "C", "G", "T"]
    for i in range(1, k):
        result2 = []
        for kmer in result:
            result2.append(kmer + "A")
            result2.append(kmer + "C")
            result2.append(kmer + "G")
            result2.append(kmer + "T")
        result = result2
    return result


def DistanceBetweenPatternAndStrings(pattern, dna):
    k = len(pattern)
    distance = 0
    for Text in dna:
        hammingDistance = 2 ** 31
        for i in range(0, len(Text) - k + 1):
            if hammingDistance > HammingDistance(pattern, Text[i:i+k]):
                hammingDistance = HammingDistance(pattern, Text[i:i+k])
        distance = distance + hammingDistance
    return distance


def HammingDistance(p, q):
    result = 0
    for i in range(len(p)):
        if p[i] != q[i]:
            result += 1
    return result

Profile-most Probable k-mer Problem: Find a Profile-most probable k-mer in a string.
     Input: A string Text, an integer k, and a 4 × k matrix Profile.
     Output: A Profile-most probable k-mer in Text.

Code Challenge: Solve the Profile-most Probable k-mer Problem.

Sample Input:

    ACCTGTTTATTGCCTAAGTTCCGAACAAACCCAATATAGCCCGAGGGCCT
    5
    0.2 0.2 0.3 0.2 0.3
    0.4 0.3 0.1 0.5 0.1
    0.3 0.3 0.5 0.2 0.4
    0.1 0.2 0.1 0.1 0.2
Sample Output:

    CCGAG

In [None]:
# Write your ProfileMostProbableKmer() function here along with any subroutines you need.
# The profile matrix assumes that the first row corresponds to A, the second corresponds to C,
# the third corresponds to G, and the fourth corresponds to T.
# You should represent the profile matrix as a dictionary whose keys are 'A', 'C', 'G', and 'T' and whose values are lists of floats
def ProfileMostProbableKmer(text, k, profile):
    max_prob = 0
    best_result = ""
    for i in range(len(text) - k + 1):
        prob = 0
        for j in range(k):
            prob += profile[text[i + j]][j]
        if prob >= max_prob:
            best_result = text[i:i + k]
            max_prob = prob
    return best_result