# Code Challenge: Implement MotifEnumeration (reproduced below).

Input: Integers k and d, followed by a space-separated collection of strings Dna.
Output: All (k, d)-motifs in Dna.
```
MotifEnumeration(Dna, k, d)
    Patterns ← an empty set
    for each k-mer Pattern in Dna
        for each k-mer Pattern’ differing from Pattern by at most d mismatches
            if Pattern' appears in each string from Dna with at most d mismatches
                add Pattern' to Patterns
    remove duplicates from Patterns
    return Patterns
```

## Sample Input:
3 1
ATTTGGC TGCCTTA CGGTATC GAAAATT
## Sample Output:
ATA ATT GTT TTT

In [2]:
# helper functions from week 2

def HammingDistance(string_1, string_2):
    # find hamming distance by iterating over string
    hamming_distance = 0
    string_length = len(string_1)
    for i in range(string_length):
        if string_1[i] != string_2[i]:
            hamming_distance = hamming_distance + 1
    return hamming_distance

def Suffix(p):
    return p[1:]

def Neighbors(pattern, d):
    if d == 0:
        return [pattern]
    if len(pattern) == 1:
        return ["A", "C", "G", "T"]
    neighborhood = []
    suffixNeighbors = Neighbors(Suffix(pattern), d)
    for text in suffixNeighbors:
        if HammingDistance(Suffix(pattern), text) < d:
            for nucleotide in ["A", "C", "G", "T"]:
                neighborhood.append(nucleotide + text)
        else:
            neighborhood.append(pattern[0] + text)
    return neighborhood

In [50]:
def split_substrings(Dna, k):
    dna_array = Dna.split(" ")
    final = []
    for entry in dna_array:
        for i in range(len(entry)-k-1):
            final.append(entry[i:i+k])
    return final

# like hamming distance but unequal lengths
def Difference(kmer, dna):
    k = len(kmer)
    lowest = k
    for i in range(len(dna) - k + 1):
        if HammingDistance(kmer, dna[i:i+k]) < lowest:
            lowest = HammingDistance(kmer, dna[i:i+k])
    return lowest

    
def MotifEnumeration(Dna, k, d):
    patterns = []
    dna_susbtrings = split_substrings(Dna, k)
    for pattern in dna_susbtrings:
        # find each k-mer Pattern’ differing from Pattern by at most d mismatches
        differing_patterns = Neighbors(str(pattern), d)
        for kmer in differing_patterns:
            count = 0
            for dna in Dna.split(" "):
                if Difference(kmer, dna) <= d:
                    count = count + 1
            if count == len(Dna.split(" ")):
                patterns.append(kmer)
    # remove duplicates from patterns
    patterns = list(dict.fromkeys(patterns))
    return " ".join(map(str, patterns))

In [41]:
print(MotifEnumeration("ATTTGGC TGCCTTA CGGTATC GAAAATT",3,1))

ATA ATT GTT TTT


In [52]:
test_file="MotifEnumeration Test Files\dataset_30302_8 (4).txt"

with open(test_file, "r") as file:
    k_d = file.readline().strip().split(" ")
    k = int(k_d[0])
    d = int(k_d[1])
    dna = file.readline().strip()
    print(MotifEnumeration(dna, k, d))

AATCA AATCC AATCG AATCT
