## Course 1 - Finding Hidden Messages in DNA  

#### The course deals with following topics:
  1.1 Finding/counting patterns in DNA string, accounting for limited mismatches/reverse complement  
  1.2 Finding a shared motif from a set of DNA strings, allowing for mismatches  
  	- Greedy search, Monte Carlo and Gibbs Sampling  
  1.3. Final project: Finding transcription factor binding motif for MTB genes upregulated in hypoxia.

### Frequent Word Problem
Solve the Frequent Words Problem.
  * Input: A string Text and an integer k.
  * Output: All most frequent k-mers in Text.

In [102]:
def FrequentWords(Text, k):
    PatternsInText = []
    NumPatterns = []
    for i in range(len(Text)-k+1):
        Found = 0
        if Text[i:i+k] in PatternsInText:
            j = PatternsInText.index(Text[i:i+k])
            NumPatterns[j] += 1
            Found = 1
        if Found == 0:
            PatternsInText.append(Text[i:i+k])
            NumPatterns.append(1)
        maxpos = [i for i in range(len(NumPatterns)) if NumPatterns[i]==max(NumPatterns)]
        FrequentKmer = [PatternsInText[i] for i in maxpos]
    return FrequentKmer

In [103]:
fnm = 'Input_FrequentWords.txt'
with open('Course1_Data/'+fnm,'r',encoding='utf-8') as file:
    Text = file.readline().strip()
    k = int(file.readline().strip())
FrequentKmer = FrequentWords(Text, k)

print(f'Text = {Text}\n')
print(f'Pattern = {k}\n')
print('Words = '+' '.join(FrequentKmer))

Text = TTGCGCCTAAATGATCAATGATCTTGCGCCTAAATGATCGACTGTAGTTGCGCCTACGGAGATTTGCGCCTAGACTGTAGCGGAGATGACTGTAGACTCATTGAGACTGTAGACTCATTGATTGCGCCTACGGAGATGACTGTAGCGGAGATTTGCGCCTATTGCGCCTAAATGATCAATGATCGACTGTAGAATGATCCGGAGATCGGAGATCGGAGATGACTGTAGCGGAGATACTCATTGAAATGATCACTCATTGATTGCGCCTACGGAGATGACTGTAGCGGAGATACTCATTGACGGAGATTTGCGCCTAACTCATTGACGGAGATGACTGTAGGACTGTAGGACTGTAGTTGCGCCTAAATGATCGACTGTAGTTGCGCCTAACTCATTGAGACTGTAGTTGCGCCTAACTCATTGAGACTGTAGGACTGTAGACTCATTGAACTCATTGAAATGATCAATGATCCGGAGATAATGATCTTGCGCCTAAATGATCAATGATCCGGAGATCGGAGATAATGATCCGGAGATCGGAGATAATGATCGACTGTAGTTGCGCCTACGGAGATAATGATCACTCATTGACGGAGATAATGATCCGGAGATTTGCGCCTAGACTGTAGACTCATTGAGACTGTAGGACTGTAGTTGCGCCTAGACTGTAGTTGCGCCTATTGCGCCTAACTCATTGAACTCATTGATTGCGCCTACGGAGATGACTGTAGCGGAGATCGGAGATCGGAGATCGGAGATTTGCGCCTATTGCGCCTATTGCGCCTACGGAGATCGGAGATTTGCGCCTATTGCGCCTAGACTGTAGGACTGTAGACTCATTGAACTCATTGATTGCGCCTACGGAGAT

Pattern = 14

Words = CGGAGATCGGAGAT


### Pattern Matching Problem.

  * Input: Two strings, Pattern and Genome.
  * Output: A collection of space-separated integers specifying all starting positions where Pattern appears as a substring of Genome.


In [104]:
def PatternMatching(Pattern, Genome):
    position = []
    for j in range(len(Genome)-len(Pattern)+1):
        if Genome[j:j+len(Pattern)] == Pattern:
            position.append(j)
    return positions

In [105]:
fnm = 'Input_PatternMatching.txt'
with open('Course1_Data/'+fnm,'r',encoding='utf-8') as file:
    Pattern = file.readline().strip()
    Genome = file.readline().strip()
positions = PatternMatching(Pattern, Genome)

print(f'Pattern = {Pattern}\n')
print(f'Genome = {Genome}\n')
print('Positions = '+' '.join([str(x) for x in positions]))

Pattern = AGACTCCAG

Genome = ACCAGACTCCTACAGACTCCGCATAGACTCCGGAGACTCCTAGACTCCAGACTCCAGACTCCGGGAGACTCCCTAGAGACTCCGCAAGACTCCAGACTCCAGAGACTCCAAAGACTCCAGACTCCGAGACTCCAGACTCCTAGACTCCCGAAGACTCCGAGACTCCCGAGACTCCGGATGTGAGACTCCAGACTCCTAAGACTCCAGACTCCATAAGACTCCAGACTCCGAGACTCCCAAGACTCCAGAGCTGCAGACTCCTATAGACTCCAGAGCCAGACTCCTAGACTCCAGACTCCTTGAGAGACTCCCCGTTCGAGACTCCAGACTCCAATAGAAACAGACTCCGTGAGCAGACTCCAAGTACAGACTCCCCGGCAGACTCCAGACTCCAGACTCCTAGACTCCAGACTCCAGACTCCTAGACTCCAAGACTCCTGTCAGACTCCCGCGGGAGACTCCGTAAGTCTCAGTCTTAGACTCCAAGACTCCAAGTTTATACATAGACTCCCAGACTCCAGACTCCCAGACTCCAGTTAGACTCCTAGACTCCTAAGACTCCTAGACTCCGATGCAGACTCCATATAGACTCCGCAGACTCCAGGGGGTCAGTTTCTTGTAGACTCCTAGGGAAGACTCCAGACTCCTTGAGACTCCTAAGACTCCGCGAGACTCCCAGACTCCAGACTCCAGACTCCAGACTCCACCGAGACTCCAGACTCCAGACTCCAGACTCCCTCACAGACTCCAGACTCCCCTAGACTCCAGAGACTCCATAGACTCCCAAGACTCCAGACTCCGCAGACTCCTGTAGACTCCGAGACTCCAGAGACTCCCGAGACTCCGTCAGACTCCAGACTCCAGACTCCTAGACTCCAGACTCCAGACTCCAGACTCCCAAGACTCCCATTAGACTCCTGAGACTCCCTAGACTCCGAGACTCCAGACTCCTGTCAGACTCCTCGGCAAG

### Clump Finding Problem: Find patterns forming clumps in a string.

  * Input: A string Genome, and integers k, L, and t.
  * Output: All distinct k-mers forming (L, t)-clumps in Genome.
  
(L, t)-clump refers to the k-mer appearing at least t times in a window of length L in the genome.


To solve this problem, for every possible k-mer we find all the positions in the genome it appears.  
Then for each k-mer, we test if there is a window of L where the k-mer appears at least t times.  
To do this efficiently, we convert the k-mers into base-10 number, and then convert them back at the end. 

In [106]:
Alphabets = {'A':0,'C':1,'G':2,'T':3}

def PatternToNumber(Pattern):
    import numpy as np
    Pattern = list(Pattern)
    Pattern.reverse()
    Base4 = [Alphabets[i] for i in Pattern]
    FourToTen = [4**i for i in range(len(Pattern))]
    Number = sum(np.array(Base4)*np.array(FourToTen))
    return Number

def NumberToPattern(Base10, k):
    Alphabets = {0:'A',1:'C',2:'G',3:'T'}
    Base4 = []
    Pattern = []
    for i in range(k):
        Base4.insert(0,Base10%4)
        Base10 = Base10//4
        Pattern.insert(0,Alphabets[Base4[0]])
    return ''.join(Pattern)

def ClumpFind(Genome,k,L,t):
    
    def FindClump(positions,k,L,t):
        for i in range(max(0,len(positions)-t+1)):
            if positions[i+t-1]-positions[i] < L-k+1:
                return True
        return False
    
    PatternsPos = [[] for i in range(4**k)]
    kMers = []
    for i in range(len(Genome)-k+1):
        PatternsPos[PatternToNumber(Genome[i:i+k])].append(i)
    for i in range(len(PatternsPos)):
        if FindClump(PatternsPos[i],k,L,t):
            kMers.append(i)
    kMers = [NumberToPattern(i,k) for i in kMers]
    return kMers

In [107]:
fnm = 'Input_ClumpFinding.txt'
with open('Course1_Data/'+fnm,'r',encoding='utf-8') as f:
    Genome = f.readline().strip()
    k, L, t = [int(x) for x in f.readline().strip().split()]

kMers = ClumpFind(Genome,k,L,t)
print(f'Genome = {Genome}\n')
print(f'k = {k}, L = {L}, t = {t}')
print('Clumped Kmers = ' + ' '.join(kMers))

Genome = GCGGTTATGCACCGTTCAAATTAGCAAACCACTAAGCGACGTAGTCTGGATTGATTTCTCCCTACCAGTGACCCAAGACGCGTTAGTGAGTTAAGTTCATATCCAGTACCTGCCGCCCTCTGTACTTGGGCGTCCGATTCGCATGCTTACTCAGGTGGAGGACACGATAATCTGATTAAACTGAGCTAAACCAGGTGGAACCAGAAACCAGGTGGGGAGTCTCGCTTCAAGCCGTTCTTGCGATCAAACCAGGTGGTCCATTATGAAACCAGGTGGCTAAACCAGGTGGTCCAGATCCTCGAATGATGTCGGTGCACATCAAAACCAGGTGGGGTGGTGGAACGTAAAACCAGGTGGCATAAACCAGGTGGGCCGGTTCGTAAACCAGGTGAAACCAGGTGGGGTGGAAACCAGGTGGGTTACAAATTACGTTGAGATGGCCCAAACCAGGTGGTGGGCTTCACCCATGTCAACAAACCACCCTATGGAACTAAACCAGGTGGAACCAGGTGGTGAAGGCTTATCCTCAGGAAAAACCAGGTGGAGGTGGTGAAATAAAACCAGGTGGACCAGGTGGATAACCCTCGCCTCGCTTCTCAACCGAGACCTGGATAAACCAGGTGGGGTGGTCCACCGATTTTTGAGACACTAGAAACCAGGTGGGCGGGGAAACCAGGTGGCAAACCAGGTGGGGTGGACGGAAACCAGGTGGATATGTCATAAAACCAAACCAGGTGGTGCACCCCCATGGTGTGTCTTATCCGTGCGTATAAACCAGGTGGTCGCACGGCTTCCACTTGCTGAGAATAGGCCCGCAGGGTCAGTGCCATGCCCTCCGTCACTCGATATGTGTTGTAAGAGTGGTTACCCCTTCATTGAAGTCGCCCACAGCCCCACCTGCATTGCTAGACTATCACCCTACAGTAGGCCTTTTCGCCTTCTTCAAGCAGCAATCTCTTATCCGCGGATGGGCGCGGCGAGCGTGGCGTCC

### Frequent Words with Mismatches and Reverse Complements Problem: 
Find the most frequent k-mers (with mismatches and reverse complements) in a string.

  * Input: A DNA string Text as well as integers k and d.
  * Output: All k-mers Pattern maximizing the sum Count<sub>d</sub>(Text, Pattern)+ Count<sub>d</sub>(Text, Pattern<sub>rc</sub>) over all possible k-mers.

I solve this problem by first obtaining all patterns that are within the Hamming distance of d from any k-mer substring in the text. Then, for each of them, I count Count<sub>d</sub>(Text, Pattern)+ Count<sub>d</sub>(Text, Pattern<sub>rc</sub>) and find the maximum. 

In [108]:
def FrequentWithMismatch(Text, k, d):
    
    # Create a list of patterns that are length d 
    def SimilarPatterns(pattern, d):
        import itertools
        similars = []
        C = list(itertools.combinations(range(len(pattern)),d))
        for comb in C:
            similars = similars + ModifyPattern(list(pattern), list(comb), [])
        return list(set(similars))

    # Use recursion to modify patterns at given indices 
    def ModifyPattern(pattern, indices, patternslist):
        if len(indices)==0:
            patternslist.append(pattern.copy())
            return
        else:
            alphabets = ['A','C','G','T']
            for s in alphabets:
                pattern[indices[0]] = s   
                ind = indices.copy()
                del ind[0]
                ModifyPattern(pattern,ind,patternslist)
            return [''.join(s) for s in patternslist]
    
    # Obtain the reverse complmeent of the pattern
    def ReversePattern(pattern):
        Reverse_dict = {'A':'T','T':'A','G':'C','C':'G'}
        revpattern = [Reverse_dict[i] for i in list(pattern)]
        revpattern.reverse()
        return ''.join(revpattern)
    
    # Count the number of approximate matches of the pattern in the Text
    def ApproxMatching(Pattern, Text, d):
        
        def HammingDistance(seq1, seq2):
            x = 0
            for i in range(len(seq1)):
                if seq1[i] != seq2[i]:
                    x = x+1
            return x
        
        matches = []
        k = len(Pattern)
        for i in range(len(Text)-len(Pattern)+1):
            if HammingDistance(Text[i:i+k], Pattern) < (d + 1):
                matches.append(i)
        return matches

    AllPatterns = []
    matches = []
    # Get all patterns that are d away from any k-mer substring in the text
    for i in range(len(Text)-k+1):
        AllPatterns = AllPatterns + SimilarPatterns(Text[i:i+k], d)
    AllPatterns = list(set(AllPatterns))
    AllPatterns = [''.join(pattern) for pattern in AllPatterns]
    for pattern in AllPatterns:
        matches.append(len(ApproxMatching(pattern, Text, d))+len(ApproxMatching(ReversePattern(pattern),Text,d)))
    MaxInd = []
    MaxVal = max(matches)
    for i in range(len(matches)):
        if matches[i] == MaxVal:
            MaxInd.append(i)
    return [AllPatterns[i] for i in MaxInd]

In [109]:
fnm = 'Input_FrequentWordsMismatch.txt'
with open('Course1_Data/'+fnm,'r',encoding='utf-8') as f:
    Genome = f.readline().strip()
    k, d = [int(x) for x in f.readline().strip().split()]

kMers = FrequentWithMismatch(Genome,k,d)

print(f'Genome = {Genome}\n')
print(f'k = {k}, d = {d}')
print('kMers with most frequent approximate matching = ' + ' '.join(kMers))

Genome = TAACATCATTTAATAATTTTTTAATAAAGTTTTAAAGAGCATTTTTTAATAAAGTAATAATAATTAACATTAACATAAAACATAGCATCATTTAGTAAAAAATAATAAAGAAAGCATAGTTTTTAAAGAGAAAAAGTAATAATTAGCATAGCATCATAGTTCATAGTAATTTAATAATAATAACATAACATTAACATCATTAAAATTTTAGTAATAAAACAT

k = 5, d = 2
kMers with most frequent approximate matching = ATTAT ATAAT


### Motif Search Problem

Given a set of t DNA strings, we want to find the most likely motif shared by all strings, allowing for mismatches. The optimization function is the sum of hamming distance of all of the actual motifs from the consensus motif. The consensus motif is determined by the profile matrix constructed from the motifs in all of the strings. The challenge here is that the choice of the motifs itself changes this probability. Thus, we need to use an EM algorithm to iteratively find the solution.

We approach this problem in two different ways: Greedy search and Randomized search (Monte Carlo). For the randomized search, we also test the Gibbs sampling method.

#### Shared utilities

In [110]:
def Score(motifs):
    consensus = Consensus(motifs)
    s = 0
    for pattern in motifs:
        s += Hamming(pattern,consensus)
    return s

def MostProbable(k, Pattern, Profile):
    alphabets = ['A','C','G','T']
    Profile = {k:v for k,v in zip(alphabets,Profile)}
    probs = []
    for i in range(len(Pattern)-k+1):
        p = 1
        for j in range(k):
            alphabet = Pattern[i+j]
            p = p*Profile[alphabet][j]
        probs.append(p)
    idx = probs.index(max(probs))
    return Pattern[idx:idx+k]

def Profile(motifs):
    N = len(motifs)
    k = len(motifs[0])
    Usage = [[1/(N+4)]*k for j in range(4)]
    alphabets = {'A':0,'C':1,'G':2,'T':3}
    for pattern in motifs:
        for i in range(k):
            row = alphabets[pattern[i]]
            Usage[row][i] += (1/(N+4))
    return Usage # 2D list

def Consensus(motifs):
    import numpy as np
    k = len(motifs[0])
    alphabets = {'A':0,'C':1,'G':2,'T':3}
    alphabets = {v:k for k,v in alphabets.items()}
    profile = np.array(Profile(motifs))
    consensus = []
    for i in range(k):
        consensus.append(alphabets[np.argmax(profile[:,i])])
    return ''.join(consensus)

def Hamming(kmer1, kmer2):
    d = sum([i!=j for i,j in zip(list(kmer1),list(kmer2))])
    return d

#### Greedy Search

In [111]:
def GreedyMotifSearch(DNA, k, t):
    
    BestMotifs = [pattern[0:k] for pattern in DNA]
    N = len(DNA[0])
    for i in range(N-k+1):
        motifs = []
        motifs.append(DNA[0][i:i+k])
        for j in range(1,t):
            profile = Profile(motifs)
            motifj = MostProbable(k,DNA[j],profile)
            motifs.append(motifj)
        if Score(BestMotifs) > Score(motifs):
            BestMotifs = motifs
    return BestMotifs

In [112]:
filename = 'Input_MotifSearch.txt'
Genome = []
with open('Course1_Data/'+filename,'r',encoding='utf-8') as file:
   k,t = tuple([int(i) for i in file.readline().strip().split(' ')])
   for i in range(t):
       Genome.append(file.readline().strip())
BestMotifs = GreedyMotifSearch(Genome,k,t)
print(f'k = {k}, t = {t}')
print('Genomes = \n' + '\n'.join(Genome))
print('\nBest motifs = ' + ' '.join(BestMotifs))

k = 12, t = 25
Genomes = 
CTATGGGGTATTTTTGTGCGGGTATTAAAGTCGGATAACACCCACCATTACTTGCGTAACCATGGCTGCTAGAGATCTTGTAGCCACGTTTGACCTGTGTGAACTGTACCACGCTCGCCCCAATCGGCTTCCGCCACGCTGTTAAGATCCTACCGG
CTGCAGCATAAGGACAACGACTTATCATTTCACCGCCGCCGCGCTACACAATAGGCTCACAGAGCTGACTGCCCTGTCGATTGTCCACAGGCAGGCCCCAATGGGCAAATAGCGAATAGTACCAACCTACGATGTGTTCCGCGGAAGTTACATACA
TTGTTTGCCTCCGCTTCTTCGCCCAGTTAGTATTGCTTATAATCCCCGGATGCCATTTCGCAGTGACGATGACGATGGGCTACCCTTGGCCGATTCCTTTCGATGAGCCCTGATGAATCTGCGCGGAATCGGAGGGGGTTAATGGCACGCTAAACT
GGTTCATTCGTTGTGTTGAAAAAGGGCACGTCGCCTCAATGGGCTCACGCATAGCTGTAGAGATTCGCGGTACCTGCCTTATAGAGCACCCACCTAAGGTGAGGCCCTATTTCGTAGATTACCTCTACATCAGTACTTATTAACGGGGGAAATACT
TTAAGGCCGATAAGTATTTCCCCCGTTAGGTATTCTTACTCTCCTACAATTATCCCTGATGTTACGGTGATTCGTGCAACAACGTGGCAACAGTGTCGAGATGGGAGGCCATGGGCTGCCAGCGCTCATTTGAGTGTTTTCTTGTTCTGTGTCAAT
CCGTTCACGTTCATAACAAGCAGGAAAACCTTTGGAGGCTTGGAGTCGGGTTGTTGAATGGTAGCTTAGTATCAATTGGCTGTCTTCTCACTTGCCGGTTTAGTAGACTCAGCTCCGTCTTAGCTCTTTTACGCATCCGAGCCTACCTCTGCGTCT
TGCAGTATCGCATGCGCGCACTTCACGATGAA

#### Monte Carlon Randomized Search

In [113]:
def MCMotifSearch(DNA, k, t, N_iter=1000):
    
    def RandomizedMotifSearch(DNA, k, t):
        from random import randrange
        N = len(DNA[0])
        BestMotifs = []
        for pattern in DNA:
            i = randrange(N-k+1)
            BestMotifs.append(pattern[i:i+k])
        while True:
            motifs = []
            profile = Profile(BestMotifs)
            for pattern in DNA:
                motifs.append(MostProbable(k,pattern,profile))
            if Score(BestMotifs) > Score(motifs):
                BestMotifs = motifs
            else:
                return Score(BestMotifs), BestMotifs
    
    BestScore = k*t+1
    BestMotifs = None
    for i in range(N_iter):
        score, motifs = RandomizedMotifSearch(DNA, k, t)
        if BestScore > score:
            BestScore = score
            BestMotifs = motifs
    return BestMotifs

In [114]:
filename = 'Input_MotifSearch.txt'
Genome = []
with open('Course1_Data/'+filename,'r',encoding='utf-8') as file:
   k,t = tuple([int(i) for i in file.readline().strip().split(' ')])
   for i in range(t):
       Genome.append(file.readline().strip())
    
N_iter = 1000
BestMotifs = MCMotifSearch(Genome,k,t,N_iter)
print(f'k = {k}, t = {t}')
print('Genomes = \n' + '\n'.join(Genome))
print('\nBest motifs = ' + ' '.join(BestMotifs))

k = 12, t = 25
Genomes = 
CTATGGGGTATTTTTGTGCGGGTATTAAAGTCGGATAACACCCACCATTACTTGCGTAACCATGGCTGCTAGAGATCTTGTAGCCACGTTTGACCTGTGTGAACTGTACCACGCTCGCCCCAATCGGCTTCCGCCACGCTGTTAAGATCCTACCGG
CTGCAGCATAAGGACAACGACTTATCATTTCACCGCCGCCGCGCTACACAATAGGCTCACAGAGCTGACTGCCCTGTCGATTGTCCACAGGCAGGCCCCAATGGGCAAATAGCGAATAGTACCAACCTACGATGTGTTCCGCGGAAGTTACATACA
TTGTTTGCCTCCGCTTCTTCGCCCAGTTAGTATTGCTTATAATCCCCGGATGCCATTTCGCAGTGACGATGACGATGGGCTACCCTTGGCCGATTCCTTTCGATGAGCCCTGATGAATCTGCGCGGAATCGGAGGGGGTTAATGGCACGCTAAACT
GGTTCATTCGTTGTGTTGAAAAAGGGCACGTCGCCTCAATGGGCTCACGCATAGCTGTAGAGATTCGCGGTACCTGCCTTATAGAGCACCCACCTAAGGTGAGGCCCTATTTCGTAGATTACCTCTACATCAGTACTTATTAACGGGGGAAATACT
TTAAGGCCGATAAGTATTTCCCCCGTTAGGTATTCTTACTCTCCTACAATTATCCCTGATGTTACGGTGATTCGTGCAACAACGTGGCAACAGTGTCGAGATGGGAGGCCATGGGCTGCCAGCGCTCATTTGAGTGTTTTCTTGTTCTGTGTCAAT
CCGTTCACGTTCATAACAAGCAGGAAAACCTTTGGAGGCTTGGAGTCGGGTTGTTGAATGGTAGCTTAGTATCAATTGGCTGTCTTCTCACTTGCCGGTTTAGTAGACTCAGCTCCGTCTTAGCTCTTTTACGCATCCGAGCCTACCTCTGCGTCT
TGCAGTATCGCATGCGCGCACTTCACGATGAA

#### Monte Carlo search with Gibbs Sampling

In [115]:
def MCMotifGibbs(DNA, k, t, N=1000, M=10):
    
    def GibbsSampler(DNA, k, t, N):
        from random import randrange
        N = len(DNA[0])
        BestMotifs = []
        for pattern in DNA:
            i = randrange(N-k+1)
            BestMotifs.append(pattern[i:i+k])
        for j in range(N):
            i = randrange(t)
            motifs = BestMotifs.copy()
            motifs.pop(i)
            profile = Profile(motifs)
            motifi = MostProbable(k, DNA[i], profile)
            motifs.insert(i,motifi)
            if Score(BestMotifs) > Score(motifs):
                BestMotifs = motifs
        return Score(BestMotifs), BestMotifs
    
    BestScore = k*t+1
    BestMotifs = None
    for i in range(M):
        score, motifs = GibbsSampler(DNA, k, t, N)
        if BestScore > score:
            BestScore = score
            BestMotifs = motifs
    return BestMotifs

In [116]:
filename = 'Input_MotifSearch.txt'
Genome = []
with open('Course1_Data/'+filename,'r',encoding='utf-8') as file:
   k,t = tuple([int(i) for i in file.readline().strip().split(' ')])
   for i in range(t):
       Genome.append(file.readline().strip())
    
N_repeat = 1000
BestMotifs = MCMotifGibbs(Genome,k,t)
print(f'k = {k}, t = {t}')
print('Genomes = \n' + '\n'.join(Genome))
print('\nBest motifs = ' + ' '.join(BestMotifs))

k = 12, t = 25
Genomes = 
CTATGGGGTATTTTTGTGCGGGTATTAAAGTCGGATAACACCCACCATTACTTGCGTAACCATGGCTGCTAGAGATCTTGTAGCCACGTTTGACCTGTGTGAACTGTACCACGCTCGCCCCAATCGGCTTCCGCCACGCTGTTAAGATCCTACCGG
CTGCAGCATAAGGACAACGACTTATCATTTCACCGCCGCCGCGCTACACAATAGGCTCACAGAGCTGACTGCCCTGTCGATTGTCCACAGGCAGGCCCCAATGGGCAAATAGCGAATAGTACCAACCTACGATGTGTTCCGCGGAAGTTACATACA
TTGTTTGCCTCCGCTTCTTCGCCCAGTTAGTATTGCTTATAATCCCCGGATGCCATTTCGCAGTGACGATGACGATGGGCTACCCTTGGCCGATTCCTTTCGATGAGCCCTGATGAATCTGCGCGGAATCGGAGGGGGTTAATGGCACGCTAAACT
GGTTCATTCGTTGTGTTGAAAAAGGGCACGTCGCCTCAATGGGCTCACGCATAGCTGTAGAGATTCGCGGTACCTGCCTTATAGAGCACCCACCTAAGGTGAGGCCCTATTTCGTAGATTACCTCTACATCAGTACTTATTAACGGGGGAAATACT
TTAAGGCCGATAAGTATTTCCCCCGTTAGGTATTCTTACTCTCCTACAATTATCCCTGATGTTACGGTGATTCGTGCAACAACGTGGCAACAGTGTCGAGATGGGAGGCCATGGGCTGCCAGCGCTCATTTGAGTGTTTTCTTGTTCTGTGTCAAT
CCGTTCACGTTCATAACAAGCAGGAAAACCTTTGGAGGCTTGGAGTCGGGTTGTTGAATGGTAGCTTAGTATCAATTGGCTGTCTTCTCACTTGCCGGTTTAGTAGACTCAGCTCCGTCTTAGCTCTTTTACGCATCCGAGCCTACCTCTGCGTCT
TGCAGTATCGCATGCGCGCACTTCACGATGAA