# Neighbors(Pattern,d)

```
Neighbors(Pattern, d)
        if d = 0
            return {Pattern}
        if |Pattern| = 1 
            return {A, C, G, T}
        Neighborhood ← an empty set
        SuffixNeighbors ← Neighbors(Suffix(Pattern), d)
        for each string Text from SuffixNeighbors
            if HammingDistance(Suffix(Pattern), Text) < d
                for each nucleotide x
                    add x • Text to Neighborhood
            else
                add FirstSymbol(Pattern) • Text to Neighborhood
        return Neighborhood```

In [2]:
samp="""ACG
1"""
lines=samp.strip().split()
Pattern,d=lines[0],int(lines[1])
print(Pattern,d)

ACG 1


In [3]:
Pattern[1:]

'CG'

In [4]:
def HammingDistance(st1:str, st2:str):
    if len(st1) != len(st2):
        raise Exception("Strings must be of equal length.")
    dist=[a for a,b in zip(st1,st2) if a!=b]
    return len(dist)

In [5]:
def Neighbors(Pattern:str,d:int):
    if d==0:
        return set(Pattern)
    if len(Pattern)==1:
        return {'A','C','G','T'}
    Neighborhood=set()
    SuffixNeighbors=Neighbors(Pattern[1:],d)
    for Text in SuffixNeighbors:
        if HammingDistance(Text, Pattern[1:])<d:
            for n in ['A','C','G','T']:
                Neighborhood.add(n+Text)
        else:
            Neighborhood.add(Pattern[0]+Text)
    return Neighborhood
Neighbors("ACG", 1)

{'AAG', 'ACA', 'ACC', 'ACG', 'ACT', 'AGG', 'ATG', 'CCG', 'GCG', 'TCG'}

In [6]:
output="""CCG TCG GCG AAG ATG AGG ACA ACC ACT ACG"""
set(output.split())==Neighbors("ACG", 1)

True

In [7]:
with open("dataset_30282_4.txt","r") as f:
    data=f.read()
data

'AAACATTCC\n2\n'

In [8]:
" ".join(list(Neighbors("AAACATTCC",2)))

'AAAAATCCC AATCATTGC AAAGATCCC AAATACTCC CAACATCCC ACACATCCC AAACAGTAC AAACATACC ACAGATTCC CAATATTCC AAACCTACC ATACATTTC AATCATTAC AAACTTGCC GAACAATCC AAACCGTCC ATACATTCG TAACATGCC AGACATTCT AAAGATGCC AGACACTCC AAACAAACC AAATATTGC AAACTTTCC AAGCATTCT TGACATTCC AAAAAGTCC TCACATTCC AAACGTACC GAACCTTCC AAACATCCG ACACATGCC CAACATTCC AAACGTTCT AAACATTAT AAACATGGC AAACCTGCC AAATATGCC AAACATTGC AAGTATTCC ATACATTCC AAACATTTA ACCCATTCC AAACAACCC AAACCTTCA AAACAATCG ACACATTCC AACCATCCC AAACGTTCG AAACACTGC AAATATTCA AGACATTTC GAACATTGC ATACAGTCC AAACAGCCC AAACATTGT AAACACTCT CCACATTCC AAAAATTGC CAACATTAC AAGCAGTCC AAAGACTCC TAACATTCC AAACATCTC AATCACTCC AAAAATTAC TAACTTTCC AAAGATTCT AAACAATAC AAACATTCC AATCATGCC AAACGTCCC TAACATCCC AGACTTTCC AAACATTAG AGACATTCC ACACACTCC AATCATTCG TAACATACC AAAAATACC AAAGATTTC CAACATTGC AAACATCCT ATGCATTCC AAATATTAC AAACCTCCC AAATATTTC AAACACACC AAGCATTCA CAACTTTCC GAACGTTCC GAACATGCC CATCATTCC AAAAAATCC AAACACTCA AAAGATACC ATACATTGC ACGCATTCC TAACATTTC AAACCTTCT

# FrequentWordsWithMismatches(Text, k, d)

```
FrequentWordsWithMismatches(Text, k, d)
    Patterns ← an array of strings of length 0
    freqMap ← empty map
    n ← |Text|
    for i ← 0 to n - k
        Pattern ← Text(i, k)
        neighborhood ← Neighbors(Pattern, d)
        for j ← 0 to |neighborhood| - 1
            neighbor ← neighborhood[j]
            if freqMap[neighbor] doesn't exist
                freqMap[neighbor] ← 1
            else
                freqMap[neighbor] ← freqMap[neighbor] + 1
    m ← MaxMap(freqMap)
    for every key Pattern in freqMap
        if freqMap[Pattern] = m
            append Pattern to Patterns
    return Patterns```

In [9]:
def FrequentWordsWithMismatches(Text, k, d):
    freqMap=dict()
    for i in range(len(Text)-k):
        Neighborhood=Neighbors(Text[i:i+k],d)
        for Neighbor in Neighborhood:
            freqMap[Neighbor]=freqMap.get(Neighbor,0)+1
    m=max(freqMap.values())
    return set([k for k,v in freqMap.items() if v==m])
FrequentWordsWithMismatches("AACGGGGCT",3,2)

{'AGA', 'AGC', 'AGG', 'AGT', 'GAG'}

In [22]:
def rc(data):
    line=data.strip()
    complement=dict(A="T",T="A",C="G",G="C")
    return "".join(reversed([complement[b] for b in line]))
rc("AACTG")

'CAGTT'

In [54]:
samp="""ACGTTGCATGTCGCATGATGCATGAGAGCT
4 1"""
Text,k,d=samp.split()
k,d=map(int, (k,d))
Text,k,d

('ACGTTGCATGTCGCATGATGCATGAGAGCT', 4, 1)

In [19]:
freqMap=dict()
for i in range(len(Text)-k):
    Neighborhood=Neighbors(Text[i:i+k],d)
    for Neighbor in Neighborhood:
        freqMap[Neighbor]=freqMap.get(Neighbor,0)+1


In [52]:
rev=[rc(o) for o in freqMap.keys()]
revfreqMap={o:freqMap.get(o,0) for o in rev}
sums=[sum(list(o)) for o in zip(freqMap.values(),revfreqMap.values())]
allMap=dict(zip(freqMap.keys(),sums))|dict(zip(revfreqMap.keys(),sums))
m=max(allMap.values())
set([k for k,v in allMap.items() if v==m])

{'ACAT', 'ATGT'}

In [62]:
def revMisFreq(Text, k, d):
    freqMap=dict()
    for i in range(len(Text)-k):
        Neighborhood=Neighbors(Text[i:i+k],d)
        for Neighbor in Neighborhood:
            freqMap[Neighbor]=freqMap.get(Neighbor,0)+1
    rev=[rc(o) for o in freqMap.keys()]
    revfreqMap={o:freqMap.get(o,0) for o in rev}
    sums=[sum(list(o)) for o in zip(freqMap.values(),revfreqMap.values())]
    allMap=dict(zip(freqMap.keys(),sums))|dict(zip(revfreqMap.keys(),sums))
    m=max(allMap.values())
    return set([k for k,v in allMap.items() if v==m])
revMisFreq(Text,k,d)

{'AAAAA', 'TTTTT'}

In [60]:
with open("dataset_30278_10.txt","r") as f:
    data=f.read()
Text,k,d=data.strip().split()
k,d=map(int,[k,d])
revMisFreq(Text,k,d)

AAAAA TTTTT 