# Neighbors(Pattern,d)

```
Neighbors(Pattern, d)
        if d = 0
            return {Pattern}
        if |Pattern| = 1 
            return {A, C, G, T}
        Neighborhood ← an empty set
        SuffixNeighbors ← Neighbors(Suffix(Pattern), d)
        for each string Text from SuffixNeighbors
            if HammingDistance(Suffix(Pattern), Text) < d
                for each nucleotide x
                    add x • Text to Neighborhood
            else
                add FirstSymbol(Pattern) • Text to Neighborhood
        return Neighborhood```

In [3]:
samp="""ACG
1"""
lines=samp.strip().split()
Pattern,d=lines[0],int(lines[1])
print(Pattern,d)

ACG 1


In [4]:
Pattern[1:]

'CG'

In [5]:
def HammingDistance(st1:str, st2:str):
    if len(st1) != len(st2):
        raise Exception("Strings must be of equal length.")
    dist=[a for a,b in zip(st1,st2) if a!=b]
    return len(dist)

In [8]:
def Neighbors(Pattern:str,d:int):
    if d==0:
        return set(Pattern)
    if len(Pattern)==1:
        return {'A','C','G','T'}
    Neighborhood=set()
    SuffixNeighbors=Neighbors(Pattern[1:],d)
    for Text in SuffixNeighbors:
        if HammingDistance(Text, Pattern[1:])<d:
            for n in ['A','C','G','T']:
                Neighborhood.add(n+Text)
        else:
            Neighborhood.add(Pattern[0]+Text)
    return Neighborhood
Neighbors("ACG", 1)

{'AAG', 'ACA', 'ACC', 'ACG', 'ACT', 'AGG', 'ATG', 'CCG', 'GCG', 'TCG'}

In [12]:
output="""CCG TCG GCG AAG ATG AGG ACA ACC ACT ACG"""
set(output.split())==Neighbors("ACG", 1)

True

In [13]:
with open("dataset_30282_4.txt","r") as f:
    data=f.read()
data

'AAACATTCC\n2\n'

In [16]:
" ".join(list(Neighbors("AAACATTCC",2)))

'CAACATTAC AAAGATTAC AAACATTGT CAATATTCC AAAAGTTCC ATAAATTCC AAACATGCT AAACATTAT AAAGAGTCC AAACCCTCC AAACTTCCC AGACATTCC TAACAGTCC AAAGTTTCC ATACATCCC TAACACTCC AAGCATTCT AAAAATTGC AAACATTCC AAATATTCA AATCAGTCC AAAAATCCC AATCTTTCC AAACACGCC AAACATTCG AAACCTTTC AACGATTCC AAATACTCC AAGCAGTCC AAACATACC AAGGATTCC TAACATACC GAACATTCA AAACATAAC CAACATGCC AAACATGTC GAACATTCG GAACATTCC ACACATTTC GATCATTCC ACGCATTCC AATCATTGC AAACATTGC CAACATTCT ACTCATTCC AGACACTCC AAACTTTCC AAACCTACC AAACCTTCA TAACATTGC ATACACTCC AAGCATTCG AACCGTTCC ACAGATTCC AAACTTTTC AACCATTCA GAAGATTCC CAGCATTCC AATCGTTCC AACCATTCG ATCCATTCC AAAGATTCA AAACATGCG AACCATACC AAGCATCCC ATGCATTCC AAACCTTCT AAACACTCT AGACAGTCC ATACATGCC AAACAGTTC ATTCATTCC AAACTTTGC AAACATTGA AAACGTTAC AAATATTCC AAACACTTC AAGCGTTCC AAACTGTCC ACACATTCT AAACAAGCC AACCATTGC ACACATCCC GAACATACC AAAGCTTCC ATACATTGC AATCATACC AGACCTTCC AATTATTCC TAACATCCC CAACATTCG AAACACTGC TACCATTCC AAAGATTCC ACACATGCC AAACATTCA TAAGATTCC AAAGGTTCC AAACATCAC AAAAATACC

# FrequentWordsWithMismatches(Text, k, d)

```
FrequentWordsWithMismatches(Text, k, d)
    Patterns ← an array of strings of length 0
    freqMap ← empty map
    n ← |Text|
    for i ← 0 to n - k
        Pattern ← Text(i, k)
        neighborhood ← Neighbors(Pattern, d)
        for j ← 0 to |neighborhood| - 1
            neighbor ← neighborhood[j]
            if freqMap[neighbor] doesn't exist
                freqMap[neighbor] ← 1
            else
                freqMap[neighbor] ← freqMap[neighbor] + 1
    m ← MaxMap(freqMap)
    for every key Pattern in freqMap
        if freqMap[Pattern] = m
            append Pattern to Patterns
    return Patterns```

In [22]:
def FrequentWordsWithMismatches(Text, k, d):
    freqMap=dict()
    for i in range(len(Text)-k):
        Neighborhood=Neighbors(Text[i:i+k],d)
        for Neighbor in Neighborhood:
            freqMap[Neighbor]=freqMap.get(Neighbor,0)+1
    m=max(freqMap.values())
    return set([k for k,v in freqMap.items() if v==m])
FrequentWordsWithMismatches("AACGGGGCT",3,2)

{'AGA', 'AGC', 'AGG', 'AGT', 'GAG'}

In [36]:
"flfjsla"[::-1]<"flfjsla"

True

In [48]:
def FreqMismRev(Text, k, d):
    freqMap=dict()
    for i in range(len(Text)-k):
        Neighborhood=Neighbors(Text[i:i+k],d)
        for Neighbor in Neighborhood:
            freqMap[Neighbor]=freqMap.get(Neighbor,0)+1
    pairs=[{x,x[::-1]}for x in freqMap.keys()]
    unique_pairs=[]
    for pair in pairs:
        if pair not in unique_pairs:
            unique_pairs.append(pair)
    print(unique_pairs)
    pair_values=[]
    for i,pair in enumerate(unique_pairs):
        pair_values.append(0)
        for Text in pair:
            pair_values[i]+=freqMap.get(Text,0)
    print(pair_values)
    print(len(pair_values)==len(unique_pairs))
    #m=max(freqMap.values())
    #return set([k for k,v in freqMap.items() if v==m])
FreqMismRev("AACGGGGCT",3,2)

[{'GCC', 'CCG'}, {'AAC', 'CAA'}, {'CAT', 'TAC'}, {'ATA'}, {'TGA', 'AGT'}, {'CGT', 'TGC'}, {'TCC', 'CCT'}, {'AGA'}, {'AAG', 'GAA'}, {'TAG', 'GAT'}, {'CTC'}, {'TTC', 'CTT'}, {'ACT', 'TCA'}, {'GAG'}, {'AGG', 'GGA'}, {'ACA'}, {'AAA'}, {'GGC', 'CGG'}, {'ATG', 'GTA'}, {'AAT', 'TAA'}, {'ACC', 'CCA'}, {'CAC'}, {'AGC', 'CGA'}, {'TAT'}, {'TTA', 'ATT'}, {'CCC'}, {'GTC', 'CTG'}, {'CAG', 'GAC'}, {'CGC'}, {'ATC', 'CTA'}, {'GCA', 'ACG'}, {'TCT'}, {'GGT', 'TGG'}, {'GTG'}, {'GCG'}, {'TCG', 'GCT'}, {'GTT', 'TTG'}, {'GGG'}, {'TGT'}]
[9, 5, 4, 2, 10, 9, 5, 6, 9, 9, 3, 3, 3, 6, 10, 2, 2, 10, 8, 3, 5, 3, 10, 1, 2, 4, 8, 9, 5, 4, 9, 1, 9, 5, 5, 8, 7, 5, 4]
True


In [45]:
L=[]
L[0]=1
L

IndexError: list assignment index out of range

In [29]:
with open("dataset_30278_9.txt","r") as f:
    data=f.read()
Text,k,d=data.strip().split()
k,d=map(int,[k,d])
print(Text)
print(k,d)

GGAGGACACCGGGACGGAGGACGGAGGAGGACGGAACCGAGGTACCGGGAAGGTGGAACCGAGGTACCGGGAGGACGGAGGAGGACGGAACCGGGACACCGGGACGGACACCGGGACAGGTAGGTGGAACCGAGGTACCGGGAAGGTGGACACCGAGGTGGAGGAACCGAGGTGGAGGAGGAGGAGGACACCGAGGTGGAAGGTGGAGGAACCGAGGTAGGTACCGGGACGGACGGACGGAAGGTGGACGGACGGAAGGTGGAAGGTACCGGGAGGACGGAGGAACCGGGACACCGACCGACCGACCGGGAGGAGGACGGAGGACGGAACCG
6 3


In [30]:
" ".join(FrequentWordsWithMismatches(Text,k,d))

'GGGGGG'