## Problem 27: Construct the De Bruijn Graph of a String


#### De Bruijn Graph from a String Problem
Construct the de Bruijn graph of a string.

> Given: An integer k and a string Text.

> Return: DeBruijnk(Text), in the form of an adjacency list.

<br>

In [35]:
# ----- INPUTS -----
text = "AAGATTCTCTAC"
k = 4

```
AAGATTCTCTAC

      AAGA   AGAT   GATT   ATTC   TTCT
   AAG -> AGA -> GAT -> ATT -> TTC -> TCT
    
if k = 4
then edges:  k = 4
and  nodes:  k-1 = 3

we need to save nodes and edges, then glue together the same nodes
    
1. traverse through input, create list of all kmers
2. traverse through kmers, create list of prefixes
3. traverse throigh all kmers, build adjacency list, connect all whose prefix == suffix
4. doesnt that automcatically merge similar onces?
```

In [36]:
nodes = []
edges = []
n = len(text)
kn = k-1

for i in range(0, n-kn+1):
    kmer = text[i:i+kn]
    nodes.append(kmer)
    
for i in range(0, n-k+1):
    kmer = text[i:i+k]
    edges.append(kmer)
    
print(nodes, len(nodes))
print(edges, len(edges))

['AAG', 'AGA', 'GAT', 'ATT', 'TTC', 'TCT', 'CTC', 'TCT', 'CTA', 'TAC'] 10
['AAGA', 'AGAT', 'GATT', 'ATTC', 'TTCT', 'TCTC', 'CTCT', 'TCTA', 'CTAC'] 9


In [37]:
# traverse through the nodes, if it exists in adj list, append, if not add to adj list

adj = {}
for i in range(0, len(nodes)-1):
    u = nodes[i]
    v = nodes[i+1]
    print(u,v)
    
    if u in adj.keys():
        adj[u].append(v)
    else:
        adj[u] = [v]
        
adj        

AAG AGA
AGA GAT
GAT ATT
ATT TTC
TTC TCT
TCT CTC
CTC TCT
TCT CTA
CTA TAC


{'AAG': ['AGA'],
 'AGA': ['GAT'],
 'GAT': ['ATT'],
 'ATT': ['TTC'],
 'TTC': ['TCT'],
 'TCT': ['CTC', 'CTA'],
 'CTC': ['TCT'],
 'CTA': ['TAC']}

In [38]:
def display_adjacency_list(adj_list):
    for u in sorted(adj):
        edge_list = ','.join(sorted(adj[u]))
        print(f"{u} -> {edge_list}")

In [39]:
# ----- OUTPUTS -----
display_adjacency_list(adj)

AAG -> AGA
AGA -> GAT
ATT -> TTC
CTA -> TAC
CTC -> TCT
GAT -> ATT
TCT -> CTA,CTC
TTC -> TCT
