# Overlap Graphs

## A Brief Introduction to Graph Theory

Networks arise everywhere in the practical world, especially in biology. Networks are prevalent in popular applications such as modeling the spread of disease, but the extent of network applications spreads far beyond popular science. Our first question asks how to computationally model a network without actually needing to render a picture of the network.
First, some terminology: graph is the technical term for a network; a graph is made up of hubs called nodes (or vertices), pairs of which are connected via segments/curves called edges. If an edge connects nodes v and w, then it is denoted by v,w (or equivalently w,v).
an edge v,w is incident to nodes v and w; we say that v and w are adjacent to each other; the degree of v is the number of edges incident to it; a walk is an ordered collection of edges for which the ending node of one edge is the starting node of the next (e.g., {v1,v2}, {v2,v3}, {v3,v4}, etc.); a path is a walk in which every node appears in at most two edges; path length is the number of edges in the path; a cycle is a path whose final node is equal to its first node (so that every node is incident to exactly two edges in the cycle); and the distance between two vertices is the length of the shortest path connecting them.

Graph theory is the abstract mathematical study of graphs and their properties.

## Problem

A graph whose nodes have all been labeled can be represented by an adjacency list, in which each row of the list contains the two node labels corresponding to a unique edge.
A directed graph (or digraph) is a graph containing directed edges, each of which has an orientation. That is, a directed edge is represented by an arrow instead of a line segment; the starting and ending nodes of an edge form its tail and head, respectively. The directed edge with tail v and head w is represented by (v,w) (but not by (w,v) ). A directed loop is a directed edge of the form (v,v).
For a collection of strings and a positive integer k, the overlap graph for the strings is a directed graph Ok in which each string is represented by a node, and string s is connected to string t with a directed edge when there is a length k suffix of s that matches a length k prefix of t, as long as s≠t; we demand s≠t to prevent directed loops in the overlap graph (although directed cycles may be present).

### Given: A collection of DNA strings in FASTA format having total length at most 10 kbp.
### Return: The adjacency list corresponding to O3. You may return edges in any order.

## Sample Dataset

```
>Rosalind_0498
AAATAAA
>Rosalind_2391
AAATTTT
>Rosalind_2323
TTTTCCC
>Rosalind_0442
AAATCCC
>Rosalind_5013
GGGTGGG
```

## Sample Output

```
Rosalind_0498 Rosalind_2391
Rosalind_0498 Rosalind_0442
Rosalind_2391 Rosalind_2323
```

In [3]:
# Test data set
faText = """>Rosalind_0498
AAATAAA
>Rosalind_2391
AAATTTT
>Rosalind_2323
TTTTCCC
>Rosalind_0442
AAATCCC
>Rosalind_5013
GGGTGGG"""

In [24]:
# Function to read in fa files


# Function to make dictionary of ends
def endsDict(faTxt, k):
    # make dictionary
    faEndsDict = {}
    
    #loop through fasta and extract reads and then extract ends
    td = faTxt.split('\n')
    key = ""
    #print(td)
    for ln in td:
        #print(ln.strip('\n'))
        #build_dic(line)
        if ln.startswith('>'):
            # if so check if key in dictionary
            if key in faEndsDict:
                #print(key, "yes")
                # subset reads to get head and tail
                head = reads[0:k]
                tail = reads[-k:]
                faEndsDict[key] = [head, tail]

            # strip > and begin new dict entry
            key = ln[1:].strip()
            faEndsDict[key] = ["",""]
            reads = ""
            #print(key)
        else:
            # if not calculate GC
            reads = reads + ln.strip()
    
    # last read
    head = reads[0:k]
    tail = reads[-k:]
    faEndsDict[key] = [head, tail]

    # return dictionary
    return faEndsDict

# Function to Create output
def rosalindOut(faTxt, k):
    # Call endsDict to get dictionary
    faInOut = endsDict(faTxt, k)
    
    # For each key get connecting nodes
    for key in faInOut:
        out = faInOut[key][1]
        for node in faInOut:
            edge = faInOut[node][0]
            #print(out, edge)
            # Can't loop to self
            if node == key:
                #print(node, key)
                continue
            elif out == edge:
                print('{} {}'.format(key, node))
    
    

In [7]:
# test
print(endsDict(faText, 3))

['>Rosalind_0498', 'AAATAAA', '>Rosalind_2391', 'AAATTTT', '>Rosalind_2323', 'TTTTCCC', '>Rosalind_0442', 'AAATCCC', '>Rosalind_5013', 'GGGTGGG']
>Rosalind_0498
Rosalind_0498
AAATAAA
>Rosalind_2391
Rosalind_0498 yes
Rosalind_2391
AAATTTT
>Rosalind_2323
Rosalind_2391 yes
Rosalind_2323
TTTTCCC
>Rosalind_0442
Rosalind_2323 yes
Rosalind_0442
AAATCCC
>Rosalind_5013
Rosalind_0442 yes
Rosalind_5013
GGGTGGG
{'Rosalind_0498': ['AAA', 'AAA'], 'Rosalind_2391': ['AAA', 'TTT'], 'Rosalind_2323': ['TTT', 'CCC'], 'Rosalind_0442': ['AAA', 'CCC'], 'Rosalind_5013': ['GGG', 'GGG']}


In [25]:
rosalindOut(faText, 3)

Rosalind_0498 Rosalind_2391
Rosalind_0498 Rosalind_0442
Rosalind_2391 Rosalind_2323


In [30]:
faReal = """>Rosalind_1121
CTACACCTAAGGCTCTCAGCGACGTCCTGTAGGCTCCATTGGCAGCCTCTAGTAAACGAT
GGGTTATCACTACCCGCTCAGTCTTTATCAAGACCCA
>Rosalind_0426
TTGGTAACCAGGCGCTCTAGCCTATCATGCTTAAGGGATACACATTTTGTAAGACACAGA
CTGTTTGCAAAAGATTAGTTAGGGGAGTGC
>Rosalind_0640
GGAGGAGTTTCCTTTGTGGCAACGGGATCTCGCGGACGCTCTCACAGTTGGTAAAAGCGT
GAATTTCCCAAACGCTTTGGACACGCTAAACTAA
>Rosalind_7263
TGATTTGCGACAACGCTCTCATAATCATTTTTTCGCCCTACTAGAAGAAATACTGGCCTC
TTTAAATGAGCGGGCGGATTTG
>Rosalind_4849
CCTCGTATTGGCAGGGTAGGAATTGGCATCTCAGTAATGGATAGCGTTCTTCATACTAGC
TTGGAGTTAATGTAGGTGAA
>Rosalind_7470
CTGAGGCTATTGATTCTACGTTAGAGAGCACCAATGTCATCCGATCTGGTAATTTACTCC
CCTGGGCAATCCAACGCCCGTATTT
>Rosalind_3453
TAAACTTATTCACTTCACTTATCTGCGAGGACAACCGCATCGTCTTAGTACCACTACTAA
ACGTACCTTAATAGAATAATGCGCACTGA
>Rosalind_4100
AAACACGTTGCGCTCATGGGACATTCATAGGCTCTCTACCGGTGAGGGGTACTAAGGTTA
GGAGATCAGTTCACTCGGTGGAGCTTTTCGT
>Rosalind_5903
CATAACAAACATCTATTTCGGGAAAATAAGATAATTGGCATGACTTGGAAGAGGCCCTTT
AGGGTCGGGCGGAGGTCAAACGGACTGTGGACAG
>Rosalind_1532
CACCCAACGCGACCTGGCTGTCCCGTGCCGATACCGACAACCGGCACACAGCTACTCACT
TTACAGGCCTTAGGTCGCCTAAGAGATAAATTGAGCTTGC
>Rosalind_4405
TGTTAATTGTAGAGTCTGAAAGGGAACAGTTTGAAGGTTCTAAAGGTGAACCTGTGAAGG
GATCCTAGTTGTCACGCCGC
>Rosalind_3628
ATTGCATCGTCGCCTGCGCAGGCAAATCACCCTCCGGTTGTGGAGCAAGTGGGCGATAAC
GGGATGCTATTAGTCCGTCACACCA
>Rosalind_6978
GGGCATTCTGAAACCATCTGGGACGTCAGGACATTACGGGGAGACCCCGCTCGGGCTGCC
GCCTGATGCCAAACACGTGCTCTGAAAC
>Rosalind_8936
GTCGCTGGCACTGAGATTCGTCGGTAGGCCTGCAGAGCCCACCATCTCTGAGGACAGGTA
ACAAGTCTGCGAATTTGACCCAGGTGTAATACTAGTAAA
>Rosalind_1092
CCTTAAGTCGTGGCGAGGGTAAGCAAACCAATATTGCCCGACCATTTAGTTAATAAGCCG
TTAACCTCTTCGAAAGCAAACCGAAA
>Rosalind_9770
GGATTCATGTGTATCCTCTCCGGTCTACCAGTGATTCATATACTTAGTAACTAATACTAA
CCTCCAAAAGTTAGACGATTGCT
>Rosalind_6064
CGTACCCTAGAAGCCACGCTTTTATGCGGCCATGACTCACCCACTCACACGCATTCCCTG
AGATATCTTTCTTGGGGTGTCTGTGTGGCAAAAG
>Rosalind_1265
CTGCACGGGCAGGATACGAGGTGGGGCTACGTTGGAAAATCTCTCTCCCATCAGCTCGCC
CTATATACACACAGTACGTAAGGTACTCACTAGCG
>Rosalind_9821
GGGTCGAAAAATCTAGTCAACCGTAACTTTATCAGCGGTTACACCTTCCTTCTTGAGGGT
TAGCCCGATGAAACTTTAAACAG
>Rosalind_2528
GATGTGGATGCTTTCCCGATGGCATTTGATATCTGTGGGGGCGAGTTCGATTACTCTTAG
GACCAAAACGATAATATCGCAGATGACAAT
>Rosalind_4424
CGACCTTGACACGCAGTATGCCTATAATTTTAATATGTGAGGATCATAGTGCCAATGGCG
TTCAGTGTGCCGCGATGTGCTAGTGTTGAAAG
>Rosalind_4878
TGAGCCTCCGGACCAGCGAACGCCACTTATTTTATTTCACAGAATAATCACGGCAACGAA
ATGATATGACGCATCTCTCCGTCAGGCAAAGCTACT
>Rosalind_0607
ATTGTTCGGCGGCCTGCACTCGGAAAGCGCCCATCGCGTTAGCTTTTATGGCCGACATTC
TCGAGCATCCGTTGGCGTATTTTTGTCGAGCCCGCGTTA
>Rosalind_7793
TGACATGATATAAAGAGGCGCACGATAATCCAGTTGATTGGTGCCGGGATCGGGGACTGC
AGCTTACTTGGTTCGTCCATACGCC
>Rosalind_5892
TACTACCCGGTTTTCGCTACATGTATTCATCGCGTCTTCCAGGGAGACAAGAGTTTTGTC
GACGGGAAGACTTTCGTTGCTTGTG
>Rosalind_8260
ACCAAATCCGACGACTGGGCGGACTGGGGCAACACCAGACGCGAGTGTGGCTGATGCGGT
GTCTAATGGAATACGGGCGG
>Rosalind_2728
TGTGCGCCTAGAAGCTTACAGTCTCGCAGTATATGCAGCCCTCCAGGTGGGCGTAGTACA
GCGTGATTTAATAGCTTGACTCACCCTTAC
>Rosalind_7214
TTGCGATCAACAAACCAGACGATTAAGAAACCTCAACGAACTTGTGTCTGGACTCAACTC
GCCTAGATTAAGGACCACATATCGTACCACCGA
>Rosalind_1203
GGCCCAAACGGTTAGACTCCACATATAAGGTGTTTCTGAGGGTTACGTAAATATGACTTA
TTGGGCGCAAGGCAGGTAACACCTTCCATGG
>Rosalind_7638
GGCAGGGCATAGTACAGCTCTGTGAATCGGGTACGGCGCAGGATCATCGATGCAGGTGAT
CCCAGAAAGCCATTGATCGCGTAGAT
>Rosalind_7179
GGTCGGACTCCGTCTGGACACTCGCTCGGTTCTCTCCTACTTTCTGTACCGGACGACCGA
TGATTGAACAAGCTAGGAGACTA
>Rosalind_2758
TTCCTTGATCGACCGTTTCTAAACGGGATTGTCTTTGAAGCAAACCAACTGCGGCATCTA
ACAGCAGTGGGCGTTGCATGAAGGTTGAGGGTATAAGC
>Rosalind_7448
GAGTATCCTAGCCGACTTCCTCGATCGAACACATACAAAGCATTCTTCTCAAAGAGACTT
AAGGTGGGTATTGGGGCACCGTCACAAATAAATTGTCTA
>Rosalind_4256
TACCCGTCAAGTGTGCAAGAGCAAACGAAGATCCTGCACCGATCCACTCTGGCTATTTTT
TAATCAGGGAGTACCCTTCTGGTTGCTATTTTGG
>Rosalind_6972
ACTGAAAAGCCGTCTGTTACCGACAAGCCAATGCTCATCCTCACCCTATGATACCGGAGG
TGATACAGCTATTTACAGCGTAGTTAGTTGTGTTG
>Rosalind_2152
ATCTGCTACGAAAAATAGTCAGGAGATCTGCCTTGCCTCATTGAACAGTACTCCCTTAGC
CCCTAGGTCAGTAAGCGCGG
>Rosalind_3048
CTGGCCACGTGGCCATGGGCCAGACAGGAAGACCAGGTTATAGCGTTGCCTATGTACGTA
GCCGTCCCGTTAAAGCCCGT
>Rosalind_4827
GAACGATGCCCTTTTGATCGGCGTTATTTCGTGAAATTCCCGTTAAAACCTCAATGGCCT
GTGAACTCACAACAAATCAC
>Rosalind_6621
AGAAAACAAAACCGTGTGTACAAAACACCAGAAGATGAATCGGCAGGTAGGTTGTGTTTC
TGTACCGGATGAAGTCTCTATCCACA
>Rosalind_1876
GAAGCGCTACGCGCAGACGACAGTTTAATAACGCCGAAGGTTCAGTACTTGCCATCACCG
ATGTCGCTGCATTTGGACCGCTAA
>Rosalind_8724
CGAAAGTATCAAACCGTTTCCCATACGGGCAAGGTTCTGTAGGATTGCTAAGTCACACTT
CGTTACCTGGGCCGGGCTCGTAGTG
>Rosalind_7145
GATCCACCTACCGTTAGCTACCTTAGCTGAGCGGGGCTAGTATAGGATTTAGATGAAATT
TCCTGGTCCAGTCGAAACTTGGAAATATA
>Rosalind_0600
TTCTTGGCTTTATAACTCGGCGAAATTGGGAGCACCATTCAATAACCCCCCAGATCGTGT
GATGATAGGTCACAGTATACTGA
>Rosalind_1859
CAATCGATTCTTCCATGATATTACTACCATAGGGTGTTCGAGGAAAACAGTGATAACAGG
ATTTCAACAGATCCTCTGTGGGTTTTTTTGGATTTAA
>Rosalind_8347
CTACCTGTATTCTGATGTCATTATGGCACTGCCCTGGGAGTATGAGGCCGTCTATAAGGG
TCCAGTTACCGAGATAGCTAAATAATGTGTACCGC
>Rosalind_6552
CACACGAGAAGGGGACTCTTGAATCCTCGTCCCTTGGAACCGCTATAATATGTGGGTTGC
AAACAGTTGGCGAGACCCACCAGAC
>Rosalind_0157
GCTAGCCTTCTCATACTAGCCGCTCTCGCCAACTTGCTATCCGCATTCGGTCGGGAAGCG
GATGAATGTAGGTGCGAGCCGTACGCCACCGG
>Rosalind_3721
CTGTACCCTCACAGTGCGTGTTCGGCCGTCAGGACTGGTGAAAGATGTTCCAATGCTGTT
CATAGTACGCTTTGCTCGTTTTTAAGCA
>Rosalind_6961
CCAAAGCAAGGGAAGTCGCGCTCCTACGTACCCGACATGCAGGAGTCATTGCGTCCCATT
CACTCCAGCTGCGTAGAGAGGTCACCGCGTTTGTGTTCTG
>Rosalind_8472
CGCGCTACGCACCAAATACCTGGCGATCGTTAATGCGCTCAGCTGTACCACGTATCAAAT
CTCGGGCCTACAATCATTGCATGTTGAT
>Rosalind_6065
AGCATTGCGCAATATGCAGACTGACGGATCTATGCCACGCTACGTTCTTTAGTGCACCAT
AAAGTCCCTGAACGGGGCTATAAGGGAGAT
>Rosalind_0878
TACCCGAGAAAACTGCTAAGTAACAATTAGAACGGTCTGATCGTAGTGCACCCTATTCTA
TTTTAGGCAGTTTAGACTCATGGGCTGAGG
>Rosalind_0482
ATAAAAAATTATGCGAAGCAACCCCTGACCTTGTCTCGCCGCCGTTTAACGCCTAACTGG
CCCCCTAGCCGTTCCTGCTAACGGTATGCAGATTG
>Rosalind_1241
CGGCACATCCTTAGTGCGCTGTAACTGTAATACTATAAGTTCTTAGACGAAACGAGTTTA
TCTAACGAGACAGCGACCACAAGCTAGGGGCGTGATA
>Rosalind_0015
AGCTTGGCCGACGGGAGAACTCCAGCGCAGTACTTATCCTTGATGATAGATGCACTCCTC
ACTGTAGTACTTACCGTCGA
>Rosalind_3191
GGCGAACCACTAGAGATTCGTGCACGGCCTGAGAAATCGTCCAGTCGCTCCCCATGGCAA
CGGCACGGGTGTAAACCATTGTAAA
>Rosalind_4732
GCTCCCCTCTGATTGGGATCTCGACCCTTGGACAAATAATTTCTATGGGGAGAAAGTTTT
GACGTGCTCACACACATACAGAATGCTA
>Rosalind_6267
ATACATGTGCCGAAGGACGTTTCGGGATGCGCGAGCGCTCTCACGGCTCGGAGCTCTGGA
ACATAACTGAAAGAGGATCC
>Rosalind_1885
TATAGTATTTGAAAGCCCAGGGAGACTGGTGATACCTACAAGGACTATTACAAGGCGGGT
CCCGATTGACAGCGGAAGACGTGTTGTGCAAAGTCTCCGT
>Rosalind_0955
CAGTATGGGATGCATAGTGACGATTCTCGTAGCGAGAACTAGGCACTTCCCATAAAAGCC
TCGCTATAAAAAACTGAAAAGACTTCAATGAAACCGGCTT
>Rosalind_2000
TCTCAACGTCTGGAGTACAGCCAGTCTTCCAGAACCTCCCAGATAAACCTATCCTCCAGA
GTGAAACCTCCTGGATGTAACATCCCAATC
>Rosalind_2381
AGCAAAATATAACGGACGTAATGCCGACTTTAAGGGCCTGACATTATCAGCACTACAAAA
TCGTGGTTGAGAGATAAAGCAATTGGATTCAT
>Rosalind_8919
TTCTACCGGGATTGATGGCTGCCTGGTACAGACTCTCATGAGCTATTCCGGAGTGCTTCG
CGCTAACGTACCGCAACGGTACCTTGCC
>Rosalind_2894
ATTGGCGGCTTATCGGTGACTATACCGTTCTGAAGTAAACACGGTACTGCCCGGTGGGAC
CAAGTGACAGTCGGGTACGAAAGCA
>Rosalind_0587
TTAGTTCTTGATCTGTGCCCCACTGGTCGCGCCCCCGTAATCTGGAAATCACTGGGAAGT
GCATGCACCAGTCCTCCGGCCTTTTTGA
>Rosalind_9649
TGGTTATCGGTACACGTGACTTAGATGGATTACATTTAGTTATTTTGATTTTTACAGATA
ATGCTTGCGGTTTCGACTAAGCCCGCTTTATGGTTT
>Rosalind_8442
GGGAGAGACCAAGATCGTATGCCAGGAGGATCAACGGTCGGGGACGGCACTGGGCCCGAT
GTGTGATAAGGACGTACCCCCAGCCA
>Rosalind_5632
TGTGCCGGTATATTCTGAGACCTAACCCTTATTGCAACCAAGGGCAGCTTGGTGCGCAGT
ATGTAGGCGAATTGCTTCGACCCCGTGAGTGCCCACTCA
>Rosalind_2606
CTTTGCCTCCATGCACCAGGTTACGCTATGCGTTGTGTCGCGGCGACTCCTGGGAACAGT
CGGAACCGACGTGCATCCTGAGCGA
>Rosalind_2785
CTCATGAGGCGTATCGATGCAACCGGGCTGTTGGTTTAGTGTTCAATAACGTCCCTTGTA
GTACGGCTGTCGTGTTCTCGGATTCTGGGC
>Rosalind_3031
ATTACCCTCTGCTGAGGCCTCATCCTAAACTTGAGCTACGGCTGGCCAAAGACGCCTTTG
AACGTTCAAGGTACGCACCGCAAGCATACGTAATTC
>Rosalind_3659
GACTCAAATGGTACGCGCCGCTCATCGGTTCTGGACCCGGGAGAAACCATCTTTGATCCT
AGATCATAATCGAAAGGCCGCATGATGCCCC
>Rosalind_7792
CAGTGTGATTGGAAGGCGGGCCTGCCCATGGAGCATCTGGAAACTATTGAGTAGGAGTGA
TGTTGCGTGGTTCGCCTCAGGCGACTACAAACCCATACG
>Rosalind_4911
TGTGATGCGTCAGTTTGATTTCCGTTCTATACCGTGGACGAGTTCTTTCTGAGAAATTTA
TTCTATGGACACGGGTATTTGTTGGT
>Rosalind_8039
TTTTGACCCAAGTCCGAGGAATGAACGCAAACAGTTGACAAGGCTTCTCCCGGATCCATT
CTCTAATATGTCGGAGTGCAGGCTGT
>Rosalind_8396
GGAATAGAACATAGGGAGCTGAACAATGCGCTACATCTTCACTACTCCACCCCATAGACT
GGGAATATTCGTGATTGCACCA
>Rosalind_6726
TGAGACAGGTCGTGCGAGACCTATGATTTTCGCCCCACATGCCATATCATTCGTTCCTCA
CTCCGGTCTATCTCACATGTCGT
>Rosalind_7923
AGTTTTCCTTCAGTCAGAACCTGGCTGGAAGGAATCTACAGCTCCTATACGGAGACTGAA
AAGGTTGATTCCTCATAACTAA
>Rosalind_1844
TGTATTCTCGCCTGGAGTCCGCTCGTGCCGCTGGCGTCCACATTGATTGAGAGTGGAGGG
TCTCAGGCGCACGAGTATCCAGGAACATCAC
>Rosalind_8559
CGAGGTTGACATAAGCCCTGTTCACATAATGCGCACGGGAGGATTTGTAACAGTGCCGTG
TCCCTAGCTTGAATAGGTACGAACCAG
>Rosalind_9401
GGTTCGCTTAGGGTAGTTGAGAATGATTGGGAACCAGAGAAGCGTCCACGTACTCTACTC
ACCTCGAGGGCGTACGGCTGGCCTATAGC
>Rosalind_0497
ACGACCTTTGAGTGGCTTGCATCTGGGAAATTGTATTACGAGTGCAACAATCGACACCGA
TATCAGCATGCCTATATTTTAGCTTCGGAG
>Rosalind_5614
AGAGTGGCGGTTGATTGATTAAATCGTGGTCCTAATCAAAACCATGTCCCTTAAAACACG
GGCACGGGACAGGCTTGAAAACCC
>Rosalind_2805
CGATGTTTACGTCACACCAACAGGGCTCATATGGGCGCTCGATTGGCTCTTCAGCGTAAA
CCAAACGTGATCACCCATAGCTTTGGAAAAGCCT
>Rosalind_7369
CATACAAATGCGGATTTTTCCCCCAAGACCAGTCATCACGTTGCCATCCGAATGACCTCC
TGCGTAAATCGTTGTGTCACGCCCTTGTGGCATGAG
>Rosalind_7894
ATTTTACGTGAAACTAAGGGTGGGGATGCGCTGTCGTCCTTTTTTGGCGTTATGAGCTTT
GGTTGGCGACCCGAGGTGCAAATA
>Rosalind_3462
GCTAAATTTTACCTCGAATGTGGAGGTACCTCCATGCCATGTATCCTGTCTGCCACGACT
TTATGGGGTAGGCCCGTGTTAC
>Rosalind_2975
TATAGACATTTTGGTAGCTTGTGGAATATTAAAGCGCGTGGGCGGGAGCGTGCGAACGCG
TCTCTATTCCACAGCCTCCGG
>Rosalind_9613
AGAGATAACGGCACGCGTACAAGGACTCTTAGGAAGACATCTACTTTTGTAGCTGCGATC
ACTATGACTAAGCGCCTACAGTTTACTTAA
>Rosalind_0695
GGAGGTATTCATGATCCTTAATGCGCGGCCCTTACTAAAGCGGATGGAACATGGAGCATT
AGCGTTGGATGGTTAGTAGATTGGCGCCCGCTTGCGT
>Rosalind_2833
GACTAGGTCAAGAGCTGTCTGAGAATCCCCTCCACCGCTGGATTTTCTGAGCCACTCCTG
AGACGGGTGGAGCTTCATGGATCCTTCTCGGTCTGTGCTT
>Rosalind_7384
CCCCAATAACAGTACCCGCATTACAGCACAGACCGCCTACTTGTGGTAGGCAGGACCCGT
ACGACCGGTTAGCATACCGAGGCGTGGCAG
>Rosalind_9717
AAGATTATGGACCCTTCAAGGTATAGCCGGCGGGTCTACATCTGTCTACGGGCCCCTGTA
ACCGCAATTCGAGGTAAATAG
>Rosalind_4192
TAAGTTTCGGTGCCGTACAGAATTGGATTGTCTGCCGAACCAACTTACGTGTTGACCTCT
ACCTTCCGAAATATCGTCCTACTGTTGCGTGATTTC
>Rosalind_6262
CCTCTACTGAGAAAAATTCGCCGATAGCTCGTGTCGGAAACAATTCGCACTGTAATTAGA
CCGGCTCCACGACGCAGCTCGGAGTGCAACACCCGTAG
>Rosalind_4434
ATGTACCGGTTTTCTTGCACCCTGTGTACAGAGCCAGTGGATATGATCCCCGTTACGGTC
TAGCTCGTGCAGCTAAATTAAGGAA
>Rosalind_0861
ACTGTCGCGCTCTGTTGGTGCCCCACGGGCCGGTCATTAACATACGAAGTCAAAACAAAT
GTACCATTGTTACACGGGACTTCAATCACACAAAGTT
>Rosalind_6973
GACCCAGACAATTTAGAGGACGATGAGGCAACCATAACCACGAGTGCAGGGTATAATTCA
CGAGTATTTATAAGTGGTGCGTTATCACGAC
>Rosalind_6307
TATGCGCCAATGTGGTGCACAAAAGTCAACCACCTTCTCTCTGTGGCGGCCCTATGCTCA
TAGTTCACAGACCCCTATAGGACAGGATGGTAT
>Rosalind_7153
CGCCTACCAAACTGCATCTGCTCGGCATGTTGTCATCCTATACTGAGTAGGAACACTACG
AACATCACATTGATTGTACTCAG"""

In [31]:
rosalindOut(faReal, 3)

Rosalind_1121 Rosalind_6961
Rosalind_0640 Rosalind_3453
Rosalind_0640 Rosalind_4192
Rosalind_7263 Rosalind_0426
Rosalind_7263 Rosalind_7214
Rosalind_4849 Rosalind_4827
Rosalind_4849 Rosalind_1876
Rosalind_7470 Rosalind_8039
Rosalind_3453 Rosalind_7263
Rosalind_3453 Rosalind_4878
Rosalind_3453 Rosalind_7793
Rosalind_3453 Rosalind_6726
Rosalind_4100 Rosalind_6064
Rosalind_5903 Rosalind_0955
Rosalind_5903 Rosalind_7792
Rosalind_4405 Rosalind_8472
Rosalind_4405 Rosalind_7153
Rosalind_3628 Rosalind_6961
Rosalind_8936 Rosalind_4100
Rosalind_1092 Rosalind_4100
Rosalind_9770 Rosalind_0157
Rosalind_9770 Rosalind_4732
Rosalind_9770 Rosalind_3462
Rosalind_6064 Rosalind_9717
Rosalind_9821 Rosalind_0955
Rosalind_9821 Rosalind_7792
Rosalind_4424 Rosalind_9717
Rosalind_4878 Rosalind_6972
Rosalind_4878 Rosalind_0861
Rosalind_0607 Rosalind_0587
Rosalind_8260 Rosalind_1241
Rosalind_2728 Rosalind_5892
Rosalind_2728 Rosalind_4256
Rosalind_2728 Rosalind_0878
Rosalind_7214 Rosalind_4424
Rosalind_7214 Rosali

In [None]:
# Enter path to fasta file
pfa = input("enter path to fasta file")