<a href="https://colab.research.google.com/github/nitrozyna/Rosalind/blob/master/12_grph.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Problem description:
[Overlap Graphs](http://rosalind.info/problems/grph/
)

A graph whose nodes have all been labeled can be represented by an **adjacency list**, in which each row of the list contains the two node labels corresponding to a unique edge.

A **directed graph** (or digraph) is a graph containing **directed edges**, each of which has an orientation. That is, a directed edge is represented by an arrow instead of a line segment; the starting and ending nodes of an edge form its **tail** and **head**, respectively. The directed edge with tail v and head w is represented by (v,w) (but not by (w,v)). A **directed loop** is a directed edge of the form (v,v).

For a collection of strings and a positive integer k, the **overlap graph** for the strings is a directed graph Ok in which each string is represented by a node, and string s is connected to string t with a directed edge when there is a length k **suffix** of s that matches a length k **prefix** of t, as long as s≠t; we demand s≠t to prevent directed loops in the overlap graph (although directed cycles may be present).
---

### Given: A collection of DNA strings in FASTA format having total length at most 10 kbp.

### Return:  The adjacency list corresponding to O3. You may return edges in any order.

Sample Dataset

>Rosalind_0498
AAATAAA
>Rosalind_2391
AAATTTT
>Rosalind_2323
TTTTCCC
>Rosalind_0442
AAATCCC
>Rosalind_5013
GGGTGGG

Sample Output

Rosalind_0498 Rosalind_2391
Rosalind_0498 Rosalind_0442
Rosalind_2391 Rosalind_2323



In [0]:
#@title Importing some modules to make a connection between Colab and Drive to download the current dataset
!pip install PyDrive
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)


In [0]:
#@title Loading test dataset
fileID = "1QAXg4Gv3NLAjd9fktPol1Lm9awSRD_Qg" #@param {type:"string"}
downloaded = drive.CreateFile({'id':fileID})
downloaded.GetContentFile('rosalind_grph.txt')  # replace the file name with your file

In [0]:
#@title Manipulating the file to give a dictionary of read names and DNA strings
with open('rosalind_grph.txt','r') as f:
    reads = {}
    for line in f:
        line = line.strip()
        if line.startswith(">"):
            reads[line[1:]] = ""
            mem_line = line[1:]
        else:
            reads[mem_line] += line
    overlapGraph(list(reads.values()))

In [0]:
#@title Making a dictionary of suffix-prefix pairs
def overlapGraph(reads):
    suffix_dict = {}
    for read in reads:
        if read in suffix_dict:
            pass
        else:
            suffix_dict[read] = []
    for read in reads:
        for key in suffix_dict.keys():
            if read[:3] == key[-3:]:
                suffix_dict[key].append(read)
    return suffix_dict
sp_dict = overlapGraph(list(reads.values()))

In [25]:
#@title Pretty printing all the read pairs
for k,v in sp_dict.items():
    if v != []:
        for name,dna in reads.items():
            if k == dna:
                prefix = name   
                for item in v:
                    for kk,vv in reads.items():
                        if item == vv:
                            suffix = kk
                            print(prefix,suffix)
                            break

Rosalind_7516 Rosalind_1005
Rosalind_7516 Rosalind_5466
Rosalind_3433 Rosalind_1619
Rosalind_6325 Rosalind_8840
Rosalind_6325 Rosalind_8550
Rosalind_3942 Rosalind_1619
Rosalind_8007 Rosalind_5933
Rosalind_8007 Rosalind_8910
Rosalind_8007 Rosalind_6014
Rosalind_8007 Rosalind_5835
Rosalind_8007 Rosalind_9494
Rosalind_3560 Rosalind_6885
Rosalind_3560 Rosalind_1557
Rosalind_3560 Rosalind_6880
Rosalind_5933 Rosalind_4959
Rosalind_5933 Rosalind_3306
Rosalind_5933 Rosalind_9094
Rosalind_5933 Rosalind_7549
Rosalind_2697 Rosalind_5933
Rosalind_2697 Rosalind_8910
Rosalind_2697 Rosalind_6014
Rosalind_2697 Rosalind_5835
Rosalind_2697 Rosalind_9494
Rosalind_5822 Rosalind_6499
Rosalind_5822 Rosalind_6263
Rosalind_5822 Rosalind_6400
Rosalind_3967 Rosalind_8922
Rosalind_4959 Rosalind_9511
Rosalind_4959 Rosalind_3351
Rosalind_9511 Rosalind_8840
Rosalind_9511 Rosalind_8550
Rosalind_2021 Rosalind_8760
Rosalind_2021 Rosalind_6256
Rosalind_1557 Rosalind_6325
Rosalind_1557 Rosalind_2010
Rosalind_1557 Rosali