## Extraction of Pathways (Reactome)

REACTOME is an open-source, open access, manually curated and peer-reviewed pathway database. OuREACTOME is an open-source, open access, manually curated and peer-reviewed pathway database. Our goal is to provide intuitive bioinformatics tools for the visualization, interpretation and analysis of pathway knowledge to support basic and clinical research, genome analysis, modeling, systems biology and education. Founded in 2003, the Reactome project is led by Lincoln Stein of OICR, Peter D’Eustachio of NYULMC, Henning Hermjakob of EMBL-EBI, and Guanming Wu of OHSU.r goal is to provide intuitive bioinformatics tools for the visualization, interpretation and analysis of pathway knowledge to support basic and clinical research, genome analysis, modeling, systems biology and education. Founded in 2003, the Reactome project is led by Lincoln Stein of OICR, Peter D’Eustachio of NYULMC, Henning Hermjakob of EMBL-EBI, and Guanming Wu of OHSU.  [Source](https://reactome.org/)

This notebook takes immune system pathways from Reactome and parses its data

In [6]:
import json as json
import pandas as pd

#### Pathway Parsing

Each pathway has 3 components: RID, a name, and an associated species. We also distinguish all pathways related to humans that are related to the immune system

In [7]:
immune_pathway_rids = []
with open("adaptive_immune_system_pathways.txt", 'r') as f2:
    for line in f2:
        s1 = line.split('\n')
        immune_pathway_rids.append(s1[0])      

In [8]:
immune_pathway_rids

['R-HSA-1280218',
 'R-HSA-202403',
 'R-HSA-202427',
 'R-HSA-202430',
 'R-HSA-202433',
 'R-HSA-202424',
 'R-HSA-388841',
 'R-HSA-389356',
 'R-HSA-389513',
 'R-HSA-389948',
 'R-HSA-983705',
 'R-HSA-983695',
 'R-HSA-1168372',
 'R-HSA-5690714',
 'R-HSA-983169',
 'R-HSA-983168',
 'R-HSA-983170',
 'R-HSA-1236975',
 'R-HSA-2132295',
 'R-HSA-198933',
 'R-HSA-392517',
 'R-HSA-8851680',
 '',
 'R-HSA-168249',
 'R-HSA-168898',
 'R-HSA-1679131',
 'R-HSA-168142',
 'R-HSA-975871',
 'R-HSA-168164',
 'R-HSA-168176',
 'R-HSA-975871',
 'R-HSA-168181',
 'R-HSA-975155',
 'R-HSA-168138',
 'R-HSA-166016',
 'R-HSA-166020',
 'R-HSA-181438',
 'R-HSA-168179',
 'R-HSA-168188',
 'R-HSA-5686938',
 'R-HSA-166658',
 'R-HSA-166663',
 'R-HSA-166786',
 'R-HSA-173736',
 'R-HSA-174577',
 'R-HSA-166665',
 'R-HSA-977606',
 'R-HSA-168638',
 'R-HSA-622312',
 'R-HSA-844456',
 'R-HSA-844455',
 'R-HSA-844623',
 'R-HSA-844615',
 'R-HSA-879415']

In [9]:
HUMAN_IMMUNE = []
with open("ReactomePathways.txt",'r') as f1:
    for line in f1:
        sl = line.split("\t")
        RID = sl[0]
        name = sl[1]
        species = sl[2]
        if species[-1] == "\n":
            species = species[0:-1]

        if species == "Homo sapiens" and immune_pathway_rids.count(RID) > 0:
            HUMAN_IMMUNE.append({"RID":RID, "name":name, "species":species})          

In [10]:
len(HUMAN_IMMUNE)

52

In [11]:
df = pd.DataFrame(HUMAN_IMMUNE)

In [12]:
df.head()

Unnamed: 0,RID,name,species
0,R-HSA-174577,Activation of C3 and C5,Homo sapiens
1,R-HSA-1280218,Adaptive Immune System,Homo sapiens
2,R-HSA-879415,Advanced glycosylation endproduct receptor sig...,Homo sapiens
3,R-HSA-173736,Alternative complement activation,Homo sapiens
4,R-HSA-983170,"Antigen Presentation: Folding, assembly and pe...",Homo sapiens


#### Creates pathway dictionary
- A dictionary for each pathway is created
- In the format:
                        [{"rid": XXXX,\
                          "name" : XXXX,\
                          "species": XXXX}]
- A list of dictionaries (aka list of pathways) is written to a file
- Two dictionaries are written, one for all pathways and one for pathways only in humans

In [13]:
immunePathways = []
for r,n,s in zip(df['RID'],df['name'], df['species']):
    immunePathways.append({"rid":r, "name":n.lower(), "species":s.lower()})

In [14]:
immunePathways[0]

{'name': 'activation of c3 and c5',
 'rid': 'R-HSA-174577',
 'species': 'homo sapiens'}

In [15]:
with open("immune_pathway_dict.json", 'w') as pd:
    json.dump(immunePathways, pd)

#### Create CSV file

In [16]:
df.set_index('RID')
df.head()

Unnamed: 0,RID,name,species
0,R-HSA-174577,Activation of C3 and C5,Homo sapiens
1,R-HSA-1280218,Adaptive Immune System,Homo sapiens
2,R-HSA-879415,Advanced glycosylation endproduct receptor sig...,Homo sapiens
3,R-HSA-173736,Alternative complement activation,Homo sapiens
4,R-HSA-983170,"Antigen Presentation: Folding, assembly and pe...",Homo sapiens


In [17]:
df.to_csv("immune_system_pathways.csv")