The notebook is to genreate a list of reactions that have GPRs in the draft reconstruction, but are not present in the current version of the model. The idea is that if there is strong genomic evidence, aka an annotation already in Kegg, that fufills the acribed reaction in the draft reconstruction, it should also be included in the curated reconstruction. 

In [4]:
#this is to parse through the draft reconstruction to get all of the original gene associated proposed by modelseed
import cobra
import bs4
from bs4 import BeautifulSoup

In [5]:
with open("../data/previous_data_from_curation/PSY_02.sbml") as fp:
    soup = BeautifulSoup(fp)

In [70]:
draft_gprs = {}
for reaction_tag in soup.listofreactions:
    string_list = str(reaction_tag).splitlines()
    for x in string_list:
        if "reaction id" in x:
            y = x[16:24]
        if "GENE_ASSOCIATION" in x:
            z = x[25: -9]
        draft_gprs[y] = z

rxn02201
(PSPTO_4496 or (PSPTO_1041 or PSPTO_1748))
rxn00351
PSPTO_5035
rxn00836
PSPTO_1131
rxn00390
PSPTO_3295
rxn08180
PSPTO_0494
rxn00062
(PSPTO_1349 or PSPTO_1455)
rxn00423
(PSPTO_5178 or PSPTO_1421)
rxn00364
PSPTO_1749
rxn05561
(PSPTO_1051 or PSPTO_4063)
rxn03505
PSPTO_3668
rxn03408
PSPTO_4408
rxn02177
(PSPTO_1354 or PSPTO_1356 or PSPTO_1859 or PSPTO_2039)
rxn05250
(PSPTO_4685 or PSPTO_2991 or PSPTO_4097 or PSPTO_4098)
rxn00646
PSPTO_0325
rxn05440
PSPTO_3721
rxn00935
PSPTO_1136
rxn01673
PSPTO_1430
rxn00247
PSPTO_0239
rxn05625
PSPTO_2304
rxn02342
(PSPTO_5005 or PSPTO_3860 or PSPTO_3862)
rxn00642
PSPTO_1005
rxn00973
(Unknown and Unknown and PSPTO_3752 and PSPTO_3752 and PSPTO_2016)
rxn02000
(PSPTO_1077 or PSPTO_5397)
rxn03239
((PSPTO_3517 and Unknown) or PSPTO_3517)
rxn05063
PSPTO_1930
rxn05457
(PSPTO_3833 or PSPTO_5081)
rxn04271
(PSPTO_4103 or PSPTO_2176)
rxn13783
(PSPTO_0656 and PSPTO_3658 and PSPTO_3826 and PSPTO_3514 and PSPTO_1549 and PSPTO_4814 and PSPTO_1812 and PSPTO_5516 an

In [85]:
#generate a dictionary of reactions that are in draft but not in the current state of the model, this will be used for comparison and further curation
pst_8 = cobra.io.read_sbml_model('../results/reconstructions/pstv8.xml')
missing_from_v8 = {}
for rxn, gene in draft_gprs.items():
    rxn = rxn + "_c"
    try:
        pst_8.reactions.get_by_id(rxn)
    except:
        missing_from_v8[rxn] = gene
        

In [88]:
with open('../data/missing_reactions_draft_v_v8.csv', 'w') as f:
    for key in missing_from_v8.keys():
        f.write("%s,%s\n"%(key,missing_from_v8[key]))

The final output is a list of reactions that do not appear in the v8 of iPto yet have GPRs in the draft reconstruction. These are manually checked by looking at the GPR and seeing if the ascribed gene fufills the function or the reaction. If yes, it was given a 1, if no, or there is otherwise no substantiated annnotation, it was given a 0. All 1s were integrated back into the model, while all 0s were left out in the following cell. 

In [106]:
import pandas as pd
missing_rxns = pd.read_excel("../data/missing_reactions_draft_v_v8.xlsx")
gprs_to_add_to_v8 = {}
for row in missing_rxns.itertuples():
    if row.evidence == 1:
        gprs_to_add_to_v8[row.id] = row.gpr

In [109]:
from medusa.test import load_universal_modelseed
universal = load_universal_modelseed()
mod_copy = pst_8.copy()


In [112]:
for key,value in gprs_to_add_to_v8.items():
    try:
        rxn_to_add = universal.reactions.get_by_id(key).copy()
        rxn_to_add.gene_reaction_rule = value
        print ("I will now add " + key + " to the model.")
        mod_copy.add_reactions([rxn_to_add])
    except:
        print(key + " did not appear in the universal model. I did not add " + key + " to the model.")

I will now add rxn06600_c to the model.
I will now add rxn00512_c to the model.
I will now add rxn05343_c to the model.
I will now add rxn05344_c to the model.
I will now add rxn05347_c to the model.
I will now add rxn12634_c to the model.
I will now add rxn12636_c to the model.
I will now add rxn12639_c to the model.
I will now add rxn12642_c to the model.
I will now add rxn12847_c to the model.
I will now add rxn12633_c to the model.
I will now add rxn12645_c to the model.
I will now add rxn12646_c to the model.
I will now add rxn12846_c to the model.
I will now add rxn01225_c to the model.
I will now add rxn05549_c to the model.
I will now add rxn00205_c to the model.
I will now add rxn00302_c to the model.
I will now add rxn00641_c to the model.
I will now add rxn12637_c to the model.
I will now add rxn12640_c to the model.
I will now add rxn12643_c to the model.
I will now add rxn12845_c to the model.
I will now add rxn00758_c to the model.
I will now add rxn01073_c to the model.


In [116]:
print (str(len(mod_copy.genes)) + " genes are in the updated model, while " + str(len(pst_8.genes)) + " are in the older version of the model.")

1549 genes are in the updated model, while 1208 are in the older version of the model.


In [138]:
import csv

#this is the chuck of code that should be used for exporting the model as a csv file
filevar = 'PSTv9'
filepath = "../results/reconstructions/"
filename = filepath+filevar+".csv"
with open(filename, 'w', newline='') as csvfile:
    csvfile = csv.writer(csvfile, delimiter=',')
    for rxn in mod_copy.reactions:
        csvfile.writerow([rxn.id, rxn.name, rxn.reaction, rxn.gene_reaction_rule,rxn.lower_bound, rxn.upper_bound])