In [1]:
import pandas as pd
import cobra
from pickle import load

In [2]:
# load the dataframe of important features
imp_frame = pd.read_csv('../results/ensemble_learning_important_reactions.csv',sep=',')

# load the current version of the reconstruction that was used to generate the ensemble
model = cobra.io.read_sbml_model('../results/v4_with_all_annotations.xml')

# read the ensemble so we can pull the gapfilled reactions to add to the draft reconstruction
with open("../results/psy_ensemble_500_SEED_biomass.pickle",'rb') as infile:
            ensemble = load(infile)

In [3]:
imp_frame.head(20)

Unnamed: 0.1,Unnamed: 0,importance,fraction active in 0,fraction active in 1
0,rxn02276_c_upper_bound,0.027794,0.570213,0.0
1,rxn09072_c_lower_bound,0.023573,0.353191,0.973585
2,rxn00993_c_lower_bound,0.02304,0.570213,0.0
3,rxn02276_c_lower_bound,0.021274,0.570213,0.0
4,rxn00993_c_upper_bound,0.020903,0.570213,0.0
5,rxn09072_c_upper_bound,0.018565,0.353191,0.973585
6,rxn21635_c_upper_bound,0.015912,0.395745,0.022642
7,rxn10878_c_lower_bound,0.015222,0.131915,0.645283
8,rxn01418_c_upper_bound,0.015014,0.442553,0.003774
9,rxn01827_c_upper_bound,0.014955,0.425532,0.0


First, we'll curate rxn02276. This is 4-Maleylacetoacetate cis-trans-isomerase, which transforms 4-Maleylacetoacetate into Fumarylacetoacetate. It is annotated throughout BiGG, KEGG, and MetaCyc. A search on uniprot shows that P. syringae pv. tomato DC3000 has a predicted gene encoding the enzyme for this function, maiA/Maleylacetoacetate isomerase (https://www.uniprot.org/uniprot/Q87Z76). Let's add this function to the draft model.

In [4]:
rxn1 = ensemble.base_model.reactions.get_by_id('rxn02276_c').copy()
rxn1.gene_reaction_rule  = ' PSPTO_3554 '
rxn1.notes = {'ensemble_curation_step':1}
model.add_reactions([rxn1])

The next most important reaction is rxn09072, Pyridoxine 5'-phosphate synthase. The reaction is annotated in UniProt as a homology-inferred match for P. syringae sv. tomato DC3000: https://www.uniprot.org/uniprot/Q87XG4. Based on this, we'll add the function to the draft model.

In [5]:
rxn2 = ensemble.base_model.reactions.get_by_id('rxn09072_c').copy()
rxn2.gene_reaction_rule = ' PSPTO_4214 '
rxn2.notes = {'ensemble_curation_step':1}
model.add_reactions([rxn2])

Working down the list, let's investigate rxn00993, 4-fumarylacetoacetate fumarylhydrolase. This gene/enzyme is predicted in UniProt (https://www.uniprot.org/uniprot/Q87UK4). Let's add it.

In [6]:
rxn3 = ensemble.base_model.reactions.get_by_id('rxn00993_c').copy()
rxn3.gene_reaction_rule = ' PSPTO_5292 '
rxn3.notes = {'ensemble_curation_step':1}
model.add_reactions([rxn3])

The next target is rxn21635, pyridoxal 5'-phosphate synthase (glutamine hydrolyzing). This enzyme has two subunits, encoded by pdxS and pdxT in bacteria. Pseudomonas aeruginosa has these subunits annotated on uniport via homology modeling (https://www.uniprot.org/uniprot/A0A2X4FLN3 and https://www.uniprot.org/uniprot/A0A2X4F5U0) but BLASTing these sequences against the P. syringae sv. tomato DC3000 genome yields no hits. A literature search yields no studies on this reaction in P. syringae, so we will assume that the reaction is NOT present.

Next, rxn10878, Pyruvate Oxidase. This enzyme is predicted in UniProt for P. syringae sv. tomato (no information for DC3000 strain) (https://www.uniprot.org/uniprot/A0A2R3FA08), and BLASTing the sequence against the DC3000 genome yields a 99% identity hit which is a pyruvate dehydrogenase (https://www.uniprot.org/uniprot/Q882W2), so it seems like the pyruvate oxidase function is promiscuous for PSPTO_2510 (RefSeq NP_792322.1). This particular function is interesting because DC3000 lacks lactate dehydrogenase necessary to convert pyruvate to lactate (https://www.pnas.org/content/100/18/10181), so an alternative pathway for the breakdown of pyruvate may exist instead. Additionally, infection of arabidopsis with P. syringae pv. tomato DC3000 leads to increased pyruvate levels in infected leaves (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC59291/). Based on the predicted function and the contextual support for a role for alternative pyruvate metabolism.

In [7]:
rxn4 = ensemble.base_model.reactions.get_by_id('rxn10878_c').copy()
rxn4.gene_reaction_rule = ' PSPTO_2510 '
rxn4.notes = {'ensemble_curation_step':1}
model.add_reactions([rxn4])

Next, rxn01418, 2-oxoadipate dehydrogenase complex. This reaction is catalyzed by the 2-oxoglutarate dehydrogenase complex, a large, highly promiscuous complex (https://biocyc.org/META/NEW-IMAGE?type=REACTION-IN-PATHWAY&object=2-KETO-ADIPATE-DEHYDROG-RXN). Three subunits, E1, E2, and E3 make up the complex. PST has annotations for all three (https://www.genome.jp/dbget-bin/www_bget?pst:PSPTO_2199 E1, https://www.genome.jp/dbget-bin/www_bget?pst:PSPTO_2200 E2, https://www.genome.jp/dbget-bin/www_bget?pst:PSPTO_2201 E3), so we will add the reaction.

In [8]:
rxn5 = ensemble.base_model.reactions.get_by_id('rxn01418_c').copy()
rxn5.gene_reaction_rule = ' (PSPTO_2199 and PSPTO_2200 and PSPTO_2201) '
rxn5.notes = {'ensemble_curation_step':1}
model.add_reactions([rxn5])

In [9]:
cobra.io.write_sbml_model(model,'../results/model_post_ensemble_curation_round1.xml')

In [10]:
#http://modelseed.org/biochem/reactions/rxn01827 - Phil

In [11]:
#http://modelseed.org/biochem/reactions/rxn01423 - Phil