In [4]:
import pandas as pd
import cobra
from pickle import load

In [5]:
# load the dataframe of important features
imp_frame = pd.read_csv('../results/ensemble_learning_important_reactions_round2.csv',sep=',')

# load the current version of the reconstruction that was used to generate the ensemble
model = cobra.io.read_sbml_model('../results/reconstructions/model_post_ensemble_curation_round1.xml')

# read the ensemble so we can pull the gapfilled reactions to add to the draft reconstruction
with open("../results/ensembles/psy_ensemble_500_SEED_biomass_round2.pickle",'rb') as infile:
            ensemble = load(infile)

In [6]:
imp_frame.head(20)

Unnamed: 0.1,Unnamed: 0,importance,fraction active in 0,fraction active in 1
0,rxn03005_c_upper_bound,0.111398,0.0,1.0
1,rxn03005_c_lower_bound,0.107421,0.0,1.0
2,rxn00688_c_lower_bound,0.07173,0.083333,1.0
3,rxn01211_c_lower_bound,0.068848,0.0,0.909091
4,rxn01211_c_upper_bound,0.065563,0.0,0.909091
5,rxn00835_c_lower_bound,0.062994,0.916667,0.0
6,rxn00688_c_upper_bound,0.062036,0.083333,1.0
7,rxn00157_c_lower_bound,0.056756,0.916667,0.0
8,rxn00835_c_upper_bound,0.054024,0.916667,0.0
9,rxn00157_c_upper_bound,0.038861,0.916667,0.0


First, we'll curate a few reactions from the previous round that we finished curating after starting the second round of ensemble generation. The first is rxn01423, L-2-aminoadipate:2-oxoglutarate aminotransferase. There is evidence that this exists in Pseudomonas putida (doi: 10.1128/JB.187.21.7500–7510.2005); there are two candidates that match in the P. syringae sv. tomato DC3000 genome, PSPTO_4775 (https://www.uniprot.org/uniprot/Q87W08) and PSPTO_5504 (https://www.uniprot.org/uniprot/Q87U11). Both are GntR transcriptional regulators that also have aminotransferase activity (MocR family, which transfers amino groups to keto-acid receptors), there is significant sequence similarity to a Pseudomonas florescens 2-aminoadipate aminotransferase (Uniprot accession P florescens: A0A0K1QLZ8). BLAST of the two putative P. syringae sv tomato DC3000 genes suggests the aminotransferase domain is conserved, but the proteins are ~100AA different in length. We hypothesize that a duplication event led to insertion of the aminotransferase in a transcription factor gene, so we will add both genes within an OR relationship for this reaction.

In [7]:
rxn1 = ensemble.base_model.reactions.get_by_id('rxn01423_c').copy()
rxn1.gene_reaction_rule  = 'PSPTO_4775 or PSPTO_5504'
rxn1.notes = {'ensemble_curation_step':2}
model.add_reactions([rxn1])

The next reaction is rxn03005, which is part of the one carbon pool by folate. It is named as an N2-FormylN1-(5-phospho-D-robosyl)glycinamide formyltransferase. This reaction is already annotated with the EC number 2.1.2.2, PSPTO_1468 and PSPTO_1699.

In [8]:
rxn1 = ensemble.base_model.reactions.get_by_id('rxn03005_c').copy()
rxn1.gene_reaction_rule  = 'PSPTO_1468 or PSPTO_1699'
rxn1.notes = {'ensemble_curation_step':2}
model.add_reactions([rxn1])

Next reaction is rxn00688. This reaction is also part of the one carbon pool by folate, and formally known as 10-formyl tetrahydrofolate:NADPH oxidoreductase EC 1.5.1.6. There are flavin reductases that match the potential function of this reaction in the PST genome, however, it seems that this would actually fall under the similar genes that PAO1 has, of which there are PSPTO_4866, PSPTO_1468, PSPTO_1699, or PSPTO_0178.

In [9]:
rxn1 = ensemble.base_model.reactions.get_by_id('rxn00688_c').copy()
rxn1.gene_reaction_rule  = 'PSPTO_1468 or PSPTO_1699 or PSPTO_4866 or PSPTO_0178'
rxn1.notes = {'ensemble_curation_step':2}
model.add_reactions([rxn1])

The next reaction is rxn01211, characterized by the enzyme 5,10-Methenyltetrahydrofolate 5-hydrolase, functioning as a decyclase. Annotated is PSPTO_2453.

In [10]:
rxn1 = ensemble.base_model.reactions.get_by_id('rxn01211_c').copy()
rxn1.gene_reaction_rule  = 'PSPTO_2453'
rxn1.notes = {'ensemble_curation_step':2}
model.add_reactions([rxn1])

rxn00835, represented by ATP:inosine 5'-phosphotransferase. While the reaction stated does not function directly, there is evidence that this is achived through a three step pathway, summarized by the genes : PSPTO_2022 and PSPTO_1131

In [11]:
rxn1 = ensemble.base_model.reactions.get_by_id('rxn00835_c').copy()
rxn1.gene_reaction_rule  = 'PSPTO_2022 and PSPTO_1131'
rxn1.notes = {'ensemble_curation_step':2}
model.add_reactions([rxn1])

rxn00157: Acetyl-CoA:formate C-acetyltransferase, this does not exist in PST based on genomic evidence

rxn05316: Previous evidence suggests that pst is able to use inosine as a sole carbon source, validating uptake.
(Rico and Preston 2008)

In [12]:
rxn1 = ensemble.base_model.reactions.get_by_id('rxn05316_c').copy()
rxn1.gene_reaction_rule  = 'Unknown'
rxn1.notes = {'ensemble_curation_step':2}
model.add_reactions([rxn1])

rxn13150 : 5-formyltetrahydrofolate cyclo-ligase, EC:3.5.4.9 (PSPTO_2453 or PSPTO_3733) and 1.5.1.5 (same genes) and 1.5.1.20 (PSPTO_5069)

In [17]:
rxn1 = ensemble.base_model.reactions.get_by_id('rxn13150_c').copy()
rxn1.gene_reaction_rule  = 'PSPTO_2453 and PSPTO_3733 and PSPTO_5069'
rxn1.notes = {'ensemble_curation_step':2}
model.add_reactions([rxn1])

Ignoring reaction 'rxn13150_c' since it already exists.


rxn10663 : Fatty acid oxidation, this is a biosynthetic reaction, in that it is a whole pathway that is combined into one reaction, not annotatable

rxn10336 : Cardiolipin synthase, same as above, however can be annotatted, PSPTO_3503 and PSPTO_5530.

In [16]:
rxn1 = ensemble.base_model.reactions.get_by_id('rxn10036_c').copy()
rxn1.gene_reaction_rule  = 'PSPTO_3503 and PSPTO_5530'
rxn1.notes = {'ensemble_curation_step':2}
model.add_reactions([rxn1])

Ignoring reaction 'rxn10036_c' since it already exists.


In [18]:
cobra.io.write_sbml_model(model,'../results/reconstructions/model_post_ensemble_curation_round2.xml')

Error encountered trying to <set model history>.
LibSBML error code -5: The object passed as an argument to the method is not of a type that is valid for the operation or kind of object involved. For example, handing an invalidly-constructed ASTNode to a method expecting an ASTNode will result in this error.


rxn09997: not a lot of annotation on this reaction, so not too much to go by to make a call.

rxn19443: glycoaldehyde synthase, ribulose 1-p mutase, not much else to make a call.