# 4.5 Gene essentiality & biolog correction

During the first round of gene essentiality prediction, several errors in the model were found. These are corrected in this notebook.
- proA/B
- argI/F

In [1]:
import cobra
from cobra.io import load_json_model
import pandas as pd

In [2]:
# Load EcN model
EcN_ID = 'CP022686.1'
EcN_model = cobra.io.load_json_model('../data/models/%s_cur_4.4.json'%EcN_ID)

# Load the tables with information on origin of genes & reactions
rxn_origin_df = pd.read_csv('../tables/rxn_origin.csv')
rxn_origin_df.set_index('reaction', inplace=True)
rxn_origin_df.head()

gene_origin_df = pd.read_csv('../tables/gene_origin.csv')
gene_origin_df.set_index('gene', inplace=True)

### _proA/B_

- proB: glutamate 5-kinase. https://ecocyc.org/gene?orgid=ECOLI&id=EG10768
- proA: glutamate-5-semialdehyde dehydrogenase. https://ecocyc.org/gene?orgid=ECOLI&id=EG10767

proA/B encode for respectively glutamate-5-semialdehyde dehydrogenase and glutamate 5-kinase. Lack of these enzymes prohibit the formation of L-Glutamate 5-semialdehyde. The EcN model has an additional reaction (NACODA - N-acetylornithine deacetylase) for the formation of this compound and acetate from N-Acetyl-L-glutamate 5-semialdehyde. However, this reaction cannot be found in the databases EcoCyc and KeGG, nor in literature, and will therefore be removed. The gene is also linked to another reaction and can stay in the model.

In [3]:
# Remove NACODA
EcN_model.remove_reactions(['NACODA'])

In [4]:
# Edit in reaction overview
rxn_origin_df.loc['NACODA', 'added'] = 'removed'
rxn_origin_df.loc['NACODA', 'notes'] = 'No evidence for existence in literature, EcoCyc & KeGG. Removed in 4.5'

### _argI/F_

- argI/argF: ornithine carbamoyltransferase. https://ecocyc.org/gene?orgid=ECOLI&id=EG10069

EcoCyc: _"E. coli K-12 contains two structural genes, argF and argI, encoding ornithine carbamoyltransferase, both of which catalyze the sixth step of arginine biosynthesis."_

EcN also contains two genes, both annotated in the genome as _argF_ (CIW80_16625 and CIW80_16605). The two K12 genes (b0273 & b4254) both have the highest homology to the CIW80_16625 gene and were both linked to this gene in the EcN model. As a result, the 'OCBT' reaction was dependent on 2x this gene, due to which knockout resulted in complete loss of function. Therefore the _argI_ gene will also be added to the model.

In [5]:
# Add the argI gene
EcN_model.reactions.OCBT.gene_reaction_rule = '( CIW80_16625 or CIW80_16605 )'
EcN_model.genes.CIW80_16605.name = 'argI'

# Add to reaction and gene overview
rxn_origin_df.loc['OCBT', 'notes'] = 'CIW80_16605 was added as additional gene in the GPR in notebook 4.5'

gene_origin_df.loc['b4254', 'EcN_gene'] = 'CIW80_16605'
gene_origin_df.loc['b4254', 'origin'] = 'iML1515'
gene_origin_df.loc['b4254', 'added'] = 'manual'
gene_origin_df.loc['b4254', 'notebook'] = '4.5'
gene_origin_df.loc['b4254', 'notes'] = 'Added as argI in notebook 4.5'

### Sucrose catabolism
EcN does not have the genes for sucrose catabolism 
https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0053957

Remove reactions associated with sucrose catabolism
- SUCtpp
- SUCptspp
- SUCR
- FFSD

In [6]:
# Remove sucrose metabolism genes
for rxn in ['SUCtpp', 'SUCptspp', 'SUCR', 'FFSD']:
    EcN_model.remove_reactions([rxn])

    # Edit in reaction overview
    rxn_origin_df.loc[rxn, 'added'] = 'removed'
    rxn_origin_df.loc[rxn, 'notes'] = 'EcN does not have sucrose catabolism. Removed in 4.5'

### Save the adapted model

In [7]:
# Save the model
cobra.io.json.save_json_model(EcN_model, str('../data/models/%s_cur_4.5.json'%EcN_ID), pretty=False)

In [8]:
# Save as a table
rxn_origin_df.to_csv('../tables/rxn_origin.csv')
gene_origin_df.to_csv('../tables/gene_origin.csv')