# Using the pipeline together with COBRApy functions to studying auxotrophy

In this tutorial the pipeline will suggest the reactions to knock-in to allow _E. coli_ auxotrophic for Tryptophan to grow on methane and produce Arginine. 

It is already known that *E. coli* can't grow on methane thus reactions should be added to give the model this functionality. So the approach is to:

1. make the model iML1515 auxotrophic for Tryptophan
2. add the Trp back to the medium to restore growth
3. on these conditions run the analysis for growth on methane and production of Arginine


## 1) Make the model auxotrophic for Trp

For this some previous research is needed:
- It is important to find out which genes are usually knocked-out to make a stain of *E. coli* auxotrophic for Trp
- Then the correspondant reaction(s) that is catalyzed by the enzyme encoded by the gene(s) have to be identified. For this  the [BiGG database](http://bigg.ucsd.edu/) is of help. Typing the gene in there allow you to find the reactions that are associated to it.
For instance in the following example, it was known that trpC is the gene that is commonly knocked-out to make *E. coli* auxotrophic for Trp, therefore trpc was typed in the BiGG search bar and the first lines of the results are the following

<img src="./images/aux1.png" alt="IGPS reaction search" title="Bigg" width="500" height="100" />

The model that is being used (i.e iML1515) is not included in the result, however, the reaction accociated to trpC in iML1515 can be found by following the link of the gene. In this case the first solution was clicked and the following is the information on gene reaction association:

<img src="./images/aux2.png" alt="IGPS reaction search" title="Bigg" width="500" height="100" />

Then clicking in one of the two reactions you could find on the right the list of the models in which it is found (see the red arro in the picture below).

<img src="./images/aux3.png" alt="IGPS reaction search" title="IGPS page & models" width="500" height="100" />

- Once the reaction(s) correspondant to the gene KO is identified, those should be removed from the model which should not grow in a normal medium composition



In [1]:
from pipeline_package import import_models, input_parser, analysis
import cobra

In [2]:
data_repo = "../inputs"
model_aux = import_models.get_reference_model(data_repo, '../inputs/ecoli_ch4_arg.csv')
universal = import_models.get_universal_main(data_repo, '../inputs/ecoli_ch4_arg.csv')

In [3]:
trpC = model_aux.reactions.IGPS

In [4]:
trpC

0,1
Reaction identifier,IGPS
Name,Indole-3-glycerol-phosphate synthase
Memory address,0x07f818fed2b50
Stoichiometry,2cpr5p_c + h_c --> 3ig3p_c + co2_c + h2o_c  1-(2-Carboxyphenylamino)-1-deoxy-D-ribulose 5-phosphate + H+ --> C'-(3-Indolyl)-glycerol 3-phosphate + CO2 CO2 + H2O H2O
GPR,b1262
Lower bound,0.0
Upper bound,1000.0


In [5]:
growth_wt = model_aux.optimize()

In [6]:
growth_wt.objective_value

0.8769972144269748

In [7]:
growth_wt.fluxes['IGPS']

0.04985115265967252

In [8]:
for i in model_aux.reactions:
    if i.flux <= -0.5 and "EX_" in i.id:
        print(i.id, i.reaction, i.flux)

EX_pi_e pi_e <=>  -0.8459567750194775
EX_glc__D_e glc__D_e <=>  -10.0
EX_nh4_e nh4_e <=>  -9.471495371048015
EX_o2_e o2_e <=>  -22.131763238945897


In [9]:
for i in model_aux.reactions:
    if i.flux >= 0.5 and "EX_" in i.id:
        print(i.id, i.reaction, i.flux)

EX_co2_e co2_e <=>  24.003293272976023
EX_h_e h_e <=>  8.058200328043572
EX_h2o_e h2o_e <=>  47.1623648086943


In [10]:
model_aux.remove_reactions([trpC])

In [11]:
growth_ko = model_aux.optimize()

In [12]:
growth_ko.objective_value

-1.7407950579484187e-33

###### Considerations
- Eliminating the reactions associated to trpC causes no growth on the wild type carbon source (glucose)
- Adding the amino acid, Trp in this case, to the medium should restore growth

## 2) Adding Trp to the medium to restore growth

In [13]:
trpgex = model_aux.reactions.EX_trp__L_e

In [14]:
trpgex.bounds

(0.0, 1000.0)

In [15]:
trpgex.lower_bound = -0.05 #inverse of the flux through the reacion KO in the wt model

In [16]:
growth_ms = model_aux.optimize()

In [17]:
growth_ms.objective_value

0.8796157838256256

## 3) Using the function of the pipeline to find out which reactions should be added

*E. coli* can't grow on methane, and the strain auxotrophic for Trp can't grow on it either. Therefore GapFilling is needed to find out possible reaction addition.  

In [18]:
input_parser.parser('../inputs/ecoli_ch4_arg.csv', universal, model_aux)

For ch4 there isn't any uptake trasnsporter in the reference model
The trasporter has been added to the 
                                reference model from the input file

For arg__L there is already a transport reaction allowing the uptake from the periplasmic space:  
Reaction ID:  ARGAGMt7pp 
Reaction equation:  agm_c + arg__L_p <=> agm_p + arg__L_c

The following reactions will be added to the universal model: ['R01142', 'R01143', 'MMO1', 'MMO2']
unknown metabolite 'ch4_c' created
unknown metabolite 'focytcc_c' created
unknown metabolite 'ficytcc_c' created


0,1
Name,iML1515
Memory address,0x07f8190864af0
Number of metabolites,1877
Number of reactions,2712
Number of groups,0
Objective expression,1.0*BIOMASS_Ec_iML1515_core_75p37M - 1.0*BIOMASS_Ec_iML1515_core_75p37M_reverse_35685
Compartments,"cytosol, extracellular space, periplasm"


In [19]:
consumption = analysis.analysis_gf_sol('../inputs/ecoli_ch4_arg.csv', model_aux, universal)

Old biomass (objective) bounds =  (0.0, 1000.0)
EX_pi_e pi_e <=>  -0.8484826627728355 

EX_glc__D_e glc__D_e <=>  -10.0 

EX_nh4_e nh4_e <=>  -9.399775697974992 

EX_o2_e o2_e <=>  -22.59369491406178 

New biomass (objective) bounds =  (0.0, 0.879615783825625)

ch4 is in the medium 

Exchange ch4:  ch4_e -->  Old bounds:  (0.0, 1000.0)
Exchange ch4:  ch4_e <--  New bounds:  (-1000, 0)

arg__L is in the medium 

Exchange arg__L:  arg__L_e -->  Old bounds:  (0.0, 1000.0)
Exchange arg__L:  arg__L_e -->  New bounds:  (0, 1000)
Starting reaction search with GapFilling . . .


In [23]:
consumption

{1: ({'model': 'Model1'},
  (['MMO2', 'ALCD1'],
   0.8769972144269785,
   {'MMO2': 88.16275218658868, 'ALCD1': 88.1627539405831},
   {'EX_ch4_e': -88.16275218658868},
   {'EX_arg__L_e': 0.0})),
 2: ({'model': 'Model2'},
  (['R01142', 'ALCD1'],
   0.8769972144269785,
   {'R01142': 110.201555639305, 'ALCD1': 110.20155739329942},
   {'EX_ch4_e': -110.201555639305},
   {'EX_arg__L_e': 0.0})),
 3: ({'model': 'Model3'},
  (['R01143', 'ALCD1'],
   0.8769972144269785,
   {'R01143': 132.2439718235817, 'ALCD1': 132.24397357757613},
   {'EX_ch4_e': -132.2439718235817},
   {'EX_arg__L_e': 0.0})),
 4: ({'model': 'Model4'},
  (['R01142', 'POX2', 'PRDX'],
   0.07906382561252327,
   {'R01142': 500.5788290740754,
    'POX2': 6.828205063204772,
    'PRDX': 500.578829232203},
   {'EX_ch4_e': -500.5788290740754},
   {'EX_arg__L_e': 0.0}))}

In [24]:
production = analysis.dict_prod_sol('../inputs/ecoli_ch4_arg.csv', consumption, model_aux, universal)


Bounds of biomass during optimization of consumption =  (0.0, 1000.0)

Bounds of biomass during optimization of production =  (0.04384986072134892, 1000.0)

The metabs to produce are:  ['arg__L']

---1---
The model can already satisfy the objective

---2---
The model can already satisfy the objective

---3---
The model can already satisfy the objective

---4---
The model can already satisfy the objective


In [25]:
production

{'1': {'1': 9.082342543721138},
 '2': {'2': 8.844247835304447},
 '3': {'3': 8.685517676574253},
 '4': {'4': 8.010915181295902}}

In [27]:
final = analysis.cons_prod_dict('../inputs/ecoli_ch4_arg.csv', model_aux, universal, consumption, production)

unknown metabolite 'qh2_c' created
Substrate's exchange reaction bounds :  (-6.6000000000000005, -5.4)
Target's exchange reaction bounds :  (2.0, 1000.0)
Carbon source:  []
Target:  EX_arg__L_e: arg__L_e --> 
FBA objective value:  2.0 
Substrate consumption flux:  -5.4 
Target production flux:  2.0 
Biomass:  0.0438498607213489 

pFBA is infeasible, control if the coefficients of the 
        reaction equation are correct (or use a different boudary 
        reaction of the target as model objective):  None (infeasible)
The thermodynamic analysis cannot proceed because of infeasible pFBA
The thermodynimac analysis has been unsuccesful
unknown metabolite 'qh2_c' created
Substrate's exchange reaction bounds :  (-6.6000000000000005, -5.4)
Target's exchange reaction bounds :  (2.0, 1000.0)
Carbon source:  []
Target:  EX_arg__L_e: arg__L_e --> 
FBA objective value:  2.0 
Substrate consumption flux:  -5.4 
Target production flux:  2.0 
Biomass:  0.0438498607213489 

pFBA is infeasible, contr

In [28]:
final

{'consumption_1': ({'model': 'Model1'},
  (['MMO2', 'ALCD1'],
   0.8769972144269785,
   {'MMO2': 88.16275218658868, 'ALCD1': 88.1627539405831},
   {'EX_ch4_e': -88.16275218658868},
   {'EX_arg__L_e': 0.0})),
 'production_1': {'arg__L': {'EX_arg__L_e flux': 9.082342543721138,
   'thermodynamic': {'mdf': None, 'pathway_length': None}}},
 'consumption_2': ({'model': 'Model2'},
  (['R01142', 'ALCD1'],
   0.8769972144269785,
   {'R01142': 110.201555639305, 'ALCD1': 110.20155739329942},
   {'EX_ch4_e': -110.201555639305},
   {'EX_arg__L_e': 0.0})),
 'production_2': {'arg__L': {'EX_arg__L_e flux': 8.844247835304447,
   'thermodynamic': {'mdf': None, 'pathway_length': None}}},
 'consumption_3': ({'model': 'Model3'},
  (['R01143', 'ALCD1'],
   0.8769972144269785,
   {'R01143': 132.2439718235817, 'ALCD1': 132.24397357757613},
   {'EX_ch4_e': -132.2439718235817},
   {'EX_arg__L_e': 0.0})),
 'production_3': {'arg__L': {'EX_arg__L_e flux': 8.685517676574253,
   'thermodynamic': {'mdf': None, 'pathw

## Concluding considerations
This approach mixes some individual research of the candidate reactions to remove from the model to generate the auxotrophic strain before using the functions of the pipeline to find out if the auxotrophic model can grow on a particular substrate and produce a target. 


<h4 style="color: red;"> In principle both reaction addition and reaction removal for growth coupled production should be found, however, the module of the pipeline using Optknock has very long running times, thus the followin analysis is uncompleted </h4>

In [6]:
from pipeline_package import call_Optknock

ModuleNotFoundError: No module named 'pipeline'

In [2]:
ko_results = call_Optknock.full_knock_out_analysis('../inputs/ecoli_ch4_arg.csv', consumptionr, final, model_aux, universal)

NameError: name 'call_Optknock' is not defined

In [None]:
ko_results