# Analysis for reactions addition allowing growth on uncommon substrates and production of a target compound

Once the models have been loaded and prepared (*see Tutorial 1*) the analysis can be performed. The pipeline first conisders if the model is able to grow on the user-indicated carbon source and if not it looks for which reactions from the universal model could be added to the model of the chassis organism in order to confer that function to it.

This first search of reactions allowing growth on an uncommon substrate is done with the function *analysis_gf_sol*. This function sets the right constraint in the model of the chassis and then calls the COBRApy[1] function for [gapfilling analysis](https://cobrapy.readthedocs.io/en/latest/gapfilling.html). In particular, *analysis_gf_sol* sets the upper bound of the biomass reactions to the normal growth rate in the wild type chassis. Thus, any reaction addition that is identified with gap-filling leads to a growth rate that is maximally equal to the wild type. (The objective of the optimization is the biomass reaction)

The function *dict_prod_sol* is used to evaluate if the addition of the reactions indicated by the gapfilling analysis makes the chassis' model also able to produce a target product indicate by the user. If the model can't already simulate production the function *dict_prod_sol* calls again the COBRApy[1] function for [gapfilling analysis](https://cobrapy.readthedocs.io/en/latest/gapfilling.html).

Checking the model's ability to produce the target compound requires a different model objective and additional constraints. The objective of the optimization is set to be the exchange reaction of the target. The biomass reaction has to be constrained in the lower bound, to indicate the model that the optimization of the formation of the target should still allow growth, otherwise the simulation will maximize production leading to 0 1/h growth rate, which is not realistic. Additionally, the uptake of the substrate is constrained by setting a minium lower value that can be indicated by the user if the ammount of substrate in the medium can be estimated. 

The functions downstream to *analysis_gf_sol* build upon the output of this function, hence *dict_prod_sol* uses the output of *analysis_gf_sol* as argument, while the output of both previous function is used by *cons_prod_dict* to generate a final object comprehensive of all the information gathered previously and calls the thermodynamic analysis too. This final output is used among others by *scores_evaluations* to score the several alternative reaction addition and the fluxes of substrate uptake and product formation.  


In this tutorial a model of *Escherichia coli* is used to generate strains growing on methane and producing itaconate.

#### Import statemets

In [1]:
from pipeline_package import import_models, input_parser, analysis


#### Load models

In [2]:
data_repo = "../inputs"
model3 = import_models.get_reference_model(data_repo, '../inputs/ecoli_tutorial3.csv')
universal3 = import_models.get_universal_main(data_repo, '../inputs/ecoli_tutorial3.csv')

#### Prepare models

In [3]:
input_parser.parser('../inputs/ecoli_tutorial3.csv', universal3, model3)

For ch4 there isn't any uptake trasnsporter in the reference model
The trasporter has been added to the 
                                reference model from the input file
For nh4 there isn't any uptake trasnsporter in the reference model

For nh4 there is a transport reaction in the universal model for the uptake from the extracellular space:  
Reaction ID:  NH4t 
Reaction equation:  nh4_e <=> nh4_c

For nh4 there is a transport reaction 
                                    in the universal model for the uptake 
                                    from the periplasm:  
Reaction ID:  NH4t4pp 
Reaction equation:  k_c + nh4_p --> k_p + nh4_c 


For nh4 there is a transport reaction 
                                    in the universal model for the uptake 
                                    from the periplasm:  
Reaction ID:  NH4tpp 
Reaction equation:  nh4_p <=> nh4_c 


For nh4 there is a transport reaction 
                                    in the universal model for the uptake 
 

0,1
Name,iML1515
Memory address,0x07f0bbf8a1cd0
Number of metabolites,1877
Number of reactions,2716
Number of groups,0
Objective expression,1.0*BIOMASS_Ec_iML1515_core_75p37M - 1.0*BIOMASS_Ec_iML1515_core_75p37M_reverse_35685
Compartments,"cytosol, extracellular space, periplasm"


The printed output above can be unserstood better looking at the example input file *ecoli_tutorial2.csv* that can be found in the inputs directory. 
 
* There is not transporter of methane in E. coli model iML1515. Thus, the pipeline adds the reaction included in the input file 
* Itaconate is indicated as target
* ALCD1 reaction (methanole dehydrogenase) is already in the model, so the pipeline recognize it and does not add it to the universal
* There are no methane oxidizing reactions in the BiGG database, therefore the pipeline adds them to the universal reaction model.

![input_main](./images/main_itacon.png)


## Running the analysis for making the model able to grow on the indicated substrate

In this case the objective is to have an *E. coli* strain able to grow on methane. The chassis can't naturally grow on methane, therefore it is expected that the pipeline will use gapfilling to serach for additional reactions. The search can take few hours, depending on the PC, the number of iteration and the type of reactions needed. For this analysis on a Windows AMD64 processor running the pipeline via Jupyter notebook it takes apporximately 4 hours.  

In [4]:
consumption3 = analysis.analysis_gf_sol('../inputs/ecoli_tutorial3.csv', model3, universal3)

Old biomass (objective) bounds =  (0.0, 1000.0)
EX_pi_e pi_e <=>  -0.8459567750196153 

EX_glc__D_e glc__D_e <=>  -10.0 

EX_nh4_e nh4_e <=>  -9.471495371048153 

EX_o2_e o2_e <=>  -22.131763238945883 

New biomass (objective) bounds =  (0.0, 0.8769972144269667)

ch4 is in the medium 

Exchange ch4:  ch4_e -->  Old bounds:  (0.0, 1000.0)
Exchange ch4:  ch4_e <--  New bounds:  (-1000, 0)

nh4 is in the medium 

Exchange nh4:  nh4_e <=>  Old bounds:  (-1000.0, 1000.0)
Exchange nh4:  nh4_e <--  New bounds:  (-1000, 0)

itacon is not in the medium 

Ther reaction EX_itacon_e has been added to the reference model, hence itacon_e is now in the medium
Exchange itacon:  itacon_e <--  Old bounds:  (-1000, 0)
Exchange itacon:  itacon_e -->  New bounds:  (0, 1000)
Starting reaction search with GapFilling . . .

---Model 1---

Reaction ALCD1, solution of round 1 has been added to the model

Reaction MMO2, solution of round 1 has been added to the model

Growth rate:  0.8769972144269667

The flux t

In [5]:
consumption3

{1: ({'model': 'Model1'},
  (['ALCD1', 'MMO2'],
   0.8769972144269667,
   {'ALCD1': 88.1612462654369, 'MMO2': 88.16124451144248},
   {'EX_ch4_e': -88.16124451144248, 'EX_nh4_e': -9.471495371047917},
   {'EX_itacon_e': 0.0})),
 2: ({'model': 'Model2'},
  (['ALCD1', 'R01142'],
   0.8769972144269667,
   {'ALCD1': 110.20140538044765, 'R01142': 110.20140362645323},
   {'EX_ch4_e': -110.20140362645323, 'EX_nh4_e': -9.471495371048045},
   {'EX_itacon_e': 0.0}))}

## Explanation of the printed output

The output reports several information relative to the set up of the analysis as introduced above (e.g. constraints) and the results of the analysis with gapfilling. In order from top to bottom you can find the following information:

- The biomass upper bound to the growth rate of the wild type (0.87 1/h)
- Indication of the fluxes of the compounds that are take up from the mediume by the wild type strain
- The old and new bounds of the exchange reaction of the compounds indicated by the user 
- The result of the analysis start after the indication that GapFilling algorithm is being run.  
- The analysis of gapfilling can be run in several iteration (each one looking for reactions that if added satisfy the model's objective, growth in this case). The result of each iteration starts with a number. 

This function also generate an intermediate .csv file that can be found in the pipeline/outputs folder. It is advicable to rename the file before running *analysis_gf_sol* further, otherwise it get overwritten

In [6]:
production3 = analysis.dict_prod_sol('../inputs/ecoli_tutorial3.csv', consumption3, model3, universal3)


Bounds of biomass during optimization of consumption =  (0.0, 0.8769972144269667)

Bounds of biomass during optimization of production =  (0.04384986072134833, 0.8769972144269667)

The metabs to produce are:  ['itacon']

---1---

---2---


In [7]:
production3

{'1': {'1': {'itacon': {'Run 1': ['ITAtr', 'ACDCX'],
    'Run 2': ['ITAtr', 'ACDCX']}}},
 '2': {'2': {'itacon': {'Run 1': ['ITAtr', 'ACDCX'],
    'Run 2': ['ITAtr', 'ACDCX']}}}}

### Explanation of the printed output

The wild type model can't produce itaconate, therefore gapfilling analysis is run for each solution indicated in the "consumption" dictionary (i.e. different strains growing on methane). The same amount of iteration is used for each strain growing on methane. This output is not to be read on its own but mostly needed by the following function, that integrates it with the output of *analysis_gf_sol* function

<h1 style="color: red;"> TODO: </h1>
<h4 style="color: blue;"> Check why the reactions are repeated. The expected result would be ACDX for iteration of both variants growing on methane </h4>

In [8]:
final3 = analysis.cons_prod_dict('../inputs/ecoli_tutorial3.csv', model3, universal3, consumption3, production3)


 solutions for consumption =  ALCD1

 solutions for consumption =  MMO2
unknown metabolite 'qh2_c' created
Substrate's exchange reaction bounds :  (-6.6000000000000005, -5.4)
Target's exchange reaction bounds :  (2.0, 1000)
Carbon source:  []
Target:  EX_itacon_e: itacon_e --> 
FBA objective value:  2.0 
Substrate consumption flux:  -6.6000000000000005 
Target production flux:  2.0 
Biomass:  0.0438498607213483 

pFBA is infeasible, control if the coefficients of the 
        reaction equation are correct (or use a different boudary 
        reaction of the target as model objective):  None (infeasible).
The thermodynamic analysis cannot proceed because of infeasible pFBA
The thermodynimac analysis has been unsuccesful

 solutions for consumption =  ALCD1

 solutions for consumption =  R01142
unknown metabolite 'qh2_c' created
Substrate's exchange reaction bounds :  (-6.6000000000000005, -5.4)
Target's exchange reaction bounds :  (2.0, 1000)
Carbon source:  []
Target:  EX_itacon_e: it

### Explanation of the printed output

Most of what is printed is related to the thermodynamic analysis. This output is the most comprehensive one and is structured with nested python dictionaries. Every solution of *analysis_gf_sol* is indicated as with the key consumption following by the number of the round of iteration of gapfilling (with the fluxes of consumption of the substrate). The respective production key reports the results of *dict_prod_sol* analysis (i.e. the fluxes of production of the target) and the result of the thermodynamic analysis.

<h1 style="color: red;"> TODO: </h1>

<h4 style="color: blue;"> Debug final loops lines 907 989 </h4>

scores_output3 = scores_evaluations('../inputs/ecoli_tutorial3.csv', consumption3, final3)

**References**
1. A. Ebrahim, J. A. Lerman, B. O. Palsson, and D. R. Hyduke, “COBRApy: COnstraints-Based Reconstruction and Analysis for Python,” BMC Syst. Biol., vol. 7, no. 1, p. 74, Aug. 2013, doi: 10.1186/1752-0509-7-74.