# Project Overview
Flux Balance Analysis (FBA) is a mathematical approach for simulating metabolism in genome-scale reconstructions of metabolic networks. These network reconstructions contain all of the known metabolic reactions in an organism and the genes that encode various enzymes. Using these networks, the following predictions can be made:  
1. Whether an organism grows under different conditions, 
2. The rate at which the organism grows, and 
3. The rate at which a metabolite is produced  
  
Given a specific media and organism, we aim to create a dynamic database containing gap-filled models and the results of FBA under various conditions, while:
* Providing an interactive graphical interface for visualizing gap-filled metabolic networks
* Facilitating the troubleshooting of why an organism grows/does not grow given specific environmental conditions
* Providing graphic visualizations of the simulated growth curves (biomass vs. time) and growth in time and space  
![Workflow](Images/pipeline.png)  

This tutorial will be focusing on *Pseudomonas simiae WCS417* growing on a RCH2_defined_no_Carbon media (defined by Adam Deutschbauer in [Supplementary TableS18_Medias](https://www.nature.com/articles/s41586-018-0124-0#Sec36).  

# Setting up the Environment
You can create an Anaconda environment using:

    conda create --name [name]
    conda activate [name]
    
The following dependency installations assume you are using Anaconda. However, if you do not want to create an environment you can change each command to `pip install`

# Creating a Gap-filled Model
**As a USER, I would like to upload my own model and media so that I can create a gap-filled model**   
In order to run FBA, a gap-filled model is necessary to predict the uptake and secretion capabilities of an organism. Supplying a media file will guarantee that the model is able to reproduce growth in a experimentally verified media, i.e. gap-filling will identify the minimal set of biochemical reactions for a model to produce biomass in a specified media.  

[CarveMe](https://carveme.readthedocs.io/en/latest/usage.html) will be used to create such a model given a media. The user will supply the accession number or choose an organism from the database.  
  
*CarveMe Dependencies*: diamond, cplex (requires license)

In [None]:
pip install carveme

In [None]:
conda install -c bioconda diamond

[Cplex](https://www.ibm.com/products/ilog-cplex-optimization-studio) can be downloaded from IBM and requires a license. Academic licenses are free.  
Once downloaded, you can install it using:  

## Media File
The user can choose from a set of pre-defined media (LB, M9, or RCH2) in the database. In the future, we would like to give the ability for the user to upload their own media file.

## Starting from Genome
Given a NCBI RefSeq accession number by the user, `CarveMe` will be used to first build a genome-scale metabolic model and then a gap-filled model in COBRA format.

In [None]:
# Create metabolic model (not gap-filled) for storage in the database 
# if the same model needs to be run with a different media
!carve --refseq GCF_000698265.1 --cobra -o GCF_000698265.1.xml

# Create a gap-filled model
!gapfill GCF_000698265.1.xml --cobra -g RCH2 -i RCH2 --mediadb media_db.tsv 

This will create 4 files: 
1. GCF_000698265.1.faa.gz - protein FASTA file	
2. GCF_000698265.1.faa.tsv - gene matching by homology search using diamond
3. GCF_000698265.1.xml - metabolic model 
4. GCF_000698265.1_gapfill.xml - gap-filled metabolic model

We will store GCF_000698265.1.xml and GCF_000698265.1_gapfill.xml in our database.

### Retrieving Organism Name
While the RefSeq accession is a unique identifier, it is not easy to read. We will extract the metadata using `Biopython`. This will provide us the organism name for ease of selecting already-built models.

In [None]:
conda install biopython 

In [99]:
from Bio import Entrez

refseq = 'GCF_000698265.1'

handle = Entrez.esearch(db="assembly", term=refseq)
record = Entrez.read(handle)

# Get Assembly Summary
esummary_handle = Entrez.esummary(db="assembly", id=record['IdList'], report="full")
esummary_record = Entrez.read(esummary_handle)

# Organism and strain
organism = esummary_record['DocumentSummarySet']['DocumentSummary'][0]['SpeciesName']
try: 
    strain = esummary_record['DocumentSummarySet']['DocumentSummary'][0]['Biosource']['InfraspeciesList'][0]['Sub_value']
except:
    strain = ''
print(organism, strain)

Pseudomonas simiae WCS417


The species and strain will be stored in our database with the RefSeq accession as the unique identifier. The strain can be NULL, however the organism name cannot.

# Running FBA
**As a USER, I would like to select my own set of conditions so that I can simulate an organism's growth under different conditions**  
Now that we have the gap-filled model, [dfba](https://pypi.org/project/dfba/) will be used to simulate growth and visualize growth curves with [CobraPy](https://opencobra.github.io/cobrapy/) on the backend. This will allow us to answer the following questions:
1. How does a microorganism grow under different sources/conditions?
2. What is the flow of a given metabolite through the metabolic network?
3. How do the growth rates of a specific organism compare across different conditions and/or media?
4. What are the metabolites and their fluxes in a given pathway?
5. What are the upper and lower bounds of a particular reaction?  

In [None]:
conda install -c conda-forge dfba

In [None]:
pip install cobra

In [None]:
pip install plotly

## Growth Under Different Conditions


In [1]:
import re
import cobra

from dfba import DfbaModel, ExchangeFlux, KineticVariable

In [None]:
# Load the gap-filled model
model = cobra.io.read_sbml_model('CarveMe/GCF_000698265.1_gapfill.xml')

In [75]:
model.solver = 'glpk'
# Create a DfbaModel with the cobrapy model
dfba_model = DfbaModel(model)

In [76]:
# Get biomass reaction and add it to the dfba model
regex = re.compile('biomass', flags=re.IGNORECASE)
biomass = model.reactions.query(regex, attribute='name')[0]

# Set kinetic variable for biomass
X = KineticVariable('Biomass')
dfba_model.add_kinetic_variables(X)

# Instantiate exchange flux for biomass
mu = ExchangeFlux(biomass.id)
dfba_model.add_exchange_fluxes(mu)
                                
# Rhs expression for biomass
dfba_model.add_rhs_expression('Biomass', mu * X)
dfba_model.add_initial_conditions({'Biomass': 0.01})

On the website, the user will be able to select one or more exchange metabolites in the model and change the concentration via a drop down menu. Below is an example for providing 15.5 mmol/L of sucrose under aerobic conditions.

In [77]:
# Example list of exchanges and their concentrations defined by the user
exchanges = {'EX_sucr_e': 15.5, 'EX_o2_e': 20}

In [78]:
# For each user defined value, we will get the reaction and calculate the lower flux
for ex, concen in exchanges.items():
    # Get the reaction by metabolite ID
    rxn = model.reactions.get_by_id(ex)
    ex_var = KineticVariable(rxn.name)
    dfba_model.add_kinetic_variables(ex_var)
    ex_flux = ExchangeFlux(ex)
    dfba_model.add_exchange_fluxes(ex_flux)
    
    # Calculate the time derivative of each kinetic variable 
    mass = 0
    for metabolite, coefficient in rxn.metabolites.items():
        mass += metabolite.formula_weight
    dfba_model.add_rhs_expression(rxn.name, ex_flux * mass * X / 1000.0)
    
    # Calculate lower bound based on michaelis mentin equation
    # vmax = 18.5, km = 0.000015 (default values taken from COMETS)
    dfba_model.add_exchange_flux_lb(ex, 18.5 * ex_var / (0.000015 + ex_var), ex_var)
    dfba_model.add_initial_conditions({rxn.name: concen})

In [79]:
# simulate model across interval t = [0.0,25.0](hours) with outputs for plotting
# every 0.1h and optional list of fluxes
concentrations, trajectories = dfba_model.simulate(
    0.0, 25.0, 0.1, list(exchanges.keys()))

# Graphical Outputs

In [None]:
from dfba.plot.plotly import *
import plotly.io as pio

pio.templates.default = "plotly_white"

In [163]:
fig = plot_concentrations(concentrations)
fig.show()

In [135]:
fig = plot_trajectories(trajectories)
fig.show()

In [None]:
import cobra
import cometspy as c

# Load the gap-filled model
model = cobra.io.read_sbml_model('CarveMe/GCF_000698265.1_gapfill.xml')

In [131]:
# Create COMETS model
comets_model = c.model(model)
# set the model's initial biomass
comets_model.initial_pop = [0, 0, 5e-6]
# remove the bounds from glucose import (will be set dynamically by COMETS)
comets_model.change_bounds('EX_glc__D_e', -1000, 1000)

# Create 1x1 layout of model
test_tube = c.layout(comets_model)
test_tube.set_specific_metabolite('glc__D_e', 0.011)
test_tube.set_specific_metabolite('cl_e', 0.00467+0.00134+0.017+0.0004+0.00068+0.00001)
test_tube.set_specific_metabolite('k_e', 0.0043)
# test_tube.set_specific_metabolite('na1_e', 0.0043+(7*0.012)+0.0029+(6*0.0004)+(7*0.00034)+(2*0.00068)+(7*0.00036)+(5*0.00004))

test_tube.media

Unnamed: 0,diff_c,g_refresh,g_static,g_static_val,init_amount,metabolite
0,0.000005,0,0,0,0,14glucan_e
1,0.000005,0,0,0,0,15dap_e
2,0.000005,0,0,0,0,2ohph_c
3,0.000005,0,0,0,0,2pglyc_e
4,0.000005,0,0,0,0,34dhbz_e
...,...,...,...,...,...,...
198,0.000005,0,0,0,0,val__L_e
199,0.000005,0,0,0,0,xan_e
200,0.000005,0,0,0,0,xtsn_e
201,0.000005,0,0,0,0,xyl__D_e


In [159]:
model.optimize()

Unnamed: 0,fluxes,reduced_costs
12DGR161tipp,0.000000,0.000000e+00
12DGR181tipp,0.000000,0.000000e+00
14GLUCANabcpp,0.000000,-3.179023e-02
14GLUCANtexi,0.000000,0.000000e+00
1PPDCRc,0.000000,-1.092876e-16
...,...,...
sink_hemeO_c,0.000000,0.000000e+00
sink_lipopb_c,0.000000,0.000000e+00
sink_sheme_c,0.000000,0.000000e+00
Growth,79.475573,3.479611e-17


In [19]:
medium = model.medium
medium['EX_glc__D_e'] = 0.0
medium['EX_fru_e'] = 0.0
model.medium = medium
model.slim_optimize()

79.47557251570046

In [14]:

cobra.io.write_sbml_model(model, 'escher.xml')

In [127]:
model.metabolites.query('na', attribute='id')

[<Metabolite nad_c at 0x7f8c9847ea50>,
 <Metabolite nadh_c at 0x7f8c98482650>,
 <Metabolite nadp_c at 0x7f8c98485b10>,
 <Metabolite nadph_c at 0x7f8c98485fd0>,
 <Metabolite na1_c at 0x7f8c984afe50>,
 <Metabolite na1_e at 0x7f8c984afc50>,
 <Metabolite na1_p at 0x7f8c984dced0>,
 <Metabolite acmana_c at 0x7f8c9c03f9d0>,
 <Metabolite acmanap_c at 0x7f8c9c04cdd0>,
 <Metabolite dhna_c at 0x7f8c9c13dd50>,
 <Metabolite dnad_c at 0x7f8c9c158210>,
 <Metabolite glutrna_c at 0x7f8c9c1b4e90>,
 <Metabolite trnaglu_c at 0x7f8c9c1b4ad0>,
 <Metabolite mana_c at 0x7f8c9c204cd0>,
 <Metabolite nac_c at 0x7f8c9c21ded0>,
 <Metabolite rnam_c at 0x7f8c9c223e50>,
 <Metabolite rnam_e at 0x7f8c9c226dd0>]

In [89]:
medium = model.medium
medium['EX_glc__D_e'] = 0.0
medium['EX_fru_e'] = 10.0
model.medium = medium
model.slim_optimize()

79.47557251570036

In [132]:
# Create a parameters object with default values 
params = c.params()
# Change the value of a parameter, for example number of simulation cycles
params.set_param('maxCycles', 1000)
params.set_param('BiomassLogRate', 1000)
params.set_param('timeStep', 0.01)
params.set_param('spaceWidth', 1.0)

# Set some parameters to True, in order to save the output
params.set_param('writeTotalBiomassLog', True)
params.set_param('writeBiomassLog', True)
params.set_param('writeFluxLog', True)
params.set_param('writeMediaLog', True)

In [133]:
simulation = c.comets(test_tube, params)
simulation.run()


Running COMETS simulation ...
Done!


In [114]:
simulation.total_biomass

Unnamed: 0,cycle,GCF_000698265_1
0,0.0,0.000005
1,1.0,0.000005
2,2.0,0.000005
3,3.0,0.000005
4,4.0,0.000005
...,...,...
9996,9996.0,0.000005
9997,9997.0,0.000005
9998,9998.0,0.000005
9999,9999.0,0.000005


In [134]:
import plotly.graph_objects as go

fig = go.Figure()
fig.add_trace(go.Scatter(x=simulation.total_biomass['cycle'], 
                         y=simulation.total_biomass['GCF_000698265_1'], 
                         mode="lines", name='Biomass'))

fig.update_layout(xaxis_title='Cycle',
                  yaxis_title='Biomass (gr.)')

In [164]:
import escher
from escher import Builder

builder = Builder()
builder.model = dfba_model
# Run FBA with the model and add the flux data to the map
solution = builder.model.optimize()
builder.reaction_data = solution.fluxes

# Add some data for metabolites
builder.metabolite_data = solution.shadow_prices
builder.reaction_scale = [
    { 'type': 'min', 'color': '#000000', 'size': 12 },
    { 'type': 'median', 'color': '#ffffff', 'size': 20 },
    { 'type': 'max', 'color': '#ff0000', 'size': 25 }
]
builder.reaction_scale_preset = 'GaBuRd'
# Make all the arrows three times as thick
builder.reaction_scale = [
    {k: v * 3 if k == 'size' else v for k, v in x.items()}
    for x in builder.reaction_scale
]
builder

TraitError: The 'model' trait of a Builder instance expected a Model or None, not the DfbaModel <DfbaModel 140241131071760 at 0x7f8c6ecce910>.