### Usage of the biomass package version 0.1

The biomass package is used to generate metabolite coefficients in the biomass objective function of a metabolic model. The main advantage of this package is that it weights the coefficients based on experimental, relatively easy to generate OMICs datasets.

OMICS data standard input format:
- A 2 columns csv file labelled "gene_id" and "abundances"

The datasets supported in version 0.1 are the following:
- Genomic: genome sequence as a fasta file
- Transcriptomic: processed RNA-seq data in the standard input format
- Proteomic: processed proteomic data in the standard input format

To be included in version 0.2:
- Lipidomic
- Metabolomic

The functionnal elements of version 0.1 are:
- DNA
- RNA
- Protein
- Update

The main method for the classes DNA,RNA and Protein are get_coefficients(). They return a dictionnary of metabolites associated with coefficients that can be utilized to update the biomass objective function.
The Update class main method is update_biomass(). It replaces the existing coefficient of the given metabolites by the ones calculated 

In [1]:
#Define cell weight and proportion of each main components of the cell
#These parameters have default values
cell_weight = 0.3 
protein_ratio = 0.6
dna_ratio = 0.1
rna_ratio = 0.2

In [2]:
#Import a metabolic model
#Two extensions are supported .json and .xml
import cobra
model = cobra.io.load_json_model('/home/jean-christophe/Documents/Maitrise_UCSD/Modelling/Mesoplasma_florum/Functionnal_testing/finalMeso1.json')
#This model has a biomass objective function without coefficients
model.reactions.BIOMASS.reaction

'10fthf_c + 12dgr180 + 5fthf_c + alatrna_c + amet_c + argtrna_c + ascb6p_c + asntrna_c + asptrna_c + atp_c + ca2_c + cl_c + clpn_c + coa_c + cobalt2_c + crm_mf_c + ctp_c + cystrna_c + datp_c + dctp_c + dgtp_c + dttp_c + fad_c + fe3_c + fmettrna_c + glntrna_c + glutrna_c + glytrna_c + gtp_c + histrna_c + iletrna_c + k_c + leutrna_c + lipopb_c + lpail345p_c + lystrna_c + mal__L_c + mettrna_c + mg2_c + mlthf_c + mn2_c + mobd_c + na1_c + ni2_c + orn_c + pa180_c + pail345p_c + pail_c + pan4p_c + pc_c + pg_c + phetrna_c + protrna_c + ptrc_c + ribflv_c + s_c + sertrna_c + sphmyln_mf_c + spmd_c + thf_c + thmpp_c + thrtrna_c + trptrna_c + tyrtrna_c + uacgam_c + utp_c + valtrna_c + zn2_c --> glutrna(gln)_c + trnaala_c + trnaarg_c + trnaasn_c + trnaasp_c + trnacys_c + trnaglu_c + trnagly_c + trnahis_c + trnaile_c + trnaleu_c + trnalys_c + 2.0 trnamet_c + trnaphe_c + trnapro_c + trnaser_c + trnathr_c + trnatrp_c + trnatyr_c + trnaval_c'

In [3]:
#The path to the model
path_to_model = '/home/jean-christophe/Documents/Maitrise_UCSD/Modelling/Mesoplasma_florum/Functionnal_testing/finalMeso1.json'

In [4]:
#Import genome sequence
path_to_fasta = '/home/jean-christophe/Documents/Maitrise_UCSD/GenBank_files/Mflorum_DNA.fasta'
#Import Genbank annotation file
path_to_genbank = '/home/jean-christophe/Documents/Maitrise_UCSD/GenBank_files/Mflorum.gbff'

In [5]:
import pandas as pd

In [6]:
#Import transcriptomic data
transcriptomic = pd.read_csv('transcriptomic.csv')

In [7]:
#Import proteomic data
proteomic = pd.read_csv('proteomics.csv')

In [8]:
#Update DNA coefficients
from biomass import DNA
dna_update = DNA()
dna_biomass_coefficients = dna_update.get_coefficients(path_to_fasta,path_to_model)
print(dna_biomass_coefficients)
dna_update.update_biomass_coefficients(dna_biomass_coefficients,model)

{<Metabolite dgtp_c at 0x7f675e289810>: 0.022407104330214745, <Metabolite datp_c at 0x7f675e26c290>: 0.0626995792763276, <Metabolite dctp_c at 0x7f675e2897d0>: 0.024606894127236167, <Metabolite dttp_c at 0x7f675e289790>: 0.06417773683777436}
Found dgtp_c in biomass reaction
The actual model None solves in <Solution 0.53 at 0x7f675e98aad0> and its biomass contains 87 metabolites
The actual model None solves in <Solution 0.53 at 0x7f675ee7a790> and its biomass contains 88 metabolites
10fthf_c + 12dgr180 + 5fthf_c + alatrna_c + amet_c + argtrna_c + ascb6p_c + asntrna_c + asptrna_c + atp_c + ca2_c + cl_c + clpn_c + coa_c + cobalt2_c + crm_mf_c + ctp_c + cystrna_c + datp_c + dctp_c + 0.0224071043302 dgtp_c + dttp_c + fad_c + fe3_c + fmettrna_c + glntrna_c + glutrna_c + glytrna_c + gtp_c + histrna_c + iletrna_c + k_c + leutrna_c + lipopb_c + lpail345p_c + lystrna_c + mal__L_c + mettrna_c + mg2_c + mlthf_c + mn2_c + mobd_c + na1_c + ni2_c + orn_c + pa180_c + pail345p_c + pail_c + pan4p_c + pc

In [9]:
#Update RNA coefficients
from biomass import RNA
rna_update = RNA()
rna_biomass_coefficients = rna_update.get_coefficients(path_to_genbank,path_to_model,transcriptomic,
                             CELL_WEIGHT=cell_weight,TOTAL_RNA_RATIO=rna_ratio)
#The fractions of ribosomal, transfer and messenger RNA can also be defined
rna_update.update_biomass_coefficients(rna_biomass_coefficients,model)

Found utp_c in biomass reaction
The actual model None solves in <Solution 0.59 at 0x7f675e00fa10> and its biomass contains 87 metabolites
The actual model None solves in <Solution 0.58 at 0x7f675e98aad0> and its biomass contains 88 metabolites
10fthf_c + 12dgr180 + 5fthf_c + alatrna_c + amet_c + argtrna_c + ascb6p_c + asntrna_c + asptrna_c + atp_c + ca2_c + cl_c + clpn_c + coa_c + cobalt2_c + crm_mf_c + ctp_c + cystrna_c + 0.0626995792763 datp_c + 0.0246068941272 dctp_c + 0.0224071043302 dgtp_c + 0.0641777368378 dttp_c + fad_c + fe3_c + fmettrna_c + glntrna_c + glutrna_c + glytrna_c + gtp_c + histrna_c + iletrna_c + k_c + leutrna_c + lipopb_c + lpail345p_c + lystrna_c + mal__L_c + mettrna_c + mg2_c + mlthf_c + mn2_c + mobd_c + na1_c + ni2_c + orn_c + pa180_c + pail345p_c + pail_c + pan4p_c + pc_c + pg_c + phetrna_c + protrna_c + ptrc_c + ribflv_c + s_c + sertrna_c + sphmyln_mf_c + spmd_c + thf_c + thmpp_c + thrtrna_c + trptrna_c + tyrtrna_c + uacgam_c + 0.121001274816 utp_c + valtrna_c

In [13]:
#Update Protein coefficients
from biomass import Protein
protein_update = Protein()

protein_coefficients = protein_update.get_coefficients(path_to_genbank,path_to_model,proteomic)
print(protein_coefficients)

{'A': 0.5159110997956983, 'C': 0.024764164647747312, 'E': 0.3236038498960037, 'D': 0.24063727925750836, 'G': 0.514084868214509, 'F': 0.13775204382857814, 'I': 0.38866814279916834, 'H': 0.05279661803935423, 'K': 0.42289554665706036, 'M': 0.10039334382981344, 'L': 0.3701278253832807, 'N': 0.26425478270623226, 'Q': 0.1276979910952906, 'P': 0.14989571771940108, 'S': 0.32422059315282087, 'R': 0.12698594044292985, 'T': 0.2918719442117945, 'W': 0.019700095170848194, 'V': 0.3621775122609601, 'Y': 0.09542866936675634}


In [None]:
protein_update.update_biomass_coefficients()