# ```HumanGEMlib``` usage

Root directory contains ```usage.ipynb``` and ```cleaneddata.tsv```, a sample file with GWAS data. Class ```FBA()``` is entry point into all ```humangemlib``` functionality. 

In [1]:
# IMPORTS
from humangemlib import fba
import pandas as pd, numpy as np  # optional

# HUMANGEMLIB SETUP

fb = fba.FBA() # flux balance analysis for genetic knockouts using HumanGEM and CobraPy

INFO:cobra.core.model:The current solver interface glpk doesn't support setting the optimality tolerance.


Sample GWAS data in root directory. File must contain list of genes to be knocked out. List of genes in column ```mappedGenes``` in ```cleaneddata.tsv```.

In [2]:
df=pd.read_csv('./cleaneddata.tsv', sep='\t')      # sample GWAS data
geneList=list(df.mappedGenes)

Given list of genes from GWAS data are separated into genes with dashes, 'LINC', commas, or neither of the 3 in their names. Only genes with no symbols in their names, or genes that are purely alphanumeric, termed pure genes, are used for knockouts currently. These genes are separated below.

Default data in library file ```cleaneddata.tsv```, which is GWAS data of type 2 diabetes, is used if the function is used without arguments. Here, both are equivalent since the genes we are sending manually to the function as ```geneList``` to ```gene_type_separation(geneList)``` are obtained from the same file as the one in the library by default. Hence, here I try to illustrate how to send your own GWAS data to this function to obtain metabolic fluxes for knockouts from the same data.

In [3]:
# fb.gene_type_separation()
fb.gene_type_separation(geneList=geneList) # uses sample genetic knockout data by default, both statements are the same

#### Flux solutions and flux differentials with respect to control condition obtained
We observe that the standard deviation of fluxes obtained from independent solutions of the flux balancing is always with 10^-13 standard deviations, effectively signaling that independent runs need not be taken for this special case. Thus, it is preferred to use ```FBA.knockout_fluxes()``` instead of ```FBA.mean_solutions(n)``` which returns mean solutions, performing the flux balance operation ```n``` times and returning average fluxes and standard deviations. 0 standard deviations were observed in the fluxes in 99+% cases and a maximum standard deviation of 10^-13 in a few. Since this can be neglected, it is assumed that a single flux balance operation results in an accurate fluxes and not edge case fluxes that happened on a fluke. The actual solution space is uncharted.

In [4]:
solutions = pd.DataFrame(fb.knockout_fluxes())    # flux balance solutions 
diff=pd.DataFrame(fb.flux_differentials(solutions))   # flux differences compared to normal by reaction

####  Fluxes through 12995 reactions by knockout gene and value of objective function

In [5]:
solutions

Unnamed: 0,knockout_gene,objective_value,MAR03905,MAR03907,MAR04097,MAR04099,MAR04108,MAR04133,MAR04281,MAR04388,...,MAR20163,MAR20164,MAR20165,MAR20166,MAR20167,MAR20168,MAR20169,MAR20170,MAR20171,MAR20172
0,,124.868148,0.0,0.0,0.0,0.0,0.0,0.0,743.557931,0.0,...,-151.832818,0.0,0.0,0.0,-0.259607,0.0,0.0,0.0,0.0,1.285417
1,AQP10,124.868148,0.0,0.0,0.0,0.0,0.0,0.0,743.557931,0.0,...,-151.832818,0.0,0.0,0.0,-0.259607,0.0,0.0,0.0,0.0,1.285417
2,USP44,124.868148,0.0,0.0,0.0,0.0,0.0,0.0,743.557931,0.0,...,-151.832818,0.0,0.0,0.0,-0.259607,0.0,0.0,0.0,0.0,1.285417
3,SLC1A2,124.868148,0.0,0.0,0.0,0.0,0.0,0.0,743.557931,0.0,...,-151.832818,0.0,0.0,0.0,-0.259607,0.0,0.0,0.0,0.0,1.285417
4,UBE2Z,124.868148,0.0,0.0,0.0,0.0,0.0,0.0,743.557931,0.0,...,-151.832818,0.0,0.0,0.0,-0.259607,0.0,0.0,0.0,0.0,1.285417
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
109,ST6GAL1,124.868148,0.0,0.0,0.0,0.0,0.0,0.0,848.348504,0.0,...,-160.610624,0.0,0.0,0.0,-0.259607,0.0,0.0,0.0,0.0,0.969803
110,GCK,124.868148,0.0,0.0,0.0,0.0,0.0,0.0,848.348504,0.0,...,-160.610624,0.0,0.0,0.0,-0.259607,0.0,0.0,0.0,0.0,0.969803
111,UBE2E2,124.868148,0.0,0.0,0.0,0.0,0.0,0.0,848.348504,0.0,...,-160.610624,0.0,0.0,0.0,-0.259607,0.0,0.0,0.0,0.0,0.969803
112,XYLT1,124.868148,0.0,0.0,0.0,0.0,0.0,0.0,848.348504,0.0,...,-160.610624,0.0,0.0,0.0,-0.259607,0.0,0.0,0.0,0.0,0.969803


#### Difference matrix with respect to control condition with no knockouts
Differential flux for each reaction in each column. Each row represents genetic knockout.

```Control flux - Knockout flux = Differential flux```

In [6]:
diff

Unnamed: 0,knockout_gene,objective_value,MAR03905,MAR03907,MAR04097,MAR04099,MAR04108,MAR04133,MAR04281,MAR04388,...,MAR20163,MAR20164,MAR20165,MAR20166,MAR20167,MAR20168,MAR20169,MAR20170,MAR20171,MAR20172
0,AQP10,-1.421085e-14,0.0,0.0,0.0,0.0,0.0,0.0,2.501110e-12,0.0,...,-1.136868e-13,0.0,0.0,0.0,5.551115e-17,0.0,0.0,0.0,0.0,-4.440892e-16
1,USP44,-1.421085e-14,0.0,0.0,0.0,0.0,0.0,0.0,2.501110e-12,0.0,...,-1.136868e-13,0.0,0.0,0.0,5.551115e-17,0.0,0.0,0.0,0.0,-4.440892e-16
2,SLC1A2,-1.421085e-14,0.0,0.0,0.0,0.0,0.0,0.0,2.501110e-12,0.0,...,-1.136868e-13,0.0,0.0,0.0,5.551115e-17,0.0,0.0,0.0,0.0,-4.440892e-16
3,UBE2Z,-1.421085e-14,0.0,0.0,0.0,0.0,0.0,0.0,2.501110e-12,0.0,...,-1.136868e-13,0.0,0.0,0.0,5.551115e-17,0.0,0.0,0.0,0.0,-4.440892e-16
4,UBE3C,-1.421085e-14,0.0,0.0,0.0,0.0,0.0,0.0,2.501110e-12,0.0,...,-1.136868e-13,0.0,0.0,0.0,5.551115e-17,0.0,0.0,0.0,0.0,-4.440892e-16
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
108,ST6GAL1,0.000000e+00,0.0,0.0,0.0,0.0,0.0,0.0,-1.047906e+02,0.0,...,8.777806e+00,0.0,0.0,0.0,-5.551115e-17,0.0,0.0,0.0,0.0,3.156138e-01
109,GCK,0.000000e+00,0.0,0.0,0.0,0.0,0.0,0.0,-1.047906e+02,0.0,...,8.777806e+00,0.0,0.0,0.0,-5.551115e-17,0.0,0.0,0.0,0.0,3.156138e-01
110,UBE2E2,0.000000e+00,0.0,0.0,0.0,0.0,0.0,0.0,-1.047906e+02,0.0,...,8.777806e+00,0.0,0.0,0.0,-5.551115e-17,0.0,0.0,0.0,0.0,3.156138e-01
111,XYLT1,0.000000e+00,0.0,0.0,0.0,0.0,0.0,0.0,-1.047906e+02,0.0,...,8.777806e+00,0.0,0.0,0.0,-5.551115e-17,0.0,0.0,0.0,0.0,3.156138e-01


In [7]:
# takes too much time
# fva_control=cobra.flux_analysis.flux_variability_analysis(fba.model, loopless=True)

114

#### Unique fluxes show subtypes
These are unique flux conditions for genetic knockouts. As seen in column ```knockout_genes```, many genetic knockout share the same flux distributions or metabolic states. We term these subtypes, as these highlight metabolic state of the human body under different genetic knockouts.

In [9]:
unique_fluxes=fb.unique_fluxes(solutions)
pd.DataFrame(unique_fluxes)

Unnamed: 0,num,knockout_genes,MAR03905,MAR03907,MAR04097,MAR04099,MAR04108,MAR04133,MAR04281,MAR04388,...,MAR20163,MAR20164,MAR20165,MAR20166,MAR20167,MAR20168,MAR20169,MAR20170,MAR20171,MAR20172
0,9,"[None, AQP10, USP44, SLC1A2, UBE2Z, UBE3C, PAR...",0.0,0.0,0.0,0.0,0.0,0.0,743.557931,0.0,...,-151.832818,0.0,0.0,0.0,-0.2596071,0.0,0.0,0.0,0.0,1.285417
1,1,[POLR1D],0.0,0.0,0.0,0.0,0.0,0.0,783.64819,0.0,...,-121.931457,0.0,0.0,0.0,3.4369e-16,0.0,0.0,0.0,0.0,1.096036
2,11,"[DMGDH, COPB1, APOE, MTAP, HERC2, MARCHF3, HPS...",0.0,0.0,0.0,0.0,0.0,0.0,713.010997,0.0,...,-158.360494,0.0,0.0,0.0,-0.2596071,0.0,0.0,0.0,0.0,1.521969
3,13,"[PEPD, PLCB3, ABO, MSRA, ATP8B2, PDE3B, PTPRQ,...",0.0,0.0,0.0,0.0,0.0,0.0,712.997198,0.0,...,-159.840865,0.0,0.0,0.0,-0.2596071,0.0,0.0,0.0,0.0,1.520545
4,1,[FARSA],0.0,0.0,0.0,0.0,0.0,0.0,853.488421,0.0,...,-175.437125,0.0,0.0,0.0,-2.0231520000000002e-17,0.0,0.0,0.0,0.0,1.18641
5,1,[NOS1],0.0,0.0,0.0,0.0,0.0,0.0,827.838935,0.0,...,-162.004465,0.0,0.0,0.0,-0.2596071,0.0,0.0,0.0,0.0,1.176982
6,20,"[HSD17B12, ATP2A3, DGKB, ATP2A1, USP36, APIP, ...",0.0,0.0,0.0,0.0,0.0,0.0,827.606024,0.0,...,-162.183665,0.0,0.0,0.0,-0.2596071,0.0,0.0,0.0,0.0,1.473086
7,1,[NUP160],0.0,0.0,0.0,0.0,0.0,0.0,777.20124,0.0,...,-162.665473,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.188908
8,1,[PLA2G6],0.0,0.0,0.0,0.0,0.0,0.0,880.545826,0.0,...,-196.65727,0.0,0.0,0.0,-0.2596071,0.0,0.0,0.0,0.0,0.1242998
9,1,[NUP133],0.0,0.0,0.0,0.0,0.0,0.0,794.644149,0.0,...,-125.9132,0.0,0.0,0.0,1.111924e-16,0.0,0.0,0.0,0.0,-1.176985e-16
