# SERIALIZER QUICK GO THROUGH

##### IFB Virtual Machine Configuration

* `rootDir` points to Jupyter Home Directory
* `projectDir` points to the Git project, holds the *tsv* file
* `libDir` points to Git project python library with files names `go.py  stat_utils.py  uniprot.py`
* `dataDir` points to shared data folder with uniprot and GO files

In [1]:
import sys

rootDir = "/Users/guillaumelaunay/work/communications/lectures/UCBL/M2_bioinfo/VDB/TP/VDB_jupyter"
libDir = rootDir + "/lib"
dataDir= "/Users/guillaumelaunay/work/communications/lectures/UCBL/M2_bioinfo/VDB/TP/data" 
projectDir=rootDir

sys.path.append(libDir)

In [2]:
%load_ext autoreload
%autoreload 2

In [3]:
import pandas, uniprot, go
import numpy as np
from stat_utils import computeORA


In [5]:
import pandas
df=pandas.read_csv(projectDir + "/TCL_wt1.tsv", delimiter ="\t")
dfClean = df[ (df['Log2 Corrected Abundance Ratio'] != '#VALEUR!') & (df['LOG10 Adj.P-val'] != '#VALEUR!')]
dfClean = dfClean.copy()
dfClean.loc[:,'Log2 Corrected Abundance Ratio'] = dfClean.loc[:,'Log2 Corrected Abundance Ratio'].astype(float)

In [6]:
uniprotCollection = uniprot.UniprotCollection(dataDir +"/dataset")
K12 = uniprot.UniprotCollection(dataDir + "/K12_proteome/")
xpProtList = dfClean['Accession'].tolist()
print("Loading ontology")
go.setOntology(dataDir + "/go.owl")
goTreeObj = go.createGoTree(ns="biological process", proteinList=xpProtList, uniprotCollection=uniprotCollection)
goTreeObj.dimensions

Loading ontology
Loaded
Extracting biological process ontology
Applying true path collapsing
2194 GO terms, 3482 leaves, 1474 proteins


(2194, 3482, 1474)

In [7]:
goTreeObjExp = go.createGoTree(ns="biological process", proteinList=xpProtList, uniprotCollection=uniprotCollection)
goTreeObjRef = go.createGoTree(ns="biological process", proteinList=K12.list, uniprotCollection=K12)

Extracting biological process ontology
Applying true path collapsing
2194 GO terms, 3482 leaves, 1474 proteins
Extracting biological process ontology
Applying true path collapsing
2728 GO terms, 6650 leaves, 3128 proteins


In [8]:
saList = dfClean[ dfClean['Log2 Corrected Abundance Ratio'] > 0.0 ]['Accession'].tolist()
oraScores = computeORA(goTreeObjExp, saList, goTreeObjRef)


Evaluated 298 / 2194 Pathways, based on 44 proteins


In [9]:
goTreeObjExp.getByName('amino acid transmembrane transport')

{'ID': 'GO:0003333', 'name': 'amino acid transmembrane transport', 'eTag': ['P23173'], 'leafCount': 0, 'features': {'Fisher': 0.3619566723145079, 'Hpg': 0.36593352068646445}, 'oNode': obo.GO_0003333, 'isDAGelem': True, 'children': ['proline import across plasma membrane', 'amino acid import across plasma membrane', 'L-alpha-amino acid transmembrane transport', 'L-arginine transmembrane transport', 'amino acid export across plasma membrane', 'cysteine transmembrane transport', 'regulation of amino acid transmembrane transport']}

In [12]:
import pickle
pickableTree = goTreeObjExp.makePickable()
with open("testGoTree.pkl", 'wb') as fp:
    pickle.dump(pickableTree, fp, protocol=pickle.HIGHEST_PROTOCOL)

In [35]:
goTreeObjClone.dimensions

(2194, 3482, 1474)

In [13]:
pickableTree

<go.AnnotationTree at 0x11d14fdd8>