<img src="./pictures/logomegalomo.png" alt="Drawing" style="width: 300px;"/>

<img src="./pictures/FullPipeline.png" alt="Drawing" style="width: 1000px;"/>

# Using the Pipeline with serialized dataSet

In this notebook, we use a particular terminology.

The original dataSet, containing informations are usually called **"Template"**

>Example : a template Id is an Id from Intact

The informations that we want to retrieve are called **"Query"**

>Example : a query Id is an Id from our organism (here : *Streptococcus pneumoniae*)

#### In This notebook, when you see "/PATH/TO/YOUR/..." written somewhere, you have to put the full path to the required file or directory.

## Packages for the pipeline

* First you have to import general packages for the pipeline :

In [3]:
import sys
import networkx as nx
import numpy as np
import matplotlib.pyplot as plt
import copy
import json
%load_ext autoreload

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


* Then, the pipeline itself :

**You have to write the path to your OmegaLoMo lib directory**

**You also have to put all your own paths**

In [4]:
sys.path.append('/PATH/TO/omegaLoMo/lib/')
import core as ca
import createTopo as cT
import graph as graph
import smallWork as sW

## Creating Topology

<img src="./pictures/Preprocessing.png" alt="Drawing" style="width: 200px;"/>

<img src="./pictures/NewDicInfo.png" alt="Drawing" style="width: 800px;"/>

* *database* and *filterIds* are here just to be initialized. Since we use serialized version, their paths are not required.


* *newDic.json* is the reduced topology from Intact.


In [5]:
database = ""
filterIds = ""

entireTopo = cT.Topology(database, filterIds)
newDic = entireTopo.deSerialize('/PATH/TO/THE/FILE/newDic.json')

## Processing data

<img src="./pictures/Analysis.png" alt="Drawing" style="width: 200px;"/>

<img src="./pictures/Analysis_flow.png" alt="Drawing" style="width: 800px;"/>

<u>Processing data involve switching between the "Template's space" to the "Query's space".</u>

* *indexR6* is a list of UniprotId corresponding to the proteome of your organims

* *bean* is a serialized version of a small dataset, corresponding to a topology

* *path* is the path to a directory with some blast.out files (in xml format)

In [6]:
indexR6 = '/PATH/TO/THE/FILE/indexR6'

In [7]:
omegaSet = ca.HomegaSet(bean='/PATH/TO/THE/FILE/FullOmegaSet_coverage.json',
                        queryIdList=indexR6)

In [None]:
omegaMatrix = ca.OmegaMatrix(topo = newDic, omegaSet = omegaSet)
omegaMatrix.reduceAndVectorInject()
queryTopo = omegaMatrix.project()

## Selection of proteins of interest  


<img src="./pictures/target.png" alt="Drawing" style="width: 200px;"/>
  

* *divR6Known* is a downloaded list from uniprot with the <u>"cell division"</u> GO term selected : **[GO:0051301]**

<div class="alert alert-info">
    You can check out the list <a href=http://www.uniprot.org/uniprot/?query=taxonomy:%22Streptococcus%20pneumoniae%20(strain%20ATCC%20BAA-255%20/%20R6)%20[171101]%22+go:51301> here
</div>

In [3]:
divisome = sW.Divisome()
onlyId = divisome.getDivisomeID('/PATH/TO/THE/FILE/divKnownR6.txt')

Here, you can check if the parsing works fine.

The next cell is supposed to print your Id list with Uniprot identifier.

In [4]:
print onlyId

['Q8CWP9', 'Q8DQM0', 'P64073', 'Q8DQE5', 'P64167', 'Q8DR70', 'Q8DR57', 'Q8DNE8', 'Q8DNI9', 'Q9EUQ7', 'Q8DNS0', 'Q8DR29', 'Q7ZAK7', 'Q8DR55', 'Q8DPV4', 'Q8DP40', 'Q8DR69', 'Q8CWQ5', 'Q8DQM2', 'Q8DQM1', 'P65467', 'P59676', 'Q8DPW6', 'Q8DQH3', 'Q8DNV6', 'Q8DQH4', 'Q8DNV8', 'Q8DQE8', 'Q8CZ65', 'Q8DPK2', 'Q8DNV9']


## Graphs Functions

<img src="./pictures/Results.png" alt="Drawing" style="width: 200px;"/>

<img src="./pictures/Resultats_example.png" alt="Drawing" style="width: 800px;"/>

<u>Results of previous analysis can be observed above.</u>

The example above show the **full graph** *(left)* of all the predicted interactions between queries.

On the right, you can see a **"nearest neighbor graph"** *(top)* of a selected protein and some statistics *(bottom)* of this graph.

In [None]:
interactome = graph.Interactome(queryTopo)
test = interactome.drawGraph()
nx.draw_networkx(test, with_labels = True)
plt.show()

In [None]:
premierVoisin = interactome.createNeiGraph('Q8DR55', test)
drawPremierVoisin = interactome.drawNeiGraph(premierVoisin)
filtre = interactome.filterGraph(drawPremierVoisin, evalue = 2.2272e-10, coverage = 90)
stats = interactome.drawCurveParam(premierVoisin)
plt.show()
nx.draw_networkx(drawPremierVoisin, with_labels = True)
plt.show()

# Using your own DataSet
## *WORK IN PROGRESS* | *WORK IN PROGRESS* | *WORK IN PROGRESS*

<div class="alert alert-danger">
This might take a while for big database
</div>

In [None]:
database = "/PATH/TO/YOUR/DATABASE"
filterIds = "/PATH/TO/FILTER/FILE"

topo = cT.Topology(database, filterIds)
newDic = topo.filter_With(filterIds)

In [None]:
indexR6 = '/PATH/TO/INDEX/R6'

In [20]:
Use the following cell if you start from multiple blast (format xml)

<div class="alert alert-warning">
    This step recquire more time, cause it's parsing all blast output
</div>

SyntaxError: invalid syntax (<ipython-input-20-ddbf30b7f82b>, line 1)

In [None]:
omegaSet = ca.HomegaSet(path='/PATH/TO/ALL/BLAST/DIR/',
                        queryIdList=indexR6)

In [None]:
divisome = sW.Divisome()
onlyId = divisome.getDivisomeID('/PATH/TO/FILE/DIV')