Upstream signaling network reconstruction
===============================

The following code use [pyPath library](https://github.com/saezlab/pypath) to reconstruct signaling network from a list of biological entities (csv format) and querying [OmnipathDB](http://omnipathdb.org/).

## 1. Load function

Import modules:

In [8]:
#!/usr/bin/env python
import csv
import os
import time
import pypath
from pypath import curl
from pypath import data_formats

In [9]:
def _upstream_signaling(pa, max_depth, to_be_explore, already_explored=[], current_depth=0, network_sif = []):
    """
    Param:
    pa: pypath env, 
    max_depth: maximum level of reconstruction, 
    to_be_explore: list of entities, 
    already_explored=[], 
    current_depth=0, 
    network_sif = []
    """
    
    # Stopping criteria 1
    if current_depth >= max_depth:
        print("Exploring alted due to maximum depth")
        return network_sif
    else:
        print('Exploration depth ' + str(current_depth))
    # Stopping criteria 2
    if len(to_be_explore) == 0:
        print("Exploring done")
        return(network_sif)
    # Start exploring
    new_to_be_explored = []
    for gene in to_be_explore:
        # get entity that get affected by MYC using vertex object (inhibition, stimulation or other)
        regulators_list = list(pa.gs_affects(gene))
        already_explored.append(gene)
        # get direction and sign of interation
        for reg in range(len(regulators_list)):
            # direction and sign
            edge = pa.get_edge(regulators_list[reg]["name"], gene)
            dirs = edge['dirs']
            sign_check = dirs.get_sign(dirs.reverse) # reverse: source ===> target
            # A pair of boolean values means if the interaction is stimulation and if it is inhibition, respectively [True, False] 
            if sign_check[0] == True and sign_check[1] == False:
                sign = 'stimulation'
            elif sign_check[0] == False and sign_check[1] == True: 
                sign = 'inhibition'
            elif sign_check[0] == True and sign_check[1] == True:
                sign = 'stimulation_and_inhibition'
            else:
                sign = 'unknown'
            if regulators_list[reg]["label"] not in already_explored:
                new_to_be_explored.append(regulators_list[reg]["label"])
            # ID, name, sign and provenance
            network_sif.append({"source_id":regulators_list[reg]["name"], "source_name":regulators_list[reg]["label"], \
                "provenance":list(regulators_list[reg]["sources"]), "target_name":gene, "sign":sign})
    print("Depth explored " + str(current_depth))
    current_depth += 1
    _upstream_signaling(pa, max_depth, new_to_be_explored, already_explored, current_depth, network_sif)
    return network_sif


def _print_to_csv(network, output_path):
    """
    Param: network, path of output file
    """
    f = open(output_path + "-temp", "w+")
    # set headers
    #f.write("source_id,source_name,target_name,sign,provenance\n")
    for e in network:
        f.write(e['source_id'] + ",")
        f.write(e['source_name'] + ",")
        f.write(e['target_name'] + ",")
        f.write(e['sign'] + ",")
        f.write(str(' '.join(e['provenance'])) + "\n")
    f.close()
    # remove duplicate
    os.system("sort " + output_path + "-temp | uniq -c > " + output_path )
    os.system("rm " + output_path + "-temp")


## 2. Configuration

In [10]:
MAX_DEPTH = 8
INPUT_GENES = []
inputfile_path = 'input-910.csv'
outfile_path = 'md08-pypath_omnipathDB.csv'

## 3. Read input file

In [11]:
with open(inputfile_path, 'rt') as csvfile:
    spamreader = csv.reader(csvfile, delimiter=',', quotechar='"')
    for row in spamreader:
        INPUT_GENES.append(row[0])

## 4. Main

In [None]:
# Init pypath
pa = pypath.PyPath()

# Load databases
with curl.cache_off():
   pa.load_resources(data_formats.pathway)



	=== d i s c l a i m e r ===

	All data coming with this module
	either as redistributed copy or downloaded using the
	programmatic interfaces included in the present module
	are available under public domain, are free to use at
	least for academic research or education purposes.
	Please be aware of the licences of all the datasets
	you use in your analysis, and please give appropriate
	credits for the original sources when you publish your
	results. To find out more about data sources please
	look at `pypath.descriptions` and
	`pypath.data_formats.urls`.

	> New session started,
	session ID: 'im7ng'
	logfile: './log/im7ng.log'
	pypath version: 0.7.120


        Downloading `` from www.uniprot.org -- 0.00B downloaded: : 1.36Mit [01:03, 56.1kit/s]


	:: Ready. Resulted `plain text` of type unicode string.                                                                                              
	:: Local file at `/home/mlefebvre/.pypath/cache/582ab5c0b5fffa42fb3e8d1757901ea1-`.
 > TRIP
	:: Loading 'uniprot-sec' to 'uniprot-pri' mapping table


        Downloading `` from www.uniprot.org -- 0.00B downloaded: : 0.00it [00:00, ?it/s]


	:: Ready. Resulted `plain text` of type unicode string.                                                                                              
	:: Local file at `/home/mlefebvre/.pypath/cache/15ff42ede8eb6a22b432a886075c2203-`.


        Downloading `sec_ac.txt` from ftp.uniprot.org -- 31.91MB downloaded: 100%|██████████| 31.9M/31.9M [00:13<00:00, 1.87Mit/s]


	:: Ready. Resulted `plain text` of type file object.                                                                                                 
	:: Local file at `/home/mlefebvre/.pypath/cache/49314fe217bf0f2a5544a2c4314b4adf-sec_ac.txt`.


        Reading from file -- finished: : 0.00it [00:00, ?it/s]


	:: Loading 'genesymbol' to 'trembl' mapping table


        Processing nodes -- finished: 100%|██████████| 370/370 [00:00<00:00, 78.8kit/s]
        Processing edges -- finished: 100%|██████████| 370/370 [00:00<00:00, 90.2kit/s]
        Processing attributes -- finished: 100%|██████████| 370/370 [00:00<00:00, 6.19kit/s]


 > SPIKE
	:: Loading data from cache previously downloaded from www.cs.tau.ac.il
	:: Extracting zip data                                                                                                                               
	:: Error in `pypath.dataio.spike_interactions()`. Skipping to next resource.
	:: ('File is not a zip file',)
  File "/home/mlefebvre/anaconda2/envs/pybravo/lib/python3.6/site-packages/pypath/main.py", line 2284, in read_data_file
    infile = inputFunc(**settings.inputArgs)
  File "/home/mlefebvre/anaconda2/envs/pybravo/lib/python3.6/site-packages/pypath/dataio.py", line 7639, in spike_interactions
    url, silent=False, large=True, files_needed=['LatestSpikeDB.xml'])
  File "/home/mlefebvre/anaconda2/envs/pybravo/lib/python3.6/site-packages/pypath/curl.py", line 770, in __init__
    self.process_file()
  File "/home/mlefebvre/anaconda2/envs/pybravo/lib/python3.6/site-packages/pypath/curl.py", line 1221, in process_file
    self.extract_file()
  File "/home

        Reading file -- finished: 100%|██████████| 16.3M/16.3M [00:00<00:00, 21.8Mit/s]


	:: Loading 'genesymbol' to 'swissprot' mapping table
	:: Loading 'genesymbol-syn' to 'swissprot' mapping table
	:: Loading 'genesymbol' to 'uniprot' mapping table


        Processing nodes -- finished: 100%|██████████| 6.94k/6.94k [00:00<00:00, 355kit/s]
        Processing edges -- finished: 100%|██████████| 6.94k/6.94k [00:00<00:00, 150kit/s]
        Processing attributes -- finished: 100%|██████████| 6.94k/6.94k [00:01<00:00, 3.77kit/s]


 > Guide2Pharma
	:: Loading data from cache previously downloaded from www.guidetopharmacology.org
	:: Ready. Resulted `plain text` of type file object.                                                                                                 
	:: Local file at `/home/mlefebvre/.pypath/cache/61ddc6eb0ff8ef877c52f5d1a81b9db2-interactions.csv`.


        Processing nodes -- finished: : 0.00it [00:00, ?it/s]
        Processing edges -- finished: : 0.00it [00:00, ?it/s]
        Processing attributes -- finished: : 0.00it [00:00, ?it/s]


 > CA1
	:: Loading data from cache previously downloaded from science.sciencemag.org
	:: Ready. Resulted `zip extracted data` of type dict of unicode strings.                                                                             
	:: Local file at `/home/mlefebvre/.pypath/cache/a221dcf8846ad398634f9997a1011b9e-Maayan_SOM_External_Files.zip`.


        Processing nodes -- finished: 100%|██████████| 1.48k/1.48k [00:00<00:00, 156kit/s]
        Processing edges -- finished: 100%|██████████| 1.48k/1.48k [00:00<00:00, 108kit/s]
        Processing attributes -- finished: 100%|██████████| 1.48k/1.48k [00:00<00:00, 4.11kit/s]


 > ARN


        Processing nodes -- finished: 100%|██████████| 95.0/95.0 [00:00<00:00, 20.4kit/s]
        Processing edges -- finished: 100%|██████████| 95.0/95.0 [00:00<00:00, 16.4kit/s]
        Processing attributes -- finished: 100%|██████████| 95.0/95.0 [00:00<00:00, 1.47kit/s]


 > NRF2ome


        Processing nodes -- finished: 100%|██████████| 109/109 [00:00<00:00, 32.1kit/s]
        Processing edges -- finished: 100%|██████████| 109/109 [00:00<00:00, 25.7kit/s]
        Processing attributes -- finished: 100%|██████████| 109/109 [00:00<00:00, 1.63kit/s]


 > Macrophage
	:: Loading data from cache previously downloaded from static-content.springer.com
	:: Ready. Resulted `plain text` of type file object.                                                                                                 
	:: Local file at `/home/mlefebvre/.pypath/cache/e7b73bcce14b977a3e384518d40f3637-12918_2010_452_MOESM2_ESM.XLS`.
	:: Loading 'genesymbol-syn' to 'uniprot' mapping table


        Processing nodes -- finished: 100%|██████████| 4.86k/4.86k [00:00<00:00, 346kit/s]
        Processing edges -- finished: 100%|██████████| 4.86k/4.86k [00:00<00:00, 141kit/s]
        Processing attributes -- finished: 100%|██████████| 4.86k/4.86k [00:00<00:00, 7.90kit/s]


 > DeathDomain
	:: Loading data from cache previously downloaded from www.deathdomain.org
	:: Ready. Resulted `plain text` of type unicode string.                                                                                              
	:: Local file at `/home/mlefebvre/.pypath/cache/13a21b2a1cae61e0b6ee14f0c6230507-show`.
	:: Loading data from cache previously downloaded from www.deathdomain.org
	:: Ready. Resulted `plain text` of type unicode string.                                                                                              
	:: Local file at `/home/mlefebvre/.pypath/cache/a8a54754a427fd1fd6c4a03cdf4f002f-show`.
	:: Loading data from cache previously downloaded from www.deathdomain.org
	:: Ready. Resulted `plain text` of type unicode string.                                                                                              
	:: Local file at `/home/mlefebvre/.pypath/cache/a0fb2adccfb56fea33fc5666e3242485-show`.
	:: Loading data from cache previously d

        Processing nodes -- finished: 100%|██████████| 236/236 [00:00<00:00, 64.8kit/s]
        Processing edges -- finished: 100%|██████████| 236/236 [00:00<00:00, 67.6kit/s]
        Processing attributes -- finished: 100%|██████████| 236/236 [00:00<00:00, 4.29kit/s]


 > PDZBase
	:: Loading data from cache previously downloaded from abc.med.cornell.edu
	:: Ready. Resulted `plain text` of type unicode string.                                                                                              
	:: Local file at `/home/mlefebvre/.pypath/cache/d16213901cd27de19ef825068fd8faa6-allinteractions`.


        Processing nodes -- finished: 100%|██████████| 125/125 [00:00<00:00, 18.8kit/s]
        Processing edges -- finished: 100%|██████████| 125/125 [00:00<00:00, 22.2kit/s]
        Processing attributes -- finished: 100%|██████████| 125/125 [00:00<00:00, 1.56kit/s]


 > Signor
	:: Loading data from cache previously downloaded from signor.uniroma2.it
	:: Ready. Resulted `plain text` of type file object.                                                                                                 
	:: Local file at `/home/mlefebvre/.pypath/cache/a357fe979f74a823bf4a42150a6dcf33-download_entity.php`.


In [6]:
start_time = time.time()
network = _upstream_signaling(pa, MAX_DEPTH, INPUT_GENES)
elapsed_time = round((time.time() - start_time), 2)
print("--- Upstream signaling network in %s seconds ---" % elapsed_time)
_print_to_csv(network, outfile_path)

Exploration depth 0


        Setting directions -- finished: 100%|██████████| 12.7k/12.7k [00:14<00:00, 861it/s]


Depth explored0
Exploration depth 1
Depth explored1
Exploration depth 2
Depth explored2
Exploration depth 3
Depth explored3
Exploration depth 4
Depth explored4
Exploration depth 5
Depth explored5
Exploration depth 6
Depth explored6
Exploration depth 7
Depth explored7
Exploring alted due to maximum depth
--- Upstream signaling network in 37.33 seconds ---


NameError: name 'os' is not defined