# The pypath tutorial collection

Before April 2019 on the OmniPath webpage (http://omnipathdb.org/) we had a few tutorials for `pypath`. However over the past years we developed a lot `pypath` and especially recently a number of important points in the interface changed (although we wanted to keep compatibility as much as possible). This is a new comprehensive tutorial which replaced the previous tutorials by April 2019 and has been updated in August 2019.

## Table of contents
* [1: Quick start – How do I build OmniPath data with *pypath*?](#quick-start)
* [2: Quick start – I just want a network quickly and play around with *pypath*](#quick-start-2)
* [3: Quick start – How do I build networks from any data with *pypath*?](#quick-start-3)
    * [3a: Defining input formats](#input-formats)
    * [3b: Creating PyPath object and loading the 2 test files](#toy-example)
* [4: Plotting the network with *igraph*](#plot)
* [5: Building networks](#building-networks)
    * [5a: Which network datasets are pre-defined in pypath?](#network-resources)
* [6: How to access the network](#access-network)
* [7: Directions and signs](#directions)
* [8: Accessing nodes in the network](#nodes)
* [9: Querying relationships with our without causality](#causality)
* [10: Accessing edges by identifiers](#edge-lookup)
* [11: Literature references](#references)
* [12: Translating identifiers](#mapping)
* [13: Enzyme-substrate interactions](#enz-sub)
* [14: Annotations](#annotations)
* [15: Inter-cellular signaling roles](#intercell)
* [16: Gene Ontology](#gene-ontology)
* [17: Protein complexes](#complexes)
* [18: Saving datasets as pickles](#pickle)
* [19: Network in *pandas.DataFrame*](#network-pandas)
* [20: Log messages and sessions](#log-session)
* [21: BEL export](#bel)
* [22: CellPhoneDB export](#cellphonedb)

## 1: Introduction

OmniPath consists 5 main database segments: network *(interactions)*, enzyme-substrate interactions *(ptms)*, protein complexes *(complexes)*, molecular entity annotations *(annotations)* and intercellular communication roles *(intercell)*. You can access all these by the web service at http://omnipathdb.org/ and the <a href="https://bioconductor.org/packages/release/bioc/html/OmnipathR.html">R/Bioconductor package *OmnipathR*</a>, furthermore the network and some of the annotations by the <a href="http://apps.cytoscape.org/apps/omnipath">Cytoscape app</a>. However only *pypath* is able to build these databases directly from the original sources with various options for customization and to provide a rich and versatile API for each database enjoying the almost unlimited flexibility of Python. From version `0.9` *pypath* is not dependent any more on *igraph* and *cairo* which meant the main difficulty at installation for many Macintosh users. From version `0.10` *pypath* has been reorganized to have a clear module structure and it has a brand new API for the *network* database. Due to reaching these milestones we are confident to recommend it as the preferred way to access *OmniPath* in any case you need greater freedom than the web service can offer.

## 2: Quick start

We provide a high level interface in the module *pypath.omnipath.app*. This is the easiest way to build, manage and access the OmniPath databases, hence this is what we present in the *Quick start* section. In further sections we show the lower level modules more in detail.

### 2.1: The OmniPath app

*pypath.omnipath* is an application which contains a database manager at *omnipath.data*. This manager is empty by default. It builds and loads the databases on demand. 

In [2]:
from pypath import omnipath

omnipath.data

<pypath.omnipath.app.Database at 0x66c858934130>

### 2.2: Networks

OmniPath offers multiple built in network datasets: the OmniPath PPI network the more strict literature curated PPI network, the special ligand-receptor PPI network and various other PPI datasets, the transcriptional regulation network from DoRothEA and other resources, miRNA post-transcriptional regulation network and also transcriptional regulation network for miRNAs.

#### 2.2.1: Strictly literature curated network

In [3]:
from pypath import omnipath
cu = omnipath.data.get_db('curated')
cu

<Network: 7126 nodes, 35112 interactions>

#### 2.2.2: The OmniPath network with extra activity flow, enzyme-substrate and ligand-receptor interactions

In [11]:
from pypath import omnipath
op = omnipath.data.get_db('omnipath')
op

<Network: 10936 nodes, 109169 interactions>

#### 2.2.3: Transcriptional regulation network from DoRothEA and other resources

Note: according to the default settings, DoRothEA confidence levels A-D and all original resources will be loaded.

In [10]:
from pypath import omnipath
tft = omnipath.data.get_db('tf_target')
tft

<Network: 15121 nodes, 75004 interactions>

#### 2.2.4: Literature curated miRNA post-transcriptional regulation network

In [7]:
from pypath import omnipath
mi = omnipath.data.get_db('mirna_mrna')
mi

<Network: 3353 nodes, 8278 interactions>

#### 2.2.5: Transcriptional regulation of miRNA

In [6]:
from pypath import omnipath
tmi = omnipath.data.get_db('tf_mirna')
tmi

<Network: 1036 nodes, 4979 interactions>

#### 2.2.6: lncRNA-mRNA interactions

In [5]:
from pypath import omnipath
lnc = omnipath.data.get_db('lncrna_mrna')
lnc

<Network: 247 nodes, 229 interactions>

### 2.3: Enzyme-substrate relationships

In [15]:
from pypath import omnipath
es = omnipath.data.get_db('enz_sub')
es

<Enzyme-substrate database: 34327 relationships>

### 2.4: Protein complexes

In [19]:
from pypath import omnipath
co = omnipath.data.get_db('complex')
co

<Complex database: 21949 complexes>

### 2.5: Annotations

In [4]:
from pypath import omnipath
an = omnipath.data.get_db('annotations')
an

<Annotation database: 4941947 records about 45448 entities from 43 resources>

### 2.6: Inter-cellular communication roles

In [5]:
from pypath import omnipath
ic = omnipath.data.get_db('intercell')
ic

<Intercell annotations: 212296 records about 40068 entities>

### 2.7 : Cache management and customization

The `pypath.omnipath.app` saves the databases to pickle dumps by default under the `~/.pypath/pickles/` directory and after the first build loads them from there. The very first build of each database might take quite long time (up to >90 min in case of the OmniPath network or annotations) because of the large number of downloads. Subsequent builds will be much faster because `pypath` stores all the downloaded data in a local cache and downloads again only upon request from the user. Loading the databases from pickle dumps takes only seconds. However if you want to build with different settings you should be aware to set up a different cache file name.

## 3: Building networks <a class="anchor" id="building-networks"></a>

For this you will need the `Network` class from the `pypath.core.network` module which takes care about building and querying the network. Also you need the `pypath.resources.network` module where you find a number of predefined input settings organized in larger categories (e.g. activity flow, enzyme-substrate, transcriptional regulation, etc). These input settings will tell `pypath` how to download and process the data.

In [6]:
from pypath.core import network
from pypath.resources import network as netres

For example the `netres.pathway` is a collection of databases which fit into the activity flow concept, i.e. one protein either stimulates or inhibits the other. It is a dictionary with names as keys and the input settings as values:

In [7]:
netres.pathway

{'trip': <NetworkResource: TRIP (post_translational, activity_flow)>,
 'spike': <NetworkResource: SPIKE (post_translational, activity_flow)>,
 'signalink3': <NetworkResource: SignaLink3 (post_translational, activity_flow)>,
 'guide2pharma': <NetworkResource: Guide2Pharma (post_translational, activity_flow)>,
 'ca1': <NetworkResource: CA1 (post_translational, activity_flow)>,
 'arn': <NetworkResource: ARN (post_translational, activity_flow)>,
 'nrf2': <NetworkResource: NRF2ome (post_translational, activity_flow)>,
 'macrophage': <NetworkResource: Macrophage (post_translational, activity_flow)>,
 'death': <NetworkResource: DeathDomain (post_translational, activity_flow)>,
 'pdz': <NetworkResource: PDZBase (post_translational, activity_flow)>,
 'signor': <NetworkResource: SIGNOR (post_translational, activity_flow)>,
 'adhesome': <NetworkResource: Adhesome (post_translational, activity_flow)>,
 'hpmr': <NetworkResource: HPMR (post_translational, activity_flow)>,
 'cellphonedb': <NetworkRes

Such a dictionary you can pass to the `load` method of the `network.Network` object. Then it will download the data from the original sources, translate the identifiers and merge the networks. Pypath stores all downloaded data in a cache, by default `~/.pypath/cache` in your user's home directory. For this reason when you load a resource for the first time it might take long but next time will be faster as data will be fetched from the cache. First create a `pypath.network.Network` object, then build the network:

In [8]:
n = network.Network()
n.load(netres.pathway)

In [9]:
n

<Network: 5697 nodes, 24571 interactions>

You can add more resource sets a similar way:

In [10]:
n.load(netres.ptm)

In [11]:
n

<Network: 7126 nodes, 35112 interactions>

To load one single resource simply pass the `NetworkResource` directly:

In [12]:
n.load(netres.interaction['matrixdb'])

In [13]:
n

<Network: 7155 nodes, 35323 interactions>

### 5a: Which network datasets are pre-defined in pypath? <a class="anchor" id="network-resources"></a>

You can find all the pre-defined datasets in the `pypath.data_formats` module. As already we mentined above, the `pathway` dataset contains the literature curated activity flow resources. This was the original focus of pypath and OmniPath, however since then we added a great variety of other kinds of resource definitions. Here we give an overview of these.

* `pypath.resources.network.pathway`: activity flow networks with literature references
* `pypath.resources.network.activity_flow`: synonym for `pathway`
* `pypath.resources.network.pathway_noref`: activity flow networks without literature references
* `pypath.resources.network.pathway_all`: all activity flow data
* `pypath.resources.network.ptm`: enzyme-substrate interaction networks with literature references
* `pypath.resources.network.enzyme_substrate`: synonym for `ptm`
* `pypath.resources.network.ptm_noref`: enzyme-substrate networks without literature references
* `pypath.resources.network.ptm_all`: all enzyme-substrate data
* `pypath.resources.network.interaction`: undirected interactions from both literature curated and high-throughput collections (e.g. IntAct, BioGRID)
* `pypath.resources.network.interaction_misc`: undirected, high-scale interaction networks without the constraint of having any literature reference (e.g. the unbiased human interactome screen from the Vidal lab)
* `pypath.resources.network.transcription_onebyone`: transcriptional regulation databases (TF-target interactions) with all databases downloaded directly and processed by `pypath`
* `pypath.resources.network.transcription`: transcriptional regulation only from the DoRothEA data
* `pypath.resources.network.mirna_target`: miRNA-mRNA interactions from literature curated resources
* `pypath.resources.network.tf_mirna`: transcriptional regulation of miRNA from literature curated resources
* `pypath.resources.network.lncrna_protein`: lncRNA-protein interactions from literature curated datasets
* `pypath.resources.network.ligand_receptor`: ligand-receptor interactions from both literature curated and other kinds of resources
* `pypath.resources.network.pathwaycommons`: the PathwayCommons database
* `pypath.resources.network.reaction`: process description databases; not guaranteed to work at this moment
* `pypath.resources.network.reaction_misc`: alternative definitions to load process description databases; not guaranteed to work at this moment
* `pypath.resources.network.small_molecule_protein`: signaling interactions between small molecules and proteins

To see the list of the resources in a dataset, you can check the dict keys or the `name` attribute of each element:

In [14]:
netres.pathway.keys()

dict_keys(['trip', 'spike', 'signalink3', 'guide2pharma', 'ca1', 'arn', 'nrf2', 'macrophage', 'death', 'pdz', 'signor', 'adhesome', 'hpmr', 'cellphonedb', 'ramilowski2015', 'lrdb', 'baccin2019'])

In [15]:
[resource.name for resource in netres.pathway.values()]

['TRIP',
 'SPIKE',
 'SignaLink3',
 'Guide2Pharma',
 'CA1',
 'ARN',
 'NRF2ome',
 'Macrophage',
 'DeathDomain',
 'PDZBase',
 'SIGNOR',
 'Adhesome',
 'HPMR',
 'CellPhoneDB',
 'Ramilowski2015',
 'LRdb',
 'Baccin2019']

## 6: How to access the network <a class="anchor" id="access-network"></a>

Once you built a network you can use it for various purposes and write your own scripts for further processing or analysis. The network is represented by an `igraph` object ([igraph.org](http://igraph.org/)):

In [22]:
pa.graph

NameError: name 'pa' is not defined

Number of edges and nodes:

In [None]:
pa.ecount, pa.vcount

The edge and vertex sequences you can access in the `es` and `vs` attributes, you can iterate these or index by integers. The edge and vertex attributes you can access by string keys. E.g. get the sources of edge 0:

In [None]:
pa.graph.es[81]['sources']

## 7: Directions and signs <a class="anchor" id="directions"></a>

By default the `igraph` object is undirected but it carries all direction information in Python objects assigned to each edge. Pypath can convert it to a directed `igraph` object, but you still need the `Direction` objects to have the signs, as `igraph` has no signed network representation. Certain methods need the directed `igraph` object and they will automatically create it, but you can create it manually:

In [None]:
pa.get_directed()

You find the directed network in the `pa.dgraph` attribute:

In [None]:
pa.dgraph

Now let's take a look on the `pypath.main.Direction` objects which contain details about directions and signs. First as an example, select a random edge:

In [None]:
edge = pa.graph.es[3241]

The `Direction` object is in the `dirs` edge attribute:

In [None]:
d = edge['dirs']

It has a method to print its content a human readable way:

In [None]:
print(pa.graph.es[3241]['dirs'])

From this we see the databases phosphoELM and Signor agree that protein `P17252` has an effect on `Q15139` and Signor in addition tells us this effect is stimulatory. However in your scripts you can query the `Direction` objects a number of ways. Each `Direction` object calls the two possible directions either straight or reverse:

In [None]:
d.straight

In [None]:
d.reverse

It can tell you if one of these directions is supported by any of the network resources:

In [None]:
d.get_dir(d.straight)

Or it can return those resources:

In [None]:
d.get_dir(d.straight, sources = True)

The opposite direction is not supported by any resource:

In [None]:
d.get_dir(d.reverse, sources = True)

Similar way the signs can be queried. The returned pair of boolean values mean if the interaction in this direction is stimulatory or inhibitory, respectively.

In [None]:
d.get_sign(d.straight)

Or you can ask whether it is inhibition:

In [None]:
d.is_inhibition(d.straight)

Or if the interaction is directed at all:

In [None]:
d.is_directed()

Sometimes resources don't agree, for example one tells an interaction is inhibition while according to others it is stimulation; or one tells A effects B and another resource the other way around. Here we preserve all these potentially contradicting information in the `Direction` object and at the end you decide what to do with it depending on your purpose. If you want to get rid of ambiguity there is a method to get a consensus direction and sign which returns the attributes the most resources agree on: 

In [23]:
d.consensus_edges()

NameError: name 'd' is not defined

## 8: Accessing nodes in the network <a class="anchor" id="nodes"></a>

In `igraph` the vertices are numbered but this numbering can change at certain operations. Instead the we can use the vertex attributes. In `PyPath` for proteins the `name` attribute is UniProt ID by default and the `label` is Gene Symbol.

In [None]:
pa.graph.vs['name'][:5]

In [None]:
pa.graph.vs['label'][:5]

The `PyPath` object offers a number of helper methods to access the nodes by their names. For example, `uniprot` or `up` returns the `igraph.Vertex` for a UniProt ID:

In [None]:
type(pa.up('P00533'))

Similarly `genesymbol` or `gs` for Gene Symbols:

In [None]:
type(pa.gs('ESR1'))

Each of these has a "plural" version:

In [None]:
len(list(pa.gss(['MTOR', 'ATG16L2', 'ULK1'])))

And a generic method where you can mix UniProts and Gene Symbols:

In [None]:
len(list(pa.proteins(['MTOR', 'P00533'])))

## 9: Querying relationships with our without causality <a class="anchor" id="causality"></a>

Above you could see how to query the directions and names of individual edges and nodes. Building on top of these, other methods give a way to query causality, i.e. which proteins are affected by an other one, and which others are its regulators. The example below returns the nodes PIK3CA is stimulated by, the `gs` prefix tells we query by the Gene Symbol:

In [None]:
pa.gs_stimulated_by('PIK3CA')

It returns a so called `_NamedVertexSeq` object, which you can get a series of `igraph.Vertex` objects or Gene Symbols or UniProt IDs from:

In [None]:
list(pa.gs_stimulated_by('PIK3CA').gs())[:5]

In [None]:
list(pa.gs_stimulated_by('PIK3CA').up())[:5]

Note, the names of these methods are a bit contraintuitive, the for example the `gs_stimulates` returns the genes stimulated by PIK3CA:

In [None]:
list(pa.gs_stimulates('PIK3CA').gs())[:5]

In [None]:
'PIK3CA' in set(pa.affected_by('AKT1').gs())

There are many similary methods, `inhibited_by` returns negative regulators, `affected_by` does not consider +/- signs, without `gs_` and `up_` prefixes you can provide either of these identifiers, `neighbors` does not consider the direction. At the end `.gs()` converts the result for a list of Gene Symbols, `up()` to UniProts, `.ids()` to vertex IDs and by default it yields `igraph.Vertex` objects:

In [None]:
list(pa.neighbors('AKT1').ids())[:5]

Finally, with `neighborhood` methods return the indirect neighborhood in custom number of steps (however size of the neighborhood increases rapidly with number of steps):

In [None]:
print(list(pa.neighborhood('ATG3', 1).gs()))

In [None]:
print(list(pa.neighborhood('ATG3', 2).gs()))

In [None]:
len(list(pa.neighborhood('ATG3', 3).gs()))

In [None]:
len(list(pa.neighborhood('ATG3', 4).gs()))

## 10: Accessing edges by identifiers <a class="anchor" id="edge-lookup"></a>

Just like nodes also edges can be accessed by identifiers like Gene Symbols. `get_edge` returns an `igraph.Edge` if the edge exists otherwise `None`.

In [None]:
type(pa.get_edge('EGF', 'EGFR'))

In [None]:
type(pa.get_edge('EGF', 'P00533'))

In [None]:
type(pa.get_edge('EGF', 'AKT1'))

In [None]:
print(pa.get_edge('EGF', 'EGFR')['dirs'])

## 11: Literature references <a class="anchor" id="references"></a>

Select a random edge and in the `references` attribute you find a list of references:

In [24]:
edge = pa.get_edge( 'MAP1LC3B', 'SQSTM1')
edge['references']

NameError: name 'pa' is not defined

Each reference has a PubMed ID:

In [None]:
edge['references'][0].pmid

In [None]:
edge['references'][0].open()

These 3 references come from 3 different databases, but there must be 2 overlaps between them:

In [None]:
edge['refs_by_source']

## 12: Translating identifiers <a class="anchor" id="mapping"></a>

The `pypath.mapping` module is for ID translation, most of the time you can simply call the `map_name` method:

In [None]:
from pypath import mapping
mapping.map_name('P00533', 'uniprot', 'genesymbol')

In [None]:
mapping.map_name('8408', 'entrez', 'uniprot')

A number of mapping tables are predefined and loaded automatically. However it does not translate in 2 steps if no direct translation table is available. For example *Entrez* to *Gene Symbol* you can translate this way:

In [None]:
mapping.map_names(
    mapping.map_name('8408', 'entrez', 'uniprot'),
    'uniprot',
    'genesymbol',
)

By default the `map_name` function returns a `set` because it accounts for ambiguous mapping. However most often the ID translation is unambiguous, and you want to retrieve only one ID. The `map_name0` returns a string, even in case of ambiguity, it returns a random element from the resulted set:

In [None]:
mapping.map_name0('GABARAPL3', 'genesymbol', 'uniprot')

## 13: Enzyme-substrate interactions <a class="anchor" id="enz-sub"></a>

The `pypath.ptm` module builds a database of enzyme-substrate interactions.

In [None]:
from pypath import ptm
ptm_db = ptm.get_db()

Here you got a dictionary with pairs of UniProt IDs as keys and a list of special objects representing enzyme-substrate interactions as values:

In [None]:
print(ptm_db.enz_sub[('Q13177', 'P01236')][0])

Alternatively the enzyme-substrate interactions can be assigned to network edges:

In [None]:
pa.load_ptms2()

In [None]:
print(pa.graph.es['ptm'][444][0])

## 14: Annotations <a class="anchor" id="annotations"></a>

This module provides various annotations about the function and location of the proteins.

In [None]:
from pypath import annot
a = annot.get_db()

OmniPath contains annotations from 27 resources. These provide various information about the characteristics of the proteins, e.g. their localization or function. The `AnnotationTable` object loads all annotations by default, optionally you can limit this to certain resources. For example, if you only want to load the pathway membership annotations from SIGNOR, SignaLink, NetPath and KEGG, you can provide the names of the appropriate classes:

In [None]:
pathways = annot.AnnotationTable(
    protein_sources = (
        'SignalinkPathways',
        'KeggPathways',
        'NetpathPathways',
        'SignorPathways',
    )
)

The `AnnotationTable` object provides methods to query all resources together, or build a boolean array out of them. To see all annotations of one protein:

In [None]:
pathways.all_annotations('P00533')

In [None]:
pathways.create_dataframe = True
pathways.make_dataframe()

In [None]:
pathways.df[:10]

The `AnnotationTable` object contains the resource specific annotation objects:

In [None]:
a.annots

For each of these you can query the names of the fields, their possible values and the set of proteins annotated with any combination of the values:

In [25]:
matrisome = a.annots['Matrisome']

NameError: name 'a' is not defined

In [None]:
matrisome.get_names()

In [None]:
matrisome.get_values('subclass')

In [None]:
matrisome.get_subset(subclass = 'Collagens')

## 15: Inter-cellular signaling roles <a class="anchor" id="intercell"></a>

`pypath` does not combine the annotations in the `annot` module, exactly what goes in goes out. For example, WNT pathway from Signor and SignaLink won't be merged automatically. However with the `pypath.annot.CustomAnnotation` class anyone can do it. For inter-cellular communication categories the `pypath.intercell` module combines the data from all the relevant resources and creates categories based on a combination of evidences.

In [None]:
from pypath import intercell

In [None]:
i = intercell.get_db() # this takes quite some time
                    # unless you load annotations from a pickle cache

In [None]:
i

In [None]:
i.make_df()

In [None]:
i.df[:10]

In [None]:
i.class_names

In [None]:
i.children['receptor']

In [None]:
i.counts()

In [None]:
i.classes_by_entity('P00533')

In [None]:
i.class_labels

In [None]:
list(i.classes['adhesion'])[:10]

## 16: Gene Ontology <a class="anchor" id="gene-ontology"></a>

`pypath.go` is an almost standalone module for management of the Gene Ontology tree and annotations. The main objects here are `GeneOntology` and `GOAnnotation`. The former represents the ontology tree, i.e. terms and their relationships, the latter their assignment to gene products. Both provides many versatile methods for querying.

In [None]:
from pypath import go
goa = go.GOAnnotation()

In [None]:
goa.ontology # the GeneOntology object

In [None]:
goa # the GOAnnotation object

Among many others, the most versatile method is `select` which is able to select the annotated gene products by various expressions built from GO terms or IDs. It understands `AND`, `OR`, `NOT` and parentheses.

In [None]:
query = """(cell surface OR
        external side of plasma membrane OR
        extracellular region) AND
        (regulation of transmembrane transporter activity OR
        channel regulator activity)"""
result = goa.select(query)
print(list(result)[:7])

In [None]:
goa.ontology.get_all_descendants('GO:0005576')

## 17: Protein complexes <a class="anchor" id="complexes"></a>

The `pypath.complex` module builds a non-redundant list of complexes from 10 original resources. Complexes are unique considering their set of components, and optionally carry stoichiometry information.

In [None]:
from pypath import complex
complexdb = complex.get_db()
complexdb.update_index()

In [None]:
complexdb

To retrieve all complexes containing a specific protein, here MTOR:

In [None]:
complexdb.proteins['P42345']

Note some of the complexes have human readable names, these are preferred at printing if available from any of the databases. Otherwise the complexes are labelled by `COMPLEX:list-of-components`.

Take a closer look on one complex object. The hash of the is equivalent with the string representation below, where the UniProt IDs are unique and alphabetically sorted. Hence you can look up complexes using strings as keys despite the dict keys are indeed `pypath.intera.Complex` objects:

In [26]:
cplex = complexdb.complexes['COMPLEX:P42345-Q13451']

NameError: name 'complexdb' is not defined

In [None]:
cplex.components # stoichiometry

In [None]:
cplex.sources # resources

## 18: Saving datasets as pickles <a class="anchor" id="pickle"></a>

The large datasets above are compiled from many resources. Even if these are already available in the cache, the data processing often takes longer than convenient, e.g. few minutes. Most of the data integration objects in `pypath` provide methods to save and load their contents as pickle dumps.

In [None]:
# for `pypath.main.PyPath` objects:
pa.save_network('mynetwork.pickle') # save
pa.init_network(pfile = 'mynetwork.pickle') # load
# for `pypath.annot.AnnotationTable` objects:
a.save_to_pickle('myannots.pickle')
a = annot.AnnotationTable(pickle_file = 'myannots.pickle')
# for `pypath.complex.ComplexAggregator` objects:
complexdb.save_to_pickle('mycomplexes.pickle')
complexdb = complex.ComplexAggregator(pickle_file = 'mycomplexes.pickle')

## 19: Network in *pandas.DataFrame* <a class="anchor" id="network-pandas"></a>

The original implementation of the network in `pypath` is based on `igraph`. Work is ongoing to provide a new and more flexible network builder which will result `pandas.DataFrame` and to make `pypath` independent from `igraph`. As a temporary solution you can easily convert the network to a `pandas.DataFrame` using the `pypath.network` module.

In [None]:
from pypath import main
from pypath import data_formats
from pypath import network

In [None]:
pa = main.PyPath()
pa.init_network(data_formats.pathway_all)

In [None]:
net = network.Network.from_igraph(pa)

In [None]:
net.records[:10]

## 20: Log messages and sessions <a class="anchor" id="log-session"></a>

Now `pypath` has an improved logger. All modules sends messages to a log file named by default by the session ID (a 5 char random string). The default path to the log file is `./pypath_log/pypath-xxxxx.log` where `xxxxx` is the session ID. When you import `pypath` the welcome message tells you the session ID and the log file location.

In [None]:
import pypath

Also by default this is the only message `pypath` prints directly to the console, otherwise it only messages to the log. Here is how you can access the session ID and the logger:

In [None]:
pypath.session_mod.session

In [None]:
pypath.session_mod.session.log.fname

In [None]:
pypath.session_mod.session.label

From your scripts and apps you can also easily send messages to the logfile:

In [None]:
pypath.session_mod.session.log.msg('Greetings from the pypath tutorial notebook! :)')

In [None]:
with open(pypath.session_mod.session.log.fname, 'r') as fp:
    messages = fp.read().split('\n')

In [None]:
print('\n'.join(messages))

If you create a class inheriting from `pypath.session_mod.Logger` it will be automatically connected to the session logger:

In [None]:
class ChildOfLogger(pypath.session_mod.Logger):
    
    def __init__(self):
        
        pypath.session_mod.Logger.__init__(self, name = 'child')
    
    def say_something(self):
        
        self._log('Have a nice day! :D')


col = ChildOfLogger()
col.say_something()

with open(pypath.session_mod.session.log.fname, 'r') as fp:
    messages = fp.read().split('\n')

print('\n'.join(messages))

Note, the log messages are flushed by default in every 2 seconds, but their timestamps always refer to the exact time the message has been sent. A second stamp shows the name of the sending submodule or class.

Finally see a log from a real `pypath` session:

In [None]:
from pypath import main
from pypath import data_formats
pa = main.PyPath()
pa.init_network(data_formats.pathway)

In [None]:
with open(pypath.session_mod.session.log.fname, 'r') as fp:
    messages = fp.read().split('\n')

print('\n'.join(messages[-20:]))

## 21: BEL export <a class="anchor" id="bel"></a>

Biological Expression Language (BEL, https://bel-commons.scai.fraunhofer.de/) is a versatile description language to capture relationships between various biological entities spanning wide range of the levels of biological organization. `pypath` has a dedicated module to convert the network and the enzyme-substrate interactions to BEL format:

In [None]:
from pypath import main
from pypath import data_formats
from pypath import bel

In [None]:
pa = main.PyPath()
pa.init_network(data_formats.pathway)

You can provide one or more resources to the `Bel` class. Supported resources currently are `pypath.main.PyPath` and `pypath.ptm.PtmAggregator`.

In [None]:
b = bel.Bel(resource = pa)

From the resources we compile a `BELGraph` object which provides a Python interface for various operations and you can also export the data in BEL format:

In [27]:
b.main()

NameError: name 'b' is not defined

In [None]:
b.bel_graph

In [None]:
b.bel_graph.summarize()

In [None]:
b.export_relationships('omnipath_pathways.bel')

In [None]:
with open('omnipath_pathways.bel', 'r') as fp:
    bel_str = fp.read()

In [None]:
print(bel_str[:333])

## 22: CellPhoneDB export <a class="anchor" id="cellphonedb"></a>

CellPhoneDB is a statistical method and a database for inferring inter-cellular communication pathways between specific cell types from single-cell data. OmniPath/pypath uses CellPhoneDB as a resource for interaction, protein complex and annotation data. Apart from this, pypath is able to export its data in the appropriate format to provide input for the CellPhoneDB Python module. For this you can use the `pypath.cellphonedb` module:

In [None]:
from pypath import cellphonedb
from pypath import settings

settings.setup(network_expand_complexes = False)

Here you can provide parameters for the network or provide an already built network. Also you can provide the datasets as pickles to make them load really fast. Otherwise this step will take quite long.

In [None]:
c = cellphonedb.CellPhoneDB()

You can access each of the CellPhoneDB input files as a `pandas.DataFrame` and also they've been exported to csv files. For example the `interaction_input.csv` contains interactions from all the resources used for building the network (here Signor, SingnaLink, etc.):

In [None]:
c.interaction_dataframe[:10]

The proteins and complexes are annotated (transmembrane, peripheral, secreted, etc.) using data from the `pypath.intercell` module (identical to the http://omnipathdb.org/intercell query of the web service):

In [None]:
c.protein_dataframe[:10]

## 21: The legacy *igraph*-based network object <a class="anchor" id="legacy"></a>

Before version `0.9` ``pypath`` built the networks using an object based on `igraph.Graph`. As this resulted a limitation in development and API design we replaced it. However it is still available in the ``pypath.legacy`` module.

In [20]:
from pypath.legacy import main

ImportError: cannot import name 'main' from 'pypath' (/home/denes/Dokumentumok/pw/dev/src/notebooks/pypath/__init__.py)

In [None]:
pa = main.PyPath()
#pa.load_omnipath() # This is commented out because it takes > 1h 
                    # to run it for the first time due to the vast
                    # amount of data download.
                    # Once you populated the cache it still takes
                    # approx. 30 min to build the entire OmniPath
                    # as the process consists of quite some data
                    # processing. If you dump it in a pickle, you
                    # can load the network in < 1 min

### 21.1: Quick start – I just want a network quickly and play around with *pypath* <a class="anchor" id="legacy-quick-start"></a>

You can find the predefined formats in the ``pypath.data_formats`` module. For example, to load one resource from there, let's say Signor:

In [21]:
from pypath.legacy import main
from pypath.resources import network as netres
pa = main.PyPath()
pa.load_resources({'signor': netres.pathway['signor']})

ImportError: cannot import name 'main' from 'pypath' (/home/denes/Dokumentumok/pw/dev/src/notebooks/pypath/__init__.py)

Or to load all *activity flow* resources with *literature references:*

In [None]:
from pypath.legacy import main
from pypath.resources import network as netres

In [None]:
pa = main.PyPath()
pa.init_network(netres.pathway)

Or to load all *activity flow* resources, including the ones without literature references:

In [None]:
pa = main.PyPath()
pa.init_network(data_formats.pathway_all)

### 21.2: Quick start – How do I build networks from any data with *pypath*? <a class="anchor" id="legacy-quick-start-2"></a>

Here we show how to build a network from your own files. The advantage of building network with pypath is that you don't need to worry about merging redundant elements, neither about different formats and identifiers. Let's say you have two files with network data:

**network1.csv**

    entrezA,entrezB,effect
    1950,1956,inhibition
    5290,207,stimulation
    207,2932,inhibition
    1956,5290,stimulation

**network2.sif**

    EGF + EGFR
    EGFR + PIK3CA
    EGFR + SOS1
    PIK3CA + RAC1
    RAC1 + MAP3K1
    SOS1 + HRAS
    HRAS + MAP3K1
    PIK3CA + AKT1
    AKT1 - GSK3B
    
*Note: you need to create these files in order to load them.*

### 3a: Defining input formats <a class="anchor" id="input-formats"></a>

In [None]:
import pypath
import pypath.iinput_formats as input_formats

input1 = input_formats.ReadSettings(
    name = 'egf1',
    input = 'network1.csv',
    header = True,
    separator = ',',
    id_col_a = 0,
    id_col_b = 1,
    id_type_a = 'entrez',
    id_type_b = 'entrez',
    sign = (2, 'stimulation', 'inhibition'),
    ncbi_tax_id = 9606,
)

input2 = input_formats.ReadSettings(
    name = 'egf2',
    input = 'network2.sif',
    separator = ' ',
    id_col_a = 0,
    id_col_b = 2,
    id_type_a = 'genesymbol',
    id_type_b = 'genesymbol',
    sign = (1, '+', '-'),
    ncbi_tax_id = 9606,
)

### 3b: Creating PyPath object and loading the 2 test files <a class="anchor" id="toy-example"></a>

In [None]:
inputs = {
    'egf1': input1,
    'egf2': input2
}

pa = main.PyPath()
pa.reload()
pa.init_network(lst = inputs)

## 4: Plotting the network with *igraph* <a class="anchor" id="plot"></a>

Here we use the network created above (because it is reasonable size, not like the networks we could get from most of the network databases). Igraph has excellent plotting abilities built on top of the *cairo* library.

In [None]:
import igraph
plot = igraph.plot(pa.graph, target = 'egf_network.png',
            edge_width = 0.3, edge_color = '#777777',
            vertex_color = '#97BE73', vertex_frame_width = 0,
            vertex_size = 70.0, vertex_label_size = 15,
            vertex_label_color = '#FFFFFF',
            # due to a bug in either igraph or IPython, 
            # vertex labels are not visible on inline plots:
            inline = False, margin = 120)
from IPython.display import Image
Image(filename='egf_network.png')