**This is a updated tutorial of visualizing hierarchical protein network modules, via a script that intefacing the DDOT python package (v1.0.1) and the HiView web browser (v2.6)**

**Author: Fan Zheng**

Please check the DDOT package has been installed and all dependencies are satisfied. To complete this tutorial, you just need the upload script `tohiview.py`, and a few input files of that script. We will walk over the creation of hierarchical models and their visualization in HiView.   

In [20]:
username = 'fzheng' # replace with your username

In [16]:
import getpass
passwd = getpass.getpass("Passwd here: ")

Passwd here:  ········


The available options of the upload script are listed below. Many options are available, but only `--ont`, `--hier_name`, `--ndex_acount` are required.

`--ont` should be a 3-col file defined in DDOT, which represent parent, child and type of the relationship (see details here).  
`--hier_name` is just a string to label the files.   
`--ndex_acount` contains 3 strings, the server name (http://test.ndexbio.org), a username and a password.

Note that so far we require using the NDEx test server, as this pipeline can potentially create a large number of networks in one's NDEx account.

In [33]:
%%bash 

python ../../ddot/tohiview.py -h

usage: tohiview.py [-h] --ont ONT --hier_name HIER_NAME
                   [--ndex_account NDEX_ACCOUNT NDEX_ACCOUNT NDEX_ACCOUNT]
                   [--score SCORE] [--subnet_size SUBNET_SIZE SUBNET_SIZE]
                   [--node_attr NODE_ATTR] [--evinet_links EVINET_LINKS]
                   [--evinet_size EVINET_SIZE] [--gene_attr GENE_ATTR]
                   [--term_2_uuid TERM_2_UUID]
                   [--visible_cols [VISIBLE_COLS [VISIBLE_COLS ...]]]
                   [--max_num_edges MAX_NUM_EDGES] [--col_color COL_COLOR]
                   [--col_label COL_LABEL] [--rename RENAME] [--skip_main]

optional arguments:
  -h, --help            show this help message and exit
  --ont ONT             ontology file, 3 col table
  --hier_name HIER_NAME
                        name of the hierarchy
  --ndex_account NDEX_ACCOUNT NDEX_ACCOUNT NDEX_ACCOUNT
  --score SCORE         integrated edge score
  --subnet_size SUBNET_SIZE SUBNET_SIZE
                        minimum and maximum

# 1. A simple hierarchy

We will first create and upload a decoy hierarchy.

In [69]:
d = './data'
df = pd.read_csv(d + '/test1.ont', sep='\t', header=None)
df

Unnamed: 0,0,1,2
0,ROOT,Coarse-1,default
1,ROOT,Coarse-2,default
2,Coarse-1,Fine-1,default
3,Coarse-1,Fine-2,default
4,Coarse-1,Fine-3,default
5,Coarse-2,Fine-3,default
6,Coarse-2,Fine-4,default
7,Fine-1,geneA,gene
8,Fine-1,geneB,gene
9,Coarse-1,geneC,gene


Note that this hierarchy is a DAG (directed acyclic graph). The node "Fine-3" has two parents: "Coarse-1" and "Coarse-2". In HiView, a circle "Fine-3" will be found nested under the circles of both "Coarse-1" and "Coarse-2".

**Warning**: underscore "_" is not allowed in the node names.  

In [39]:
%%bash -s "$username" "$passwd"

python ../../ddot/tohiview.py --ont ./data/decoy.txt --hier_name decoy --ndex_account http://test.ndexbio.org $1 $2

Creating NdexGraph

Uploading to NDEx
http://hiview.ucsd.edu/edb47885-d6ba-11ea-8772-0ac135e8bacf?type=www&server=http://www.ndexbio.org


Paste the above link to the browser to launch HiView.

# 2. Adding integrated networks to communities

HiView is a powerful platform to display multiscale communities in a network. It is often of interest to visualize edges in the source network that support a community.   Precisely, for a source network $G = (V, E)$, a subnetwork of a community $s$ is defined as $G_s = (V_s, E_s)$, where $V_s \in V, E_s \in E$, and $\forall e = (u,v) \in E_s$, $u,v \in V_s$.  


This is achieved by the `--score` argument. It is a tab seperated file with three columns: `geneA`,`geneB` and `score`. We recommend having the values of score reside within (0,1). 

In this example, we use some gene-gene association data, and a sub-hierarchy inferred by the CliXO algorithm. Let's see their format:

In [42]:
df_ont = pd.read_csv(d + '/test2.ont', sep='\t', header=None)
df_ont.head(3)

Unnamed: 0,0,1,2
0,22133,21875,default
1,22435,22133,default
2,22451,21851,default


In [43]:
df_ont.tail(3)

Unnamed: 0,0,1,2
37,23161,SUPT5H,gene
38,23248,CSNK2A2,gene
39,23248,HIST1H3A,gene


In [44]:
df_score = pd.read_csv(d + '/test2_score.txt', sep='\t', header=None)
df_score.head(3)

Unnamed: 0,0,1,2
0,CDC73,CTR9,0.798
1,CDC73,DNMT3A,0.521
2,CDC73,HIST1H3A,0.47


In [58]:
%%bash -s "$username" "$passwd"

python ../../ddot/tohiview.py --ont ./data/test2.ont --hier_name hiview_tutorial_test2 --ndex_account http://test.ndexbio.org $1 $2 --score ./data/test2_score.txt

Traceback (most recent call last):
  File "../../ddot/tohiview.py", line 223, in <module>
    terms_small = {t:term_rename[t] for t,s in zip(ont.terms, ont.term_sizes) if (s >= args.subnet_size[0]) and (s<args.evinet_size)}
  File "../../ddot/tohiview.py", line 223, in <dictcomp>
    terms_small = {t:term_rename[t] for t,s in zip(ont.terms, ont.term_sizes) if (s >= args.subnet_size[0]) and (s<args.evinet_size)}
KeyError: 'S21851'


CalledProcessError: Command 'b'\npython ../../ddot/tohiview.py --ont ./data/test2.ont --hier_name hiview_tutorial_test2.3 --ndex_account http://www.ndexbio.org $1 $2 --score ./data/test2_score.txt\n'' returned non-zero exit status 1.

## "score" of a community.   

This is a concept specific to certain community detection algorithms, e.g. CliXO, which takes a weighted graph as the input, and iterate community detection at different thresholds. Thus, each community in CliXO is associated with a "score".

By default, edges in a subnetwork have a uniform color in HiView. However, if communities are associated with scores, the edges will be shown with a discrete color map (which often visually highlights the community structures), determined by the score of the community itself, and the score(s) of its children community(ies). This can be achieved by adding a 4-th column to the file for the `--ont` argument, as in the following example:

In [54]:
df_ont = pd.read_csv(d + '/test2_ww.ont', sep='\t', header=None)
df_ont.head(3)

Unnamed: 0,0,1,2,3
0,S22133,S21875,default,0.72
1,S22435,S22133,default,0.65
2,S22451,S21851,default,0.58


The values in the column "3" (e.g. 0.72, 0.65) indicate the "score" of the community in the column "0". The score of a parent community is required to be smaller than the scores of its childdren. In this example, "S22435" is the parent of "S22133", and thus 0.65 < 0.72. 

In [None]:
%%bash -s "$username" "$passwd"

python ../../ddot/tohiview.py --ont ./data/test2.ont --hier_name hiview_tutorial_test2 --ndex_account http://www.ndexbio.org $1 $2 --score ./data/test2_score.txt

After upload, we can see the change of edge colors in the data view.

# 3. Adding multiple evidence networks to systems

In addition to a single master network, it is also possible to overlay more networks supporting a community and visualize them HiView. For example, if the master network is the result of integrating multiple datasets, it is often of interest to visualize the interactions in these datasets (jointly or separately).

This can be achieved by passing a file to the `--evinet_links` argument. It is a two column file, providing the name of individual datasets, and the path to the actual files containing the interactions:

In [60]:
%%bash

cat ./data/net_links.txt

Physical	./data/test3_ppisample.txt
Co_protein_expr	./data/test3_coxsample.txt
CCMI	./data/test3_binarysample.txt


A source file is a 3-column tab-separated file, which can contain binary interactions, or interactions with weights: 

In [63]:
%%bash

cat ./data/test3_ppisample.txt |head -5

CTR9	LEO1	5.05
SSRP1	SUPT16H	4.42
LEO1	PAF1	5.35
CTR9	PAF1	5.11
CSNK2A1	CSNK2B	5.13


In [64]:
%%bash

cat ./data/test3_binarysample.txt |head -5

MTDH	SUPT16H	True
SUPT16H	TSPYL5	True
MTDH	SSRP1	True
SSRP1	TSPYL5	True


Now we do the upload:

In [None]:
%%bash -s "$username" "$passwd"

python ../../ddot/tohiview.py --ont ./data/test2.ont --hier_name hiview_tutorial_test2 --ndex_account http://www.ndexbio.org $1 $2 --score ./data/test2_score.txt --evinet_links ./data/net_links.txt

## Large networks

Large scale networks are often bottlenecks of the speed of uploading and HiView visualization (we are working on improving that). To reduce overhead, subnetwork uploading can be disabled for large communities, while still being enabled for smaller communities. 

It is achieved by the `--subnet_size` argument, which takes two integers, specifying the lower and upper bound of community sizes for which upload of the integrated subnetworks is enabled.

Similarly `--evinet_size` takes one integer, and for communities larger than this threshold, upload of evidence networks will be disabled.

We require `subnet_size[0] < evinet_size <= subnet_size[1]`.

# 4. Reuse uploaded subnetworks

After uploading a hierarchy with subnetworks, you will notice a file starting with `term_2_uuid` written to the working directory. This file describe the mapping between community names and community subnetworks. 

This file can also be later used as the input of `--term_2_uuid` argument, so subnetworks can be shared across different hierarchical models.

# 5. Control the information displayed in HiView

# 6. Delete a HiView session from NDEx account

After a upload has been finished, the script creates a folder (network set) on the NDEx account containing one network of the hierarchical model (to be used in the model view), as well as many subnetworks (to be used in the data view). With the button `Delete Network Set`, the set and all networks in this set can be deleted.

**Warning**: note that if subnetworks are shared across models (see Section 4 above), the deleting operation could affect other models (unwanted).