# Building single-sample regulatory networks using LIONESS and netZooPy
### Author: 
Qi (Alex) Song*.

*Channing division of network medicine, Brigham's and Women hospital and Harvard Medical School, Boston, MA. (qi.song@channing.harvard.edu)

## 1. Introduction
In this tutorial, we will briefly walk through the steps to perform analysis with Lioness algorithm using netZooPy package. Lioness is an algorithm for estimating sample-specific gene regualtory networks in a population.  LIONESS infers individual sample networks by applying linear interpolation to the predictions made by existing aggregate network inference approaches [1]. In this tutorial, we will use Panda as our basic network inference apporach to build sample-specific networks.

## 2. Installation of netZooPy.
netZooPy comes with full support for Lioness algorithm. netZooPy can be installed through `pip` command. For more details, please refer to the installation guide at netZooPy documentation site [here](https://netzoopy.readthedocs.io/en/latest/install/index.html).    

First, let's change the working directory.

In [None]:
import os

## 3. Load required modules
We will need `Panda` and `Lioness` python classes from netZooPy package. We will also need `read_csv()` function from `pandas` package for demonstrating the input data sets. 

In [None]:
from netZooPy.panda import Panda
from netZooPy.lioness import Lioness
from netZooPy.lioness.analyze_lioness import AnalyzeLioness
import pandas as pd

## 4. Load input data

Now let's look at the three data sets to get a sense about what the inputs look like.

In [None]:
exp_data = pd.read_csv('/opt/data/ToyExpressionData.txt',header=None, index_col = 0, sep = "\t")
motif_data = pd.read_csv('/opt/data/ToyMotifData.txt',header=None, sep = "\t")
ppi_data = pd.read_csv('/opt/data/ToyPPIData.txt',header=None, sep = "\t")

Expression data is a matrix where rows are genes and columns are samples.There are 1000 genes and 50 samples in this expression dataset

In [None]:
exp_data

Motif data should be formatted into a three-column list, where first column contains TF IDs and second column the target gene IDs and third column the interaction scores. 

In [None]:
motif_data

There are 87 unique TFs and 913 unique motifs in this motif dataset.

In [None]:
motif_data[0].unique().shape[0]

In [None]:
motif_data[1].unique().shape[0]

PPI (protein protein interaction) data should be formatted into a three-column list, where first two columns contain protein IDs and third column contains a score for each interaction

In [None]:
pd.concat([ppi_data[0],ppi_data[1]]).unique().size

This PPI dataset has 238 interactions among 87 TFs.

## 5. Run Panda
Before running Lioness, we will first need to generate a `Panda` object. This will be used later to run `Lioness`. Note that the argument `keep_expression_matrix` should be specified as `True`. As Lioness iteractions need to call Panda function to build networks, which needs expression matrix as input. 

In [None]:
panda_obj = Panda('/opt/data/ToyExpressionData.txt',
                  '/opt/data/ToyMotifData.txt',
                  '/opt/data/ToyPPIData.txt',
                  remove_missing=False, 
                  keep_expression_matrix=True, save_memory=False, modeProcess='legacy')

## 6. Run Lioness to estimate sample-specific networks
We will first use the `Panda` object as input for `Lioness` object. Then `Lioness` will run Panda algorithm in its iterations to estimate sample-specific network for each sample.   

In [None]:
lioness_obj = Lioness(panda_obj, save_dir='../data')

## 7. Run Lioness with co-expression matrix
Lioness can work with co-expression matrix. To compute Lioness with coexpression matrix, we can set motif data to `None`:

In [None]:
motif = None

# Make sure to keep epxression matrix for next step
panda_obj = Panda('/opt/data/ToyExpressionData.txt',
                  None,
                  '/opt/data/ToyPPIData.txt',
                  save_tmp=True,
                  remove_missing=False,
                  keep_expression_matrix=True, modeProcess='legacy')
lioness_obj = Lioness(panda_obj, save_dir='../data')

## 8. Visualize Lioness results
AnalyzeLioness() can be used to visualize lioness network. You may select only the `top` genes to be visualized in the graph. In current version of Lioness. Only the network of the first sample will be visualized using `.top_network_plot()` function.

In [None]:
analyze_lioness_obj = AnalyzeLioness(lioness_obj)
analyze_lioness_obj.top_network_plot(top = 10, file = "../data/lioness_top_10.png")

## 9. Save Lioness results
We can save Lioness results by using `save_lioness_results()` method of the `Lioness` object. The edge weights of Lioness predictions will be saved into output file. We can get TF and target IDs from the `.export_panda_results` property of `Panda` object. Each row correspond to a row in the Lioness output file.

In [None]:
panda_obj.export_panda_results

In [None]:
lioness_obj.save_lioness_results(file = '../data/lioness.txt')

## References
Kuijjer ML, Tung MG, Yuan GC, Quackenbush J, Glass K: Estimating Sample-Specific Regulatory Networks. iScience 2019.