usecase/f1000-cyauto-workflow.Rmd

---
title: "KEGGscape REST API workflow combining with the py2cytoscape package"
author: "Kozo Nishida and Keiichiro Ono"
date: "June 24, 2018"
output: html_document
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

## Prerequisites and tested environment

You will need:

- Cytoscape 3.6.1 or greater, which can be downloaded from http://www.cytoscape.org/download.php. Simply follow the installation instructions on screen.
- KEGGscape 0.8.0 or greater, which can be instelled from Cytoscape App Manager.

![](https://github.com/idekerlab/KEGGscape/blob/master/usecase/figures/keggscape-in-appmanager.PNG)

In the following workflow, it is necessary for Cytoscape to be running on the PC that executes R Markdown.
First make sure that Cytoscape which installed KEGGscape is up.

### Tested environment

#### Windows
- Windows 10
- Cytoscape 3.6.1 (w/Java 8 u172)
- KEGGscape 0.8.2
- R 3.5.0
- RStudio 1.1.453
- R Markdown 1.10
- Bioconductor 3.7
- Python 3.6.5 (Installed via miniconda)
- py2cytoscape 0.7.0
- pandas 0.23.1

#### Mac
- MacOS 10.13.4
- Cytoscape 3.6.1 (w/embedded Java 8 u162)
- KEGGscape 0.8.2
- R 3.5.0
- RStudio 1.1.453
- R Markdown 1.10
- Bioconductor 3.7
- Python 3.6.5 (Installed via miniconda)
- py2cytoscape 0.7.0
- pandas 0.23.1

## Note about this RMarkdown
This R Markdown document uses **Python3**, and you need to install it.
(We recommend to install Python3 using [Miniconda](https://conda.io/miniconda.html).)
And add the path to Python to your PATH environment variable.

## Overview

The following image represents KEGGscape REST API workflow combining with the py2cytoscape package. The left panel shows
our workflow of the gene expression analysis to the KEGGscape visualization. The right panel represents how
omics data and pathway information are actually merged in the workflow.

![](figures/fig1.png)

## Step 0 (Installing dependencies)
First of all we install the R and Python packages for this workflow.

### Installing R packages

```{r}
source("https://bioconductor.org/biocLite.R")

biocLite("ecoliLeucine", ask = FALSE)
biocLite("affy", ask = FALSE)
biocLite("limma", ask = FALSE)
```

### Installing Python packages

There are two ways to do this:

##### From terminal
```
python -m pip install --upgrade py2cytoscape requests pandas
```

##### From Python code
```{python}
import os
os.system('python -m pip install --upgrade py2cytoscape requests pandas')
```

#### Note on Windows

Windows users need to download and install the python-igraph whl from [Christoph’s site](http://www.lfd.uci.edu/~gohlke/pythonlibs/#python-igraph).
Please install python-igraph before you install py2cytoscape, otherwise pip will try to **build** python-igraph (and will fail). 

```
python -m pip install ".\python_igraph-0.7.1.post6-cp36-cp36m-win_amd64.whl"
python -m pip install py2cytoscape
```

## Step 1 (Microarray preprocessing)
We use bioconductor **ecoliLeucine and affy** packages for the dataset preprocessing of this workflow.

```{r}
library(ecoliLeucine)
library(affy)

data(ecoliLeucine)
eset <- rma(ecoliLeucine)
pData(eset)
```

## Step 2 (Statistical analysis)

We use bioconductor **limma** package to calculate the statistical quantities like fold change ratio between two E. coli strains and the adjusted p-values for probe set information.

```{r}
library(limma)
strain <- c("lrp-","lrp-","lrp-","lrp-","lrp+","lrp+","lrp+","lrp+")
design <- model.matrix(~factor(strain))
colnames(design) <- c("lrp-","lrp+vs-")
design

fit <- lmFit(eset, design)
fit <- eBayes(fit)
options(digits=2)
result <- topTable(fit, coef = 2, number = 40, adjust.method = "BH")
result
write.csv(result, file = "result.csv")
```

## Step 3 (Integration of statistical quantities with pathways)

### Importing all Escherichia coli K-12 MG1655 KEGG pathways with KEGGscape REST API

This code chunk automates the batch pathway import.
This code chunk import the all *E. coli* pathways in KEGG. 
Before running this code, please make sure that Cytoscape (with KEGGscape installed) is running on your local machine.

```{python}
import requests
import pandas as pd
from io import StringIO

r = requests.get('http://rest.kegg.jp/list/pathway/eco')
ecopathways = pd.read_table(StringIO(r.text), header=None)
for pathid in ecopathways[0]:
    requests.get('http://localhost:1234/keggscape/v1/' + pathid[5:])
```

The following image is the schematic illustration.

![](figures/fig2.png)

KEGGscape REST API uses KEGG API.
Academic users are permitted to use the KEGG API for individual downloads. KEGG FTP subscription and
license agreement are, however, required for non-academic usage (http://www.kegg.jp/kegg/legal.
html).


### Importing the limma result table with py2cytoscape

Here we integrate the limma statistical quantities with the E. coli (K-12 MG1655) KEGG pathway.
We use Python pandas package (https://pandas.pydata.org) to integrate each quantities and probe sets into enzyme nodes of KEGG pathway. If there exists multiple probe set for the one KEGG enzyme node, then average of the statistical quantities would be embedded into the node.
If none of the statistics are integrated in the enzyme node in the pathway, the pathway has been removed from cytoscape.


```{python}
import pandas as pd
# Removing intergenic probe sets
limma = pd.read_csv("./result.csv", index_col=0)
new_index = limma.index.str.extract('(b[0-9]{4})')
limma = limma.set_index(new_index[0])
limma = limma[limma.index.notna()]

# Merging with Cytoscape node table
from py2cytoscape.data.cyrest_client import CyRestClient
cy = CyRestClient()
all_suid = cy.network.get_all()

def concat(x):
    if isinstance(x, (list,)):
        return " ".join(x)

for i in all_suid:
    net = cy.network.create(i)
    table = net.get_node_table()
    keggids = table.set_index('SUID')['KEGG_ID']
    genes = keggids.apply(concat).str.extractall('(b[0-9]{4})')
    table4cy = pd.merge(genes, limma, left_on=0, right_index=True)
    if(table4cy.shape[0] != 0):
        meantable4cy = table4cy.groupby('SUID').mean()
        net.update_node_table(df=meantable4cy, network_key_col='SUID')
    else:
        cy.network.delete(net)
```

## Step 4 (Visualization with Cytoscape)

This code chunk automates Cytoscape VizMap GUI operation.
Here we highlight the KEGG enzyme node with integrated limma statistics in red.
Finally we save all the network visualization result in svg images.

```{python}
from py2cytoscape.data.cyrest_client import CyRestClient
from py2cytoscape.data.style import StyleUtil
import pandas as pd
# Removing intergenic probe sets
limma = pd.read_csv("./result.csv", index_col=0)
new_index = limma.index.str.extract('(b[0-9]{4})')
limma = limma.set_index(new_index[0])
limma = limma[limma.index.notna()]

cy = CyRestClient()

# Vizual mapping
my_kegg_style = cy.style.create('KEGG Style')
new_defaults = {
    # Node defaults
    'NODE_FILL_COLOR': '#ffffff',
    'NODE_BORDER_WIDTH' : '1.0',
}
my_kegg_style.update_defaults(new_defaults)
color_gradient = StyleUtil.create_2_color_gradient(min=limma['adj.P.Val'].min(), max=limma['adj.P.Val'].max(), colors=('red', 'white'))
my_kegg_style.create_continuous_mapping(column='adj.P.Val', vp='NODE_FILL_COLOR', col_type='Double', points=color_gradient)

# Saving all networks as images
all_suid = cy.network.get_all()
for i in all_suid:
    net = cy.network.create(i)
    path = net.get_network_table()['KEGG_PATHWAY_ID']
    pathid = path.values[0][5:]
    file = open(pathid + ".svg", 'wb')
    file.write(net.get_svg())
    file.close()
```