# Comparing LIONESS Regulatory Networks using limma
Author: Camila Lopes-Ramos<sup>1</sup>

<sup>1</sup> Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA.

# 1. Introduction
LIONESS (Linear Interpolation to Obtain Network Estimates for Single Samples)<sup>1</sup> is a method for creating sample-specific networks. When applied to a PANDA<sup>2</sup> regulatory network, the result is a set of gene regulatory networks, one for each sample in the gene expression dataset. More information on LIONESS can be found in the published paper: https://doi.org/10.1016/j.isci.2019.03.021  
  
In this vignette, we will compare LIONESS regulatory networks from 207 females and 238 males with colon cancer using RNA-Seq data from TCGA<sup>3</sup>. We will compare the edge weights between females and males using a linear regression model and correcting for the covariates age, race, and disease stage, as available in the limma package. We will also compare the gene's in-degree (defined as the sum of the gene's incoming edge weights from all TFs in the network). Finally, we will perform gene set enrichment analysis to find the pathways enriched for genes differentially targeted by sex in colon cancer.

## 1.1. Install and load packages

This case study can be run on the server or locally by setting the followiing parameter

In [None]:
runserver=1

Setting this parameter allows to point to the paths of input files and data for this analysis.

In [None]:
if(runserver==1){
    ppath = '/opt/data/'
}else if(runserver==0){
    ppath = ''
}

First, we need to install the required packages. They are already installed on the server.

In [None]:
if(runserver==0){
    if (!requireNamespace("BiocManager", quietly = TRUE))   
        install.packages("BiocManager",repos = "http://cran.us.r-project.org")  
    BiocManager::install("fgsea")
    BiocManager::install("limma")
    BiocManager::install("Biobase")
    install.packages("ggplot2")
    install.packages("igraph")
}

Then, we need to load these packages.

In [None]:
library(limma)
library(Biobase)
library(ggplot2)
library(igraph)

We load the `fgsea` package to conduct gene enrichment analysis

In [None]:
library(fgsea)

An example snippet using toy data is presented here:

In [None]:
data(examplePathways)
data(exampleRanks)
fgseaRes <- fgsea(pathways = examplePathways, 
                  stats    = exampleRanks,
                  minSize  = 15,
                  maxSize  = 500)

# 2. Load the data

For the purposes of demonstrating the workflow, we will load only a subet of LIONESS networks. Our subset shows the edge weights for 50,000 edges (rows) by 445 samples (columns).
Let's take a look at the networks:

In [None]:
lioness <- read.delim(paste0(ppath,"lioness_coloncancer_subset.txt"),stringsAsFactors = F, check.names = F)
head(lioness[,1:5])

We can clean this dataframe by combining TFs and their target genes as edges and add row names as "TF_gene".

In [None]:
rownames(lioness) <- apply(lioness, 1, function(x){
  paste(x[1], x[2], sep="_")
})

Then, remove TF and gene columns, to only keep sample columns.

In [None]:
lioness <- lioness[,-(1:2)]
head(lioness[,1:5])

Next, we compute gene in-degree (sum of all edge weights for each gene) as a summary statistic for gene scores in the network to do downstream analyses .

In [None]:
load(paste0(ppath,"inDegree_allEdges_coloncancer.rdata"))

We can also find the clinical information for each sample from TCGA.

In [None]:
pData(obj1)[1:5,30:35]

# 3. Compare the edge weights
In this part, we will do a sex difference analysis by comparing the edge weights between males and females using linear regression model (limma package) and adjusting for covariates: stage of the disease, age, race.  

## 3.1. Run limma

First, we need to define the groups by sex.

In [None]:
gender <- factor(as.character(pData(obj1)$gender),levels=c("MALE","FEMALE"))

Then, we define the covariates starting by cancer stage

In [None]:
stage <- (as.character(pData(obj1)$uicc_stage))
stage[which(is.na(stage))] <- "NA"    
stage <- as.factor(stage)

Then race

In [None]:
race <- as.character(pData(obj1)$race)
race[which(is.na(race))] <- "NA"
race <- as.factor(race)

and finally age

In [None]:
age <- as.numeric(pData(obj1)$age_at_initial_pathologic_diagnosis)
age[which(is.na(age))] <- mean(age,na.rm=TRUE)

This information allows to build the design matrix for the linear model

In [None]:
design = model.matrix(~ stage + race + age + gender)

Finally, we run the linear model

In [None]:
fitGood = lmFit(as.matrix(lioness),design)
fitGood = eBayes(fitGood)
tb = topTable(fitGood,coef="genderFEMALE",number=Inf)
head(tb)

Using the female donors as reference group, we get the difference in edge weights statistics as log of fold change (logFC), p-values, and multiple-testing corrected p-values.

## 3.2. Visualize the top edges with differential weights by sex
We select the top 50 edges with differential edge weights by sex and convert them into an igraph `graph.data.frame` object for visualization. We color edges red if they have higher coefficients in the female group, and blue if they have higher coefficients in the male group.

In [None]:
toptable_edges <- t(matrix(unlist(c(strsplit(row.names(tb), "_"))),2))
z <- cbind(toptable_edges[1:50,], tb$logFC[1:50])
g <- graph.data.frame(z, directed=FALSE)
E(g)$weight <- as.numeric(z[,3])
E(g)$color[E(g)$weight<0] <- "blue"
E(g)$color[E(g)$weight>0] <- "red"
E(g)$weight <- 1
par(mar=c(0,0,0,0))
plot(g, vertex.label.cex=0.7, vertex.size=10,  vertex.label.font=3, edge.width=5*(abs(as.numeric(z[,3]))))

# 4. Compare the gene in-degree
In this section, we will compare the gene in-degree or targeting scores<sup>4</sup> between males and females using linear regression model (limma package) and adjusting for covariates: disease stage, age, race that we can collect from the metadata in TCGA.

## 4.1. Run limma
First, we can explore teh values for these targeting scores that we computed previously for each gene.

In [None]:
indegree <- assayData(obj1)[["quantile"]]
head(indegree[,1:3])

Then, to compute the linear model, we use the same design matrix as before.

In [None]:
fitGood = lmFit(indegree,design)
fitGood = eBayes(fitGood)
tb_degree = topTable(fitGood,coef="genderFEMALE",number=Inf)
head(tb_degree)

Finally, we rank them by their targeting scores for a follow-up gene enrichment analysis that requires ranks.

In [None]:
indegree_rank <- setNames(object=tb_degree[,"t"], rownames(tb_degree))
head(indegree_rank)

# 5. Gene Set Enrichment Analysis
We will use the `fgsea` package to perform gene set enrichment analysis. We need to point to a ranked gene list (for example the gene in-degree statistical difference (t value) between males and females), and a list of gene sets (or signatures) in `gmt` format to test for enrichment. The gene sets can be downloaded from MSigDB: http://software.broadinstitute.org/gsea/msigdb Same gene annotation should be used in the ranked gene list and gene sets.

## 5.1. Run fgsea

Running gene enrichment analysis allows to estimate the patwhays that are enriched for ou list of differentially-targeted genes in femals.

In [None]:
pathways <- gmtPathways(paste0(ppath,"c2.cp.kegg.v7.0.symbols.gmt"))
fgseaRes <- fgsea(pathways, indegree_rank, minSize=15, maxSize=500, nperm=1000)
head(fgseaRes)

We ca subset to pathways with FDR < 0.05

In [None]:
sig <- fgseaRes[fgseaRes$padj < 0.05,]

Then, select the top 10 pathways enriched in females. These are the ones that have a positive normalized enrichment score (NES), because we took the reference group as females.

In [None]:
sig$pathway[sig$NES > 0][1:10]

We can also identify the top 10 pathways enriched in males.

In [None]:
sig$pathway[sig$NES < 0][1:10]

## 5.2. Bubble plot of differentially targeted pathways by sex
To visualize these results in a more meaningful way, we can draw a bubble plot. First, we need to set some general settings.

In [None]:
dat <- data.frame(fgseaRes)
# Settings
fdrcut <- 0.05 # FDR cut-off to use as output for significant signatures
dencol_neg <- "blue" # bubble plot color for negative ES
dencol_pos <- "red" # bubble plot color for positive ES
signnamelength <- 4 # set to remove prefix from signature names (2 for "GO", 4 for "KEGG", 8 for "REACTOME")
asp <- 3 # aspect ratio of bubble plot
charcut <- 100 # cut signature name in heatmap to this nr of characters

Then, make signature names more readable.

In [None]:
a <- as.character(dat$pathway) # 'a' is a great variable name to substitute row names with something more readable
for (j in 1:length(a)){
  a[j] <- substr(a[j], signnamelength+2, nchar(a[j]))
}
a <- tolower(a) # convert to lower case (you may want to comment this out, it really depends on what signatures you are looking at, c6 signatures contain gene names, and converting those to lower case may be confusing)
for (j in 1:length(a)){
  if(nchar(a[j])>charcut) { a[j] <- paste(substr(a[j], 1, charcut), "...", sep=" ")}
} # cut signature names that have more characters than charcut, and add "..."
a <- gsub("_", " ", a)
dat$NAME <- a

Then determine what signatures to plot (based on FDR cut)

In [None]:
dat2 <- dat[dat[,"padj"]<fdrcut,]
dat2 <- dat2[order(dat2[,"padj"]),] 
dat2$signature <- factor(dat2$NAME, rev(as.character(dat2$NAME)))

Then determine the labels to color.

In [None]:
sign_neg <- which(dat2[,"NES"]<0)
sign_pos <- which(dat2[,"NES"]>0)

And the colors assigned to them.

In [None]:
signcol <- rep(NA, length(dat2$signature))
signcol[sign_neg] <- dencol_neg # text color of negative signatures
signcol[sign_pos] <- dencol_pos # text color of positive signatures
signcol <- rev(signcol) # need to revert vector of colors, because ggplot starts plotting these from below

Finally, we can draw the bubble plot.

In [None]:
g<-ggplot(dat2, aes(x=padj,y=signature,size=size))
g+geom_point(aes(fill=NES), shape=21, colour="white")+
  theme_bw()+ # white background, needs to be placed before the "signcol" line
  xlim(0,fdrcut)+
  scale_size_area(max_size=10,guide="none")+
  scale_fill_gradient2(low=dencol_neg, high=dencol_pos)+
  theme(axis.text.y = element_text(colour=signcol))+
  theme(aspect.ratio=asp, axis.title.y=element_blank()) # test aspect.ratio

Bubble plot of gene sets on y-axis and adjusted p-value (padj) on x-axis. Bubble size indicates the number of genes in each gene set, and bubble color indicates the normalized enrichment score (NES). Blue is for negative NES (enrichment of higher targeted genes in males), and red is for positive NES (enrichment of higher targeted genes in females).

# References 
1- Kuijjer, Marieke Lydia, et al. "Estimating sample-specific regulatory networks." Iscience 14 (2019): 226-240.

2- Glass, Kimberly, et al. "Passing messages between biological networks to refine predicted interactions." PloS one 8.5 (2013): e64832.

3- Tomczak, Katarzyna, Patrycja Czerwińska, and Maciej Wiznerowicz. "The Cancer Genome Atlas (TCGA): an immeasurable source of knowledge." Contemporary oncology 19.1A (2015): A68.

4- Weighill, Deborah, et al. "Gene targeting in disease networks." Frontiers in Genetics 12 (2021): 501.