# Gene regulatory network analysis in Glioblastoma 
Author: Camila Lopes-Ramos<sup>1</sup>

<sup>1</sup> Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA.

## Introduction
This tutorial comes with our recent study of gene regulatory networks in Glioblastoma<sup>1</sup>. All the generated networks in this analysis are available in [GRAND database](https://grand.networkmedicine.org/cancers/).
## Load packages

We start first by loading the required packages.

In [None]:
library(limma) # for differential targeting analysis
library(reshape2) # for data processing
library(ggplot2) # for plotting
library(tidyr)  for data processing
library(ggpubr)

## 1. Differential methylation analysis
Differential methylation analysis using limma on methylation B-values quantile normalized (limited to probes on PD1 pathway) to compare short- vs long-survival groups, and correcting for gender, adjuvant therapy and age.

In [None]:
# Function to perform differential methylation analysis using limma and correcting for the covariates: gender, adj therapy and age
methyl_limma<-function(obj,quantileNorm=TRUE){ 
  obj<-obj[,!is.na(pData(obj)$survival)]
  gene <- unlist(lapply(fData(obj)$pd1_genes, function(x) paste(x, collapse = ", ")))
  # Covariates
  surv <- factor(pData(obj)$survival, levels=c("long", "short"))
  gender <- as.character(pData(obj)$gender.y)
  gender[which(is.na(gender))] <- "NA"    
  gender <- as.factor(gender)
  adj <- as.character(pData(obj)$neoadjuvanttherapy)
  adj[which(is.na(adj))] <- "NA"    
  adj <- as.factor(adj)
  age <- as.numeric(pData(obj)$yearstobirth)
  age[which(is.na(age))] <- mean(age,na.rm=TRUE)
  # limma
  design <- model.matrix(~ gender + adj + age + surv)
  mat_lmFit<-exprs(obj)
  if(quantileNorm==TRUE) mat_lmFit<-normalizeQuantiles(mat_lmFit)
  fitGood <- lmFit(mat_lmFit,design)
  fitGood <- eBayes(fitGood)
  # Save output table
  deg <- topTable(fitGood,coef="survshort",number=Inf, genelist=gene)
  Mean <- t(apply(mat_lmFit, 1, function(x) by(x, surv, mean, na.rm=TRUE)))
  Mean <- Mean[rownames(deg),]
  tab <- data.frame("Gene.ID"=deg[,"ID"], "BetaMean.Short-term"=Mean[,"short"], "BetaMean.Long-term"=Mean[,"long"], deg[,c("P.Value","adj.P.Val")])
  tab
}

Run differential methylation analysis for 450K platform

In [None]:
load("/opt/data/netZooR/gbm/methyl_obj450_pd1.RData")
Platform.450K <- methyl_limma(obj450_pd1,quantileNorm=TRUE)
head(Platform.450K)

Run differential methylation analysis for 27K platform

In [None]:
load("/opt/data/netZooR/gbm/methyl_obj27_pd1.RData")
Platform.27K <- methyl_limma(obj27_pd1,quantileNorm=TRUE)
head(Platform.27K)

## 2. Estimation of cell compositions
This is code for Supplemental Figure S1<sup>1</sup>.
We used xCell to estimate cell compositions in each tumor sample in each of the three datasets. The scores calculated by xCell approximate cell type fractions, and adjust for overlap between closely related cell types. We applied the xCell pipeline (https://xcell.ucsf.edu) to each of the datasets and used the default threshold of 0.2 to filter out cell types not present in the datasets.
We performed t-test followed by p-value adjustment by FDR to compare xCell score values between short and long-term survival groups.
Figure S1 shows boxplots comparing xCell score values between short and long-term survival groups in the discovery and validation datasets. Here we included only cell types with significant abundance levels (xCell scores > 0.2 in at least one sample) as well as total immune and microenvironment scores.
## 2.1. Data processing
#### Supplemental Figure S1.A - Discovery dataset 1

In [None]:
# Load xCell scores data
dat1_xcell <- read.delim("/opt/data/netZooR/gbm/xCell_score_discoverySet1.txt",stringsAsFactors = F, check.names = F)
# Keep only cell types present with an xCell score of at least 0.2 in at least 1 sample
idx <-  apply(dat1_xcell, 2, function(x) sum(x > 0.2) >= 1 )
dat1_xcellsub <- dat1_xcell[,idx]
# Calculate adjusted p-values
a1 <- which(dat1_xcellsub$survival=="long")
a2 <- which(dat1_xcellsub$survival=="short")
pvals <-  apply(dat1_xcellsub[,1:ncol(dat1_xcellsub)-1], 2, function(x) t.test(x[a1],x[a2])$p.val )
adjp <- p.adjust(pvals)
# Cell types with significantly different abundance between long- and short-term survival
# CD8+ naive T-cells FDR=0.01491527
adjp[which(adjp<0.05)] 
# Boxplot comparing xCell score values between long- and short-term survival
dat1_xcellsub_plot <- melt(dat1_xcellsub)
colnames(dat1_xcellsub_plot)[2] <- "celltype"
ggplot(dat1_xcellsub_plot, aes(x = reorder(celltype, -value, FUN = median), y = value, fill = survival))  + geom_boxplot()+ theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust=.3)) +scale_fill_manual(values=c("dodgerblue", "firebrick1")) +labs(title = "Discovery dataset 1", x = "Cell Type", y= "xCell Value") #ordered x axis descending

#### Supplemental Figure S1.B - Discovery dataset 2

In [None]:
# Load xCell scores data
dat2_xcell <- read.delim("/opt/data/netZooR/gbm/xCell_score_discoverySet2.txt",stringsAsFactors = F, check.names = F)
# Keep only cell types present with an xCell score of at least 0.2 in at least 1 sample
idx <-  apply(dat2_xcell, 2, function(x) sum(x > 0.2) >= 1 )
dat2_xcellsub <- dat2_xcell[,idx]
# Calculate adjusted p-values
a1 <- which(dat2_xcellsub$survival=="long")
a2 <- which(dat2_xcellsub$survival=="short")
pvals <-  apply(dat2_xcellsub[,1:ncol(dat2_xcellsub)-1], 2, function(x) t.test(x[a1],x[a2])$p.val )
adjp <- p.adjust(pvals)
# Cell types with significantly different abundance between long- and short-term survival
# CD8+ naive T-cells FDR=0.07766112
# CD4+ memory T-cells FDR=0.09203959
adjp[which(adjp<0.1)] 
# Boxplot comparing xCell score values between long- and short-term survival
dat2_xcellsub_plot <- melt(dat2_xcellsub)
colnames(dat2_xcellsub_plot)[2] <- "celltype"
ggplot(dat2_xcellsub_plot, aes(x = reorder(celltype, -value, FUN = median), y = value, fill = survival))  + geom_boxplot()+ theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust=.3)) +scale_fill_manual(values=c("dodgerblue", "firebrick1")) +labs(title = "Discovery dataset 2", x = "Cell Type", y= "xCell Value") #ordered x axis descending

#### Supplemental Figure S1.C - Validation dataset

In [None]:
# Load xCell scores data
dat3_xcell <- read.delim("/opt/data/netZooR/gbm/xCell_score_validationSet.txt",stringsAsFactors = F, check.names = F)
# Keep only cell types present with an xCell score of at least 0.2 in at least 1 sample
idx <-  apply(dat3_xcell, 2, function(x) sum(x > 0.2) >= 1 )
dat3_xcellsub <- dat3_xcell[,idx]
# Calculate adjusted p-values
a1 <- which(dat3_xcellsub$survival=="long")
a2 <- which(dat3_xcellsub$survival=="short")
pvals <-  apply(dat3_xcellsub[,1:ncol(dat3_xcellsub)-1], 2, function(x) t.test(x[a1],x[a2])$p.val )
adjp <- p.adjust(pvals)
# Cell types with significantly different abundance between long- and short-term survival
# No cell types found with FDR<0.1
adjp[which(adjp<0.1)] 
# Boxplot comparing xCell score values between long- and short-term survival
dat3_xcellsub_plot <- melt(dat3_xcellsub)
colnames(dat3_xcellsub_plot)[2] <- "celltype"
ggplot(dat3_xcellsub_plot, aes(x = reorder(celltype, -value, FUN = median), y = value, fill = survival))  + geom_boxplot()+ theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust=.3)) +scale_fill_manual(values=c("dodgerblue", "firebrick1")) +labs(title = "Validation dataset", x = "Cell Type", y= "xCell Value") #ordered x axis descending

### Comparing short and long-term survival groups

This is the code for Supplemental Figure S2<sup>1</sup>.

In [None]:
# Load xCell scores data
dat1_xcell <- read.delim("/opt/data/netZooR/gbm/xCell_score_discoverySet1.txt",stringsAsFactors = F, check.names = F)
dat2_xcell <- read.delim("/opt/data/netZooR/gbm/xCell_score_discoverySet2.txt",stringsAsFactors = F, check.names = F)
dat3_xcell <- read.delim("/opt/data/netZooR/gbm/xCell_score_validationSet.txt",stringsAsFactors = F, check.names = F)

In [None]:
# Load indegree for PD1 pathway genes
# The PD1 pathway targeting score is the average indegree for genes in the PD1 pathway.
dat1_pd1_deg <- read.delim("/opt/data/netZooR/gbm/indegree_pd1genes_discoverySet1.txt", stringsAsFactors = F)
dat2_pd1_deg <- read.delim("/opt/data/netZooR/gbm/indegree_pd1genes_discoverySet2.txt", stringsAsFactors = F)
dat3_pd1_deg <- read.delim("/opt/data/netZooR/gbm/indegree_pd1genes_validationSet.txt", stringsAsFactors = F)

A) Boxplots comparing the distributions of immune scores from xCell between short- and long-term survival groups in the discovery and validation datasets.

In [None]:
# A) Boxplots comparing the distributions of immune scores from xCell between short- and long-term survival groups in the discovery and validation datasets.
boxplot_one_cellType<-function(xcell_exp,cellType,datasetName){
  xcell <- xcell_exp %>% gather (key = cell_type, value = xcell_value, cellType)
  xcell$xcell_value <-as.numeric(xcell$xcell_value)
  xcell$cell_type <- as.factor(xcell$cell_type)
  xcell$survival <- sub("long","long-term",xcell$survival)
  xcell$survival <- sub("short","short-term",xcell$survival)
  xcell$survival <- factor(xcell$survival, levels=c("long-term", "short-term"))
  plot<-ggboxplot(xcell, x = "survival",y= "xcell_value", fill = "survival",palette = c("dodgerblue", "firebrick1")) + labs(title = datasetName, x = "Survival", y= "xCell Immune Score")
  plot
}

immune_score_box1<-boxplot_one_cellType(dat1_xcell, "ImmuneScore", "Discovery dataset 1")
immune_score_box2<-boxplot_one_cellType(dat2_xcell, "ImmuneScore", "Discovery dataset 2")
immune_score_box3<-boxplot_one_cellType(dat3_xcell, "ImmuneScore", "Validation dataset")

B) Scatter plots of xCell immune scores and PD1 pathway targeting scores in the discovery and validation datasets. Regression lines with confidence intervals of 0.95 are shown for the long-term (blue) and short-term (red) groups.

In [None]:
# B) Scatter plots of xCell immune scores and PD1 pathway targeting scores in the discovery and validation datasets. Regression lines with confidence intervals of 0.95 are shown for the long-term (blue) and short-term (red) groups.
scatterplot_indegree_cellTypeScore<-function(xcell_exp, pd1_deg, cellType, datasetName){
  celltype_score<-xcell_exp[c(cellType,"survival")] 
  celltype_info<-merge(celltype_score, as.data.frame(t(pd1_deg)), by = "row.names")
  celltype_info$pd1_mean<-rowMeans(celltype_info[4:(length(colnames(celltype_info)))])
  plot <-  ggscatter(celltype_info,x="pd1_mean", y="ImmuneScore", color = "survival", add = "reg.line", conf.int = TRUE ,palette = c("dodgerblue", "firebrick1"))+stat_cor(aes(color = survival)) + labs(title = datasetName, x = "PD1 pathway targeting score", y= "xCell Immune Score")
  plot
}

immune_score_scatter1<- scatterplot_indegree_cellTypeScore(dat1_xcell, dat1_pd1_deg, "ImmuneScore", "Discovery dataset 1")
immune_score_scatter2<- scatterplot_indegree_cellTypeScore(dat2_xcell, dat2_pd1_deg, "ImmuneScore", "Discovery dataset 2")
immune_score_scatter3<- scatterplot_indegree_cellTypeScore(dat3_xcell, dat3_pd1_deg, "ImmuneScore", "Validation dataset")

In [None]:
# Plot supplemental figure 2
ggarrange(immune_score_box1, immune_score_box2, immune_score_box3, immune_score_scatter1, immune_score_scatter2, immune_score_scatter3, ncol = 3, nrow = 2)

## References

1- Lopes-Ramos, Camila Miranda, et al. "Regulation of PD1 signaling is associated with prognosis in glioblastoma multiforme." bioRxiv (2021).