<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Microbial-ecological-network-inference" data-toc-modified-id="Microbial-ecological-network-inference-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Microbial ecological network inference</a></span><ul class="toc-item"><li><span><a href="#STEP-1:-R-packages-installation" data-toc-modified-id="STEP-1:-R-packages-installation-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>STEP 1: R packages installation</a></span></li><li><span><a href="#STEP-2:-get-and-load-data-!" data-toc-modified-id="STEP-2:-get-and-load-data-!-1.2"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>STEP 2: get and load data !</a></span></li><li><span><a href="#STEP-3:-data-filtering" data-toc-modified-id="STEP-3:-data-filtering-1.3"><span class="toc-item-num">1.3&nbsp;&nbsp;</span>STEP 3: data filtering</a></span></li><li><span><a href="#STEP-4:-SPIEC-EASI-run" data-toc-modified-id="STEP-4:-SPIEC-EASI-run-1.4"><span class="toc-item-num">1.4&nbsp;&nbsp;</span>STEP 4: SPIEC-EASI run</a></span></li><li><span><a href="#STEP-5:-Graphs-visualizations" data-toc-modified-id="STEP-5:-Graphs-visualizations-1.5"><span class="toc-item-num">1.5&nbsp;&nbsp;</span>STEP 5: Graphs visualizations</a></span></li><li><span><a href="#STEP-6:-Exercice" data-toc-modified-id="STEP-6:-Exercice-1.6"><span class="toc-item-num">1.6&nbsp;&nbsp;</span>STEP 6: Exercice</a></span></li></ul></li></ul></div>

Authors: Samuel Chaffron and Nils Giordano

With credits to Marko Budinich, Erwan Delage, Zachary Kurtz, and Karoline Faust

# Microbial ecological network inference

SPIEC-EASI (SParse InversE Covariance Estimation for Ecological Association
Inference) exploits the fact that under certain assumptions (that all relevant
variables are being considered and the data are multivariate normally
distributed), the inverse covariance matrix corresponds to a network without
indirect edges. SPIEC-EASI estimates the inverse covariance
matrix from sequencing data. The inference of networks using the inverse
covariance matrix is also known in the literature as Graphical Gaussian model
and the inverse covariance matrix is also referred to as precision or partial
correlation matrix. SPIEC-EASI is implemented in R, for further details about
SPIEC-EASI, please check the associated publication:
http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1004226

## STEP 1: R packages installation

In [None]:
# SPIEC-EASI R package installation (https://github.com/zdk123/SpiecEasi)
library(devtools)
library(huge)
#install_github("zdk123/SpiecEasi")
library(SpiecEasi)
# Phyloseq installation
library(gtools)
#source('http://bioconductor.org/biocLite.R')
#biocLite('phyloseq')
library(phyloseq)
# iGraph installation
library(Matrix)
library(igraph)

## STEP 2: get and load data !

http://ocean-microbiome.embl.de/companion.html

Using the Tara miTAGs 16S data both OTUs and taxonomy files have already been
formatted, you find them in the SPIEC-EASI folder:

- OTU table: `miTAG.taxonomic.profiles.release.OTUtable.tsv`
- TAX table: `miTAG.taxonomic.profiles.release.TAXtable.tsv`

In [None]:
DIR = 'Data/'

# Load abundance matrix
otumat = read.csv(paste(DIR,"miTAG.taxonomic.profiles.release.OTUtable.tsv",sep=""),
                  sep="\t", row.names=1, check.names=FALSE)
taxmat = read.csv(paste(DIR,"miTAG.taxonomic.profiles.release.TAXtable.tsv",sep=""),
                  sep="\t", row.names=1, check.names=FALSE)
otus = otu_table(otumat, taxa_are_rows = TRUE)
taxa = tax_table(as.matrix(taxmat))
mat = phyloseq(otus, taxa)

# OTUs as columns
totus = t(otus)

## STEP 3: data filtering

It is common practice to pre-filter your OTU data by relative abudance cut-off
and/or prevalence, study the 2 functions below

In [None]:
# function to perform pre-filtering on OTU with low abundance relative to total abundance
# OTUS with an abundance lower than 0.01% of total abundance are removed from the table
low.count.removal = function(
  data, # OTU count data frame of size n (sample) x p (OTU)
  percent=0.01 # cutoff chosen
){
  keep.otu = which(colSums(data)*100/(sum(colSums(data))) > percent)
  data.filter = data[,keep.otu]
  return(list(data.filter = data.filter, keep.otu = keep.otu))
}

# function to perform pre-filtering on OTU with low presence across stations
# OTUS that appears in less than 5% of stations are removed from the table
min.stations.removal = function(
  data, # OTU count data frame of size n (sample) x p (OTU)
  percent=0.05 # cutoff chosen
){
  keep.otu = which(colSums(data != 0) > percent * dim(data)[1])
  data.filter = data[,keep.otu]
  return(list(data.filter = data.filter, keep.otu = keep.otu))
}


# Initialize thresholds (in %)
thresholdAbundance = 0.1
thresholdPresence = 0.6

# Remove OTUs with relative abundance lower than 0.1% of total abundance
cat("NB stations :",dim(totus)[1],"\n")
cat("NB otus before abundance filter :",dim(totus)[2],"\n")
nbOtusInitial = dim(totus)[2]

ftr = low.count.removal(totus, thresholdAbundance)
totus = totus[,ftr$keep.otu]
nbOtusAfterAbFilter = dim(totus)[2]
cat("NB otus after abundance filter :",nbOtusAfterAbFilter,"\n")

# Filter OTU's on presence in minimum number of stations
ftr = min.stations.removal(totus,thresholdPresence)
totus = totus[,ftr$keep.otu]
nbOtusAfterPrFilter = dim(totus)[2]
cat("NB otus after presence filter :",nbOtusAfterPrFilter,"\n")

# update phyloseq object after filtering
otus = otu_table(totus, taxa_are_rows = F)
taxa = tax_table(as.matrix(taxa[colnames(otus),]))
physeqo = phyloseq(t(otus), taxa)

## STEP 4: SPIEC-EASI run

In [None]:
# SPIEC-EASI default parameters
nc = 10
lambda.min.ratio = 1e-2 #!!!
nlambda = 20
rep.num = 20
stars.thresh = 0.05  ##!!! or 0.1 (for details see function huge.select from R package huge)

# SPIEC-EASI run
physeqo.mb = spiec.easi(physeqo, method='mb', lambda.min.ratio=lambda.min.ratio, nlambda=nlambda, icov.select.params=list(stars.thresh = stars.thresh, rep.num=rep.num, ncores=nc))
print(physeqo.mb)
# The above example does not cover all possible options and parameters. For
# example, other generative network models are available, the lambda.min.ratio
# (the scaling factor that determines the minimum sparsity/lambda parameter)
# shown here might not be right for your dataset. Additionally, increasing the
# rep.num argument (the number of StARS subsamples) may result in better
# performance.

# Build iGraph object
# Extract the regression coefficients from the SPIEC-EASI
# output, which for method mb is achieved with command getOptBeta. The
# regression coefficient matrix is not symmetric and can be symmetrised with
# command symBeta
adj   = physeqo.mb
adj.g = adj2igraph(symBeta(getOptBeta(adj), mode='maxabs'), vertex.attr=list(name=taxa_names(physeqo)))
hist(E(adj.g)$weight)

## STEP 5: Graphs visualizations

In [None]:
## Using iGraph
## set size of vertex proportional to clr-mean
vsize <- rowMeans(clr(otus, 1))+3
## set layout
am.coord <- layout_with_graphopt(adj.g)
plot(adj.g, layout=am.coord, vertex.size=vsize, vertex.label=NA, vertex.color="aquamarine2", edge.color="black", edge.width=E(adj.g)$weight, main="Tara euphotic network")
# degree stats
dd.mb <- degree.distribution(adj.g)
plot(0:(length(dd.mb)-1), dd.mb, ylim=c(0,.35), type='b',
     ylab="Frequency", xlab="Degree", main="Degree Distributions")

## Using phyloseq
pdf(paste(DIR,"SPIEC-EASI.networks.pdf",sep=""), paper = "a4r", width=29, height=21)
plot_network(adj.g, taxa, type='taxa', color="Genus", label=NULL)
plot_network(adj.g, taxa, type='taxa', color="Class", label=NULL)
plot_network(adj.g, taxa, type='taxa', color="Phylum", label=NULL)
dev.off()

# Export graph
write_graph(adj.g, paste(DIR,"Tara.SUR.DCM.genus.merged16S.graphml",sep=""), format="graphml")

## STEP 6: Exercice

Extract the number of positive and negative associations inferred by SPIEC-EASI.

They can be obtained from the matrix of regression coefficients stored in the adjancency matrix (adj.g)

In [None]:
# YOUR CODE HERE