# Identifying Gene Co-Expression Modules in the Developing Human Brain

**Course**: MED263, "Bioinformatics Applications to Human Disease"  
**Preparer**: Kevin Chau (kkchau@ucsd.edu) (https://github.com/kkchau/MED263)

In this practical, we will be using R, along with other web-service tools, to identify and characterize gene co-expression modules from human brain developmental transcriptome expression data. We will be using publicly available gene expression data from the BrianSpan Atlas, creating the networks with Weighted Gene Co-Expression Network Analysis in R, and characterizing these modules with ENRICHR.

## Set-Up
All of the following packages should already be installed if running from the corresponding Docker container
### Bioconductor packages


In [2]:
# source("https://bioconductor.org/biocLite.R")

We will be using WGCNA for the actual network construction; data will be organized into "Summarized Experiment" objects for ease of use. Installation of these packages may take some time; look for output/progress in the terminal.

In [3]:
# install.packages(c("matrixStats", "Hmisc", "splines", "foreach", "doParallel", "fastcluster", "dynamicTreeCut", "survival"))

In [4]:
# biocLite("WGCNA")
# biocLite("SummarizedExperiment")

### Libraries

In [5]:
library(WGCNA)
library(SummarizedExperiment)

Loading required package: dynamicTreeCut
Loading required package: fastcluster

Attaching package: ‘fastcluster’

The following object is masked from ‘package:stats’:

    hclust




*
*  Package WGCNA 1.62 loaded.
*
*    Important note: It appears that your system supports multi-threading,
*    but it is not enabled within WGCNA in R. 
*    To allow multi-threading within WGCNA with all available cores, use 
*
*          allowWGCNAThreads()
*
*    within R. Use disableWGCNAThreads() to disable threading if necessary.
*    Alternatively, set the following environment variable on your system:
*
*          ALLOW_WGCNA_THREADS=<number_of_processors>
*
*    for example 
*
*          ALLOW_WGCNA_THREADS=2
*
*    To set the environment variable in linux bash shell, type 
*
*           export ALLOW_WGCNA_THREADS=2
*
*     before running R. Other operating systems or shells will
*     have a similar command to achieve the same aim.
*





Attaching package: ‘WGCNA’

The following object is masked from ‘package:stats’:

    cor

Loading required package: GenomicRanges
Loading required package: stats4
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: ‘BiocGenerics’

The following objects are masked from ‘package:parallel’:

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from ‘package:stats’:

    IQR, mad, sd, var, xtabs

The following objects are masked from ‘package:base’:

    anyDuplicated, append, as.data.frame, cbind, colMeans, colnames,
    colSums, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
    grepl, intersect, is.unsorted, lapply, lengths, Map, mapply, match,
    mget, order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
    rbind, Reduce, rowMeans, rownames, rowSums, sapply, setdiff,

### Data Packaging
To begin the data analysis, we will first download and extract gene expression data from the BrainSpan Atlas

In [11]:
url <- "http://www.brainspan.org/api/v2/well_known_file_download/267666525"
utils::download.file(url, destfile="brainSpan.zip", mode='wb')
utils::unzip("brainSpan.zip", exdir="brainSpan")
file.remove("brainSpan.zip")

The downloaded files consist of a RPKM expression matrix, sample metadata, and row metadata, along with a readme file