# Systems Immunogenetics Project

## Expression Array DE and Pathway Analysis Workflow

### McWeeney Lab, Oregon Health & Science University

** Authors: Gabrielle Choonoo (choonoo@ohsu.edu) and Michael Mooney (mooneymi@ohsu.edu) **

## Introduction

This is the step-by-step workflow for the DE and pathway analysis of pre-processed expression from the Bat Array data, including plots for DE genes and GO pathways.

Required Files:

* Bioconductor packages: 
+ gdata, plyr,oligo,pd.mogene.2.1.st,Heatplus,reshape2,ReportingTools,hwriter,ggplot2,limma,clusterProfiler,XML, mogene21sttranscriptcluster.db,GOstats,genefilter
* Pre-processing notebook (SIG_Array_QA_QC_Workflow.ipynb): [Download here](https://raw.githubusercontent.com/mooneymi/systems_immunogenetics/master/SIG_Array_QA_QC_Workflow.ipynb)
* This notebook (DE analysis.ipynb): [Download here](https://raw.githubusercontent.com/gchoonoo/DE_analysis/master/DE%20analysis.ipynb)
* The R script (DE_pathway_functions.R): [Download here](https://raw.githubusercontent.com/gchoonoo/DE_analysis/master/DE_pathway_functions.R)

**Note: this notebook can also be downloaded as an R script (only the code blocks seen below will be included): [Download R script here](https://raw.githubusercontent.com/gchoonoo/DE_analysis/master/DE_pathway_analysis.R)

** All code is available on https://github.com/gchoonoo/DE_analysis **



# Step 1. Load Necessary R Functions and Libraries

In [25]:
.libPaths(c('/Users/choonoo/miniconda2/lib/R/library', .libPaths()))
.libPaths()

In [46]:
## Load libraries and functions for DE and pathway analysis

#source("http://bioconductor.org/biocLite.R")

source('DE_pathway_functions.R')

Loading required package: ReportingTools
In library(package, lib.loc = lib.loc, character.only = TRUE, logical.return = TRUE, : there is no package called ‘ReportingTools’Loading required package: clusterProfiler
In library(package, lib.loc = lib.loc, character.only = TRUE, logical.return = TRUE, : there is no package called ‘clusterProfiler’Loading required package: XML
In library(package, lib.loc = lib.loc, character.only = TRUE, logical.return = TRUE, : there is no package called ‘XML’Loading required package: GOstats
In library(package, lib.loc = lib.loc, character.only = TRUE, logical.return = TRUE, : there is no package called ‘GOstats’Loading required package: genefilter
In library(package, lib.loc = lib.loc, character.only = TRUE, logical.return = TRUE, : there is no package called ‘genefilter’

# Step 2: Read in raw expression data

In [15]:
# save file where raw expression is saved
file = "~/de_test_v2.RData"

load(file)

# Step 3: DE Analysis

## Step 3A: Compute normalized expression

In [16]:
# Compute normalization of expression and save to file
norm.exprs = normalize_expression(raw.exprs=raw.exprs)

Background correcting
Normalizing
Calculating Expression
[1] "Saving normalized expression to file..."


## Step 3B: Feature Filter

In [45]:
norm.exprs.filter = filter_features(norm.exprs=norm.exprs)

'select()' returned 1:many mapping between keys and columns


ERROR: Error in findLargest(featureNames(sub.eset), rowIQRs(sub.eset), "mogene21sttranscriptcluster.db"): could not find function ".isOrgSchema"


## Step 3C: Sample Filter

In [42]:
norm.exprs.filter.category = sample_filter(norm.exprs.filter=norm.exprs.filter)

## Step 3D: DE Analysis and save results

In [43]:
# Compute DE analysis and save table of results
de_table = de_analysis_table(norm.exprs.filter.category=norm.exprs.filter.category, category='Category')

head(de_table)

'select()' returned 1:many mapping between keys and columns


ERROR: Error: anyDuplicated(annot.temp$SYMBOL) == 0 is not TRUE


Unnamed: 0,ProbeId,Symbol,Resistant.Sensitive.logFC,Resistant.Sensitive.Signif,p.value
17362028,17362028,Gm14964,1.235306,1,1.051038e-08
17524590,17524590,Zglp1,-0.04195866,0,0.5282334
17493037,17493037,Vmn2r65,0.09694176,0,0.1235148
17234709,17234709,Gm10024,-0.01864596,0,0.7850663
17248721,17248721,F630206G17Rik,-0.01459861,0,0.8145857
17432514,17432514,Oog3,-0.0713851,0,0.6132962


# Step 4: Pathway Analysis

In [38]:
# GO stats enrichment analysis of DE genes
path_results = pathway_analysis(norm.exprs.filter.category=norm.exprs.filter.category, de_table=de_table, pvalue_cutoff=0.05)

head(path_results)

'select()' returned 1:many mapping between keys and columns


ERROR: Error in getClass(Class, where = topenv(parent.frame())): “GOHyperGParams” is not a defined class


Unnamed: 0,GOBPID,Pvalue,OddsRatio,ExpCount,Count,Size,Term
1,GO:0008152,4.291801e-95,1.819908,3533.805,4249,9373,metabolic process
2,GO:0044237,9.608379000000001e-95,1.834199,3023.698,3725,8020,cellular metabolic process
3,GO:0044238,1.3787960000000001e-84,1.767449,3130.771,3796,8304,primary metabolic process
4,GO:0071704,3.396728e-84,1.760046,3272.531,3940,8680,organic substance metabolic process
5,GO:0071840,5.613201e-57,1.738299,1618.922,2072,4294,cellular component organization or biogenesis
6,GO:0044260,4.1565100000000005e-55,1.633057,2234.596,2729,5927,cellular macromolecule metabolic process


## Step 4A: Pathway Visualization

In [47]:
pathway_plots(path_results=path_results,parameter=c("OddsRatio"),pvalue=c('raw'))

ERROR: Error in `[.data.frame`(res_table_v3, , parameter): object 'parameter' not found
