# 3.2 Calling differentially expressed peaks

### IMPORTANT: Please make sure that you are using the bash kernel to run this notebook. ###
### IMPORTANT: Run the command below to git pull and make sure you are running the latest code!! ###
#### (Do this at the beginning of every session) ###

In [6]:
cd /srv/scratch/training_camp/tc2017/`whoami`/src/training_camp
git stash 
git pull 

Saved working directory and index state WIP on master: 1c67c83 pca more readable
HEAD is now at 1c67c83 pca more readable
Already up-to-date.


In [7]:
### Set up variables storing the location of our data
### The proper way to load your variables is with the ~/.bashrc command, but this is very slow in iPython 
export SUNETID="$(whoami)"
export WORK_DIR="/srv/scratch/training_camp/tc2017/${SUNETID}"
export DATA_DIR="${WORK_DIR}/data"
export FASTQ_DIR="${DATA_DIR}/fastq/"
export SRC_DIR="${WORK_DIR}/src/training_camp/src/"

export ANALYSIS_DIR="${WORK_DIR}/analysis/"
export TRIMMED_DIR="$ANALYSIS_DIR/trimmed"
export ALIGNMENT_DIR="$ANALYSIS_DIR/aligned/"
export TAGALIGN_DIR="$ANALYSIS_DIR/tagAlign/"
export PEAKS_DIR="$ANALYSIS_DIR/peaks/"
export SIGNAL_DIR="${ANALYSIS_DIR}signal/"
export FOLDCHANGE_DIR="${SIGNAL_DIR}foldChange/"
export COUNTS_DIR="${SIGNAL_DIR}counts/"

export YEAST_DIR="/srv/scratch/training_camp/saccer3/seq"
export YEAST_INDEX="/srv/scratch/training_camp/saccer3/bowtie2_index/saccer3"
export YEAST_CHR="/srv/scratch/training_camp/saccer3/sacCer3.chrom.sizes"

export TMP="${WORK_DIR}/tmp"
export TEMP=$TMP 
export TMPDIR=$TMP

export RLIBS=$RLIBS:"/usr/local/lib/R/site-library"
export MASTER_DATA="/srv/scratch/training_camp/data/tc2017"



In this tutorial, we will focus on calling differential peaks: 
![Analysis pipeline](part4.png)

## Missing R packages 

When running the scripts in this section, if you get an error saying the gplots package has not been installed, you can install the package locally by  running the **3.5 Install R packages** notebook.

## Running DESeq

We run DESeq with 5 comparisons (which we call "contrasts"): 
* Media 
    * SCD vs SCE
* Salt 
    * 1 vs 0 , where 1 = salt used, 0 = no salt used
* Strain: 
    * WT vs cln3 
    * WT vs whi5
    * WT vs whi5cnl3
   

In [10]:
#create a directory to store the DeSeq output 
DESEQ_DIR="${ANALYSIS_DIR}deseq/"
[[ ! -d $DESEQ_DIR ]] && mkdir -p "$DESEQ_DIR"

Rscript $SRC_DIR/runDESeqTrainingCamp.r $MASTER_DATA/counts.filtered.tab $MASTER_DATA/batches.deseq2.txt $DESEQ_DIR

Loading required package: S4Vectors
Loading required package: methods
Loading required package: stats4
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: ‘BiocGenerics’

The following objects are masked from ‘package:parallel’:

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from ‘package:stats’:

    IQR, mad, xtabs

The following objects are masked from ‘package:base’:

    anyDuplicated, append, as.data.frame, cbind, colnames, do.call,
    duplicated, eval, evalq, Filter, Find, get, grep, grepl, intersect,
    is.unsorted, lapply, lengths, Map, mapply, match, mget, order,
    paste, pmax, pmax.int, pmin, pmin.int, Position, rank, rbind,
    Reduce, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit


Attaching package: ‘S4Vectors’

This code will generate 5 pairs of files: 

* Media_SCD_vs_SCE.txt  
* Media_SCD_vs_SCE.txt.sigPeakNames  


* Salt_1_vs_0.txt  
* Salt_1_vs_0.txt.sigPeakNames  


* Strain_WT_vs_cln3.txt  
* Strain_WT_vs_cln3.txt.sigPeakNames


* Strain_WT_vs_whi5.txt  
* Strain_WT_vs_whi5.txt.sigPeakNames  


* Strain_WT_vs_whi5cln3.txt
* Strain_WT_vs_whi5cln3.txt.sigPeakNames


The first is the raw output from DESeq for all peaks. We will not have time to discuss everything in this file, but feel free to read the DESeq manual and see if you can understand it. The second, which ends in “sigPeakNames,” contains a list of the IDs of the differentially open peaks from ATAC‐seq. The p‐value cutoff for differential openness that we use is 0.05. You can examine the content of these files with the following commands: 

In [14]:
head -n20 $DESEQ_DIR/Media_SCD_vs_SCE.txt
#head -n20 $DESEQ_DIR/Salt_1_vs_0.txt
#head -n20 $DESEQ_DIR/Strain_WT_vs_cln3.txt
#head -n20 $DESEQ_DIR/Strain_WT_vs_whi5.txt
#head -n20 $DESEQ_DIR/Strain_WT_vs_whi5cln3.txt


baseMean	log2FoldChange	lfcSE	stat	pvalue	padj
chrI_0_156422	26084.7015972314	0.181049014703123	1.96988531736345	0.0919084035538903	0.926770814654233	0.999384634705468
chrI_156464_156851	44.9589928934705	0.501374031244778	1.33779989187538	0.374775057383159	0.707827765824978	0.999384634705468
chrI_157271_157456	19.9526780442076	0.645956988548164	1.31637284527805	0.490709749039012	0.623631750219979	0.999384634705468
chrI_157831_157984	9.07390983920557	1.06394540998462	1.22506164250447	0.868483162863159	0.385129885681319	0.971104364520846
chrI_158496_158893	72.6942074836576	-0.00515400304259715	1.38361186260075	-0.00372503530933109	0.997027858711757	0.999384634705468
chrI_159448_159806	42.7643467784961	0.126253276130358	1.38680152762531	0.0910391816098931	0.927461457885821	0.999384634705468
chrI_159903_160189	47.6945001293688	0.316446635561479	1.50658461983367	0.210042390845868	0.833634587815822	0.999384634705468
chrI_166115_166931	289.860262308754	0.96948968596705	1.5402926956197

In [15]:
head -n20 $DESEQ_DIR/Media_SCD_vs_SCE.txt.sigPeakNames
#head -n20 $DESEQ_DIR/Salt_1_vs_0.txt.sigPeakNames
#head -n20 $DESEQ_DIR/Strain_WT_vs_cln3.txt.sigPeakNames
#head -n20 $DESEQ_DIR/Strain_WT_vs_whi5.txt.sigPeakNames
#head -n20 $DESEQ_DIR/Strain_WT_vs_whi5cln3.txt.sigPeakNames

baseMean	log2FoldChange	lfcSE	stat	pvalue	padj
chrIV_1515148_1515306	30.4412111751273	-0.861991587977391	0.237574129652288	-3.62830578076408	0.000285287247621631	0.0128213813094144
chrIV_1516434_1516627	48.115544930109	-0.589717383725913	0.183285492204032	-3.21747988143788	0.0012932209635873	0.0402130137725003
chrIX_421454_422802	427.476706622039	0.332056957095462	0.0702447178446448	4.72714486276177	2.27698824298992e-06	0.000371718330668104
chrIX_426990_427998	512.57370579957	-0.86059967095833	0.233060189095366	-3.69260693685519	0.000221966956247123	0.0111495709561055
chrIX_430190_430883	131.705950740097	0.791634838593522	0.208613903967666	3.79473670516333	0.000147800217714995	0.0108701997299132
chrIX_433561_433834	47.7242189773192	0.805226806200644	0.249011292947996	3.23369593670922	0.00122199480819069	0.0398981304874262
chrIX_435365_435534	28.922827342493	0.927461104254123	0.250504409606109	3.70237436423757	0.000213591137332358	0.0111495709561055
chrV_561740_561981	42.9559928