# 3.5 GO Term Enrichment for Differentially Accessible Chromatin Regions. #

### IMPORTANT: Please make sure that you are using the bash kernel to run this notebook. ###


In [None]:
### Set up variables storing the location of our data
### The proper way to load your variables is with the ~/.bashrc command, but this is very slow in iPython 
export SUNETID="$(whoami)"
export WORK_DIR="/scratch/${SUNETID}"
export DATA_DIR="${WORK_DIR}/data"
[[ ! -d ${WORK_DIR}/data ]] && mkdir "${WORK_DIR}/data"
export SRC_DIR="${WORK_DIR}/src"
[[ ! -d ${WORK_DIR}/src ]] && mkdir -p "${WORK_DIR}/src"
export METADATA_DIR="/metadata"
export AGGREGATE_DATA_DIR="/data"
export AGGREGATE_ANALYSIS_DIR="/outputs"
export YEAST_DIR="/saccer3"
export TMP="${WORK_DIR}/tmp"
export TEMP=$TMP
export TMPDIR=$TMP
[[ ! -d ${TMP} ]] && mkdir -p "${TMP}"



In this tutorial, we will focus on GO term enrichment analysis: 
![Analysis pipeline](images/part5.png)

In the previous tutorial, we identified differential peaks between pairs of strains and media. These were stored in the $WORK_DIR, as the following files: 

* Timepoint_0min_vs_45min.txt  
* Timepoint_0min_vs_45min.differential.positive.txt  
* Timepoint_0min_vs_45min.differential.negative.txt  


* Strain_WT_vs_MSN1.txt  
* Strain_WT_vs_MSN1.differential.positive.txt
* Strain_WT_vs_MSN1.differential.negative.txt


* Strain_WT_vs_MSN2.txt  
* Strain_WT_vs_MSN2.differential.positive.txt
* Strain_WT_vs_MSN2.differential.negative.txt


* Strain_WT_vs_MSN4.txt  
* Strain_WT_vs_MSN4.differential.positive.txt
* Strain_WT_vs_MSN4.differential.negative.txt


* Strain_WT_vs_HOG1.txt  
* Strain_WT_vs_HOG1.differential.positive.txt
* Strain_WT_vs_HOG1.differential.negative.txt


* Strain_WT_vs_SKN7.txt  
* Strain_WT_vs_SKN7.differential.positive.txt
* Strain_WT_vs_SKN7.differential.negative.txt


* Strain_WT_vs_SKO1.txt  
* Strain_WT_vs_SKO1.differential.positive.txt
* Strain_WT_vs_SKO1.differential.negative.txt


* Strain_WT_vs_SMP1.txt  
* Strain_WT_vs_SMP1.differential.positive.txt
* Strain_WT_vs_SMP1.differential.negative.txt


* Strain_WT_vs_YAP6.txt  
* Strain_WT_vs_YAP6.differential.positive.txt
* Strain_WT_vs_YAP6.differential.negative.txt


* Strain_WT_vs_YAP7.txt  
* Strain_WT_vs_YAP7.differential.positive.txt
* Strain_WT_vs_YAP7.differential.negative.txt

We'll perform GO Term enrichment for the Timepoint variable. You can look at term enrichment between the different Strains as an exercise. 

In [None]:
cd $WORK_DIR
head  Timepoint_0h_vs_4h.differential.positive.txt  

We will map the differentially expressed peaks to their nearest genes, as we did in tutorial 3.2, and search for GO term enrichment. The genes close to differential peaks will be the foreground set. The full set of genes near peaks will be the background set. 



In [None]:
#foreground mapping
cut -f1,2,3  Timepoint_0h_vs_4h.differential.positive.txt   | bedtools closest -D a -a stdin -b $YEAST_DIR/yeast_tss_coords.bed >  Timepoint_0h_vs_4h.differential.positive.togene.txt  
cut -f1,2,3  Timepoint_0h_vs_4h.differential.negative.txt   | bedtools closest -D a -a stdin -b $YEAST_DIR/yeast_tss_coords.bed >  Timepoint_0h_vs_4h.differential.negative.togene.txt  

#background mapping (exclude the header line)
tail -n +2 Timepoint_0h_vs_4h.txt| cut -f1,2,3 | bedtools closest -D a -a stdin -b $YEAST_DIR/yeast_tss_coords.bed > Timepoint_0h_vs_4h.togene.txt


In [None]:
head Timepoint_0h_vs_4h.differential.positive.togene.txt

In [None]:
head Timepoint_0h_vs_4h.togene.txt

In [None]:
#As before, we want a list of genes to use in GO Term enrichment, so we extract column 7, which contains the gene names
cut -f7 Timepoint_0h_vs_4h.differential.positive.togene.txt > Timepoint.foreground.positive.txt
cut -f7 Timepoint_0h_vs_4h.differential.negative.togene.txt > Timepoint.foreground.negative.txt

cut -f7 Timepoint_0h_vs_4h.togene.txt > Timepoint.background.txt 

In [None]:
head Timepoint.background.txt

In [None]:
wc -l Timepoint.foreground.positive.txt
wc -l Timeponit.foreground.negative.txt

In [None]:
wc -l Timepoint.background.txt

Now use the saccharomyces Genome Databases (SGD’s) GO Term finder tools (http://www.yeastgenome.org/cgi-bin/GO/goTermFinder.pl) to check for GO Term enrichment. Upload your differential gene list as the foreground set and the full gene list as the background set. 