# 3.5 GO Term Enrichment for Differentially Accessible Chromatin Regions. #

### IMPORTANT: Please make sure that you are using the bash kernel to run this notebook. ###


In [None]:
### Set up variables storing the location of our data
### The proper way to load your variables is with the ~/.bashrc command, but this is very slow in iPython 
export SUNETID="$(whoami)"
export WORK_DIR="/srv/scratch/training_camp/work/${SUNETID}"
export DATA_DIR="${WORK_DIR}/data"
[[ ! -d ${WORK_DIR}/data ]] && mkdir "${WORK_DIR}/data"
export SRC_DIR="${WORK_DIR}/src"
[[ ! -d ${WORK_DIR}/src ]] && mkdir -p "${WORK_DIR}/src"
export METADATA_DIR="/srv/scratch/training_camp/metadata"
export AGGREGATE_DATA_DIR="/srv/scratch/training_camp/data"
export AGGREGATE_ANALYSIS_DIR="/srv/scratch/training_camp/aggregate_analysis"
export YEAST_DIR="/srv/scratch/training_camp/saccer3"
export TMP="${WORK_DIR}/tmp"
export TEMP=$TMP
export TMPDIR=$TMP
[[ ! -d ${TMP} ]] && mkdir -p "${TMP}"



In this tutorial, we will focus on GO term enrichment analysis: 
![Analysis pipeline](images/part5.png)

In the previous tutorial, we identified differential peaks between pairs of strains and media. These were stored in the $WORK_DIR, as the following files: 

* Media_YPD_vs_YPGE.txt  
* Media_YPD_vs_YPGE.differential.txt  


* Strain_WT_vs_asf1.txt  
* Strain_WT_vs_asdf1.differential.txt


* Strain_WT_vs_rtt109.txt  
* Strain_WT_vs_rtt109.differential.txt  


* Strain_asf1_vs_rtt109.txt
* Strain_asf1_vs_rtt109.differential.txt



We'll perform GO Term enrichment for the Media variable. You can look at term enrichment between the different strains as an exercise. 

In [None]:
cd $WORK_DIR
head Media_YPD_vs_YPGE.txt

In [None]:
head Media_YPD_vs_YPGE.differential.txt

We will map the differentially expressed peaks to their nearest genes, as we did in tutorial 3.2, and search for GO term enrichment. The genes close to differential peaks will be the foreground set. The full set of genes near peaks will be the background set. 



In [None]:
#foreground mapping
bedtools closest -D a -a Media_YPD_vs_YPGE.differential.txt -b $YEAST_DIR/yeast_tss_coords.bed > Media_YPD_vs_YPGE.differential.togene.txt
#background mapping 
tail -n +2 Media_YPD_vs_YPGE.txt| cut -f1,2,3 | bedtools closest -D a -a stdin -b $YEAST_DIR/yeast_tss_coords.bed > Media_YPD_vs_YPGE.togene.txt


In [None]:
head Media_YPD_vs_YPGE.differential.togene.txt

In [None]:
head Media_YPD_vs_YPGE.togene.txt

In [None]:
#As before, we want a list of genes to use in GO Term enrichment, so we extract column 7, which contains the gene names
cut -f7 Media_YPD_vs_YPGE.differential.togene.txt > Media.foreground.txt
cut -f7 Media_YPD_vs_YPGE.togene.txt > Media.background.txt 

In [None]:
#Add symbolic links to the gene lists in the folder where this notebook is stored.
ln -s $WORK_DIR/*foreground* ~/training_camp/workflow_notebooks
ln -s $WORK_DIR/*background* ~/training_camp/workflow_notebooks


In [None]:
wc -l Media.foreground.txt

In [None]:
wc -l Media.background.txt

Now use the saccharomyces Genome Databases (SGD’s) GO Term finder tools (http://www.yeastgenome.org/cgi-bin/GO/goTermFinder.pl) to check for GO Term enrichment. Upload your differential gene list as the foreground set and the full gene list as the background set. 