# 3.5 GO Term Enrichment for Differentially Accessible Chromatin Regions. #

### IMPORTANT: Please make sure that you are using the bash kernel to run this notebook. ###


In [None]:
### Set up variables storing the location of our data
### The proper way to load your variables is with the ~/.bashrc command, but this is very slow in iPython 
export SUNETID="$(whoami)"
export WORK_DIR="/scratch/${SUNETID}"
export DATA_DIR="${WORK_DIR}/data"
[[ ! -d ${WORK_DIR}/data ]] && mkdir "${WORK_DIR}/data"
export SRC_DIR="${WORK_DIR}/src"
[[ ! -d ${WORK_DIR}/src ]] && mkdir -p "${WORK_DIR}/src"
export METADATA_DIR="/metadata"
export AGGREGATE_DATA_DIR="/data"
export AGGREGATE_ANALYSIS_DIR="/outputs"
export YEAST_DIR="/saccer3"
export TMP="${WORK_DIR}/tmp"
export TEMP=$TMP
export TMPDIR=$TMP
[[ ! -d ${TMP} ]] && mkdir -p "${TMP}"



In this tutorial, we will focus on GO term enrichment analysis: 
![Analysis pipeline](images/part5.png)

In the previous tutorial, we identified differential peaks between pairs of strains and media. These were stored in the $WORK_DIR, as the following files: 

|N Diff Peaks        |Comparison Filename                          |
|--------------------|---------------------------------------------|
| 33                 | WT_0min_vs_45min.negative.txt               |
| 13                 | HOG1_0min_vs_45min.negative.txt             |
| 11                 | MSN1_0min_vs_45min.negative.txt             |
| 11                 | MSN4_0min_vs_45min.negative.txt             |
| 10                 | MSN1_0min_vs_45min.positive.txt              |
| 10                 | MSN2_0min_vs_45min.negative.txt             |
| 5                  | MSN2_0min_vs_45min.positive.txt              |
| 5                  | MSN4_0min_vs_45min.positive.txt              |
| 4                  | HOG1_0min_vs_45min.positive.txt              |
| 4                  | Strain_WT_vs_SKN7.differential.positive.txt |
| 3                  | Strain_WT_vs_MSN2.differential.positive.txt |
| 3                  | WT_0min_vs_45min.positive.txt                |
| 2                  | HOT1_0min_vs_45min.positive.txt              |
| 2                  | HOT1_0min_vs_45min.negative.txt             |
| 1                  | Strain_WT_vs_HOT1.differential.positive.txt |
| 1                  | Strain_WT_vs_MSN1.differential.positive.txt |
| 1                  | Strain_WT_vs_SKN7.differential.negative.txt |
| 1                  | Strain_WT_vs_YAP6.differential.negative.txt |
| 1                  | Strain_WT_vs_YAP6.differential.positive.txt |
| 1                  | YAP1_0min_vs_45min.negative.txt             |
| 0                  | SKN7_0min_vs_45min.positive.txt              |
| 0                  | SKN7_0min_vs_45min.negative.txt             |
| 0                  | Strain_WT_vs_HOG1.differential.negative.txt |
| 0                  | Strain_WT_vs_HOG1.differential.positive.txt |
| 0                  | Strain_WT_vs_HOT1.differential.negative.txt |
| 0                  | Strain_WT_vs_MSN1.differential.negative.txt |
| 0                  | Strain_WT_vs_MSN2.differential.negative.txt |
| 0                  | Strain_WT_vs_MSN4.differential.negative.txt |
| 0                  | Strain_WT_vs_MSN4.differential.positive.txt |
| 0                  | Strain_WT_vs_YAP1.differential.negative.txt |
| 0                  | Strain_WT_vs_YAP1.differential.positive.txt |
| 0                  | Strain_WT_vs_YAP7.differential.negative.txt |
| 0                  | Strain_WT_vs_YAP7.differential.positive.txt |
| 0                  | YAP1_0min_vs_45min.positive.txt              |
| 0                  | YAP6_0min_vs_45min.positive.txt              |
| 0                  | YAP6_0min_vs_45min.negative.txt             |
| 0                  | YAP7_0min_vs_45min.positive.txt              |
| 0                  | YAP7_0min_vs_45min.negative.txt             |


The output files contain chromosome positions of open peaks from ATAC‐seq. The p‐value cutoff for differential openness that we use is 0.01. 

We will map the differential peaks to their nearest genes. In most analyses, we have few differential peaks, but we will attempt GO term enrichment on the comparison with the most differential peaks: WT_0min_vs_45min

In [None]:
cd $WORK_DIR
cat  WT_0min_vs_45min.negative.txt 


In [None]:
cat WT_0min_vs_45min.positive.txt 

We will map the differentially expressed peaks to their nearest genes, as we did in tutorial 3.3, and search for GO term enrichment. The genes close to differential peaks will be the foreground set. The full set of genes near peaks will be the background set. 



In [None]:
#foreground mapping
bedtools closest -D a -a WT_0min_vs_45min.positive.txt -b $YEAST_DIR/yeast_tss_coords.bed >  WT_0min_vs_45min.differential.positive.togene.txt  
bedtools closest -D a -a WT_0min_vs_45min.negative.txt -b $YEAST_DIR/yeast_tss_coords.bed >  WT_0min_vs_45min.differential.negative.togene.txt  



In [None]:
#background mapping (exclude the header line)
sed '1d' all\_merged.peaks.bed| cut -f1,2,3 | bedtools closest -D a -a stdin -b $YEAST_DIR/yeast_tss_coords.bed > all_peaks.togene.txt


In [None]:
cat  WT_0min_vs_45min.differential.positive.togene.txt 

In [None]:
cat  WT_0min_vs_45min.differential.negative.togene.txt 

In [None]:
#As before, we want a list of genes to use in GO Term enrichment, so we extract column 7, which contains the gene names
cut -f7  WT_0min_vs_45min.differential.positive.togene.txt  > foreground.positive.txt
cut -f7  WT_0min_vs_45min.differential.negative.togene.txt  > foreground.negative.txt

cut -f7 all_peaks.togene.txt > background.txt 

In [None]:
head background.txt

In [None]:
wc -l foreground.positive.txt
wc -l foreground.negative.txt

In [None]:
wc -l background.txt

We use GoRilla (http://cbl-gorilla.cs.technion.ac.il/) to check for GO Term enrichment. Upload your differential gene list as the foreground set and the full gene list as the background set. 