# 3.6 Finding TF motifs # 

### IMPORTANT: Please make sure that you are using the bash kernel to run this notebook. ###


In [None]:
### Set up variables storing the location of our data
### The proper way to load your variables is with the ~/.bashrc command, but this is very slow in iPython 
export SUNETID="$(whoami)"
export WORK_DIR="/scratch/${SUNETID}"
export DATA_DIR="${WORK_DIR}/data"
[[ ! -d ${WORK_DIR}/data ]] && mkdir "${WORK_DIR}/data"
export SRC_DIR="${WORK_DIR}/src"
[[ ! -d ${WORK_DIR}/src ]] && mkdir -p "${WORK_DIR}/src"
export METADATA_DIR="/metadata"
export AGGREGATE_DATA_DIR="/data"
export AGGREGATE_ANALYSIS_DIR="/outputs"
export YEAST_DIR="/saccer3"
export TMP="${WORK_DIR}/tmp"
export TEMP=$TMP
export TMPDIR=$TMP
[[ ! -d ${TMP} ]] && mkdir -p "${TMP}"



In this tutorial, we will focus on identifying motifs in the ATAC-seq peaks: 
![Analysis pipeline](images/part6.png)

In [None]:
cd $WORK_DIR

We will look for TF motifs in the differentially open  chromatin regions we have identified. Pick one of the following files to check for motif enrichment: 

* Timepoint_0min_vs_45min.differential.positive.txt  
* Timepoint_0min_vs_45min.differential.negative.txt  
* Strain_WT_vs_MSN1.differential.positive.txt
* Strain_WT_vs_MSN1.differential.negative.txt
* Strain_WT_vs_MSN2.differential.positive.txt
* Strain_WT_vs_MSN2.differential.negative.txt
* Strain_WT_vs_MSN4.differential.positive.txt
* Strain_WT_vs_MSN4.differential.negative.txt
* Strain_WT_vs_HOG1.differential.positive.txt
* Strain_WT_vs_HOG1.differential.negative.txt
* Strain_WT_vs_SKN7.differential.positive.txt
* Strain_WT_vs_SKN7.differential.negative.txt
* Strain_WT_vs_SKO1.differential.positive.txt
* Strain_WT_vs_SKO1.differential.negative.txt
* Strain_WT_vs_SMP1.differential.positive.txt
* Strain_WT_vs_SMP1.differential.negative.txt
* Strain_WT_vs_YAP6.differential.positive.txt
* Strain_WT_vs_YAP6.differential.negative.txt
* Strain_WT_vs_YAP7.differential.positive.txt
* Strain_WT_vs_YAP7.differential.negative.txt





We will use HOMER (http://homer.ucsd.edu/homer/) to search for enriched motifs. First, we load the module for homer:

In [None]:
module load homer 

In [None]:
module list

The specific HOMER command we will use is `findMotifsGenome.pl`. Let's see the inputs and outputs needed by this command:

In [None]:
findMotifsGenome.pl --help

The **pos** file is our list of differential peaks. 

**genome** is the fasta file containing the yeast genome. 

**output dir** is the output directory where HOMER outputs will be stored. 

We leave all other values at their defaults. 


In [None]:
findMotifsGenome.pl $WORK_DIR/Timepoint_0h_vs_4h.differential.positive.txt $YEAST_DIR/sacCer3.fa homer_output_positive
findMotifsGenome.pl $WORK_DIR/Timepoint_0h_vs_4h.differential.negative.txt $YEAST_DIR/sacCer3.fa homer_output_negative



We can examine the contents of the homer_output folder in the browser. 