### IMPORTANT: Please make sure that you are using the bash kernel to run this notebook.###


In [None]:
### Set up variables storing the location of our data
### The proper way to load your variables is with the ~/.bashrc command, but this is very slow in iPython 
export SUNETID="$(whoami)"
export WORK_DIR="/scratch/${SUNETID}"
export DATA_DIR="${WORK_DIR}/data"
[[ ! -d ${WORK_DIR}/data ]] && mkdir "${WORK_DIR}/data"
export SRC_DIR="${WORK_DIR}/src"
[[ ! -d ${WORK_DIR}/src ]] && mkdir -p "${WORK_DIR}/src"
export METADATA_DIR="/metadata"
export AGGREGATE_DATA_DIR="/data"
export AGGREGATE_ANALYSIS_DIR="/outputs"
export YEAST_DIR="/saccer3"
export TMP="${WORK_DIR}/tmp"
export TEMP=$TMP
export TMPDIR=$TMP
[[ ! -d ${TMP} ]] && mkdir -p "${TMP}"



In [None]:
mkdir $WORK_DIR/Tobias
cd $WORK_DIR/Tobias

We will be using the [TOBIAS](https://github.com/loosolab/TOBIAS) tool created by Mette Bentsen to correct for Tn5 bias and generate footprints. From the DESEQ2 analysis, it appears that the highest number of differential peaks where observed in the SKN7 strain between 0 minutes and 45 minutes. We will perform differential footprinting for experiments "**0min_SKN7**" and "**45min_SKN7**" for the REB1 and XCPE1 transcription factors. 



First, let's merge the filtered bam files for the two replicates in our dataset. We can do this with the bamtools merge command. 

We also get the combined idr optimal peak set for the 2 conditions. 

In [None]:
rm 0min_45min_SKN7.optimal_peak.narrowPeak.bed #if you run this more than once, make sure you overwrite the merged peak file

for experiment in 45min_SKN7 0min_SKN7
do
    rep1_bam=$AGGREGATE_ANALYSIS_DIR/croo_pilot/$experiment/align/rep1/$experiment\_1_R1.merged.nodup.bam 
    rep2_bam=$AGGREGATE_ANALYSIS_DIR/croo_pilot/$experiment/align/rep2/$experiment\_2_R1.merged.nodup.bam
    idr_peaks=$AGGREGATE_ANALYSIS_DIR/croo_pilot/$experiment/peak/idr_reproducibility/optimal_peak.narrowPeak.gz
    
    #merge the bam file replicates 
    samtools merge -f $experiment.merged.bam $rep1_bam $rep2_bam 
    samtools index  $experiment.merged.bam
    zcat $idr_peaks >> 0min_45min_SKN7.optimal_peak.narrowPeak.bed 
done 

## Step 1:  ATACorrect: Bias correction of ATAC-seq reads in open chromatin
ATACorrect corrects the cutsite-signal from ATAC-seq with regard to the underlying
sequence preference of Tn5 transposase.

Usage:
TOBIAS ATACorrect --bam <reads.bam> --genome <genome.fa> --peaks <peaks.bed>

Output files:
- <outdir>/<prefix>_uncorrected.bw
- <outdir>/<prefix>_bias.bw
- <outdir>/<prefix>_expected.bw
- <outdir>/<prefix>_corrected.bw
- <outdir>/<prefix>_atacorrect.pdf

In [None]:
#This can take 5 - 10 minutes to run 
for experiment in 45min_SKN7 0min_SKN7
do
    TOBIAS ATACorrect --bam $experiment.merged.bam --genome $YEAST_DIR/sacCer3.fa --peaks  0min_45min_SKN7.optimal_peak.narrowPeak.bed 
done

![bias_forward](images/bias_forward.png)
![bias_reverse](images/bias_reverse.png)
![bias_corrected_forward](images/bias_corrected_forward.png)
![bias_corrected_reverse](images/bias_corrected_reverse.png)


In [None]:
#verify that files got generated in current directory 
ls -lah

We can examine the observed Tn5 bias before and after correction: 
http://1.gentc.net:/scratch/annashch/Tobias/0min_HOG1.merged_atacorrect.pdf
(replace 'annashch' with your username, and replace '0min_HOG1' with your sample name) 

## Step 2: ScoreBigwig: Calculate footprint scores from corrected cutsites


TOBIAS [ScoreBigwig](https://github.com/loosolab/TOBIAS/wiki/ScoreBigwig) is used to calculate a continuous footprinting score across regions. 


--signal
Signal bigwig containing cutsites per basepair - e.g. {prefix}_corrected.bw from ATACorrect

--regions
Limits the computation of footprints to regions found in --regions .bed-file (in most cases the peaks of accessible chromatin).

--score The score to apply to the input --signal. The default is "footprint", which calculates the footprint score depicted above. Other options include "sum" and "mean", which can be used to calculate other types of scores on the input --signal.

In [None]:
for experiment in 45min_SKN7 0min_SKN7
do
    TOBIAS FootprintScores --signal $experiment.merged_corrected.bw --regions 0min_45min_SKN7.optimal_peak.narrowPeak.bed   --output $experiment\_footprints.bw --cores 8
done

## Step 3: BinDetect: Estimation of differentially bound motifs based on scores, sequence and motifs

In [None]:
#Motif REB1
TOBIAS BINDetect --motifs /data/motif_pfm/REB1.pfm \
                 --signals 0min_SKN7_footprints.bw 45min_SKN7_footprints.bw \
                 --genome $YEAST_DIR/sacCer3.fa \
                 --peaks 0min_45min_SKN7.optimal_peak.narrowPeak.bed \
                 --outdir BINDetect_output \
                 --cond_names 0min_SKN7 45min_SKN7 \
                 --cores 1
#Motif XCPE1                 
TOBIAS BINDetect --motifs /data/motif_pfm/XCPE1.pfm \
                 --signals 0min_SKN7_footprints.bw 45min_SKN7_footprints.bw \
                 --genome $YEAST_DIR/sacCer3.fa \
                 --peaks 0min_45min_SKN7.optimal_peak.narrowPeak.bed \
                 --outdir BINDetect_output \
                 --cond_names 0min_SKN7 45min_SKN7 \
                 --cores 1                 


We observe differential binding for XCPE1 but not REB1 -- corroborating the HOMER results: 

![REB1_diff_bind](images/REB1_diff_bind.png)
![XCPE1_diff_bind](images/XCPE1_diff_bind.png)



In [None]:
#Visualize heatmap for Reb1
TOBIAS PlotHeatmap --TFBS BINDetect_output/REB1_MA0363.1/beds/REB1_MA0363.1_0min_SKN7_bound.bed BINDetect_output/REB1_MA0363.1/beds/REB1_MA0363.1_0min_SKN7_unbound.bed \
                   --TFBS BINDetect_output/REB1_MA0363.1/beds/REB1_MA0363.1_45min_SKN7_bound.bed BINDetect_output/REB1_MA0363.1/beds/REB1_MA0363.1_45min_SKN7_unbound.bed \
                   --signals 0min_SKN7_footprints.bw 45min_SKN7_footprints.bw \
                   --output REB1_heatmap.png --signal_labels 0min_SKN7 45min_SKN7 \
                   --share_colorbar \
                   --sort_by -1 \
                   --flank 50
                   
#Visualize heatmap for XCPE1
TOBIAS PlotHeatmap --TFBS BINDetect_output/XCPE1_POL011.1/beds/XCPE1_POL011.1_0min_SKN7_bound.bed BINDetect_output/XCPE1_POL011.1/beds/XCPE1_POL011.1_0min_SKN7_unbound.bed \
                   --TFBS BINDetect_output/XCPE1_POL011.1/beds/XCPE1_POL011.1_45min_SKN7_bound.bed BINDetect_output/XCPE1_POL011.1/beds/XCPE1_POL011.1_45min_SKN7_unbound.bed \
                   --signals 0min_SKN7_footprints.bw 45min_SKN7_footprints.bw \
                   --output XCPE1_heatmap.png --signal_labels 0min_SKN7 45min_SKN7 \
                   --share_colorbar \
                   --sort_by -1 \
                   --flank 50

These are the resulting footprint heatmap for XCPE1: 
![XCPE1_heatmap](images/XCPE1_heatmap.png)


These are the resulting footprint heatmap for REB1: 
![REB1_heatmap](images/REB1_heatmap.png)

