### IMPORTANT: Please make sure that you are using the bash kernel to run this notebook.###


In [34]:
### Set up variables storing the location of our data
### The proper way to load your variables is with the ~/.bashrc command, but this is very slow in iPython 
export SUNETID="$(whoami)"
export WORK_DIR="/scratch/${SUNETID}"
export DATA_DIR="${WORK_DIR}/data"
[[ ! -d ${WORK_DIR}/data ]] && mkdir "${WORK_DIR}/data"
export SRC_DIR="${WORK_DIR}/src"
[[ ! -d ${WORK_DIR}/src ]] && mkdir -p "${WORK_DIR}/src"
export METADATA_DIR="/metadata"
export AGGREGATE_DATA_DIR="/data"
export AGGREGATE_ANALYSIS_DIR="/outputs"
export YEAST_DIR="/saccer3"
export TMP="${WORK_DIR}/tmp"
export TEMP=$TMP
export TMPDIR=$TMP
[[ ! -d ${TMP} ]] && mkdir -p "${TMP}"



: 1

In [None]:
mkdir $WORK_DIR/Tobias
cd $WORK_DIR/Tobias

We will be using the [TOBIAS](https://github.com/loosolab/TOBIAS) tool created by Mette Bentsen to correct for Tn5 bias and generate footprints. From the HOMER analysis, it appears that the de novo motif resembling POL011.1_XCPE1/Jaspar(0.681) is differential in the SKN7 strain between 0 minutes and 45 minutes. We will run this analysis  using experiment names "**0min_SKN7**" and "**45min_SKN7**" to perform differential fooptrinting of POL011.1 in these two samples. 




First, let's merge the filtered bam files for the two replicates in our dataset. We can do this with the bamtools merge command. 

We also get the combined idr optimal peak set for the 2 conditions. 

In [56]:
rm 0min_45min_SKN7.optimal_peak.narrowPeak.bed #if you run this more than once, make sure you overwrite the merged peak file

for experiment in 45min_SKN7 0min_SKN7
do
    rep1_bam=$AGGREGATE_ANALYSIS_DIR/croo/$experiment/align/rep1/$experiment\_1_R1.merged.nodup.bam 
    rep2_bam=$AGGREGATE_ANALYSIS_DIR/croo/$experiment/align/rep2/$experiment\_2_R1.merged.nodup.bam
    idr_peaks=$AGGREGATE_ANALYSIS_DIR/croo/$experiment/peak/idr_reproducibility/optimal_peak.narrowPeak.gz
    
    #merge the bam file replicates 
    samtools merge -f $experiment.merged.bam $rep1_bam $rep2_bam 
    zcat $idr_peaks >> 0min_45min_SKN7.optimal_peak.narrowPeak.bed 
done 

## Step 1:  ATACorrect: Bias correction of ATAC-seq reads in open chromatin
ATACorrect corrects the cutsite-signal from ATAC-seq with regard to the underlying
sequence preference of Tn5 transposase.

Usage:
TOBIAS ATACorrect --bam <reads.bam> --genome <genome.fa> --peaks <peaks.bed>

Output files:
- <outdir>/<prefix>_uncorrected.bw
- <outdir>/<prefix>_bias.bw
- <outdir>/<prefix>_expected.bw
- <outdir>/<prefix>_corrected.bw
- <outdir>/<prefix>_atacorrect.pdf

In [None]:
#This can take 5 - 10 minutes to run 
for experiment in 45min_SKN7 0min_SKN7
do
    TOBIAS ATACorrect --bam $experiment.merged.bam --genome $YEAST_DIR/sacCer3.fa --peaks  0min_45min_SKN7.optimal_peak.narrowPeak.bed 
done

In [50]:
ls -lah

total 74M
drwxr-xr-x 2 annashch users 6.0K Aug 28 18:09 .
drwxrwxrwx 8 annashch users  14K Aug 28 17:34 ..
-rw-r--r-- 1 annashch users  44M Aug 28 18:03 0min_SKN7.merged.bam
-rw-r--r-- 1 annashch users  12K Aug 28 18:03 0min_SKN7.merged.bam.bai
-rw-r--r-- 1 annashch users 2.2M Aug 28 18:04 0min_SKN7.merged_AtacBias.pickle
-rw-r--r-- 1 annashch users  34K Aug 28 18:05 0min_SKN7.merged_atacorrect.pdf
-rw-r--r-- 1 annashch users 2.9M Aug 28 18:05 0min_SKN7.merged_bias.bw
-rw-r--r-- 1 annashch users 1.9M Aug 28 18:05 0min_SKN7.merged_corrected.bw
-rw-r--r-- 1 annashch users 1.8M Aug 28 18:05 0min_SKN7.merged_expected.bw
-rw-r--r-- 1 annashch users 513K Aug 28 18:05 0min_SKN7.merged_uncorrected.bw
-rw-r--r-- 1 annashch users 102K Aug 28 18:03 0min_SKN7.optimal_peak.narrowPeak.bed
-rw-r--r-- 1 annashch users 1.7M Aug 28 18:05 0min_SKN7_footprints.bw
-rw-r--r-- 1 annashch users  17M Aug 28 18:09 45min_SKN7.merged.bam
-rw-r--r-- 1 annashch users  11K Aug 28 18:09 45min_SKN7.merged.bam.bai
-rw-

We can examine the observed Tn5 bias before and after correction: 
http://1.gentc.net:/scratch/annashch/Tobias/0min_HOG1.merged_atacorrect.pdf
(replace 'annashch' with your username, and replace '0min_HOG1' with your sample name) 

## Step 2: ScoreBigwig: Calculate footprint scores from corrected cutsites


TOBIAS [ScoreBigwig](https://github.com/loosolab/TOBIAS/wiki/ScoreBigwig) is used to calculate a continuous footprinting score across regions. 


--signal
Signal bigwig containing cutsites per basepair - e.g. {prefix}_corrected.bw from ATACorrect

--regions
Limits the computation of footprints to regions found in --regions .bed-file (in most cases the peaks of accessible chromatin).

--score The score to apply to the input --signal. The default is "footprint", which calculates the footprint score depicted above. Other options include "sum" and "mean", which can be used to calculate other types of scores on the input --signal.

In [None]:
for experiment in 45min_SKN7 0min_SKN7
do
    TOBIAS FootprintScores --signal $experiment.merged_corrected.bw --regions 0min_45min_SKN7.optimal_peak.narrowPeak.bed   --output $experiment\_footprints.bw --cores 8
done

## Step 3: BinDetect: Estimation of differentially bound motifs based on scores, sequence and motifs

In [62]:
ls

0min_45min_SKN7.optimal_peak.narrowPeak.bed  45min_SKN7.merged.bam
0min_SKN7.merged.bam			     45min_SKN7.merged.bam.bai
0min_SKN7.merged.bam.bai		     45min_SKN7.merged_AtacBias.pickle
0min_SKN7.merged_AtacBias.pickle	     45min_SKN7.merged_atacorrect.pdf
0min_SKN7.merged_atacorrect.pdf		     45min_SKN7.merged_bias.bw
0min_SKN7.merged_bias.bw		     45min_SKN7.merged_corrected.bw
0min_SKN7.merged_corrected.bw		     45min_SKN7.merged_expected.bw
0min_SKN7.merged_expected.bw		     45min_SKN7.merged_uncorrected.bw
0min_SKN7.merged_uncorrected.bw		     45min_SKN7_footprints.bw
0min_SKN7_footprints.bw


In [None]:
TOBIAS BINDetect --motifs /data/motif_pfm/XCPE1.pfm \
                 --signals 0min_SKN7_footprints.bw 45min_SKN7_footprints.bw \
                 --genome $YEAST_DIR/sacCer3.fa \
                 --peaks 0min_45min_SKN7.optimal_peak.narrowPeak.bed \
                 --outdir BINDetect_output \
                 --cond_names 0min_SKN7 45min_SKN7 \
                 --cores 1

In [76]:
#Visualize heatmap 
TOBIAS PlotHeatmap --TFBS BINDetect_output/XCPE1_POL011.1/beds/XCPE1_POL011.1_all.bed --signals 0min_SKN7_footprints.bw 45min_SKN7_footprints.bw --output XCPE1_heatmap.png --sort_by -2

# TOBIAS 0.11.6 PlotHeatmap (run started 2020-08-28 18:45:12.318945)
# Working directory: /scratch/annashch/Tobias
# Command line call: TOBIAS PlotHeatmap --TFBS BINDetect_output/XCPE1_POL011.1/beds/XCPE1_POL011.1_all.bed --signals 0min_SKN7_footprints.bw 45min_SKN7_footprints.bw --output XCPE1_heatmap.png --sort_by -2

# ----- Input parameters -----
# TFBS:	[['BINDetect_output/XCPE1_POL011.1/beds/XCPE1_POL011.1_all.bed']]
# signals:	['0min_SKN7_footprints.bw', '45min_SKN7_footprints.bw']
# output:	XCPE1_heatmap.png
# plot_boundaries:	False
# share_colorbar:	False
# flank:	75
# title:	TOBIAS heatmap
# TFBS_labels:	None
# signal_labels:	None
# show_columns:	[]
# sort_by:	-2
# verbosity:	3


# ----- Output files -----
# XCPE1_heatmap.png


2020-08-28 18:45:12 (63817) [INFO]	Using bedfiles: [['BINDetect_output/XCPE1_POL011.1/beds/XCPE1_POL011.1_all.bed']] across all bigwigs

2020-08-28 18:45:12 (63817) [INFO]	Reading bedfiles
2020-08-28 18:45:12 (63817) [INFO]	- Read 103 sites from BINDet