# 2.2 Creating count coverage tracks #

### IMPORTANT: Please make sure that you are using the bash kernel to run this notebook. ###
### IMPORTANT: Run the command below to git pull and make sure you are running the latest code!! ###
#### (Do this at the beginning of every session) ###

In [None]:
cd /srv/scratch/training_camp/tc2017/`whoami`/src/training_camp
git stash 
git pull 

In [None]:
### Set up variables storing the location of our data
### The proper way to load your variables is with the ~/.bashrc command, but this is very slow in iPython 
export SUNETID="$(whoami)"
export WORK_DIR="/srv/scratch/training_camp/tc2017/${SUNETID}"
export DATA_DIR="${WORK_DIR}/data"
export FASTQ_DIR="${DATA_DIR}/fastq/"
export SRC_DIR="${WORK_DIR}/src/training_camp/src/"

export ANALYSIS_DIR="${WORK_DIR}/analysis/"
export TRIMMED_DIR="$ANALYSIS_DIR/trimmed"
export ALIGNMENT_DIR="$ANALYSIS_DIR/aligned/"
export TAGALIGN_DIR="$ANALYSIS_DIR/tagAlign/"
export PEAKS_DIR="${ANALYSIS_DIR}peaks/"

export YEAST_DIR="/srv/scratch/training_camp/saccer3/seq"
export YEAST_INDEX="/srv/scratch/training_camp/saccer3/bowtie2_index/saccer3"
export YEAST_CHR="/srv/scratch/training_camp/saccer3/sacCer3.chrom.sizes"

export TMP="${WORK_DIR}/tmp"
export TEMP=$TMP 
export TMPDIR=$TMP

Before running the scripts here, make sure your environment variables for the temp folder are set to something other than the default of /tmp, or you may get an out-of-space error:

In [None]:
echo $TMP 
echo $TEMP
echo $TMPDIR 

This tutorial focuses on generating signal tracks that give coverage at each base pair of the genome:
![Pipeline 2](part2.png)


We will compute the per‐base coverage (number of read starts at each base in the genome) for each sample. We will simply be counting the number of read starts (5’ ends of reads in a strand specific manner) from both strands at each base. This gives us a frequency of cuts at each base.

Note that this is unnormalized coverage i.e. you can’t compare the values per base across samples since samples with overall greater number of reads (sequencing depth) can have greater coverage values simply due to the greater sequencing depth. The normalized signal tracks that we will generate by the peak caller MACS2 are more comparable.

Look at the script **$SRC_DIR/create_countCoverageTracks.sh**. It will use the genomeCoverageBed utility to create the count coverage files. You can see the usage instructions for genomeCoverageBed by typing genomeCoverageBed -h. 

In [None]:
genomeCoverageBed -h

Additional documentation on this and other bed utilities can be found at:

BEDTools software: https://code.google.com/p/bedtools/

BEDTools manual: http://bedtools.readthedocs.org/en/latest/

We will perform the required operations in batch mode using **$SRC_DIR/batch_countCoverage.sh**, which will submit a series of jobs the the queue (each job takes several minutes to run)


In [None]:
$SRC_DIR/batch_countCoverage.sh

Let's create a new "signal" directory to store the counts and fold change bigWig files. 

In [None]:
#create a directory to store the signal data 
SIGNAL_DIR="${ANALYSIS_DIR}signal/"
[[ ! -d $SIGNAL_DIR ]] && mkdir -p "$SIGNAL_DIR"

#create a directory to store the fold change data 
FOLDCHANGE_DIR="${SIGNAL_DIR}foldChange/"
[[ ! -d $FOLDCHANGE_DIR ]] && mkdir -p "$FOLDCHANGE_DIR"

#create a directory to store the counts data 
COUNTS_DIR="${SIGNAL_DIR}counts/"
[[ ! -d $COUNTS_DIR ]] && mkdir -p "$COUNTS_DIR"

In [None]:
cd $TAGALIGN_DIR
mv *.count.bedgraph.gz *.count.bigWig $COUNTS_DIR

convert the fold change files from bedGraph to bigWig format and move them to the $FOLDCHANGE_DIR 

In [None]:
cd $PEAKS_DIR
for fold_change_file in *FE.bdg
do
    #sort the bedgraph file 
    bedtools sort -i $fold_change_file > $fold_change_file\.sorted 
    
    #sometimes MACS2 fold change calculation returns positions outside of the chromosome. We run bedClip to trim 
    #any coordinates that are outside those specified in the YEAST_CHR chrom sizes file 
    bedClip $fold_change_file\.sorted $YEAST_CHR $fold_change_file\.clipped
    
    #compute the fold change 
    fold_change_bigwig_file=$FOLDCHANGE_DIR$(echo $(basename $fold_change_file) | sed -e 's/.bdg/.bigWig/')
    bedGraphToBigWig $fold_change_file\.clipped $YEAST_CHR $fold_change_bigwig_file 
done