## Sort and index bam files for downstream analyses and look at the alignment with IGV

### Samtools

Check out the [samtools](http://www.htslib.org/doc/samtools.html) documentation. Since BAM files are binary, they can only be read by the computer. Samtools is a great tool that lets us view the contents of bamfiles and perform various manipulations on them.

Check that you have samtools installed. 

Basic usage:

`samtools <command> [options]`

Try out some of the samtools commands:

`samtools view interesting_file.bam (or interesting_file.sam)`

`samtools flagstat interesting_file.bam`

If your machine shows error associated with No shared libraries:libbz2.so.1.0,

`conda install --override-channels -c conda-forge bzip2`

Let's use samtools to **sort and index our bam files**. Look at the documentation to figure out how you would sort a bam file and save it to a new file with the extension .sorted.bam


`samtools sort -@ 8 -o interesting_file.sorted.bam interesting_file.bam`

We also need a bai index of the sorted bam file. Again, look at the documentation to determine what that command would look like:

`samtools index interesting_file.sorted.bam`

Now that you have figured out what the commands should be, write a submitter script with both of those commands that you can submit for the file you made. The sorting takes 8 processors, so we need to submit a job. Keep in mind that you can include two commands in the same script. Just put one below the other and your second one will run after the first one is finished.


Write a script to sort and index your bam files

`#!/bin/bash`<br>
`#PBS -q hotel`<br>
`#PBS -N sam_index_sort`<br>
`#PBS -l nodes=1:ppn=8`<br>
`#PBS -l walltime=2:00:00`<br>
`#PBS -o sam_index_sort.out`<br>
`#PBS -e sam_index_sort.err`<br>

`for x in DMSO_1_ATCACGAligned.out DMSO_2_CGATGTAligned.out DTP_1_CAGATCAligned.out DTP_2_CCGTCCAligned.out DTP_3_GTGAAAAligned.out`<br>
<br>
`do`
<br>
<br>
`cd ~/scratch/star_alignment`<br>
<br>
`samtools view -S -b $x.sam > $x.bam`
<br>
<br>
`samtools sort -@ 8 -o $x.sorted.bam $x.bam`
<br>
<br>
`samtools index $x.sorted.bam`
<br>
<br>
`done`




### IGV

Check out the [IGV](http://software.broadinstitute.org/software/igv/) website. Go to Downloads page and follow the instructions based on your operating system.


In order to view alignments, you need to upload the bam files to an external server (not TSCC) for viewing. You can also download the bam and the indexed bai files to your desktop and load them from there. But since the files are big, Emily from Yeo lab has uploaded them to an external server for you to view. I have uploaded four files in total, two for each condition (control and knockdown).

**After you complete the download, open IGV (as described above).**

Select your genome with genome - load from server. Choose hg19.

Upload the bam files with - Select File - Load from URL<br>
The URL links are:<br>
<br>
DMSO-1:<br>
https://mstp-bootcamp.s3.us-east-2.amazonaws.com/DMSO_1_ATCACGAligned.out.sorted.bam<br>
https://mstp-bootcamp.s3.us-east-2.amazonaws.com/DMSO_1_ATCACGAligned.out.sorted.bam.bai<br>
<br>
DMSO-2:<br>
https://mstp-bootcamp.s3.us-east-2.amazonaws.com/DMSO_2_CGATGTAligned.out.sorted.bam<br>
https://mstp-bootcamp.s3.us-east-2.amazonaws.com/DMSO_2_CGATGTAligned.out.sorted.bam.bai<br>
<br>
DTP-1:<br>
https://mstp-bootcamp.s3.us-east-2.amazonaws.com/DTP_1_CAGATCAligned.out.sorted.bam<br>
https://mstp-bootcamp.s3.us-east-2.amazonaws.com/DTP_1_CAGATCAligned.out.sorted.bam.bai<br>
<br>
DTP-2:<br>
https://mstp-bootcamp.s3.us-east-2.amazonaws.com/DTP_2_CCGTCCAligned.out.sorted.bam<br>
https://mstp-bootcamp.s3.us-east-2.amazonaws.com/DTP_2_CCGTCCAligned.out.sorted.bam.bai<br>
<br>
DTP-3:<br>
https://mstp-bootcamp.s3.us-east-2.amazonaws.com/DTP_3_GTGAAAAligned.out.sorted.bam<br>
https://mstp-bootcamp.s3.us-east-2.amazonaws.com/DTP_3_GTGAAAAligned.out.sorted.bam.bai<br>

Once you have uploaded all files, play around by viewing different genes or chromosome locations. Can you see genes that clearly have fewer reads in the knockdown vs control datasets?

We will come back to these later on for the initial pass that our most highly differentially expressed genes show expression differences.

When you are ready to quit IGV, you can save the session with File - Save session. Next time you open IGV you can open your saved session without having to reload the BAM files.