# Visualize BAM file alignment at the Leptin gene locus

Our results are in the analysed directory

In [None]:
!ls /home/gea_user/rna-seq-project/kallisto/analyzed

**Notice**: In this notebook we will be using the [Python](https://www.python.org/) package [genomeview](https://genomeview.readthedocs.io/en/latest/index.html) to visualize our BAM alignments. All of our normal shell commands will begin with a `!` - python commands will not begin with this character. 

 lets remind ourselves what each of these directories contain:

|SRA_Sample	|SRR Read Name|Sample_Name|
|-----------|-------------|-----------|
|SRS1794112|SRR5017139|Regular Diet Control 1|
|SRS1794102|SRR5017129|Regular Diet Control 2|
|SRS1794109|SRR5017136|Regular Diet Control 3|
|SRS1794105|SRR5017132|High-Fat Diet Tumor 1|
|SRS1794101|SRR5017128|High-Fat Diet Tumor 2|
|SRS1794111|SRR5017138|High-Fat Diet Tumor 3|

**Optional**: We need a copy of the mouse genome to visualize an alignment of these reads. We can use the following command to download the primary assembly from [ensembl](http://useast.ensembl.org/Mus_musculus/Info/Index). This genome has already been provided for you, but you could change the FTP link to download a genome of your choice for a different dataset. 

In [None]:
# get mouse genome using wget - uncomment the line below
#!wget ftp://ftp.ensembl.org/pub/release-97/fasta/mus_musculus/dna/Mus_musculus.GRCm38.dna.primary_assembly.fa.gz

# If you import the mouse genome you will also need to decompress it by uncommenting the line below
#!gzip -d Mus_musculus.GRCm38.dna.primary_assembly.fa.gz

We will make a directory for the downloaded genome (`mkidr -p`) and move (`mv`) the downloaded genome to the new directory. We then use the (`gzip -d`) command to decompress the genome. 

In [None]:
!mkdir -p /home/gea_user/rna-seq-project/genomes

In [None]:
!mv /home/gea_user/data/pre-imported/genomes/Mus_musculus.GRCm38.dna.primary_assembly.fa /home/gea_user/rna-seq-project/genomes/

We will now import the `genomeview` python package

In [None]:
import genomeview

We will specify the datasets ([bam](https://en.wikipedia.org/wiki/SAM_(file_format)) files), as well as the location of our reference genome and the chromosome to visualize (including the start and stop position on the chromosome)

In [None]:
datasets = {"Control Diet 1":"/home/gea_user/rna-seq-project/kallisto/analyzed/SRR5017139_trimmed.fastq.gz_quant/pseudoalignments.bam",
            "Control Diet 2":"/home/gea_user/rna-seq-project/kallisto/analyzed/SRR5017129_trimmed.fastq.gz_quant/pseudoalignments.bam",
            "Control Diet 3":"/home/gea_user/rna-seq-project/kallisto/analyzed/SRR5017136_trimmed.fastq.gz_quant/pseudoalignments.bam",
           "High-fat Diet Tumor 1":"/home/gea_user/rna-seq-project/kallisto/analyzed/SRR5017132_trimmed.fastq.gz_quant/pseudoalignments.bam", 
           "High-fat Diet Tumor 2":"/home/gea_user/rna-seq-project/kallisto/analyzed/SRR5017128_trimmed.fastq.gz_quant/pseudoalignments.bam", 
           "High-fat Diet Tumor 3":"/home/gea_user/rna-seq-project/kallisto/analyzed/SRR5017138_trimmed.fastq.gz_quant/pseudoalignments.bam"}
reference = "/home/gea_user/rna-seq-project/genomes/Mus_musculus.GRCm38.dna.primary_assembly.fa"
chrom = "chr6"
start = 29060195 
end = 29073877
visualization = genomeview.visualize_data(datasets, chrom, start, end, reference)

Finally, we can visualize the three bam files for each of the control/high-fat diets. The blue tick-marks represent individual RNA-Seq reads that map to this region of the genome (the [leptin gene](https://www.ncbi.nlm.nih.gov/gene/16846)). Clearly, in the mouse tumor samples from the high-fat diet mice, expression of this gene is greatly increased. Although each replicated experiment does not have exactly the same number of mapped reads, there is a clear difference. Further statistical analysis would allow us to further describe the signifigance of the difference. 

In [None]:
visualization