## Visualizing alignments with IGV


### IGV 
or Integrated Genomics Viewer, is a "high-performance visualization tool for interactive exploration of large, integrated genomic datasets. It supports a wide variety of data types, including array-based and next-generation sequence data, and genomic annotations." It can be downloaded [here](http://software.broadinstitute.org/software/igv/download). I am using version `2.3.97`. 



### Objective: 
Visualize alignment of Alaskan and Pacific cod data to the Atlantic cod genome. I have created two fasta files that contain the consensus sequences for all loci retained after all filtering steps (MAF, missing data, HWE). Each population has its own file; there are `2945` sequences in the Alaskan fasta file (batch 2), and `6637` in the Korean fasta file (batch 6). I want to see how these files align to the Atlantic cod genome relative to each other.




### Prep: 

**(1) Download and unzip IGV** - for now, I have the IGV executable in `Downloads` because I'm short on space on my drive. 

**(2) Load in Atlantic cod genome** - I had previously downloaded the Atlantic cod genome from Ensembl (see [this]() notebook). I used `Genomes >> Load Genome from File` on the top tool bar in the IGV gui to load in the Atlantic cod genome's fasta file. 

![IGV_img]()

*Note that in order to actually see the sequences, you have to select a chromosome or scaffold from the top dropdown menu, and then zoom way in from kilo-basepairs to ~150-200 basepairs. *

![IGV_zoomed_img]()

<br>

<br>

### Step 1: BLAST Alaskan and Korean cod to Atlantic cod genome

While I have done this already in a [prior notebook](), I need a `.gff` file to load into IGV. 


#### ATTEMPT #1
I'm going to use a [shell script]() from [Alvar Almstedt's github page](https://github.com/alvaralmstedt/Tutorials/wiki/How-to-convert-your-BLAST-results-into-a-gff-file.) which runs blast and then automatically convert the output to a gff format. 



In [1]:
cd ../scripts

/mnt/hgfs/Pacific cod/DataAnalysis/PCod-Compare-repo/scripts


In [4]:
!head -n 25 blast_to_gff_wrapper.sh

#!/bin/bash

HELP="""
Wrapper for the blast_to_gff.py script. Use this to ensure that your gff file makes sense.
By: Alvar Almstedt (alvar.almstedt@gmail.com)

Usage: blast_to_gff_wrapper.sh -q <query file> -d <database file> -p <blast program>

Options:
	-h	:	Help. What you are reading now.
	-q	:	Query. Put the path to your query fasta here.
	-d	:	Database. Put the path to your blast database here.
	-o	:	Output. Put the name or path and name to your output location here.
	-p	:	Program. Currently only confirmed to work with tblastn 
			but others should work too.
	-t	:	Threads. Number of threads/processors you want the blast analysis to 
			run on. (Default: 1)
	-l	:	Long. Puts additional information like stop, start
			and name of query, frame, bitscore; in the notes field
			of the gff file. 
			This will increase the result file size significantly.
    -k  :   Keep. This will keep the intermediate blast output. Otherwise
            it wil

In [5]:
!head -n 20 blast_to_gff.py

#!/usr/bin/python

from sys import argv
import csv

"""
Converts minimal (3 field or more) tab separated BED/blast result files into minimal (9 field)
tab separated GFF files.
Usage: blast_to_gff.py <BED-infile> <GFF-outfile>
By: Alvar Almstedt
"""

class Table(object):

    def __init__(self, input_file_name, output_file_name):
        self.input_file_name = input_file_name
        self.output_file_name = output_file_name

# This method reads the input blast result. For best results, input the blast flags in the same order as in the
# "fieldnames" list beneath.


In [None]:
./blast_to_gff_wrapper_AAlmstedt.sh -q ../fasta_inputs/KOR_batch6_FinalFiltered.fa \
-d ../ACod_reference \
-p tblastn \
-o ../blast_outputs/KORb6_BLASTto_Acod_forIGV.tblastn \
-l -k

`Warning: [tblastn] lcl|Query_563 2245: Warning: Could not calculate ungapped Karlin-Altschul parameters due to an invalid query sequence or its translation. Please verify the query sequence(s) and/or filtering options 
Warning: [tblastn] lcl|Query_564 2248: Warning: Could not calculate ungapped Karlin-Altschul parameters due to an invalid query sequence or its translation. Please verify the query sequence(s) and/or filtering options 
^C`


#### ATTEMPT #2


I'm going to use the function `mgkit.io.blast.parse_uniprot_blast()`. In order to do so, I had to first install MGKit on my VM -- see directions [here](http://pythonhosted.org/mgkit/install.html#install-ubuntu)

In [None]:
# install mgkit
sudo apt-get install velvet bowtie2 python-pip python \
  virtualenv python-dev zlib1g-dev libblas-dev \
  liblapack-dev gfortran libfreetype6-dev libpng-dev \
  fontconfig pkg-config

In [None]:
pip install mgkit

I need to use the `-outfmt 6` in BLAST for their script to work, so I'm going to re-run blast. 

In [11]:
pwd

u'/mnt/hgfs/Pacific cod/DataAnalysis/PCod-Compare-repo/scripts'

In [12]:
cd ../

/mnt/hgfs/Pacific cod/DataAnalysis/PCod-Compare-repo


In [13]:
!blastn -query fasta_inputs/KOR_batch6_FinalFiltered.fa \
-db ACod_reference/Gadus_morhua \
-out blast_outputs/KORb6_BLASTto_Acod_outfmt6.fa \
-outfmt 6

In [20]:
import mgkit as mg

In [29]:
cd scripts

/mnt/hgfs/Pacific cod/DataAnalysis/PCod-Compare-repo/scripts


In [30]:
!head blast2gff.py

"""
Blast output conversion in GFF requires a BLAST+ tabular format which can be
obtained by using the `--outfmt 6` option with the default columns, as
specified in :func:`mgkit.io.blast.parse_blast_tab`. The script can get data
from the standard in and ouputs GFF lines on the standard output by default.

Uniprot
*******

The Function :func:`mgkit.io.blast.parse_uniprot_blast` is used, which filters


In [31]:
!python blast2gff.py blastdb [-v | --quiet] [--cite] [--manual] [--version] [KORb6_BLASTto_Acod_outfmt6.fa] [KORb6_BLASTto_Acod.gff]

/bin/sh: 1: --quiet]: not found
Traceback (most recent call last):
  File "blast2gff.py", line 55, in <module>
    from .. import logger
ValueError: Attempted relative import in non-package
