# Analyzing & Visualizing NextGen Reads with Magic-BLAST

This Jupyter Notebook contains the background and instructions for the hands-on exercises of this workshop:

* [Introduction](#Introduction)
* [Objective 1 - Identify variation in SARS-Cov-2 samples using MagicBlast + Sequence Viewer](#Objective-1) 
* [Objective 2 - Assess genome completeness and content from SRA read data](#Objective-2)
* [Objective 3 - Align RNAseq data to reference and visualize transcript coverage and diversity  ](#Objective-3)

# Introduction

**Placeholder Text from Webpage**: Next generation sequencing (next-gen) has become the standard method for obtaining genomic and transcriptomic sequences; however analysis of the large number of short reads can be complicated and requires specialized software.  NCBI has developed a variation of the BLAST algorithm, Magic-BLAST, to align next-gen reads to a reference nucleotide sequence.  You can import the read alignment results from Magic-BLAST into one of NCBI’s genome browsers to visualize them alongside your choice of hundreds of other NCBI annotation tracks.  Thus, without having to go through a sequence assembly step, you can quickly assess the sequence set for genetic variations and map potential gene annotations.

This workshop is designed for researchers already working with next-gen data and performing DNAseq or RNAseq experiments who use command-line tools for bioinformatic analysis

## **Case Studies**



# Objective 1 - Identify variation in SARS-Cov-2 samples using MagicBlast + Sequence Viewer <a class="anchor" id="Objective-1"></a>

## **Objective Goals**

1. Fill 
2. In 
3. Here

# Objective 2 - Assess genome completeness and content from SRA read data <a class="anchor" id="Objective-2"></a>

## **Objective Goals**

1. Fill 
2. In 
3. Here

# Objective 3 - Align RNAseq data to reference and visualize transcript coverage and diversity <a class="anchor" id="Objective-3"></a>

## **Objective Goals**

1. Identify an RNAseq SRA dataset of interest and use magic-BLAST to align it to a reference.
2. Use the Genome Data Viewer to compare RNAseq alignment to NCBI annotation tracks.
3. Visualize web-BLAST results in GDV to learn about our gene of interest in other species. 

**Goal 1a: Identify an RNAseq SRA dataset of interest on GEO**

### **Goal 1b: Use magic-BLAST to align chosen SRAs to reference sequence**

Above, we identified two RNAseq SRA data sets we would like to map to the region of the human genome containing the LMNA gene: SRR7062973 (our control sample), and SRR7062975 (our HGPS sample). 

**Step 1: Download the region of the current human genome assembly containing the LMNA gene**

Looking at the Gene Page for the LMNA gene, we can see that the LMNA gene occupies base pairs 156082573 to 156140081 on Human Chromosome 1. Because we want to learn about the genes in the surrounding region as well, we are also going to extract a larger surrounding region around the LMNA gene. The following `efetch` command will extract this region into a FASTA-formatted file. 

Here is a brief explanation relevant parts of this command that you can be modified:<br>
`-db`:<br>
`-id`:<br>
`-format`:<br> 
`-seq_start`:<br> 
`-seq_stop`:<br> 
`> NC_000001_LMNA_Region.fasta`:<br> 

In [2]:
efetch -db nuccore -id NC_000001 -format fasta -seq_start 156070000 -seq_stop 156150000 > NC_000001_LMNA_Region.fasta

**Step 2: Create LMNA region BLAST database**

Now that we have the region downloaded in FASTA format, we can use the BLAST utility `makeblastdb` to create a BLAST-formatted database from the file, using the following command: 

In [3]:
makeblastdb -in NC_000001_LMNA_Region.fasta -out Human_Chr1_Region -parse_seqids -dbtype nucl



Building a new DB, current time: 11/07/2022 16:12:19
New DB name:   /home/jupyter-sally.chang/Human_Chr1_Region
New DB title:  NC_000001_region.fa
Sequence type: Nucleotide
Keep MBits: T
Maximum file size: 3000000000B
Adding sequences from FASTA; added 1 sequences in 0.101484 seconds.




### **Goal 1c: Prepare magic-BLAST results for visualization**