# Use Microsoft Azure Genomics Data Lake with IGV-Jupyter extension

Jupyter notebook is a great tool for data scientists who is working on Genomics data analysis. We will demonstrate Azure Jupyter notebook usage via  'Integrative Genomics Viewer Jupyter Extension (igv-jupyter) with Microsoft Azure Genomics Data Lake files.

**Here is the coverage of this notebook:**

1. Download the  data from Azure Genomics Data Lake
2. Cloning the igv-jupyter extension repo
3. igv-jupyter extension installation and sample submissions

**Dependencies:**

This notebook requires the following libraries:

- Azure storage `pip install azure-storage-blob==2.1.0`. Please visit [this page](https://github.com/Azure/azure-storage-python/wiki) for frequently encountered problem for this SDK.

- IGV: Integrative Genomics Viewer Jupyter Extension (*We have used the sample codes from igv-jupyter sample notebooks: https://github.com/igvteam/igv-jupyter, https://pypi.org/project/igv-jupyter/*)

- Technical note: [Explore Azure Genomics Data Lake with Azure Storage Explorer](https://github.com/microsoft/genomicsnotebook/blob/main/docs/Genomics_Data_Lake_Azure_Storage_Explorer.pdf)

- Requirements:

    `python >= 3.6.4`

    `jupyterlab >= 3.0`

**Important information: This notebook should be executed on Jupyter Lab Version 3.0 or higher. Users can install Jupyter Lab with `pip install jupyterlab==3.0` command.**


# 1. Download sample VCF data from Broad Institute's GATK Test Data on Azure Genomics Data Lake

Several public genomics data has been uploaded as an Azure Open Dataset [here](https://azure.microsoft.com/services/open-datasets/catalog/). We create a blob service linked to this open datasets. Than, users can use IGV browser from Jupyter environment. We recommend to use Azure Machine Learning Studio for Jupyter Lab environment.

**1.a.Install Azure Blob Storage SDK**

In [None]:
pip install azure-storage-blob==2.1.0

**1.b.Download the sample VCF and .tbi file from Microsoft Genomics Data Lake**

In [None]:
import os
import uuid
import sys
from azure.storage.blob import BlockBlobService, PublicAccess

blob_service_client = BlockBlobService(account_name='datasetgatktestdata', sas_token='sv=2020-04-08&si=prod&sr=c&sig=fzLts1Q2vKjuvR7g50vE4HteEHBxTcJbNvf%2FZCeDMO4%3D')     
blob_service_client.get_blob_to_path('dataset/1kgp/downsampled_vcf_hg38', '1kgp-50-exomes.vcf.gz', './1kgp-50-exomes.vcf.gz')

In [None]:
import os
import uuid
import sys
from azure.storage.blob import BlockBlobService, PublicAccess

blob_service_client = BlockBlobService(account_name='datasetgatktestdata', sas_token='sv=2020-04-08&si=prod&sr=c&sig=fzLts1Q2vKjuvR7g50vE4HteEHBxTcJbNvf%2FZCeDMO4%3D')     
blob_service_client.get_blob_to_path('dataset/1kgp/downsampled_vcf_hg38', '1kgp-50-exomes.vcf.gz.tbi', './1kgp-50-exomes.vcf.gz.tbi')

## 2.igv-jupyter extension: sample submissions

In [None]:
pip install igv-jupyter

In [None]:
import igv_notebook

igv_notebook.init()

b = igv_notebook.Browser(
    {
        "genome": "hg38",
        "locus": "chr22",
        "tracks": [
            {
                "url": "1kgp-50-exomes.vcf.gz",
                "indexURL": "1kgp-50-exomes.vcf.gz.tbi",
                "name": "Color by table, SVTYPE",
                "visibilityWindow": -1,
                "colorBy": "SVTYPE",
                "colorTable": {
                    "DEL": "#ff2101",
                    "INS": "#001888",
                    "DUP": "#028401",
                    "INV": "#008688",
                    "CNV": "#8931ff",
                    "BND": "#891100",
                    "*": "#002eff"
                }
            }]
    })

# References

1. IGV-Jupyter:  https://github.com/igvteam/igv-jupyter
2. IGV-Jupyter project: https://pypi.org/project/igv-jupyter/
3. 1000 Genomes Project: https://www.internationalgenome.org/

## Notices

THIS NOTEBOOK HAS JUST A SAMPLE CODES. MICROSOFT DOES NOT CLAIM ANY OWNERSHIP ON THESE CODES AND LIBRARIES. MICROSOFT PROVIDES THIS NOTEBOOK AND SAMPLE USE OF igv-jupyter LIBRARIES ON AN “AS IS” BASIS. DATA OR ANY MATERIAL ON THIS NOTEBOOK. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, GUARANTEES OR CONDITIONS WITH RESPECT TO YOUR USE OF THIS NOTEBOOK. TO THE EXTENT PERMITTED UNDER YOUR LOCAL LAW, MICROSOFT DISCLAIMS ALL LIABILITY FOR ANY DAMAGES OR LOSSES, INCLUDING DIRECT, CONSEQUENTIAL, SPECIAL, INDIRECT, INCIDENTAL OR PUNITIVE, RESULTING FROM YOUR USE OF THIS NOTEBOOK.