# NVIDIA Clara Parabricks on Microsoft Azure 

NVIDIA introduced the Clara Parabricks software suite for performing analysis of NGS DNA and RNA data. It delivers results at blazing fast speeds and low cost. Clara Parabricks can analyze 30x WGS data in under 25 minutes on a single 8-GPU server, instead of 30 hours for traditional CPU-based methods. Its output matches commonly used software, making it simple to verify the accuracy of the results.

Clara Parabricks software provides at least an order of magnitude acceleration in compute time while generating identical outputs and reducing analysis costs. Clara Parabricks is available free on NVIDIA GPU Cloud (NGC) and can be easily deployed on Azure GPU based virtual machines (VM).

Clara Parabricks provides optimal performance for multiple Microsoft Azure instance types and can be used out of the box for essential bioinformatics needs. Currently, the Clara Parabricks accelerated analysis tools start from FASTQ files and perform alignment through variant calling and expression analysis, including QC tools for both types of outputs. The suite of tools can be used to support end-to-end workflows for germline, somatic and RNA-Seq pipelines, providing the flexibility to meet the individual needs of most projects. The tools can also be used individually, as drop-in replacements for steps in existing workflows.

You can learn more from this [link](https://www.nvidia.com/en-us/clara/genomics)




## The pre-requisites for running Parabricks 4.0 on Microsoft Azure

- An Azure subscription with Compute-VM (cores-vCPUs) quota allowing to create GPU based VMs (preferably NCas_T4_v3 and ND96asr_A100_v4)
- An NVIDIA driver greater than version 465.32.*
- Any Linux Operating System that supports nvidia-docker2 Docker version 20.10 (or higher)

To make sure you have **nvidia-docker2** installed, run this command:

In [None]:
!docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi

When it finishes downloading the container, it will run the nvidia-smi command and show you the same output as above. The Clara Parabricks Docker image can be obtained from NGC by running the following command (please check https://catalog.ngc.nvidia.com/orgs/nvidia/teams/clara/containers/clara-parabricks for the latest version):

In [None]:
!docker pull nvcr.io/nvidia/clara/clara-parabricks:4.0.0-1

## Sample Run- 'fq2bam' pipeline with Clara Parabricks

In [None]:
! docker run \
	--gpus all \
	--rm \
	--volume /host/data:/input_data \
	--volume /host/results:/outputdir \
	--workdir /image/input_data \
    nvcr.io/nvidia/clara/clara-parabricks:4.0.0-1 \
	pbrun fq2bam \
	--ref /input_data/Homo_sapiens_assembly38.fasta \
	--in-fq /input_data/fastq1.gz /input_data/fastq2.gz \
	--out-bam /image/outputdir/fq2bam_output.bam

### Download reference file

In [None]:
! wget -O parabricks_sample.tar.gz https://datasettoaexample.blob.core.windows.net/publicsample/parabricks_sample.tar.gz

In [None]:
! tar xzvf parabricks_sample.tar.gz

### Download Sample fastq paired-end data

In [None]:
!wget https://datasettoaexample.blob.core.windows.net/publicsample/HG001.novaseq.pcr-free.30x.R1.fastq.gz

In [None]:
!wget https://datasettoaexample.blob.core.windows.net/publicsample/HG001.novaseq.pcr-free.30x.R2.fastq.gz

### `fq2bam` pipeline submission to Clara Parabricks 

In [None]:
! sudo time -v docker run --gpus all -v /data:/parabricks nvcr.io/nvidia/clara/clara-parabricks:4.0.0-1 pbrun germline \
--ref /parabricks/parabricks_sample/Ref/Homo_sapiens_assembly38.fasta \
--in-fq /parabricks/HG002-NA24385-pFDA_S2_L002_R1_001-30x.fastq.gz /parabricks/HG002-NA24385-pFDA_S2_L002_R2_001-30x.fastq.gz \
--knownSites /parabricks/parabricks_sample/Ref/Homo_sapiens_assembly38.known_indels.vcf.gz --out-bam /parabricks/output.bam \
--out-variants /parabricks/output.vcf \
--out-recal-file /parabricks/report.txt \
--run-partition --no-alt-contigs |& tee germline_30x_4gpu.txt.

### Notices

Third party software notices from [NVIDIA CLARA PARABRICKS](https://docs.nvidia.com/clara/parabricks/v3.5/text/software_notices.html)

THE NOTEBOOK THIS PROJECT JUST PROVIDES A SAMPLE CODES FOR EDUCATIONAL PURPOSES. MICROSOFT DOES NOT CLAIM ANY OWNERSHIP ON THESE CODES AND LIBRARIES. MICROSOFT PROVIDES THIS NOTEBOOK AND SAMPLE USE OF NVIDIA Clara™ Parabricks® codes ON AN “AS IS” BASIS. DATA OR ANY MATERIAL ON THIS NOTEBOOK. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, GUARANTEES OR CONDITIONS WITH RESPECT TO YOUR USE OF THIS NOTEBOOK. TO THE EXTENT PERMITTED UNDER YOUR LOCAL LAW, MICROSOFT DISCLAIMS ALL LIABILITY FOR ANY DAMAGES OR LOSSES, INCLUDING DIRECT, CONSEQUENTIAL, SPECIAL, INDIRECT, INCIDENTAL OR PUNITIVE, RESULTING FROM YOUR USE OF THIS NOTEBOOK.

### Support

For questions about this notebook: Please send an e-mail to genomics@microsoft.com

For other questions about NVIDIA Clara Parabricks [Developer forum of NVIDIA Clara Parabricks](https://forums.developer.nvidia.com/c/healthcare/parabricks/290)