## Install Software
You may choose to create or load a virtual environment of your choice. The following command will install cwltool in your current working environment.

In [None]:
!pip install cwltool

## Download Input Files

In [None]:
!wget --no-clobber https://storage.googleapis.com/analysis-workflows-example-data/somatic_inputs/hla_and_brca_genes.fa \
https://storage.googleapis.com/analysis-workflows-example-data/somatic_inputs/hla_and_brca_genes.fa.amb \
https://storage.googleapis.com/analysis-workflows-example-data/somatic_inputs/hla_and_brca_genes.fa.ann \
https://storage.googleapis.com/analysis-workflows-example-data/somatic_inputs/hla_and_brca_genes.fa.bwt \
https://storage.googleapis.com/analysis-workflows-example-data/somatic_inputs/hla_and_brca_genes.fa.fai \
https://storage.googleapis.com/analysis-workflows-example-data/somatic_inputs/hla_and_brca_genes.fa.pac \
https://storage.googleapis.com/analysis-workflows-example-data/somatic_inputs/hla_and_brca_genes.fa.sa \
https://storage.googleapis.com/analysis-workflows-example-data/somatic_inputs/hla_and_brca_genes.dict \
https://storage.googleapis.com/analysis-workflows-example-data/somatic_inputs/hla_and_brca_genes_bait.interval_list \
https://storage.googleapis.com/analysis-workflows-example-data/somatic_inputs/hla_and_brca_genes_target.interval_list \
https://storage.googleapis.com/analysis-workflows-example-data/somatic_inputs/hla_and_brca_genes_known_indels.vcf.gz \
https://storage.googleapis.com/analysis-workflows-example-data/somatic_inputs/hla_and_brca_genes_known_indels.vcf.gz.tbi \
https://storage.googleapis.com/analysis-workflows-example-data/somatic_inputs/hla_and_brca_genes_mills.vcf.gz \
https://storage.googleapis.com/analysis-workflows-example-data/somatic_inputs/hla_and_brca_genes_mills.vcf.gz.tbi \
https://storage.googleapis.com/analysis-workflows-example-data/somatic_inputs/hla_and_brca_genes_dbsnp.vcf.gz \
https://storage.googleapis.com/analysis-workflows-example-data/somatic_inputs/hla_and_brca_genes_dbsnp.vcf.gz.tbi \
https://storage.googleapis.com/analysis-workflows-example-data/somatic_inputs/hla_and_brca_genes_omni.vcf.gz \
https://storage.googleapis.com/analysis-workflows-example-data/somatic_inputs/hla_and_brca_genes_omni.vcf.gz.tbi \
https://storage.googleapis.com/analysis-workflows-example-data/unaligned_subset_bams/normal/2895499331.bam \
https://storage.googleapis.com/analysis-workflows-example-data/unaligned_subset_bams/normal/2895499399.bam \
https://storage.googleapis.com/analysis-workflows-example-data/unaligned_subset_bams/tumor/2895499223.bam \
https://storage.googleapis.com/analysis-workflows-example-data/unaligned_subset_bams/tumor/2895499237.bam


## CWL Workflow

In [None]:
!git clone https://github.com/genome/analysis-workflows.git

## YAML Inputs

YAML format files are used to declare all of the input files, variables, parameters, etc. for the CWL workflow we inted to run.

For each sample, tumor and normal, create a YAML file using the text editor of your choice, ex. vim.

<details>
<summary>Click for normal_inputs.yaml</summary>

```yaml
---

bait_intervals:
  class: File
  path: hla_and_brca_genes_bait.interval_list

target_intervals:
  class: File
  path: hla_and_brca_genes_target.interval_list

sequence:
  - sequence:
      bam:
        class: File
        path: 2895499331.bam
    readgroup: "@RG\tID:2895499331\tPU:H7HY2CCXX.3\tSM:H_NJ-HCC1395-HCC1395_BL\tLB:H_NJ-HCC1395-HCC1395_BL-lg21-lib1\tPL:Illumina\tCN:WUGSC"
  - sequence:
      bam:
        class: File
        path: 2895499399.bam
    readgroup: "@RG\tID:2895499399\tPU:H7HY2CCXX.4\tSM:H_NJ-HCC1395-HCC1395_BL\tLB:H_NJ-HCC1395-HCC1395_BL-lg21-lib1\tPL:Illumina\tCN:WUGSC"

bqsr_known_sites:
- class: File
  path: hla_and_brca_genes_known_indels.vcf.gz
- class: File
  path: hla_and_brca_genes_mills.vcf.gz
- class: File
  path: hla_and_brca_genes_dbsnp.vcf.gz

omni_vcf:
  class: File
  path: hla_and_brca_genes_omni.vcf.gz

picard_metric_accumulation_level: LIBRARY

reference:
  class: File
  path: hla_and_brca_genes.fa

bqsr_intervals:
- chr6
- chr17

per_base_intervals:
- file:
    class: File
    path: hla_and_brca_genes_target.interval_list
  label: clinvar

per_target_intervals:
- file:
    class: File
    path: hla_and_brca_genes_target.interval_list
  label: acmg_genes

summary_intervals: []
```
 
</details>   

<details>
<summary>Click for tumor_inputs.yaml</summary>

```yaml
---

bait_intervals:
  class: File
  path: hla_and_brca_genes_bait.interval_list

target_intervals:
  class: File
  path: hla_and_brca_genes_target.interval_list

sequence:
  - sequence:
      bam:
        class: File
        path: 2895499223.bam
    readgroup: "@RG\tID:2895499223\tPU:H7HY2CCXX.3\tSM:H_NJ-HCC1395-HCC1395\tLB:H_NJ-HCC1395-HCC1395-lg24-lib1\tPL:Illumina\tCN:WUGSC"
  - sequence:
      bam:
        class: File
        path: 2895499237.bam
    readgroup: "@RG\tID:2895499237\tPU:H7HY2CCXX.4\tSM:H_NJ-HCC1395-HCC1395\tLB:H_NJ-HCC1395-HCC1395-lg24-lib1\tPL:Illumina\tCN:WUGSC"

bqsr_known_sites:
- class: File
  path: hla_and_brca_genes_known_indels.vcf.gz
- class: File
  path: hla_and_brca_genes_mills.vcf.gz
- class: File
  path: hla_and_brca_genes_dbsnp.vcf.gz

omni_vcf:
  class: File
  path: hla_and_brca_genes_omni.vcf.gz

picard_metric_accumulation_level: LIBRARY

reference:
  class: File
  path: hla_and_brca_genes.fa

bqsr_intervals:
- chr6
- chr17

per_base_intervals:
- file:
    class: File
    path: hla_and_brca_genes_target.interval_list
  label: clinvar

per_target_intervals:
- file:
    class: File
    path: hla_and_brca_genes_target.interval_list
  label: acmg_genes

summary_intervals: []
```
 
</details>   

# Execute Alignment Workflow

## Normal Alignment

In [None]:
!cwltool --outdir normal analysis-workflows/definitions/pipelines/alignment_exome.cwl normal_inputs.yaml

## Tumor Alignment

In [None]:
!cwltool --outdir tumor analysis-workflows/definitions/pipelines/alignment_exome.cwl tumor_inputs.yaml

Load the tumor/final.bam and normal/final.bam in IGV: ex. chr17:7,675,089