# Sensitivity/Specificity analysis demo

First, we configure several variables we are using during this demo: project id, S3 profile and VM path to save data files from the S3 bucket: 

In [None]:
from cdispyutils.hmac4 import get_auth
import json
import requests
project = 'bpa-ThermoFisher_P0001_T1'
profile = 'bloodpac'
path    = 'files/'

### 1. Querying data from graphQL

These are some examples of queries to directly interact with the API to get some metadata:

In [None]:
with open('/home/ubuntu/.secrets','r') as f:
    secrets = json.load(f)
auth = get_auth(secrets['access_key'], secrets['secret_key'], 'submission')

**A) Query general API to get projects included in BloodPAC:**

In [None]:
print requests.get('http://kubenode.internal.io:30004/v0/submission/bpa/', auth=auth).text

**B) Example graphQL query:** List submitted aligned read files (BAMs) for a specific project:

In [None]:
data = {'query': """{sample (project_id: "bpa-ThermoFisher_P0001_T1") {   
                                 _aliquots_count 
                                 aliquots {  
                                     analytes{
                                         _read_groups_count 
                                         read_groups {
                                            _submitted_aligned_reads_files_count
                                            submitted_aligned_reads_files {
                                                file_name
                                            }
                                         } 
                                     }
                                }
                            }
                    } """};

print requests.post('http://kubenode.internal.io:30006/v0/submission/graphql/', auth=auth, json=data).text

### 2. Exploring metadata through a high level Python library

Import the Python library with functionality to explore metadata and make some data analyses. Also, add our authentication file to the library. 

In [None]:
import bpa_analysis_functions_v2 as bp
bp.add_keys('/home/ubuntu/.secrets')

First, we show some functions to explore information in the project's metadata: 

**A) List samples for this project:**

In [None]:
samples = bp.list_samples(project)
bp.arrayTable(samples)

**B) List files in one project by type:** 

In [None]:
vcf_files = bp.list_files_by_type(project,'VCF')
bp.SummaryTable(vcf_files)

**C) Get summary metrics for one field in a specific node:**

Get expected mutations from contrived_expectation metadata and gather them by chromosome:

In [None]:
summary = bp.query_summary_field("contrived_expectation", "expected_mutation_gene")

### 3. Data handle and analysis

Now, we use additional functions to show how we can analyze data from one specific project in BloodPAC containing expected mutations from contrived samples and resulting VCF files. 

**A) Transfer files from S3 bucket to the VPC virtual machine:**

In [None]:
bp.get_files_from_bucket(project, profile, path, '*.vcf')

**B) Calculate quality metrics (sensitivity/sensitivity)**:

In [None]:
sample = samples[1]
vcf_file = vcf_files[sample][8]
metrics = bp.calculate_metrics_vcf(project, path, vcf_file)
metrics

Calculate sensitivity and sensitivity values for all VCFs in the project:

In [None]:
table_metrics, data_metrics = bp.calculate_metrics_all_vcf(project, path, vcf_files, samples[0])
table_metrics

Remove potential germline mutations using a baseline VCF:

In [None]:
baseline_vcf = 'TFS.HZ.0.0.perc_20ng_TSVC_IonCodeTag_0101.vcf'
table_filter_metrics, data_filter_metrics = bp.calculate_metrics_all_vcf(project, path, vcf_files, samples[0], baseline_vcf)
table_filter_metrics

Some results are finally visualized in barplots:

In [None]:
bp.plot_metrics(data_metrics, data_filter_metrics)