# Sensitivity/Specificity analysis demo

First, we configure several variables we are using during this demo: project id, S3 profile and VM path to save data files from the S3 bucket: 

In [None]:
from cdispyutils.hmac4 import get_auth
import json
import requests
project = 'bpa-ThermoFisher_P0001_T1'
profile = 'bloodpac'
path    = 'files/'

These are some examples of queries to directly interact with the API to get some metadata:

In [None]:
with open('/home/ubuntu/.secrets','r') as f:
    secrets = json.load(f)
auth = get_auth(secrets['access_key'], secrets['secret_key'], 'submission')

In [None]:
print requests.get('http://kubenode.internal.io:30004/v0/submission', auth=auth).text
print requests.get('http://kubenode.internal.io:30004/v0/submission/bpa/', auth=auth).text

In [None]:
data = {'query': """query Test {sample (project_id: "bpa-ThermoFisher_P0001_T1", submitter_id: "BPA-THERMOFISHER_P0001-S1") {   
                                 _aliquots_count 
                                 aliquots { 
                                     aliquot_concentration  
                                     _read_groups_count 
                                     read_groups {
                                        _submitted_somatic_mutations_count 
                                        submitted_somatic_mutations {
                                            file_name
                                        }
                                     } 
                                 } 
                                }
                            } """};
print requests.post('http://kubenode.internal.io:30004/v0/submission/graphql/', auth=auth, json=data).text

Import the Python library with functionality to explore metadata and make some data analyses. Also, add our authentication file to the library. 

In [None]:
import bpa_analysis_functions_v2 as bp
bp.add_keys('/home/ubuntu/.secrets')

First, we show some functions to explore information in the project's metadata. For instance, list samples for this project:

In [None]:
samples = bp.list_samples(project)
samples

List and count file types contained in this project

In [None]:
vcf_files = bp.list_files_by_type(project,'VCF', samples[1])
vcf_files

In [None]:
counts = bp.count_file_types(project, samples[1])
counts

Get expected mutations from sample-expectation metadata and gather them by chromosome:

In [None]:
summary = bp.query_summary_field("sample_expectation", "expected_mutation_gene")
summary

Now, we use additional functions to show how we can analyze data from one specific project in BloodPAC containing expected mutations from contrived samples and resulting VCF files. 

Transfer VCF files from S3 bucket to our VM directory:

In [None]:
bp.get_files_from_bucket(project, profile, path, '*.vcf')

Calculate sensitivity and sensitivity values for one example VCF:

In [None]:
sample = samples[1]
vcf_file = vcf_files[sample][2]
metrics = bp.calculate_metrics_vcf(project, path, vcf_file, sample)
metrics

Calculate sensitivity and sensitivity values for all VCFs in the project:

In [None]:
table_metrics, data_metrics = bp.calculate_metrics_all_vcf(project, path, vcf_files)
table_metrics

In [None]:
baseline_vcf = 'TFS.HZ.0.0.perc_20ng_TSVC_IonCodeTag_0101.vcf'
table_filter_metrics, data_filter_metrics = bp.calculate_metrics_all_vcf(project, path, vcf_files, baseline_vcf)
table_filter_metrics

We finally show how results can be easily visualized in simple plots:

In [None]:
bp.plot_metrics(data_metrics, data_filter_metrics)