# ClinVar exploratory analysis notebook - ASD demo 

### <font color=blue> Instructions for use:</font>  
    1. Copy & rename this Jupyter Notebook
    2. In the new notebook: Update the USER supplied variables in the first code cell  
    3. In the new notebook: Run ALL cells  

##### USER supplied variables

In [None]:
## REQUIRED: path to clinvar_workflow package
CV_WKFLW_PKG_PATH = '..'

################## FileIO ##########################################################
## REQUIRED: input variant FILE - relative or absolute path
VAR_FILE = 'demo_input_variant_files/demo_variants_ASD_hg19.txt'

## REQUIRED: output file DIRECTORY - relative or absolute path
OUT_DIR = 'demo_output'

## REQUIRED: prefix for the outputs (default = '')
OUT_PREFIX = 'demo_ASD'

################## update values based on YOUR current input file ##################
## REQUIRED: Genome build: hg19 or hg38 (default = hg19)
BUILD = 'hg19'

## REQUIRED: 4 variant columns - the column order *must* remain the same* but the column names can change
COLS_VAR = ['CHR', 'POS', 'REF', 'ALT']

## optional: list of additional input columns to include in output DF
COLS_INPUT = ['VAR_FUNCTION', 'VAR_CATEGORY', 'STUDY_TYPE', 'VAR_VALIDATION']


##### Imports

In [2]:
from __future__ import print_function
import pandas as pd

import os, sys
sys.path.insert(0, os.path.abspath(CV_WKFLW_PKG_PATH))

## Custom ClinVar query workflow module
from clinvar_workflow.workflows import exploratory_analysis_workflow as cv
from clinvar_workflow.vizualization import viz_jupyter as nb

## Jupyter & ipywidgets
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"
from ipywidgets import HBox, VBox, AppLayout, HTML, Tab

Invoking __init__.py for clinvar_workflow
Invoking __init__.py for clinvar_workflow.helpers
Invoking __init__.py for clinvar_workflow.workflows
Invoking __init__.py for clinvar_workflow.query_clinvar
Invoking __init__.py for clinvar_workflow.vizualization


In [3]:
%%html
<style>.css_widgets {font-size:150%}


## Run ClinVar Exploratory Analysis

In [4]:
results = cv.run_clinvar_exploratory_analysis(var_file=VAR_FILE, 
                                              out_dir=OUT_DIR, 
                                              out_prefix=OUT_PREFIX, 
                                              build=BUILD, 
                                              cols_var=COLS_VAR, 
                                              cols_input=COLS_INPUT)

results.keys()


Step 1: verify & process user inputs
	.. User specified variant input file exists
	.. Input variant file contains the specified variant columns
	.. User specified output directory exists & is writable


Step 2: run MyVariant ClinVar query
	.. run ClinVar query
querying 1-1000...done.
querying 1001-1065...done.



pandas.io.json.json_normalize is deprecated, use pandas.json_normalize instead



	.. ClinVar query data wrangling


Step 3: process MyVariant ClinVar query results
	.. generating variant ClinVar summary (1 row per variant)
	.. adding input columns to variant summary DF --> update UNREPORTED variants
	.. adding aggregation stats, Boolean indicator columns & FLAG columns to CV variant summary DF
	.. adding variant summary DF columns to full ClinVar DF


Step 4: run Exploratory analysis
	.. converting DF for analyses & visualization
	.. generate dataset summary
	.. identify pathogenic variants
	.. add data viz & pathogenic variant DataFrames to result dictionary
	.. starting Clinical Significance exploratory analyses
		.. performing Variant Clinical Significance analysis
		.. performing Variant-Condition (RCV) Clinical Significance analysis
		.. performing Gene-based analysis
		.. performing Condition-based analysis
		.. assembling Clinical Significance analysis results


Step 5: write output files
	.. Writing ClinVar Variant summary
	.. Writing ClinVar Variant full d

dict_keys(['cv_var_summary_df', 'cv_full_df', 'input_df', 'viz_df', 'data_summary_df', 'patho_var_df', 'patho_var_detail_df', 'clinsig_var', 'clinsig_rcv', 'clinsig_var_gene', 'clinsig_var_cond'])

## Summary of current ClinVar query 

In [5]:
results['data_summary_df']

## Identify ClinVar pathogenic variants

In [6]:
results['patho_var_df']

Unnamed: 0,CHR,POS,REF,ALT,hgvs_id.hg19,rsid,gene.symbol,clinical_significance,clinical_significance.rcv.set,conditions.name,conditions.name.set,conditions.synonyms,patho_cond.set,patho_cond.nuniq,patho_cond.%,_.patho_ALL_cond,_.patho_ANY_cond,preferred_name,hg19.start,hg19.end,hg38.start,hg38.end
0,1,24664177,C,T,chr1:g.24664177C>T,rs886037767,GRHL3,Pathogenic,{Pathogenic},nonsyndromic cleft palate,{nonsyndromic cleft palate},,{nonsyndromic cleft palate},1.0,1.000,True,True,NM_198173.3(GRHL3):c.738C>T (p.Gly246=),24664177,24664177,24337687,24337687
1,1,24664177,C,T,chr1:g.24664177C>T,rs886037767,GRHL3,Pathogenic,{Pathogenic},nonsyndromic cleft palate,{nonsyndromic cleft palate},,{nonsyndromic cleft palate},1.0,1.000,True,True,NM_198173.3(GRHL3):c.738C>T (p.Gly246=),24664177,24664177,24337687,24337687
2,1,24664177,C,T,chr1:g.24664177C>T,rs886037767,GRHL3,Pathogenic,{Pathogenic},nonsyndromic cleft palate,{nonsyndromic cleft palate},,{nonsyndromic cleft palate},1.0,1.000,True,True,NM_198173.3(GRHL3):c.738C>T (p.Gly246=),24664177,24664177,24337687,24337687
3,1,24664177,C,T,chr1:g.24664177C>T,rs886037767,GRHL3,Pathogenic,{Pathogenic},nonsyndromic cleft palate,{nonsyndromic cleft palate},,{nonsyndromic cleft palate},1.0,1.000,True,True,NM_198173.3(GRHL3):c.738C>T (p.Gly246=),24664177,24664177,24337687,24337687
4,1,24664177,C,T,chr1:g.24664177C>T,rs886037767,GRHL3,Pathogenic,{Pathogenic},nonsyndromic cleft palate,{nonsyndromic cleft palate},,{nonsyndromic cleft palate},1.0,1.000,True,True,NM_198173.3(GRHL3):c.738C>T (p.Gly246=),24664177,24664177,24337687,24337687
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
324,X,153296471,G,A,chrX:g.153296471G>A,rs61750240,MECP2,Pathogenic,{Pathogenic},"[Autism, susceptibility to, X-linked 3 (AUTSX3...","{not provided, Mental retardation, X-linked, s...","['Austism susceptibility, X-linked'], ['Autism...","{not provided, Mental retardation, X-linked, s...",8.0,1.000,True,True,NM_001110792.2(MECP2):c.844C>T (p.Arg282Ter),153296471,153296471,154031020,154031020
325,X,153296516,G,A,chrX:g.153296516G>A,rs61749721,MECP2,Conflicting interpretations of pathogenicity,"{Pathogenic, Pathogenic/Likely pathogenic}","[Autism, susceptibility to, X-linked 3 (AUTSX3...","{not provided, Mental retardation, X-linked, s...","['AllHighlyPenetrant'], ['Austism susceptibili...","{not provided, Mental retardation, X-linked, s...",8.0,1.000,True,True,NM_001110792.2(MECP2):c.799C>T (p.Arg267Ter),153296516,153296516,154031065,154031065
326,X,153296516,G,A,chrX:g.153296516G>A,rs61749721,MECP2,Conflicting interpretations of pathogenicity,"{Pathogenic, Pathogenic/Likely pathogenic}","[Autism, susceptibility to, X-linked 3 (AUTSX3...","{not provided, Mental retardation, X-linked, s...","['AllHighlyPenetrant'], ['Austism susceptibili...","{not provided, Mental retardation, X-linked, s...",8.0,1.000,True,True,NM_001110792.2(MECP2):c.799C>T (p.Arg267Ter),153296516,153296516,154031065,154031065
327,X,153296605,G,C,chrX:g.153296605G>C,rs61749715,MECP2,Pathogenic,"{Pathogenic, Pathogenic/Likely pathogenic}","[Absent speech, Developmental regression, Irre...","{not provided, Irregular respiration, Developm...","['Absent speech development', 'Lack of languag...","{not provided, Irregular respiration, Developm...",7.0,0.875,True,True,NM_001110792.2(MECP2):c.710C>G (p.Pro237Arg),153296605,153296605,154031154,154031154


## Display ClinVar Clinical Significance exploratory analysis & visualization 

<div class="alert alert-info">


**_ClinVar Variant vs RCV:_**   
- The ClinVar accession (RCV) is based on a variant-condition(s) combination, not the variant alone   
- Some variants have more than one RCV because the variant has been reported for multiple distinct disorders 

For more information about ClinVar RCV, see ClinVar FAQ:  
https://www.ncbi.nlm.nih.gov/clinvar/docs/faq/#accs  
https://www.ncbi.nlm.nih.gov/clinvar/docs/faq/#var_rcv  


</div>



**Note - *RCV* clinical significance:**   
   - the # of variants is **_NOT DISTINCT_**!   
   - some variants have >1 unique RCV clinsig classification


In [7]:
results, display = nb.display_clinsig_exploratory_analysis(results)
display_results = nb.display_css(display, 'css_widgets')

display_results

		.. generating Variant Clinical Significance Plotly Table
		.. generating Variant-Condition (RCV) Clinical Significance Plotly Table
		.. generating Gene Variant Clinical Significance Plotly Table
		.. generating Condition Variant Clinical Significance Plotly Table
		.. generating containers to display widgets
		.. assembling results to display


VBox(children=(HBox(children=(Accordion(children=(HBox(children=(FigureWidget({
    'data': [{'hole': 0.45,
  …