# Assess the CIViC interface for Curation Effort and Gaps in Curation

#### Usage

This script evaluates the curation effort in CIViC by pulling in a list of genes that are implicated in biomarker panels and showing if there is curation in CIViC on these genes. It also evalutes if there are genes in CIViC that have extensive curation, but they are not implicated in the gene panels. Finally, it provides information on variants that require curation on the SOID to ensure that these variants are eligible for CIViC Capture Design.

#### Input Files:

1) threshold = minimum requirement for actionability score to be included in output files; default is 20 points.

2) panel_genes = an input file with genes and total number of assocaited panels

#### Output Files:

There are no output files!


In [4]:
##SET INPUT VALUES

threshold = str(20)

In [2]:
#!/usr/bin/env python3
import json
import numpy as np
import requests
import sys

In [9]:
##Pull in Data from JSON
variants_capture = requests.get('https://civic.genome.wustl.edu/api/panels/captureseq/qualifying_variants?minimum_score=',threshold).json()['records'] #Call eligible variants
variants_nanostring = requests.get('https://civic.genome.wustl.edu/api/panels/nanostring/qualifying_variants?minimum_score=',threshold).json()['records'] #Call eligible variants

In [11]:
##See how many genes are in panels but not eligible for CIViC
panel_genes = open('../../smMIPs_panel/input_files/panel_genes.txt', 'r') #open panel_genes
panel_genes_list = [] #create empty file for panel genes
for line in panel_genes: #iterate through panel genes
    line = line.strip('\n') #strip the new line
    line = line.split('\t') #split by tabs
    gene = line[0] #pull gene
    panel_genes_list.append(gene) #append to gene list
not_in_CIViC = [] #create empty list for genes that are not extensively curated
for item in panel_genes_list: #for item in panel list
    if item not in variant_list: #if the item is not in the civic list
        not_in_CIViC.append(item) #append it to the not in civic list
print('Number of genes in at Least 10 Panels is:', len(panel_genes_list)) #print the length of the panel Genes
print('Number of Genes Missing from CIViC is:', len(not_in_CIViC)) #print number of genes not in civic
print('List of Genes missing from CIViC ', not_in_CIViC)

Number of genes in at Least 10 Panels is: 161
Number of Genes Missing from CIViC is: 51
List of Genes missing from CIViC  ['APC', 'JAK3', 'MPL', 'SRC', 'HNF1A', 'MYC', 'CBL', 'BRIP1', 'MUTYH', 'PDGFRB', 'KMT2A', 'GATA1', 'MAP2K2', 'ETV6', 'PAX5', 'FGFR4', 'KDM6A', 'NBN', 'SDHB', 'CREBBP', 'BCL6', 'GATA3', 'SUFU', 'FLCN', 'SDHD', 'BMPR1A', 'AKT3', 'FH', 'EP300', 'NOTCH2', 'PRKAR1A', 'SDHC', 'IL7R', 'RARA', 'FLT4', 'FLT1', 'FANCA', 'BLM', 'RAD51C', 'CIC', 'PHOX2B', 'TGFBR2', 'CDC73', 'ZRSR2', 'MRE11A', 'PHF6', 'ETV1', 'NFE2L2', 'MAP2K4', 'PIK3R2', 'DICER1']


In [12]:
#See how many genes are in CIViC but are not on the 10 panel list
civic_only = [] #create list for civic only genes
for item in variant_list: #iterate through variant list
    if item not in panel_genes_list: #see if variant not in panel gene list
        if item not in civic_only: #see if it is not in the civic only list
            civic_only.append(item) #if it is not in civic only list add it
print('Number of genes in CIViC but not in 10 gene Panels is: ', len(civic_only))
print('Genes in CIViC but not in 10 panels: ', civic_only)

Number of genes in CIViC but not in 10 gene Panels is:  165
Genes in CIViC but not in 10 panels:  ['NRG1', 'SPHK1', 'CASP8', 'TIMP1', 'MGMT', 'B4GALT1', 'MMP9', 'ALDH1A2', 'HSPH1', 'MMP2', 'KIAA1524', 'DNMT1', 'VEGFA', 'DDX43', 'PBK', 'TGFA', 'CDKN1B', 'BIRC5', 'PDCD4', 'CDKN1A', 'IGF2', 'PTP4A3', 'ALCAM', 'AGR2', 'UGT1A', 'PRKAA2', 'CFLAR', 'TFF3', 'DUSP6', 'HSPA5', 'RPS6', 'EREG', 'STMN1', 'CBLC', 'NQO1', 'TBK1', 'PIM1', 'DKK1', 'ABCB1', 'ABCC10', 'DEFA1', 'EPHB4', 'PROM1', 'ERCC1', 'MAGEH1', 'CD44', 'MIR218-1', 'NCOA3', 'PGR', 'EGF', 'RIT1', 'TYMS', 'SLFN11', 'TOP2A', 'CXCR4', 'HSPB1', 'FOXP3', 'HMOX1', 'TUBB3', 'HAVCR2', 'CD274', 'NT5E', 'AREG', 'SYK', 'CDX2', 'ROBO4', 'SIRT1', 'THBS2', 'POU5F1', 'NEDD9', 'ZEB1', 'HLA-DRA', 'EPAS1', 'HIF1A', 'FGF13', 'KRT18', 'EIF4EBP1', 'PSMB8', 'PTTG1', 'LEPR', 'MKI67', 'JUN', 'STAG3', 'ETV4', 'GAS6', 'RAD23B', 'PAX8', 'HGF', 'FOS', 'SGK1', 'KIF23', 'FGF2', 'RRM1', 'RRM2', 'MERTK', 'ACTA1', 'PTGS2', 'CCND2', 'SSX4', 'FLI1', 'ERG', 'PML', 'CBFB', 

In [13]:
##Make sure that all variants have SOIDs
#Iterate through capture seq to pull soids
SOID_labels = requests.get('https://civic.genome.wustl.edu/api/panels?count=1000000').json()['CaptureSeq']['sequence_ontology_terms'] #pull API
SOID = {} #Create new dictionary to hold SOIDs in API
#Iterate through the API interface
for item in SOID_labels:
    if item['soid'] not in SOID: #Pull the SOIDs
        SOID[item['soid']] = [] #create new list if it is not already in SOID dictioanry
        SOID[item['soid']].append(item['name']) #Add to dictionary
#Iterate through nanostring to pull soids
SOID_labels = requests.get('https://civic.genome.wustl.edu/api/panels?count=1000000').json()['NanoString']['sequence_ontology_terms']
for item in SOID_labels:#iterate through the variants
    if item['soid'] not in SOID: #If the soid is not already in the
        SOID[item['soid']] = [] #create holder
        SOID[item['soid']].append(item['name']) #add to the list

CIViC_SOID = [] #Create new list for all of the SOIDs that are in CIViC
no_SOID_in_CIViC = [] #Create new list for all of the variants that do not have a SOID term attached to it

#Pull all of the variants from the CIViC API
SOID_API = requests.get('https://civic.genome.wustl.edu/api/variants?count=1000000').json()['records']
for item in SOID_API: #iterate through the API
    if item['variant_types'] != []: #If the variant_type is there
        if item['variant_types'][0]['so_id'] not in CIViC_SOID: #and the soid is not in the CIViC SOID list
            CIViC_SOID.append([item['variant_types'][0]['so_id'], item['variant_types'][0]['display_name']]) #add it to the list
    if item['variant_types'] == []: #If the variant type has not been created yet
        if item['entrez_name'] not in no_SOID_in_CIViC: #and the gene name has not already been evaluated
            no_SOID_in_CIViC.append(item['entrez_name']) #Add the gene name to the 'not in civic' list

print('Number of genes without Variant Type (SO_id):', len(no_SOID_in_CIViC))
print('These genes are:', no_SOID_in_CIViC)

Number of genes without Variant Type (SO_id): 85
These genes are: ['APC', 'TP53', 'EGFR', 'AR', 'GNAQ', 'GNA11', 'SULT1E1', 'JAK2', 'BRCA1', 'CRBN', 'BRD4', 'MLH1', 'TYMS', 'CHEK2', 'NTRK1', 'RUNX1', 'ERBB2', 'HLA-C', 'FGFR3', 'FGFR2', 'ALK', 'BRAF', 'KIT', 'CREBBP', 'MTAP', 'GNAS', 'STK11', 'PDGFRA', 'NOTCH1', 'KRAS', 'ATM', 'NRAS', 'AEBP1', 'MDM2', 'MDM4', 'CTAG1B', 'CTAG2', 'SMO', 'NTRK3', 'DCC', 'ARID1A', 'AKT3', 'MYC', 'HRAS', 'ABL1', 'FBXW7', 'MTOR', 'FLT3', 'PTCH1', 'ROS1', 'POLD1', 'SCN8A', 'PXDNL', 'PAPPA2', 'POLE4', 'MET', 'PIK3R2', 'LYN', 'ESR1', 'RAF1', 'PSMD4', 'CDK6', 'ACVR1', 'RET', 'CHEK1', 'VHL', 'POLE', 'CIC', 'ETV1', 'ETV5', 'ATXN1L', 'SUFU', 'PIK3CA', 'CTNNB1', 'DICER1', 'EZH2', 'CX3CL1', 'NRG1', 'FGFR4', 'RAD50', 'BTK', 'PLCG2', 'TSC2', 'TSPYL1', 'ERRFI1']


In [14]:
##Determine if you need to add soid labels to your list on the API
SOID_labels = requests.get('https://civic.genome.wustl.edu/api/panels?count=1000000').json()['unbinned_terms']
if len(SOID_labels) == 0:
    print('There are no SO_ids that need to be added to the API!')
else:
    print('The Following SO_ids need to be added to the API')  # Header
    for item in SOID_labels:#iterate through the variants
        print(item['soid'] + ' - ' + item['name'])


There are no SO_ids that need to be added to the API!
