## Understudied Bubble Chart Creator

In order to create a potent visualization, we have to create JavaScript-intelligible objects. Here we do so with some help from Python's Pandas module and some list comprehensions.


#### Instructions -- altering the JSON Inputs section will change what shows up in the final JSON

In [1]:
# imports -- not all are used
import numpy as np
import pandas as pd
import json
import functools as ft
from collections import Counter
import itertools as it
import matplotlib.pyplot as plt
%matplotlib inline

## Import the data

We read the data from the /data/ directory.

In [2]:
# read csv file into a DataFrame
understudied = pd.read_csv('../data/dark_kinases.csv')
understudied.head()

Unnamed: 0,hgnc_id,symbol,ensembl_gene_id,class,name,uniprot_ids,kinase_com_name
0,HGNC:19038,ADCK1,ENSG00000063761,Dark,aarF domain containing kinase 1,Q86TW2,ADCK1
1,HGNC:19039,ADCK2,ENSG00000133597,Dark,aarF domain containing kinase 2,Q7Z695,ADCK2
2,HGNC:21738,ADCK5,ENSG00000173137,Dark,aarF domain containing kinase 5,Q3MIX3,ADCK5
3,HGNC:20565,ALPK2,ENSG00000198796,Dark,alpha kinase 2,Q86TB3,AlphaK2
4,HGNC:17574,ALPK3,ENSG00000136383,Dark,alpha kinase 3,Q96L96,AlphaK1


In [3]:
# read csv file into a DataFrame
kin_classes = pd.read_csv('../data/Table_005_IDG_dark_kinome.csv')
kin_classes.head()

Unnamed: 0,gene_id,gene_symbol,name_nih,name,atp_binder,pharos_designation,pubmed_citation_2017,comment,tier,justification
0,57143,ADCK1,,aarF domain containing kinase 1,,Tdark,14.0,,,Kinome subnetwork integration undefined.
1,90956,ADCK2,ADCK2,aarF domain containing kinase 2,Y,Tbio,,,3.0,
2,203054,ADCK5,ADCK5,aarF domain containing kinase 5,Y,Tdark,,,3.0,
3,115701,ALPK2,ALPK2,alpha kinase 2,?,Tdark,,,3.0,
4,57538,ALPK3,ALPK3,alpha kinase 3,?,Tdark,,,3.0,


In [4]:
# left join the pharos_designation into matching kinases (161 vs 162 present)
understudied['pharos_designation'] = understudied.set_index('symbol').join(other=kin_classes.set_index('gene_symbol')['pharos_designation'])['pharos_designation'].values

## Configure JSON inputs

Here we assemble several dictionaries to create the JSON with

In [5]:
# kin_list is the list of kinase symbols
kin_list = list(map(str, understudied['symbol'].tolist()))

# kin_com_names has common names, which are slightly longer than the labels
# kin_labels stores this in a dict object mapping {kin_list:kin_common_names}
kin_com_names = list(map(str, understudied['kinase_com_name'].tolist()))
kin_labels = {k:n for k,n in zip(kin_list, kin_com_names)}

# full names and descriptors -- full names aren't currently used in the visualization
kin_full_names = list(map(str, understudied['name'].tolist()))
kin_descriptors = {k:n for k,n in zip(kin_list, kin_full_names)}

sizes = {k:5 for k in kin_list}
large_list = ['PKMYT1', 'TLK2', 'BRSK2', 'CDK12', 'CDK13']
med_list = ['STK3', 'PIP5K1A', 'NEK7', 'ICK', 'NRBP2']
sizes.update({k:60 for k in large_list})
sizes.update({k:30 for k in med_list})

## add in random classes -- this controls the color!
class_name_list = list(understudied['pharos_designation'].unique()) # a string is a list of characters
classes = understudied['pharos_designation'].map({x:y for y,x in enumerate(class_name_list)}) # this is a numeric array of size (len(kin_list,))
kin_arr = np.array(kin_list) # handy to have the kinases in a numpy array

## Build the JSON

This combines the outputs of previous steps to create the JSON object

In [6]:
# str_out is a JSON-formatted string from the json_out dictionary
# the 'replace' steps make the file more human-readable for debugging, etc
json_out = {"name":"viz", "children":[{"name":class_name_list[c], "children":[{"name":k, "label":kin_labels[k], "desc":kin_descriptors[k], "size":sizes[k]} for k in kin_arr[classes == c].tolist()]} for c in np.unique(classes).tolist()]}
str_out = json.dumps(json_out).replace("},", "}, \n").replace('[{', '[\n{\n').replace(']},', ']}\n,\n')

## Output

Write the JSON object to the 'vis.json' file

In [7]:
with open('../dist/viz.json', 'w') as f:
    f.write(str_out)