In [11]:
import requests
import random
import pandas as pd
import uuid
import json

In [23]:
# This is a useful trick to explore API responses from https://github.com/caldwell/renderjson
from IPython.display import display_javascript, display_html, display

class RenderJSON(object):
    def __init__(self, json_data):
        if isinstance(json_data, dict):
            self.json_str = json.dumps(json_data)
        else:
            self.json_str = json
        self.uuid = str(uuid.uuid4())

    def _ipython_display_(self):
        display_html('<div id="{}" style="height: 300px; width:100%;"></div>'.format(self.uuid),
            raw=True
        )
        display_javascript("""
        require(["https://rawgit.com/caldwell/renderjson/master/renderjson.js"], function() {
          document.getElementById('%s').appendChild(renderjson(%s))
        });
        """ % (self.uuid, self.json_str), raw=True)


# Find what's known for a list of genes

> eg. Maria L in the fly group routinely screens lists of human genes for the presence of evidence in neurological diseases. She wants to understand a bit about the biology of these genes. She looks at open targets to find what is known about the target and to which disease it has already been connected.

Compiling evidence relative to a list of genes can be a time-consuming and tedious task. Now the process can be automated, thanks to the targetvalidation.org API.

This tutorial illustrates how to compile all the evidence contained in targetvalidation.org for a list of genes into a compact representation (eg. a spreadsheet) using python. 
You might have a list of genes being generated from an experiment or a genetic screen, or perhaps you are following a particular set of targets and would like to maintain an updated look at what evidence has been compiled around them.
The process can be useful to reduce bias and increase success rates when prioritizing a list of genes for experimental follow-up. 

For this tutorial, we will use 20 random genes taken from the HGNC catalog. First, let's download the full catalog and create a list of symbols only.

In [3]:
%%capture
%%bash
[ -f ../data/hgnc_complete_set.txt.gz ] || wget --directory-prefix=../data/ http://ftp.ebi.ac.uk/pub/databases/genenames/hgnc_complete_set.txt.gz 
[ -f ../data/hgnc_complete_set.txt ] || gunzip ../data/hgnc_complete_set.txt.gz 
tail -n +2 ../data/hgnc_complete_set.txt | cut -f2 | grep -v withdrawn > ../data/hgnc_symbol_set.txt

To extract 20 random gene names:

In [1]:
import random
with open('hgnc_symbols.txt') as f:
    genes = [line.rstrip() for line in f]
random.shuffle(genes)
genes[:20]

['CACYBP',
 'CST4',
 'DUXA',
 'EIF4A1P10',
 'C6orf201',
 'ECSIT',
 'CC2D1B',
 'CFAP161',
 'FTO',
 'BCL2L15',
 'CDC27P3',
 'FREM1',
 'GOLIM4',
 'FER1L6-AS2',
 'ADCK5',
 'FAM168A',
 'COX6C',
 'C17orf96',
 'FAM151A',
 'AOC4P']

## how many disease in each therapeutic area are associated with each target

In [4]:
API = 'https://www.targetvalidation.org/api/1.1'

In [7]:
r = requests.post(API + '/public/association/filter', json = {'target':['ENSG00000157764','ENSG00000142168']})

In [24]:
RenderJSON(r.json())


* displays list of top 3 diseases, name and Association Score.  Also shows number of disease associations

* 
