# Query

What genes and pathways are uniquely targeted by HSCT conditioning drugs that are well- vs poorly- tolerated by FA patients?

# Workflow

**Input**: [Two HSCT conditioning drug sets: (1) well-tolerated by FA patients (Set1d); (2) poorly-tolerated by FA patients (Set2d)](#input)

**Step 1.** [Retrieve proteins targeted by set of well-tolerated HSCT conditioning drugs --> Set1p](#step1)

**Step 2.** [Retrieve proteins targeted by set of poorly-tolerated HSCT conditioning drugs --> Set2p](#step2)

**Step 3.** [Retrieve genes encoding proteins in Set1p vs Set2p --> Set1g, Set2g](#step3)

**Step 4.** [Retrieve pathways associated with genes in Set1g vs Set2g --> Set1pw, Set2pw](#step4)

**Step 5.** [Retreive other genes involved in pathways in Set1pw vs Set2pw --> Set1g', Set2g'](#step5)

**Step 6.** [Execute set comparison analysis to return the set of genes that is uniquely targetd by poorly tolerated drugs (i.e. effected directly or indirectly by poorly tolerated drugs, but not affected by well-tolerated drugs)](#step6)

**Output**: Set of genes that may be uniquely targeted by pre-conditioning drugs that are poorly tolerated by FA patients.

# Required packages

1. [**biothings-client**](https://pypi.python.org/pypi/biothings-client) (pip install biothings-client)
2. [**biothings-explorer**](https://pypi.python.org/pypi/biothings-explorer/0.1.0) (pip install biothings-explorer)


**biothings_explorer** Python package is designed based on JSON-LD, and could be utilized to connect information from different biological entities, e.g. drug-protein-gene-pathway. Currently, it integrates API resource from MyGene.info, MyVariant.info, Drug and Compound API, etc.

**biothings_client** Python package is an easy-to-use Python wrapper to access any Biothings.api-based backend service.


# Demo


<a id='input'></a>
**Input**: Two HSCT conditioning drug sets: (1) well-tolerated by FA patients (Set1d); (2) poorly-tolerated by FA patients (Set2d)

In [30]:
'''
Assume well-tolerated drugs: Fludarabine, Carmustine
Assume poorly-tolerated drugs: Etoposide, Tacrolimus
'''
drug_set1 = ['Fludarabine', 'Carmustine']
drug_set2 = ['Etoposide', 'Tacrolimus']

In [31]:
'''
load biothings_client and biothings_explorer
'''
from biothings_client import get_client
from biothings_explorer import IdListHandler
md = get_client('drug')
ih = IdListHandler()

In [32]:
'''
Transform drug name to drugbank ID using biothings_client python package
'''
results_drug_set1 = md.querymany(drug_set1, scopes='drugbank.name', fields='drugbank.accession_number')
set1d = [_record['drugbank']['accession_number'] for _record in results_drug_set1]
print('Drugbank ID list for Set1: {}'.format(set1d))
results_drug_set2 = md.querymany(drug_set2, scopes='drugbank.name', fields='drugbank.accession_number')
set2d = [_record['drugbank']['accession_number'] for _record in results_drug_set2]
print('Drugbank ID list for Set2: {}'.format(set2d))

querying 1-2...done.
Finished.
Drugbank ID list for Set1: ['DB01073', 'DB00262']
querying 1-2...done.
Finished.
Drugbank ID list for Set2: ['DB00773', 'DB00864']


<a id='step1'></a>
**Step 1**. Retrieve proteins (*uniprot_id*) targeted by set of well-tolerated HSCT conditioning drugs --> Set1p

In [33]:
'''
Use IdListHandler to retrieve a list of Uniprot_IDs correponding to Drugbank_IDs for Drug Set 1
'''
set1p = ih.list_handler(input_id_list=set1d, input_type='drugbank_id', output_type='uniprot_id', relation='oban:is_Target_of')
print('Protein Uniprot IDs related to Drugs in Drug Set 1 is: {}'.format(set1p))

Protein Uniprot IDs related to Drugs in Drug Set 1 is: ['P09884', 'P00390', 'P23921', 'P27707']


<a id='step2'></a>
**Step 2.** Retrieve proteins targeted by set of well-tolerated HSCT conditioning drugs --> Set2p

In [34]:
'''
Use IdListHandler to retrieve a list of Uniprot_IDs correponding to Drugbank_IDs for Drug Set 2
'''
set2p = ih.list_handler(input_id_list=set2d, input_type='drugbank_id', output_type='uniprot_id', relation='oban:is_Target_of')
print('Protein Uniprot IDs related to Drugs in Drug Set 2 is: {}'.format(set2p))

Protein Uniprot IDs related to Drugs in Drug Set 2 is: ['P11388', 'P62942', 'Q02880']


<a id='step3'></a>
**Step 3**. Retrieve genes encoding proteins in Set1p vs Set2p --> Set1g, Set2g

In [35]:
'''
Use IdListHandler to retrieve a list of Entrez_Gene_IDs correponding to Uniprot_IDs for Drug Set 1
'''
set1g = ih.list_handler(input_id_list=set1p, input_type='uniprot_id', output_type='entrez_gene_id')
print('Entrez Gene IDs related to Drugs in Drug Set 1 is: {}'.format(set1g))

['uniprot.Swiss-Prot']
uniprot.Swiss-Prot:P09884
Fetching 1 gene(s) . . .
Number of IDs from mygene.info related to this query uniprot.Swiss-Prot:P09884 is : 1
['uniprot.Swiss-Prot']
uniprot.Swiss-Prot:P00390
Fetching 1 gene(s) . . .
Number of IDs from mygene.info related to this query uniprot.Swiss-Prot:P00390 is : 1
['uniprot.Swiss-Prot']
uniprot.Swiss-Prot:P23921
Fetching 1 gene(s) . . .
Number of IDs from mygene.info related to this query uniprot.Swiss-Prot:P23921 is : 1
['uniprot.Swiss-Prot']
uniprot.Swiss-Prot:P27707
Fetching 1 gene(s) . . .
Number of IDs from mygene.info related to this query uniprot.Swiss-Prot:P27707 is : 1
Entrez Gene IDs related to Drugs in Drug Set 1 is: ['1633', '6240', '5422', '2936']


In [36]:
'''
Use IdListHandler to retrieve a list of Entrez_Gene_IDs correponding to Uniprot_IDs for Drug Set 1
'''
set2g = ih.list_handler(input_id_list=set2p, input_type='uniprot_id', output_type='entrez_gene_id')
print('Entrez Gene IDs related to Drugs in Drug Set 1 is: {}'.format(set2g))

['uniprot.Swiss-Prot']
uniprot.Swiss-Prot:P11388
Fetching 1 gene(s) . . .
Number of IDs from mygene.info related to this query uniprot.Swiss-Prot:P11388 is : 1
['uniprot.Swiss-Prot']
uniprot.Swiss-Prot:P62942
Fetching 1 gene(s) . . .
Number of IDs from mygene.info related to this query uniprot.Swiss-Prot:P62942 is : 1
['uniprot.Swiss-Prot']
uniprot.Swiss-Prot:Q02880
Fetching 1 gene(s) . . .
Number of IDs from mygene.info related to this query uniprot.Swiss-Prot:Q02880 is : 1
Entrez Gene IDs related to Drugs in Drug Set 1 is: ['2280', '7155', '7153']


<a id='step4'></a>
**Step 4.** Retrieve pathways associated with genes in Set1g vs Set2g --> Set1pw, Set2pw

In [37]:
'''
Use IdListHandler to retrieve a list of Wikipathway_IDs correponding to Entrez_Gene_IDs for Drug Set 1
'''
set1pw = ih.list_handler(input_id_list=set1g, input_type='entrez_gene_id', output_type='wikipathway_id')
print('Wikipathway IDs related to Drugs in Drug Set 1 is: {}'.format(set1pw))

Wikipathway IDs related to Drugs in Drug Set 1 is: ['WP692', 'WP100', 'WP2882', 'WP408', 'WP2884', 'WP702', 'WP404', 'WP2377', 'WP466', 'WP15', 'WP3925', 'WP1601', 'WP2446', 'WP3940']


In [38]:
'''
Use IdListHandler to retrieve a list of Wikipathway_IDs correponding to Entrez_Gene_IDs for Drug Set 2
'''
set2pw = ih.list_handler(input_id_list=set2g, input_type='entrez_gene_id', output_type='wikipathway_id')
print('Wikipathway IDs related to Drugs in Drug Set 2 is: {}'.format(set2pw))

Wikipathway IDs related to Drugs in Drug Set 2 is: ['WP2431', 'WP2363', 'WP2377', 'WP1471', 'WP2361', 'WP560', 'WP536', 'WP2446']


<a id='step5'></a>
**Step 5.** Retrieve other genes involved in pathways in Set1pw vs Set2pw --> Set1g', Set2g'

In [39]:
'''
Use IdListHandler to retrieve a list of Entrez_Gene_IDs correponding to Wikipathway_IDs for Drug Set 1
'''
set1g_other = ih.list_handler(input_id_list=set1pw, input_type='wikipathway_id', output_type='entrez_gene_id')
print('Other Entrez Gene IDs related to Drugs in Drug Set 1 is: {}'.format(set1g_other))

['pathway.wikipathways.id']
pathway.wikipathways.id:WP692
Fetching 17 gene(s) . . .
Number of IDs from mygene.info related to this query pathway.wikipathways.id:WP692 is : 17
['pathway.wikipathways.id']
pathway.wikipathways.id:WP100
Fetching 20 gene(s) . . .
Number of IDs from mygene.info related to this query pathway.wikipathways.id:WP100 is : 20
['pathway.wikipathways.id']
pathway.wikipathways.id:WP2882
Fetching 316 gene(s) . . .
Number of IDs from mygene.info related to this query pathway.wikipathways.id:WP2882 is : 316
['pathway.wikipathways.id']
pathway.wikipathways.id:WP408
Fetching 30 gene(s) . . .
Number of IDs from mygene.info related to this query pathway.wikipathways.id:WP408 is : 30
['pathway.wikipathways.id']
pathway.wikipathways.id:WP2884
Fetching 142 gene(s) . . .
Number of IDs from mygene.info related to this query pathway.wikipathways.id:WP2884 is : 142
['pathway.wikipathways.id']
pathway.wikipathways.id:WP702
Fetching 177 gene(s) . . .
Number of IDs from mygene.info r

In [40]:
'''
Use IdListHandler to retrieve a list of Entrez_Gene_IDs correponding to Wikipathway_IDs for Drug Set 2
'''
set2g_other = ih.list_handler(input_id_list=set2pw, input_type='wikipathway_id', output_type='entrez_gene_id')
print('Other Entrez Gene IDs related to Drugs in Drug Set 1 is: {}'.format(set2g_other))

['pathway.wikipathways.id']
pathway.wikipathways.id:WP2431
Fetching 117 gene(s) . . .
Number of IDs from mygene.info related to this query pathway.wikipathways.id:WP2431 is : 117
['pathway.wikipathways.id']
pathway.wikipathways.id:WP2363
Fetching 32 gene(s) . . .
Number of IDs from mygene.info related to this query pathway.wikipathways.id:WP2363 is : 32
['pathway.wikipathways.id']
pathway.wikipathways.id:WP2377
Fetching 170 gene(s) . . .
Number of IDs from mygene.info related to this query pathway.wikipathways.id:WP2377 is : 170
['pathway.wikipathways.id']
pathway.wikipathways.id:WP1471
Fetching 36 gene(s) . . .
Number of IDs from mygene.info related to this query pathway.wikipathways.id:WP1471 is : 36
['pathway.wikipathways.id']
pathway.wikipathways.id:WP2361
Fetching 29 gene(s) . . .
Number of IDs from mygene.info related to this query pathway.wikipathways.id:WP2361 is : 29
['pathway.wikipathways.id']
pathway.wikipathways.id:WP560
Fetching 55 gene(s) . . .
Number of IDs from mygene.i

<a id='step6'></a>
**Step 6.** Execute set comparison analysis to return the set of genes that is uniquely targeted by poorly tolerated drugs (i.e. affected directly or indirectly by poorly tolerated drugs, but not affected by well-tolerated drugs)

In [41]:
'''
Get Unique Entrez Gene IDs for both sets
'''
set1g_other_unique = set(set1g_other)
set2g_other_unique = set(set2g_other)
print('Total number of unique genes in gene set 1: {}'.format(len(set1g_other_unique)))
print('Total number of unique genes in gene set 2: {}'.format(len(set2g_other_unique)))

Total number of unique genes in gene set 1: 913
Total number of unique genes in gene set 2: 592


In [42]:
'''
Find the set of genes that is uniquely targeted by poorly tolerated drugs (e.g. only present in set2g_other_unique)
'''
set2g_only = set2g_other_unique - set1g_other_unique
print('Total number of genes uniquely targeted by poorly tolerated drugs: {}'.format(len(set2g_only)))

Total number of genes uniquely targeted by poorly tolerated drugs: 334
