# Query

What genes and pathways are uniquely targeted by HSCT conditioning drugs that are well- vs poorly- tolerated by FA patients?

# Workflow

**Input**: Two HSCT conditioning drug sets: (1) well-tolerated by FA patients (Set1d); (2) poorly-tolerated by FA patients (Set2d)

**Step 1.** Retrieve proteins targeted by set of well-tolerated HSCT conditioning drugs --> Set1p

**Step 2.** Retrieve proteins targeted by set of well-tolerated HSCT conditioning drugs --> Set2p

**Step 3.** Retrieve genes encoding proteins in Set1p vs Set2p --> Set1g, Set2g

**Step 4.** Retrieve pathways associated with genes in Set1g vs Set2g --> Set1pw, Set2pw

**Step 5.** Retreive other genes involved in pathways in Set1pw vs Set2pw --> Set1g', Set2g'

**Step 6.** Execute set comparison analysis to return the set of genes that is uniquely targetd by poorly tolerated drugs (i.e. effected directly or indirectly by poorly tolerated drugs, but not affected by well-tolerated drugs)

**Output**: Set of genes that may be uniquely targeted by pre-conditioning drugs that are poorly tolerated by FA patients.

# Sources & Routes

**Step 1 & 2**: **Drug to Protein** [Drug and Compound API](http://c.biothings.io)

**Step 3**: **Protein to Gene** [MyGene.info](http://mygene.info)

**Step 4**: **Gene to Pathway** [MyGene.info](http://mygene.info)

**Step 5**: **Pathway to Gene** [MyGene.info](http://mygene.info)

# Demo

**Input**: Two HSCT conditioning drug sets: (1) well-tolerated by FA patients (Set1d); (2) poorly-tolerated by FA patients (Set2d)

In [5]:
'''
Assume well-tolerated drugs: Fludarabine, Carmustine
Assume poorly-tolerated drugs: Etoposide, Tacrolimus
'''
drug_set1 = ['Fludarabine', 'Carmustine']
drug_set2 = ['Etoposide', 'Tacrolimus']

In [12]:
'''
Get DrugBank ID using drug and compound API (mydrug python package)
'''
import mydrug
md = mydrug.MyDrugInfo()
results_drug_set1 = md.querymany(drug_set1, scopes='drugbank.name', fields='drugbank.accession_number')
print(results_drug_set1)
set1d = [_record['drugbank']['accession_number'] for _record in results_drug_set1]
print('Drugbank ID list for Set1: {}'.format(set1d))

querying 1-2...done.
Finished.
[{'_id': 'GIUYCYHIANZCFB-FJFJXFQQSA-N', 'query': 'Fludarabine', 'drugbank': {'accession_number': 'DB01073'}, '_score': 16.305637}, {'_id': 'DLGOEMSEDOSKAD-UHFFFAOYSA-N', 'query': 'Carmustine', 'drugbank': {'accession_number': 'DB00262'}, '_score': 16.305637}]
Drugbank ID list for Set1: ['DB01073', 'DB00262']


In [13]:
results_drug_set2 = md.querymany(drug_set2, scopes='drugbank.name', fields='drugbank.accession_number')
print(results_drug_set2)
set2d = [_record['drugbank']['accession_number'] for _record in results_drug_set2]
print('Drugbank ID list for Set2: {}'.format(set2d))

querying 1-2...done.
Finished.
[{'_id': 'VJJPUSNTGOMMGY-MRVIYFEKSA-N', 'query': 'Etoposide', 'drugbank': {'accession_number': 'DB00773'}, '_score': 16.305143}, {'_id': 'QJJXYPPXXYFBGM-LFZNUXCKSA-N', 'query': 'Tacrolimus', 'drugbank': {'accession_number': 'DB00864'}, '_score': 16.30573}]
Drugbank ID list for Set2: ['DB00773', 'DB00864']


**Step 1**. Retrieve proteins (*uniprot_id*) targeted by set of well-tolerated HSCT conditioning drugs --> Set1p

In [8]:
'''
Using Biothings Python Library to find Uniprot_ID related to each Drugbank_ID
Biothings Library is built on JSON-LD
The query is done through Drug and Compound API internally
'''
from biothings import IdListHandler
# IdListHandler is designed to handle a list of given IDs, e.g. drugbank ID, and return a list of IDs given the output type, e.g. uniprot_id
ih = IdListHandler()


In [14]:
'''
Use IdListHandler to retrieve a list of Uniprot_IDs correponding to Drugbank_IDs for Drug Set 1
'''
set1p = ih.list_handler(input_id_list=set1d, input_type='drugbank_id', output_type='uniprot_id')
print('Protein Uniprot IDs related to Drugs in Drug Set 1 is: {}'.format(set1p))

Protein Uniprot IDs related to Drugs in Drug Set 1 is: ['P23921', 'P09884', 'P27707', 'P00390']


**Step 2.** Retrieve proteins targeted by set of well-tolerated HSCT conditioning drugs --> Set2p

In [15]:
'''
Use IdListHandler to retrieve a list of Uniprot_IDs correponding to Drugbank_IDs for Drug Set 2
'''
set2p = ih.list_handler(input_id_list=set2d, input_type='drugbank_id', output_type='uniprot_id')
print('Protein Uniprot IDs related to Drugs in Drug Set 2 is: {}'.format(set2p))

Protein Uniprot IDs related to Drugs in Drug Set 2 is: ['P11388', 'Q02880', 'P62942']


**Step 3**. Retrieve genes encoding proteins in Set1p vs Set2p --> Set1g, Set2g

In [16]:
'''
Use IdListHandler to retrieve a list of Entrez_Gene_IDs correponding to Uniprot_IDs for Drug Set 1
'''
set1g = ih.list_handler(input_id_list=set1p, input_type='uniprot_id', output_type='entrez_gene_id')
print('Entrez Gene IDs related to Drugs in Drug Set 1 is: {}'.format(set1g))

uniprot.Swiss-Prot:P23921
Fetching 1 genes(s) . . .
Number of IDs from mygene.info related to this query uniprot.Swiss-Prot:P23921 is : 1
uniprot.Swiss-Prot:P09884
Fetching 1 genes(s) . . .
Number of IDs from mygene.info related to this query uniprot.Swiss-Prot:P09884 is : 1
uniprot.Swiss-Prot:P27707
Fetching 1 genes(s) . . .
Number of IDs from mygene.info related to this query uniprot.Swiss-Prot:P27707 is : 1
uniprot.Swiss-Prot:P00390
Fetching 1 genes(s) . . .
Number of IDs from mygene.info related to this query uniprot.Swiss-Prot:P00390 is : 1
Entrez Gene IDs related to Drugs in Drug Set 1 is: ['6240', '5422', '1633', '2936']


In [17]:
'''
Use IdListHandler to retrieve a list of Entrez_Gene_IDs correponding to Uniprot_IDs for Drug Set 1
'''
set2g = ih.list_handler(input_id_list=set2p, input_type='uniprot_id', output_type='entrez_gene_id')
print('Entrez Gene IDs related to Drugs in Drug Set 1 is: {}'.format(set2g))

uniprot.Swiss-Prot:P11388
Fetching 1 genes(s) . . .
Number of IDs from mygene.info related to this query uniprot.Swiss-Prot:P11388 is : 1
uniprot.Swiss-Prot:Q02880
Fetching 1 genes(s) . . .
Number of IDs from mygene.info related to this query uniprot.Swiss-Prot:Q02880 is : 1
uniprot.Swiss-Prot:P62942
Fetching 1 genes(s) . . .
Number of IDs from mygene.info related to this query uniprot.Swiss-Prot:P62942 is : 1
Entrez Gene IDs related to Drugs in Drug Set 1 is: ['7153', '7155', '2280']


**Step 4.** Retrieve pathways associated with genes in Set1g vs Set2g --> Set1pw, Set2pw

In [18]:
'''
Use IdListHandler to retrieve a list of Wikipathway_IDs correponding to Entrez_Gene_IDs for Drug Set 1
'''
set1pw = ih.list_handler(input_id_list=set1g, input_type='entrez_gene_id', output_type='wikipathway_id')
print('Wikipathway IDs related to Drugs in Drug Set 1 is: {}'.format(set1pw))

Wikipathway IDs related to Drugs in Drug Set 1 is: ['WP1601', 'WP2377', 'WP404', 'WP2446', 'WP404', 'WP2446', 'WP466', 'WP2446', 'WP3925', 'WP2884', 'WP3940', 'WP702', 'WP100', 'WP408', 'WP2882', 'WP692', 'WP15']


In [19]:
'''
Use IdListHandler to retrieve a list of Wikipathway_IDs correponding to Entrez_Gene_IDs for Drug Set 2
'''
set2pw = ih.list_handler(input_id_list=set2g, input_type='entrez_gene_id', output_type='wikipathway_id')
print('Wikipathway IDs related to Drugs in Drug Set 2 is: {}'.format(set2pw))

Wikipathway IDs related to Drugs in Drug Set 2 is: ['WP2377', 'WP2363', 'WP2446', 'WP2361', 'WP2431', 'WP2377', 'WP1471', 'WP560', 'WP536']


**Step 5.** Retreive other genes involved in pathways in Set1pw vs Set2pw --> Set1g', Set2g'

In [21]:
'''
Use IdListHandler to retrieve a list of Entrez_Gene_IDs correponding to Wikipathway_IDs for Drug Set 1
'''
set1g_other = ih.list_handler(input_id_list=set1pw, input_type='wikipathway_id', output_type='entrez_gene_id')
print('Other Entrez Gene IDs related to Drugs in Drug Set 1 is: {}'.format(set1g_other))

pathway.wikipathways.id:WP1601
Fetching 34 genes(s) . . .
Number of IDs from mygene.info related to this query pathway.wikipathways.id:WP1601 is : 34
pathway.wikipathways.id:WP2377
Fetching 170 genes(s) . . .
Number of IDs from mygene.info related to this query pathway.wikipathways.id:WP2377 is : 170
pathway.wikipathways.id:WP404
Fetching 19 genes(s) . . .
Number of IDs from mygene.info related to this query pathway.wikipathways.id:WP404 is : 19
pathway.wikipathways.id:WP2446
Fetching 89 genes(s) . . .
Number of IDs from mygene.info related to this query pathway.wikipathways.id:WP2446 is : 89
pathway.wikipathways.id:WP404
Fetching 19 genes(s) . . .
Number of IDs from mygene.info related to this query pathway.wikipathways.id:WP404 is : 19
pathway.wikipathways.id:WP2446
Fetching 89 genes(s) . . .
Number of IDs from mygene.info related to this query pathway.wikipathways.id:WP2446 is : 89
pathway.wikipathways.id:WP466
Fetching 42 genes(s) . . .
Number of IDs from mygene.info related to thi

In [22]:
'''
Use IdListHandler to retrieve a list of Entrez_Gene_IDs correponding to Wikipathway_IDs for Drug Set 2
'''
set2g_other = ih.list_handler(input_id_list=set2pw, input_type='wikipathway_id', output_type='entrez_gene_id')
print('Other Entrez Gene IDs related to Drugs in Drug Set 1 is: {}'.format(set2g_other))

pathway.wikipathways.id:WP2377
Fetching 170 genes(s) . . .
Number of IDs from mygene.info related to this query pathway.wikipathways.id:WP2377 is : 170
pathway.wikipathways.id:WP2363
Fetching 32 genes(s) . . .
Number of IDs from mygene.info related to this query pathway.wikipathways.id:WP2363 is : 32
pathway.wikipathways.id:WP2446
Fetching 89 genes(s) . . .
Number of IDs from mygene.info related to this query pathway.wikipathways.id:WP2446 is : 89
pathway.wikipathways.id:WP2361
Fetching 29 genes(s) . . .
Number of IDs from mygene.info related to this query pathway.wikipathways.id:WP2361 is : 29
pathway.wikipathways.id:WP2431
Fetching 117 genes(s) . . .
Number of IDs from mygene.info related to this query pathway.wikipathways.id:WP2431 is : 117
pathway.wikipathways.id:WP2377
Fetching 170 genes(s) . . .
Number of IDs from mygene.info related to this query pathway.wikipathways.id:WP2377 is : 170
pathway.wikipathways.id:WP1471
Fetching 36 genes(s) . . .
Number of IDs from mygene.info relat

**Step 6.** Execute set comparison analysis to return the set of genes that is uniquely targetd by poorly tolerated drugs (i.e. effected directly or indirectly by poorly tolerated drugs, but not affected by well-tolerated drugs)

In [23]:
'''
Get Unique Entrez Gene IDs for both sets
'''
set1g_other_unique = set(set1g_other)
set2g_other_unique = set(set2g_other)
print('Total number of unique genes in gene set 1: {}'.format(len(set1g_other_unique)))
print('Total number of unique genes in gene set 2: {}'.format(len(set2g_other_unique)))

Total number of unique genes in gene set 1: 913
Total number of unique genes in gene set 2: 592


In [24]:
'''
Find the set of genes that is uniquely targetd by poorly tolerated drugs (e.g. only present in set2g_other_unique)
'''
set2g_only = set2g_other_unique - set1g_other_unique
print('Total number of genes uniquely targeted by poorly tolerated drugs: {}'.format(len(set2g_only)))

Total number of genes uniquely targeted by poorly tolerated drugs: 334
