# Mapping Drugbank drug targets on KEGG pathway

by Kozo Nishida (Riken, Japan)

Here we show a example of data integration. We map drug targets(from Drugbank) on KEGG pathway. To manage several tables, we use MongoDB and PyMongo.

## Loading all data into pandas dataframe
First we import a KEGG pathway: [Alanine, aspartate and glutamate metabolism, eco00250](http://www.genome.jp/kegg-bin/show_pathway?eco00250)


In [1]:
import requests
import json
import pandas as pd

PORT_NUMBER = 1234
BASE = 'http://localhost:' + str(PORT_NUMBER) + '/v1/'
HEADERS = {'Content-Type': 'application/json'}

requests.post(BASE + 'networks?source=url&collection=KEGG', data=json.dumps(['http://rest.kegg.jp/get/eco00250/kgml']), headers=HEADERS)

<Response [200]>

### and get node attribute table as alanine_nodes.tsv

In [2]:
res = requests.get(BASE + 'networks')
networkIds = eval(res.content)
print networkIds

[71442, 70708, 71718, 72010, 72396, 71166]


In [3]:
res = requests.get(BASE + 'networks/' + str(networkIds[0]) + '/tables/defaultnode.tsv')
f = open('alanine_nodes.tsv', 'w')
f.write(res.content)
f.close()

####import alanine_nodes.tsv into pandas dataframe

In [4]:
alanine_df = pd.read_table('alanine_nodes.tsv')
alanine_df.head()

Unnamed: 0,SUID,shared name,KEGG_NODE_X,KEGG_NODE_Y,KEGG_NODE_WIDTH,KEGG_NODE_HEIGHT,KEGG_NODE_LABEL,KEGG_NODE_LABEL_LIST_FIRST,KEGG_NODE_LABEL_LIST,KEGG_ID,...,ld20t16,ld20t56,ld20t14,ld20t60,ld20t72,ld20t28,id,chart,name,selected
0,71452,path:ath00020:28,526,649,46,17,K00174...,K00174...,K00174...,ko:K00174|ko:K00175|ko:K00177|ko:K00176,...,,,,,,,,,path:ath00020:28,False
1,71453,path:ath00020:29,467,618,46,17,mtLPD1...,mtLPD1...,mtLPD1...,ath:AT1G48030|ath:AT3G16950|ath:AT3G17240|ath:...,...,,,,,,,,,path:ath00020:29,False
2,71454,path:ath00020:30,661,574,46,17,AT3G55410...,AT3G55410...,AT3G55410...,ath:AT3G55410|ath:AT5G65750,...,,,,,,,,,path:ath00020:30,False
3,71455,path:ath00020:31,530,575,46,17,AT3G55410...,AT3G55410...,AT3G55410...,ath:AT3G55410|ath:AT5G65750,...,,,,,,,,,path:ath00020:31,False
4,71456,path:ath00020:32,403,574,46,17,AT4G26910...,AT4G26910...,AT4G26910...,ath:AT4G26910|ath:AT5G55070,...,,,,,,,,,path:ath00020:32,False


####next we download DRUGBANK drugtarget and ID mapping table

In [5]:
!curl -O http://www.drugbank.ca/system/downloads/current/all_target_ids_all.csv.zip

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  203k  100  203k    0     0   281k      0 --:--:-- --:--:-- --:--:--  281k


In [6]:
!unzip all_target_ids_all.csv.zip

Archive:  all_target_ids_all.csv.zip
  inflating: all_target_ids_all.csv  


####import Drugbank drug targets into pandas dataframe

In [12]:
drugbank_df = pd.read_csv('all_target_ids_all.csv')
drugbank_df.head()

Unnamed: 0,ID,Name,Gene Name,GenBank Protein ID,GenBank Gene ID,UniProt ID,Uniprot Title,PDB ID,GeneCard ID,GenAtlas ID,HGNC ID,Species,Drug IDs
0,P45059,Peptidoglycan synthase FtsI,ftsI,1574687,L42023,P45059,FTSI_HAEIN,,,,,Haemophilus influenzae (strain ATCC 51907 / DS...,DB00303
1,P19113,Histidine decarboxylase,HDC,32109,X54297,P19113,DCHS_HUMAN,,HDC,HDC,HGNC:4855,Human,DB00114; DB00117
2,Q9UI32,"Glutaminase liver isoform, mitochondrial",GLS2,6650606,AF110330,Q9UI32,GLSL_HUMAN,,GLS2,GLS2,HGNC:29570,Human,DB00142
3,P00488,Coagulation factor XIII A chain,F13A1,182309,M22001,P00488,F13A_HUMAN,,F13A1,F13A1,HGNC:3531,Human,DB01839; DB02340
4,P35228,"Nitric oxide synthase, inducible",NOS2,292242,L09210,P35228,NOS2_HUMAN,,NOS2A,NOS2A,HGNC:7873,Human,DB00125; DB00155; DB01110; DB01234; DB01686; D...


####Get uniprot-keggid conversion table. This takes long time.

In [13]:
!curl -o conv_eco_uniprot.tsv http://rest.kegg.jp/conv/eco/uniprot

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:--  0:00:36 --:--:--     0^C


In [14]:
!head conv_eco_uniprot.tsv

head: conv_eco_uniprot.tsv: No such file or directory


####import uniprot-KEGG ID conversion table

In [15]:
idconversion_df = pd.read_table('conv_eco_uniprot.tsv', header=None)
idconversion_df.head()

IOError: File conv_eco_uniprot.tsv does not exist

## Merging pandas dataframes

We integrate the three table(network nodes, drug targets table, id conversion table). Here we append columns drug target and drug to Cytoscape’s node table.

In [None]:
target_uniprot = []
target_drug = []

In [None]:
for i, keggtype in alanine_df['KEGG_NODE_TYPE'].iteritems():
    target_uniprot.append(None)
    target_drug.append(None)
    if keggtype == 'gene':
        uniprotids = []
        for locus in alanine_df['KEGG_ID'][i].split('|'):
            uniprot = idconversion_df[idconversion_df[1] == locus][0]
            uniprotid = uniprot.values[0].replace('up:', '')
            uniprotids.append(uniprotid)
        for j, unip in drugbank_df['UniProt ID'].iteritems():
            if unip in uniprotids:
                target_uniprot.pop()
                target_uniprot.append(unip)
                target_drug.pop()
                target_drug.append(drugbank_df['Drug IDs'][j])

In [None]:
s1 = pd.Series(target_uniprot, name='TARGET_UNIPROT')
s2 = pd.Series(target_drug, name='TARGET_DRUG')
merged_df = pd.concat([alanine_df, s1, s2], axis=1)
merged_df.head()

In [None]:
drugjson = json.loads(merged_df.to_json(orient="records"))

new_table_data = {
    "key": "KEGG_NODE_LABEL",
    "dataKey": "KEGG_NODE_LABEL",
    "data" : drugjson
}

update_table_url =  BASE + "networks/" + str(networkIds[0]) + "/tables/defaultnode"
print update_table_url

requests.put(update_table_url, data=json.dumps(new_table_data), headers=HEADERS)