# Identifier mapping using BridgeDb WS

In this notebook I will present two use cases for BridgeDb with the purpose of identifier mapping: 
* Mapping data from a recognized data source by BridgeDb to another recognized data source ([see here](https://github.com/bridgedb/BridgeDb/blob/2dba5780260421de311cb3064df79e16a396b887/org.bridgedb.bio/resources/org/bridgedb/bio/datasources.tsv)). For example mapping data identifiers from Affy to Entrez Gene.
* Given a local identifier and a TSV mapping it to one of the BridgeDb data sources, how to map the local identifier to a different data source.

![](https://mermaid.ink/img/eyJjb2RlIjoiZmxvd2NoYXJ0IExSXG4gIHN1YmdyYXBoIFRTVlxuICAgIEFbTG9jYWwgaWRlbnRpZmllcl0gLS0-IEJbRW50cmV6XVxuICBlbmQgIFxuXG4gIHN1YmdyYXBoIEJyaWRnZURCXG4gICAgQ1tFbnRyZXpdIC0tPiBEW0FmZnldXG4gIGVuZFxuXG4gIHN1YmdyYXBoIFNjcmlwdFxuICAgIEVbTG9jYWwgaWRlbnRpZmllcl0gLS0-IEZbQWZmeV1cbiAgZW5kXG5cblxuICBUU1YgJiBCcmlkZ2VEQiAtLT4gU2NyaXB0XG4iLCJtZXJtYWlkIjp7InRoZW1lIjoibmV1dHJhbCIsInRoZW1lVmFyaWFibGVzIjp7InByaW1hcnlDb2xvciI6IiNlZWUiLCJjb250cmFzdCI6IiMyNmEiLCJzZWNvbmRhcnlDb2xvciI6ImhzbCgyMTAsIDY2LjY2NjY2NjY2NjclLCA5NSUpIiwiYmFja2dyb3VuZCI6IiNmZmZmZmYiLCJ0ZXJ0aWFyeUNvbG9yIjoiaHNsKC0xNjAsIDAlLCA5My4zMzMzMzMzMzMzJSkiLCJwcmltYXJ5Qm9yZGVyQ29sb3IiOiJoc2woMCwgMCUsIDgzLjMzMzMzMzMzMzMlKSIsInNlY29uZGFyeUJvcmRlckNvbG9yIjoiaHNsKDIxMCwgMjYuNjY2NjY2NjY2NyUsIDg1JSkiLCJ0ZXJ0aWFyeUJvcmRlckNvbG9yIjoiaHNsKC0xNjAsIDAlLCA4My4zMzMzMzMzMzMzJSkiLCJwcmltYXJ5VGV4dENvbG9yIjoiIzExMTExMSIsInNlY29uZGFyeVRleHRDb2xvciI6InJnYigyMS4yNSwgMTIuNzUsIDQuMjUpIiwidGVydGlhcnlUZXh0Q29sb3IiOiJyZ2IoMTcuMDAwMDAwMDAwMSwgMTcuMDAwMDAwMDAwMSwgMTcuMDAwMDAwMDAwMSkiLCJsaW5lQ29sb3IiOiIjNjY2IiwidGV4dENvbG9yIjoiIzAwMDAwMCIsImFsdEJhY2tncm91bmQiOiJoc2woMjEwLCA2Ni42NjY2NjY2NjY3JSwgOTUlKSIsIm1haW5Ca2ciOiIjZWVlIiwic2Vjb25kQmtnIjoiaHNsKDIxMCwgNjYuNjY2NjY2NjY2NyUsIDk1JSkiLCJib3JkZXIxIjoiIzk5OSIsImJvcmRlcjIiOiIjMjZhIiwibm90ZSI6IiNmZmEiLCJ0ZXh0IjoiIzMzMyIsImNyaXRpY2FsIjoiI2Q0MiIsImRvbmUiOiIjYmJiIiwiYXJyb3doZWFkQ29sb3IiOiIjMzMzMzMzIiwiZm9udEZhbWlseSI6IlwidHJlYnVjaGV0IG1zXCIsIHZlcmRhbmEsIGFyaWFsIiwiZm9udFNpemUiOiIxNnB4Iiwibm9kZUJrZyI6IiNlZWUiLCJub2RlQm9yZGVyIjoiIzk5OSIsImNsdXN0ZXJCa2ciOiJoc2woMjEwLCA2Ni42NjY2NjY2NjY3JSwgOTUlKSIsImNsdXN0ZXJCb3JkZXIiOiIjMjZhIiwiZGVmYXVsdExpbmtDb2xvciI6IiM2NjYiLCJ0aXRsZUNvbG9yIjoiIzMzMyIsImVkZ2VMYWJlbEJhY2tncm91bmQiOiJ3aGl0ZSIsImFjdG9yQm9yZGVyIjoiaHNsKDAsIDAlLCA4MyUpIiwiYWN0b3JCa2ciOiIjZWVlIiwiYWN0b3JUZXh0Q29sb3IiOiIjMzMzIiwiYWN0b3JMaW5lQ29sb3IiOiIjNjY2Iiwic2lnbmFsQ29sb3IiOiIjMzMzIiwic2lnbmFsVGV4dENvbG9yIjoiIzMzMyIsImxhYmVsQm94QmtnQ29sb3IiOiIjZWVlIiwibGFiZWxCb3hCb3JkZXJDb2xvciI6ImhzbCgwLCAwJSwgODMlKSIsImxhYmVsVGV4dENvbG9yIjoiIzMzMyIsImxvb3BUZXh0Q29sb3IiOiIjMzMzIiwibm90ZUJvcmRlckNvbG9yIjoiaHNsKDYwLCAxMDAlLCAyMy4zMzMzMzMzMzMzJSkiLCJub3RlQmtnQ29sb3IiOiIjZmZhIiwibm90ZVRleHRDb2xvciI6IiMzMzMiLCJhY3RpdmF0aW9uQm9yZGVyQ29sb3IiOiIjNjY2IiwiYWN0aXZhdGlvbkJrZ0NvbG9yIjoiI2Y0ZjRmNCIsInNlcXVlbmNlTnVtYmVyQ29sb3IiOiJ3aGl0ZSIsInNlY3Rpb25Ca2dDb2xvciI6ImhzbCgyMTAsIDY2LjY2NjY2NjY2NjclLCA3MCUpIiwiYWx0U2VjdGlvbkJrZ0NvbG9yIjoid2hpdGUiLCJzZWN0aW9uQmtnQ29sb3IyIjoiaHNsKDIxMCwgNjYuNjY2NjY2NjY2NyUsIDcwJSkiLCJ0YXNrQm9yZGVyQ29sb3IiOiJoc2woMjEwLCA2Ni42NjY2NjY2NjY3JSwgMzAlKSIsInRhc2tCa2dDb2xvciI6IiMyNmEiLCJ0YXNrVGV4dExpZ2h0Q29sb3IiOiJ3aGl0ZSIsInRhc2tUZXh0Q29sb3IiOiJ3aGl0ZSIsInRhc2tUZXh0RGFya0NvbG9yIjoiIzMzMyIsInRhc2tUZXh0T3V0c2lkZUNvbG9yIjoiIzMzMyIsInRhc2tUZXh0Q2xpY2thYmxlQ29sb3IiOiIjMDAzMTYzIiwiYWN0aXZlVGFza0JvcmRlckNvbG9yIjoiaHNsKDIxMCwgNjYuNjY2NjY2NjY2NyUsIDMwJSkiLCJhY3RpdmVUYXNrQmtnQ29sb3IiOiIjZWVlIiwiZ3JpZENvbG9yIjoiaHNsKDAsIDAlLCA5MCUpIiwiZG9uZVRhc2tCa2dDb2xvciI6IiNiYmIiLCJkb25lVGFza0JvcmRlckNvbG9yIjoiIzY2NiIsImNyaXRCa2dDb2xvciI6IiNkNDIiLCJjcml0Qm9yZGVyQ29sb3IiOiJoc2woMTAuOTA5MDkwOTA5MSwgNzMuMzMzMzMzMzMzMyUsIDQwJSkiLCJ0b2RheUxpbmVDb2xvciI6IiNkNDIiLCJsYWJlbENvbG9yIjoiYmxhY2siLCJlcnJvckJrZ0NvbG9yIjoiIzU1MjIyMiIsImVycm9yVGV4dENvbG9yIjoiIzU1MjIyMiIsImNsYXNzVGV4dCI6IiMxMTExMTEiLCJmaWxsVHlwZTAiOiIjZWVlIiwiZmlsbFR5cGUxIjoiaHNsKDIxMCwgNjYuNjY2NjY2NjY2NyUsIDk1JSkiLCJmaWxsVHlwZTIiOiJoc2woNjQsIDAlLCA5My4zMzMzMzMzMzMzJSkiLCJmaWxsVHlwZTMiOiJoc2woMjc0LCA2Ni42NjY2NjY2NjY3JSwgOTUlKSIsImZpbGxUeXBlNCI6ImhzbCgtNjQsIDAlLCA5My4zMzMzMzMzMzMzJSkiLCJmaWxsVHlwZTUiOiJoc2woMTQ2LCA2Ni42NjY2NjY2NjY3JSwgOTUlKSIsImZpbGxUeXBlNiI6ImhzbCgxMjgsIDAlLCA5My4zMzMzMzMzMzMzJSkiLCJmaWxsVHlwZTciOiJoc2woMzM4LCA2Ni42NjY2NjY2NjY3JSwgOTUlKSJ9fX0)

# Querying the WS

To query the Webservice we define below the url and the patterns for a single request and a batch request. You can find the docs [here](https://bridgedb.github.io/swagger/). We will use Python's requests library.

In [7]:
url = "https://webservice.bridgedb.org/"

In [8]:
single_request = url+"{org}/xrefs/{source}/{identifier}"

In [9]:
batch_request = url+"{org}/xrefsBatch/{source}"

In [10]:
import requests
import pandas as pd

Here we define a method that will turn the web service response into a dataframe with columns corresponding to:
* The original identifier
* The data source that the identifier is part of
* The mapped identifier
* The data source for the mapped identifier

In [11]:
def to_df(response, batch=False):
    if batch:
        records = []
        for tup in to_df(response).itertuples():
            if tup[3] != None:
                for mappings in tup[3].split(','):
                    target = mappings.split(':')
                    records.append((tup[1], tup[2], target[1], target[0]))
        return pd.DataFrame(records, columns = ['original', 'source', 'mapping', 'target'])
        
    return pd.DataFrame([line.split('\t') for line in response.text.split('\n')])

Here we define the organism and the data source from which we want to map

In [12]:
source = "X"
org = 'Zea%20mays'

# Case 1

Here we first load the case 1 example data.

In [13]:
case1 = pd.read_csv("data/case1-example.tsv", header=None)
case1.head()

Unnamed: 0,0
0,AFFX-Zm-ef1a-5_a_at
1,AFFX-Zm_Ubiquitin_M_f_at
2,AFFX-Zm_Ubiquitin_5_f_at


Then we batch request the mappings

In [14]:
response1 = requests.post(batch_request.format(org=org, source=source), data = case1.to_csv(index=False, header=False))

And use our `to_df` method to turn it into a DataFrame

In [15]:
case1_df = to_df(response1, batch=True)
case1_df

Unnamed: 0,original,source,mapping,target
0,AFFX-Zm-ef1a-5_a_at,Affy,A0A1D6M1H3,S
1,AFFX-Zm-ef1a-5_a_at,Affy,A0A1D6M1H2,S
2,AFFX-Zm-ef1a-5_a_at,Affy,A0A1D6M1J4,S
3,AFFX-Zm-ef1a-5_a_at,Affy,GO,T
4,AFFX-Zm-ef1a-5_a_at,Affy,A0A1D6M1J3,S
...,...,...,...,...
74,AFFX-Zm_Ubiquitin_5_f_at,Affy,Q41752,S
75,AFFX-Zm_Ubiquitin_5_f_at,Affy,Q41751,S
76,AFFX-Zm_Ubiquitin_5_f_at,Affy,XP_008645269.1,Q
77,AFFX-Zm_Ubiquitin_5_f_at,Affy,K7UBK6,S


# Case 2
Here we first load the case 2 example data and perform the same steps as before

In [16]:
case2 = pd.read_csv('data/case2-example.tsv', sep='\t', names=['local', 'source'])
case2.head()

Unnamed: 0,local,source
0,1234,AFFX-Zm-ef1a-5_a_at
1,6789,AFFX-Zm_Ubiquitin_M_f_at
2,5555,AFFX-Zm_Ubiquitin_5_f_at


In [17]:
response2 = requests.post(batch_request.format(org=org, source=source), data = case2.source.to_csv(index=False, header=False))

In [18]:
mappings = to_df(response2, batch=True)
mappings.head()

Unnamed: 0,original,source,mapping,target
0,AFFX-Zm-ef1a-5_a_at,Affy,A0A1D6M1H3,S
1,AFFX-Zm-ef1a-5_a_at,Affy,A0A1D6M1H2,S
2,AFFX-Zm-ef1a-5_a_at,Affy,A0A1D6M1J4,S
3,AFFX-Zm-ef1a-5_a_at,Affy,GO,T
4,AFFX-Zm-ef1a-5_a_at,Affy,A0A1D6M1J3,S


After obtaining the mappings we join with the TSV file on the Affy identifier, obtaining the desired mapping by selecting the columns `mapping` and local

In [19]:
local_mapping = mappings.join(case2.set_index('source'), on='original')

In [21]:
local_mapping[['mapping', 'local']]

Unnamed: 0,mapping,local
0,A0A1D6M1H3,1234
1,A0A1D6M1H2,1234
2,A0A1D6M1J4,1234
3,GO,1234
4,A0A1D6M1J3,1234
...,...,...
74,Q41752,5555
75,Q41751,5555
76,XP_008645269.1,5555
77,K7UBK6,5555


# Using Script

In [1]:
from bridgedb_script import get_mappings

In [2]:
get_mappings("data/case2-example.tsv", "Zea mays", "X", case=2)

Unnamed: 0,original,source,mapping,target,local
0,AFFX-Zm-ef1a-5_a_at,Affy,A0A1D6M1H3,S,1234
1,AFFX-Zm-ef1a-5_a_at,Affy,A0A1D6M1H2,S,1234
2,AFFX-Zm-ef1a-5_a_at,Affy,A0A1D6M1J4,S,1234
3,AFFX-Zm-ef1a-5_a_at,Affy,GO,T,1234
4,AFFX-Zm-ef1a-5_a_at,Affy,A0A1D6M1J3,S,1234
...,...,...,...,...,...
74,AFFX-Zm_Ubiquitin_5_f_at,Affy,Q41752,S,5555
75,AFFX-Zm_Ubiquitin_5_f_at,Affy,Q41751,S,5555
76,AFFX-Zm_Ubiquitin_5_f_at,Affy,XP_008645269.1,Q,5555
77,AFFX-Zm_Ubiquitin_5_f_at,Affy,K7UBK6,S,5555


In [3]:
get_mappings("data/case1-example.tsv", "Zea mays", "X", case=1)

Unnamed: 0,original,source,mapping,target
0,AFFX-Zm-ef1a-5_a_at,Affy,A0A1D6M1H3,S
1,AFFX-Zm-ef1a-5_a_at,Affy,A0A1D6M1H2,S
2,AFFX-Zm-ef1a-5_a_at,Affy,A0A1D6M1J4,S
3,AFFX-Zm-ef1a-5_a_at,Affy,GO,T
4,AFFX-Zm-ef1a-5_a_at,Affy,A0A1D6M1J3,S
...,...,...,...,...
74,AFFX-Zm_Ubiquitin_5_f_at,Affy,Q41752,S
75,AFFX-Zm_Ubiquitin_5_f_at,Affy,Q41751,S
76,AFFX-Zm_Ubiquitin_5_f_at,Affy,XP_008645269.1,Q
77,AFFX-Zm_Ubiquitin_5_f_at,Affy,K7UBK6,S
