# Identifier mapping using BridgeDb WS

In this notebook I will present two use cases for BridgeDb with the purpose of identifier mapping: 
* Mapping data from a recognized data source by BridgeDb to another recognized data source ([see here](https://github.com/bridgedb/BridgeDb/blob/2dba5780260421de311cb3064df79e16a396b887/org.bridgedb.bio/resources/org/bridgedb/bio/datasources.tsv)). For example mapping data identifiers from Affy to Entrez Gene.
* Given a local identifier and a TSV mapping it to one of the BridgeDb data sources, how to map the local identifier to a different data source.


[![](https://mermaid.ink/img/eyJjb2RlIjoiZmxvd2NoYXJ0IFRCIFxuXG4gICAgc3ViZ3JhcGggVFNWXG4gICAgQVtMb2NhbCBpZGVudGlmaWVyXS0tPkJbQWZmeV1cbiAgICBlbmRcbiAgICBcbiAgICBzdWJncmFwaCBCcmlkZ2VEYlxuICAgIENbQWZmeV0tLT5FbnRyZXpcbiAgICBlbmRcbiAgICBcbiAgICBzdWJncmFwaCBTY3JpcHRcbiAgICBEW0xvY2FsIGlkZW50aWZpZXJdIC0tPiBGW0FmZnldIC0tPiBFW0VudHJlel1cbiAgICBlbmRcbiAgICBcbiAgICBUU1YgLS0-IFNjcmlwdFxuICAgIEJyaWRnZURiIC0tPiBTY3JpcHQiLCJtZXJtYWlkIjp7InRoZW1lIjoiZGVmYXVsdCIsInRoZW1lVmFyaWFibGVzIjp7ImJhY2tncm91bmQiOiJ3aGl0ZSIsInByaW1hcnlDb2xvciI6IiNFQ0VDRkYiLCJzZWNvbmRhcnlDb2xvciI6IiNmZmZmZGUiLCJ0ZXJ0aWFyeUNvbG9yIjoiaHNsKDgwLCAxMDAlLCA5Ni4yNzQ1MDk4MDM5JSkiLCJwcmltYXJ5Qm9yZGVyQ29sb3IiOiJoc2woMjQwLCA2MCUsIDg2LjI3NDUwOTgwMzklKSIsInNlY29uZGFyeUJvcmRlckNvbG9yIjoiaHNsKDYwLCA2MCUsIDgzLjUyOTQxMTc2NDclKSIsInRlcnRpYXJ5Qm9yZGVyQ29sb3IiOiJoc2woODAsIDYwJSwgODYuMjc0NTA5ODAzOSUpIiwicHJpbWFyeVRleHRDb2xvciI6IiMxMzEzMDAiLCJzZWNvbmRhcnlUZXh0Q29sb3IiOiIjMDAwMDIxIiwidGVydGlhcnlUZXh0Q29sb3IiOiJyZ2IoOS41MDAwMDAwMDAxLCA5LjUwMDAwMDAwMDEsIDkuNTAwMDAwMDAwMSkiLCJsaW5lQ29sb3IiOiIjMzMzMzMzIiwidGV4dENvbG9yIjoiIzMzMyIsIm1haW5Ca2ciOiIjRUNFQ0ZGIiwic2Vjb25kQmtnIjoiI2ZmZmZkZSIsImJvcmRlcjEiOiIjOTM3MERCIiwiYm9yZGVyMiI6IiNhYWFhMzMiLCJhcnJvd2hlYWRDb2xvciI6IiMzMzMzMzMiLCJmb250RmFtaWx5IjoiXCJ0cmVidWNoZXQgbXNcIiwgdmVyZGFuYSwgYXJpYWwiLCJmb250U2l6ZSI6IjE2cHgiLCJsYWJlbEJhY2tncm91bmQiOiIjZThlOGU4Iiwibm9kZUJrZyI6IiNFQ0VDRkYiLCJub2RlQm9yZGVyIjoiIzkzNzBEQiIsImNsdXN0ZXJCa2ciOiIjZmZmZmRlIiwiY2x1c3RlckJvcmRlciI6IiNhYWFhMzMiLCJkZWZhdWx0TGlua0NvbG9yIjoiIzMzMzMzMyIsInRpdGxlQ29sb3IiOiIjMzMzIiwiZWRnZUxhYmVsQmFja2dyb3VuZCI6IiNlOGU4ZTgiLCJhY3RvckJvcmRlciI6ImhzbCgyNTkuNjI2MTY4MjI0MywgNTkuNzc2NTM2MzEyOCUsIDg3LjkwMTk2MDc4NDMlKSIsImFjdG9yQmtnIjoiI0VDRUNGRiIsImFjdG9yVGV4dENvbG9yIjoiYmxhY2siLCJhY3RvckxpbmVDb2xvciI6ImdyZXkiLCJzaWduYWxDb2xvciI6IiMzMzMiLCJzaWduYWxUZXh0Q29sb3IiOiIjMzMzIiwibGFiZWxCb3hCa2dDb2xvciI6IiNFQ0VDRkYiLCJsYWJlbEJveEJvcmRlckNvbG9yIjoiaHNsKDI1OS42MjYxNjgyMjQzLCA1OS43NzY1MzYzMTI4JSwgODcuOTAxOTYwNzg0MyUpIiwibGFiZWxUZXh0Q29sb3IiOiJibGFjayIsImxvb3BUZXh0Q29sb3IiOiJibGFjayIsIm5vdGVCb3JkZXJDb2xvciI6IiNhYWFhMzMiLCJub3RlQmtnQ29sb3IiOiIjZmZmNWFkIiwibm90ZVRleHRDb2xvciI6ImJsYWNrIiwiYWN0aXZhdGlvbkJvcmRlckNvbG9yIjoiIzY2NiIsImFjdGl2YXRpb25Ca2dDb2xvciI6IiNmNGY0ZjQiLCJzZXF1ZW5jZU51bWJlckNvbG9yIjoid2hpdGUiLCJzZWN0aW9uQmtnQ29sb3IiOiJyZ2JhKDEwMiwgMTAyLCAyNTUsIDAuNDkpIiwiYWx0U2VjdGlvbkJrZ0NvbG9yIjoid2hpdGUiLCJzZWN0aW9uQmtnQ29sb3IyIjoiI2ZmZjQwMCIsInRhc2tCb3JkZXJDb2xvciI6IiM1MzRmYmMiLCJ0YXNrQmtnQ29sb3IiOiIjOGE5MGRkIiwidGFza1RleHRMaWdodENvbG9yIjoid2hpdGUiLCJ0YXNrVGV4dENvbG9yIjoid2hpdGUiLCJ0YXNrVGV4dERhcmtDb2xvciI6ImJsYWNrIiwidGFza1RleHRPdXRzaWRlQ29sb3IiOiJibGFjayIsInRhc2tUZXh0Q2xpY2thYmxlQ29sb3IiOiIjMDAzMTYzIiwiYWN0aXZlVGFza0JvcmRlckNvbG9yIjoiIzUzNGZiYyIsImFjdGl2ZVRhc2tCa2dDb2xvciI6IiNiZmM3ZmYiLCJncmlkQ29sb3IiOiJsaWdodGdyZXkiLCJkb25lVGFza0JrZ0NvbG9yIjoibGlnaHRncmV5IiwiZG9uZVRhc2tCb3JkZXJDb2xvciI6ImdyZXkiLCJjcml0Qm9yZGVyQ29sb3IiOiIjZmY4ODg4IiwiY3JpdEJrZ0NvbG9yIjoicmVkIiwidG9kYXlMaW5lQ29sb3IiOiJyZWQiLCJsYWJlbENvbG9yIjoiYmxhY2siLCJlcnJvckJrZ0NvbG9yIjoiIzU1MjIyMiIsImVycm9yVGV4dENvbG9yIjoiIzU1MjIyMiIsImNsYXNzVGV4dCI6IiMxMzEzMDAiLCJmaWxsVHlwZTAiOiIjRUNFQ0ZGIiwiZmlsbFR5cGUxIjoiI2ZmZmZkZSIsImZpbGxUeXBlMiI6ImhzbCgzMDQsIDEwMCUsIDk2LjI3NDUwOTgwMzklKSIsImZpbGxUeXBlMyI6ImhzbCgxMjQsIDEwMCUsIDkzLjUyOTQxMTc2NDclKSIsImZpbGxUeXBlNCI6ImhzbCgxNzYsIDEwMCUsIDk2LjI3NDUwOTgwMzklKSIsImZpbGxUeXBlNSI6ImhzbCgtNCwgMTAwJSwgOTMuNTI5NDExNzY0NyUpIiwiZmlsbFR5cGU2IjoiaHNsKDgsIDEwMCUsIDk2LjI3NDUwOTgwMzklKSIsImZpbGxUeXBlNyI6ImhzbCgxODgsIDEwMCUsIDkzLjUyOTQxMTc2NDclKSJ9fSwidXBkYXRlRWRpdG9yIjpmYWxzZX0)](https://mermaid-js.github.io/mermaid-live-editor/#/edit/eyJjb2RlIjoiZmxvd2NoYXJ0IFRCIFxuXG4gICAgc3ViZ3JhcGggVFNWXG4gICAgQVtMb2NhbCBpZGVudGlmaWVyXS0tPkJbQWZmeV1cbiAgICBlbmRcbiAgICBcbiAgICBzdWJncmFwaCBCcmlkZ2VEYlxuICAgIENbQWZmeV0tLT5FbnRyZXpcbiAgICBlbmRcbiAgICBcbiAgICBzdWJncmFwaCBTY3JpcHRcbiAgICBEW0xvY2FsIGlkZW50aWZpZXJdIC0tPiBGW0FmZnldIC0tPiBFW0VudHJlel1cbiAgICBlbmRcbiAgICBcbiAgICBUU1YgLS0-IFNjcmlwdFxuICAgIEJyaWRnZURiIC0tPiBTY3JpcHQiLCJtZXJtYWlkIjp7InRoZW1lIjoiZGVmYXVsdCIsInRoZW1lVmFyaWFibGVzIjp7ImJhY2tncm91bmQiOiJ3aGl0ZSIsInByaW1hcnlDb2xvciI6IiNFQ0VDRkYiLCJzZWNvbmRhcnlDb2xvciI6IiNmZmZmZGUiLCJ0ZXJ0aWFyeUNvbG9yIjoiaHNsKDgwLCAxMDAlLCA5Ni4yNzQ1MDk4MDM5JSkiLCJwcmltYXJ5Qm9yZGVyQ29sb3IiOiJoc2woMjQwLCA2MCUsIDg2LjI3NDUwOTgwMzklKSIsInNlY29uZGFyeUJvcmRlckNvbG9yIjoiaHNsKDYwLCA2MCUsIDgzLjUyOTQxMTc2NDclKSIsInRlcnRpYXJ5Qm9yZGVyQ29sb3IiOiJoc2woODAsIDYwJSwgODYuMjc0NTA5ODAzOSUpIiwicHJpbWFyeVRleHRDb2xvciI6IiMxMzEzMDAiLCJzZWNvbmRhcnlUZXh0Q29sb3IiOiIjMDAwMDIxIiwidGVydGlhcnlUZXh0Q29sb3IiOiJyZ2IoOS41MDAwMDAwMDAxLCA5LjUwMDAwMDAwMDEsIDkuNTAwMDAwMDAwMSkiLCJsaW5lQ29sb3IiOiIjMzMzMzMzIiwidGV4dENvbG9yIjoiIzMzMyIsIm1haW5Ca2ciOiIjRUNFQ0ZGIiwic2Vjb25kQmtnIjoiI2ZmZmZkZSIsImJvcmRlcjEiOiIjOTM3MERCIiwiYm9yZGVyMiI6IiNhYWFhMzMiLCJhcnJvd2hlYWRDb2xvciI6IiMzMzMzMzMiLCJmb250RmFtaWx5IjoiXCJ0cmVidWNoZXQgbXNcIiwgdmVyZGFuYSwgYXJpYWwiLCJmb250U2l6ZSI6IjE2cHgiLCJsYWJlbEJhY2tncm91bmQiOiIjZThlOGU4Iiwibm9kZUJrZyI6IiNFQ0VDRkYiLCJub2RlQm9yZGVyIjoiIzkzNzBEQiIsImNsdXN0ZXJCa2ciOiIjZmZmZmRlIiwiY2x1c3RlckJvcmRlciI6IiNhYWFhMzMiLCJkZWZhdWx0TGlua0NvbG9yIjoiIzMzMzMzMyIsInRpdGxlQ29sb3IiOiIjMzMzIiwiZWRnZUxhYmVsQmFja2dyb3VuZCI6IiNlOGU4ZTgiLCJhY3RvckJvcmRlciI6ImhzbCgyNTkuNjI2MTY4MjI0MywgNTkuNzc2NTM2MzEyOCUsIDg3LjkwMTk2MDc4NDMlKSIsImFjdG9yQmtnIjoiI0VDRUNGRiIsImFjdG9yVGV4dENvbG9yIjoiYmxhY2siLCJhY3RvckxpbmVDb2xvciI6ImdyZXkiLCJzaWduYWxDb2xvciI6IiMzMzMiLCJzaWduYWxUZXh0Q29sb3IiOiIjMzMzIiwibGFiZWxCb3hCa2dDb2xvciI6IiNFQ0VDRkYiLCJsYWJlbEJveEJvcmRlckNvbG9yIjoiaHNsKDI1OS42MjYxNjgyMjQzLCA1OS43NzY1MzYzMTI4JSwgODcuOTAxOTYwNzg0MyUpIiwibGFiZWxUZXh0Q29sb3IiOiJibGFjayIsImxvb3BUZXh0Q29sb3IiOiJibGFjayIsIm5vdGVCb3JkZXJDb2xvciI6IiNhYWFhMzMiLCJub3RlQmtnQ29sb3IiOiIjZmZmNWFkIiwibm90ZVRleHRDb2xvciI6ImJsYWNrIiwiYWN0aXZhdGlvbkJvcmRlckNvbG9yIjoiIzY2NiIsImFjdGl2YXRpb25Ca2dDb2xvciI6IiNmNGY0ZjQiLCJzZXF1ZW5jZU51bWJlckNvbG9yIjoid2hpdGUiLCJzZWN0aW9uQmtnQ29sb3IiOiJyZ2JhKDEwMiwgMTAyLCAyNTUsIDAuNDkpIiwiYWx0U2VjdGlvbkJrZ0NvbG9yIjoid2hpdGUiLCJzZWN0aW9uQmtnQ29sb3IyIjoiI2ZmZjQwMCIsInRhc2tCb3JkZXJDb2xvciI6IiM1MzRmYmMiLCJ0YXNrQmtnQ29sb3IiOiIjOGE5MGRkIiwidGFza1RleHRMaWdodENvbG9yIjoid2hpdGUiLCJ0YXNrVGV4dENvbG9yIjoid2hpdGUiLCJ0YXNrVGV4dERhcmtDb2xvciI6ImJsYWNrIiwidGFza1RleHRPdXRzaWRlQ29sb3IiOiJibGFjayIsInRhc2tUZXh0Q2xpY2thYmxlQ29sb3IiOiIjMDAzMTYzIiwiYWN0aXZlVGFza0JvcmRlckNvbG9yIjoiIzUzNGZiYyIsImFjdGl2ZVRhc2tCa2dDb2xvciI6IiNiZmM3ZmYiLCJncmlkQ29sb3IiOiJsaWdodGdyZXkiLCJkb25lVGFza0JrZ0NvbG9yIjoibGlnaHRncmV5IiwiZG9uZVRhc2tCb3JkZXJDb2xvciI6ImdyZXkiLCJjcml0Qm9yZGVyQ29sb3IiOiIjZmY4ODg4IiwiY3JpdEJrZ0NvbG9yIjoicmVkIiwidG9kYXlMaW5lQ29sb3IiOiJyZWQiLCJsYWJlbENvbG9yIjoiYmxhY2siLCJlcnJvckJrZ0NvbG9yIjoiIzU1MjIyMiIsImVycm9yVGV4dENvbG9yIjoiIzU1MjIyMiIsImNsYXNzVGV4dCI6IiMxMzEzMDAiLCJmaWxsVHlwZTAiOiIjRUNFQ0ZGIiwiZmlsbFR5cGUxIjoiI2ZmZmZkZSIsImZpbGxUeXBlMiI6ImhzbCgzMDQsIDEwMCUsIDk2LjI3NDUwOTgwMzklKSIsImZpbGxUeXBlMyI6ImhzbCgxMjQsIDEwMCUsIDkzLjUyOTQxMTc2NDclKSIsImZpbGxUeXBlNCI6ImhzbCgxNzYsIDEwMCUsIDk2LjI3NDUwOTgwMzklKSIsImZpbGxUeXBlNSI6ImhzbCgtNCwgMTAwJSwgOTMuNTI5NDExNzY0NyUpIiwiZmlsbFR5cGU2IjoiaHNsKDgsIDEwMCUsIDk2LjI3NDUwOTgwMzklKSIsImZpbGxUeXBlNyI6ImhzbCgxODgsIDEwMCUsIDkzLjUyOTQxMTc2NDclKSJ9fSwidXBkYXRlRWRpdG9yIjpmYWxzZX0)

# Querying the WS

To query the Webservice we define below the url and the patterns for a single request and a batch request. You can find the docs [here](https://bridgedb.github.io/swagger/). We will use Python's requests library.

In [7]:
url = "https://webservice.bridgedb.org/"

In [8]:
single_request = url+"{org}/xrefs/{source}/{identifier}"

In [28]:
batch_request = url+"{org}/xrefsBatch/{source}{}"

In [10]:
import requests
import pandas as pd

Here we define a method that will turn the web service response into a dataframe with columns corresponding to:
* The original identifier
* The data source that the identifier is part of
* The mapped identifier
* The data source for the mapped identifier

In [11]:
def to_df(response, batch=False):
    if batch:
        records = []
        for tup in to_df(response).itertuples():
            if tup[3] != None:
                for mappings in tup[3].split(','):
                    target = mappings.split(':')
                    records.append((tup[1], tup[2], target[1], target[0]))
        return pd.DataFrame(records, columns = ['original', 'source', 'mapping', 'target'])
        
    return pd.DataFrame([line.split('\t') for line in response.text.split('\n')])

Here we define the organism and the data source from which we want to map

In [12]:
source = "X"
org = 'Zea%20mays'

# Case 1

Here we first load the case 1 example data.

In [13]:
case1 = pd.read_csv("data/case1-example.tsv", header=None)
case1.head()

Unnamed: 0,0
0,AFFX-Zm-ef1a-5_a_at
1,AFFX-Zm_Ubiquitin_M_f_at
2,AFFX-Zm_Ubiquitin_5_f_at


Then we batch request the mappings

In [35]:
response1 = requests.post(batch_request.format('?dataSource=En', org=org, source=source), data = case1.to_csv(index=False, header=False))

And use our `to_df` method to turn it into a DataFrame

In [36]:
case1_df = to_df(response1, batch=True)
case1_df

Unnamed: 0,original,source,mapping,target
0,AFFX-Zm-ef1a-5_a_at,Affy,Zm00001d037873,En
1,AFFX-Zm-ef1a-5_a_at,Affy,Zm00001d037877,En
2,AFFX-Zm-ef1a-5_a_at,Affy,Zm00001d037875,En
3,AFFX-Zm_Ubiquitin_M_f_at,Affy,Zm00001d053838,En
4,AFFX-Zm_Ubiquitin_5_f_at,Affy,Zm00001d053838,En


# Case 2
Here we first load the case 2 example data and perform the same steps as before

In [22]:
case2 = pd.read_csv('data/case2-example.tsv', sep='\t', names=['local', 'source'])
case2.head()

Unnamed: 0,local,source
0,1234,AFFX-Zm-ef1a-5_a_at
1,6789,AFFX-Zm_Ubiquitin_M_f_at
2,5555,AFFX-Zm_Ubiquitin_5_f_at


In [30]:
response2 = requests.post(batch_request.format('?dataSource=En', org=org, source=source), data = case2.source.to_csv(index=False, header=False))

In [32]:
mappings = to_df(response2, batch=True)
mappings.head()

Unnamed: 0,original,source,mapping,target
0,AFFX-Zm-ef1a-5_a_at,Affy,Zm00001d037873,En
1,AFFX-Zm-ef1a-5_a_at,Affy,Zm00001d037877,En
2,AFFX-Zm-ef1a-5_a_at,Affy,Zm00001d037875,En
3,AFFX-Zm_Ubiquitin_M_f_at,Affy,Zm00001d053838,En
4,AFFX-Zm_Ubiquitin_5_f_at,Affy,Zm00001d053838,En


After obtaining the mappings we join with the TSV file on the Affy identifier, obtaining the desired mapping by selecting the columns `mapping` and local

In [19]:
local_mapping = mappings.join(case2.set_index('source'), on='original')

In [21]:
local_mapping[['mapping', 'local']]

Unnamed: 0,mapping,local
0,A0A1D6M1H3,1234
1,A0A1D6M1H2,1234
2,A0A1D6M1J4,1234
3,GO,1234
4,A0A1D6M1J3,1234
...,...,...
74,Q41752,5555
75,Q41751,5555
76,XP_008645269.1,5555
77,K7UBK6,5555


# Using Script

In [1]:
from bridgedb_script import get_mappings

In [2]:
get_mappings("data/case2-example.tsv", "Zea mays", "X", case=2, target='En')

Unnamed: 0,original,source,mapping,target,local
0,AFFX-Zm-ef1a-5_a_at,Affy,Zm00001d037873,En,1234
1,AFFX-Zm-ef1a-5_a_at,Affy,Zm00001d037877,En,1234
2,AFFX-Zm-ef1a-5_a_at,Affy,Zm00001d037875,En,1234
3,AFFX-Zm_Ubiquitin_M_f_at,Affy,Zm00001d053838,En,6789
4,AFFX-Zm_Ubiquitin_5_f_at,Affy,Zm00001d053838,En,5555


In [3]:
get_mappings("data/case1-example.tsv", "Zea mays", "X", case=1)

Unnamed: 0,original,source,mapping,target
0,AFFX-Zm-ef1a-5_a_at,Affy,A0A1D6M1H3,S
1,AFFX-Zm-ef1a-5_a_at,Affy,A0A1D6M1H2,S
2,AFFX-Zm-ef1a-5_a_at,Affy,A0A1D6M1J4,S
3,AFFX-Zm-ef1a-5_a_at,Affy,GO,T
4,AFFX-Zm-ef1a-5_a_at,Affy,A0A1D6M1J3,S
...,...,...,...,...
74,AFFX-Zm_Ubiquitin_5_f_at,Affy,Q41752,S
75,AFFX-Zm_Ubiquitin_5_f_at,Affy,Q41751,S
76,AFFX-Zm_Ubiquitin_5_f_at,Affy,XP_008645269.1,Q
77,AFFX-Zm_Ubiquitin_5_f_at,Affy,K7UBK6,S
