 <img src="SIFTS.png" style="width:500px"> 

# Mapping Pharos Receptors to PDB with SIFTS

Structure Integration with Function, Taxonomy and Sequence (SIFTS), provides mappings between Uniprot and  PDB, as well as annotations from IntEnz, GO, InterPro, Pfam, CATH, SCOP, PubMed, Ensembl and Homologene resources.

In [2]:
import pandas as pd

Read Pharos data for receptor classes, containing UniProt ids for each receptor.

In [9]:
protein_kinase = pd.read_csv("data/Kinase/protein_kinase/export-target-325784f280/targets.csv", index_col=False)
non_protein_kinase = pd.read_csv("data/Kinase/non_protein_kinase/export-target-f07d061d60/targets.csv", index_col=False)
AGC = pd.read_csv("data/Kinase/AGC/export-target-da568e3060/targets.csv", index_col=False)
CAMK = pd.read_csv("data/Kinase/CAMK/export-target-f6584830c8/targets.csv", index_col=False)
CMGC = pd.read_csv("data/Kinase/CMGC/export-target-0cd37e2c75/targets.csv", index_col=False)
STE = pd.read_csv("data/Kinase/STE/export-target-40303ebc87/targets.csv", index_col=False)
TK = pd.read_csv("data/Kinase/TK/export-target-c9067101a0/targets.csv", index_col=False)
TKL = pd.read_csv("data/Kinase/TKL/export-target-aef08ae137/targets.csv", index_col=False)

Read SIFTS Uniprot to PDB mappings.

In [10]:
uniprot_to_pdb = pd.read_csv("data/uniprot_pdb.csv", skiprows=1)

In [11]:
uniprot_to_pdb.head()

Unnamed: 0,SP_PRIMARY,PDB
0,A0A010,5b00;5b01;5b02;5b03;5b0i;5b0j;5b0k;5b0l;5b0m;5...
1,A0A011,3vk5;3vka;3vkb;3vkc;3vkd
2,A0A014C6J9,6br7
3,A0A016UNP9,2md0
4,A0A023GPI4,2m6j


Find the PDB ids for each receptor, if available.

In [12]:
def find_pdbs(df):
    """ Input: Data Frame of Pharos data.
        Output: List of PDB IDs. """
    IDS = []
    for i in range(len(df)):
        pdb_ids = None
        uniprot_id = df.loc[:, "Uniprot ID"][i]
        mapping = uniprot_to_pdb[uniprot_to_pdb.SP_PRIMARY == uniprot_id]
        if len(mapping) != 0:
            pdb_ids = mapping.PDB.iloc[0].split(';')
        IDS.append(pdb_ids)
    return IDS

Add PDBs to Data Frame.

In [13]:
for df in [protein_kinase, non_protein_kinase, AGC, CAMK,
           CMGC, STE, TK, TKL]:
    df['PDB_IDS'] = find_pdbs(df)

Number of receptors in each class with at least one structure in the Protein Data Bank:

In [14]:
for df in [protein_kinase, non_protein_kinase, AGC, CAMK,
           CMGC, STE, TK, TKL]:
    print(len(df) - sum(df.PDB_IDS.isna()))

290
27
42
46
40
30
74
34
