# Antibody mapping curation

- we have a bunch of antibodies that supposedly target phosphorylated amino acid residues  at given coordinates in particular genes
- we will verify that the given amino acid coordinates exist in each target gene

### How?
- if the amino acid coordinates do not match, we will check the post-translational modifcations
    - if there is a phosphorylated residue of the same amino acid nearby, we will assign that residue as the correct phosphorylation site
    - if such a residue is too far, we may assign the phosphorylation site to the nearest residue of the same amino acid

In [1]:
from indra.databases import uniprot_client as uc

In [2]:
def site_check(up, residue, coord):
    seq = uc.get_sequence(up)
    seq = " " + seq
    n = coord
    print('http://www.uniprot.org/uniprot/' + up)
    print('name ', uc.get_gene_name(up))
    print('length ', len(seq))
    start_string = n-20
    if start_string < 0:
        start_string = 0
    print(seq[start_string:n], seq[n], seq[n+1:n+21])
    if residue == seq[n]:
        print("ok.")
    else:
        print("X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X")

---
# MAP2K1

In [3]:
site_check('Q02750-1', 'S', 218)

http://www.uniprot.org/uniprot/Q02750-1
name  MAP2K1
length  394
VNSRGEIKLCDFGVSGQLID S MANSFVGTRSYMSPERLQGT
ok.


---
# MAP2K2

In [4]:
site_check('P36507-1', 'S', 222)

http://www.uniprot.org/uniprot/P36507-1
name  MAP2K2
length  401
VNSRGEIKLCDFGVSGQLID S MANSFVGTRSYMAPERLQGT
ok.


---
# MAPK1

In [5]:
site_check('P28482-1', 'T', 202)

http://www.uniprot.org/uniprot/P28482-1
name  MAPK1
length  361
GFLTEYVATRWYRAPEIMLN S KGYTKSIDIWSVGCILAEML
X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X


In [6]:
site_check('P28482-1', 'Y', 204)

http://www.uniprot.org/uniprot/P28482-1
name  MAPK1
length  361
LTEYVATRWYRAPEIMLNSK G YTKSIDIWSVGCILAEMLSN
X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X


### check PTM section
- T185 Y187 pair found

In [7]:
site_check('P28482-1', 'T', 185)

http://www.uniprot.org/uniprot/P28482-1
name  MAPK1
length  361
ICDFGLARVADPDHDHTGFL T EYVATRWYRAPEIMLNSKGY
ok.


In [8]:
site_check('P28482-1', 'Y', 187)

http://www.uniprot.org/uniprot/P28482-1
name  MAPK1
length  361
DFGLARVADPDHDHTGFLTE Y VATRWYRAPEIMLNSKGYTK
ok.


---
# MAPK3

In [9]:
site_check('P27361-1', 'T', 185)

http://www.uniprot.org/uniprot/P27361-1
name  MAPK3
length  380
RDLKPSNLLINTTCDLKICD F GLARIADPEHDHTGFLTEYV
X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X


In [10]:
site_check('P27361-1', 'Y', 187)

http://www.uniprot.org/uniprot/P27361-1
name  MAPK3
length  380
LKPSNLLINTTCDLKICDFG L ARIADPEHDHTGFLTEYVAT
X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X


### check PTM section
- T202 Y204 pair found
- There is also a T198, but this does not match the spacing
- A pair that matches spacing at T202 and Y204
- Supported by phosphosite

In [11]:
site_check('P27361-1', 'T', 202)

http://www.uniprot.org/uniprot/P27361-1
name  MAPK3
length  380
ICDFGLARIADPEHDHTGFL T EYVATRWYRAPEIMLNSKGY
ok.


In [12]:
site_check('P27361-1', 'Y', 204)

http://www.uniprot.org/uniprot/P27361-1
name  MAPK3
length  380
DFGLARIADPEHDHTGFLTE Y VATRWYRAPEIMLNSKGYTK
ok.


---
# RPS6KA1

In [13]:
site_check('Q15418-1', 'S', 380)

http://www.uniprot.org/uniprot/Q15418-1
name  RPS6KA1
length  736
PKDSPGIPPSAGAHQLFRGF S FVATGLMEDDGKPRAPQAPL
ok.


In [14]:
site_check('Q15418-1', 'T', 573)

http://www.uniprot.org/uniprot/Q15418-1
name  RPS6KA1
length  736
LRICDFGFAKQLRAENGLLM T PCYTANFVAPEVLKRQGYDE
ok.


---
# AKT1

In [15]:
site_check('P31749-1', 'T', 308)

http://www.uniprot.org/uniprot/P31749-1
name  AKT1
length  481
IKITDFGLCKEGIKDGATMK T FCGTPEYLAPEVLEDNDYGR
ok.


In [16]:
site_check('P31749-1', 'S', 473)

http://www.uniprot.org/uniprot/P31749-1
name  AKT1
length  481
DQDDSMECVDSERRPHFPQF S YSASGTA
ok.


---
# AKT2

In [17]:
site_check('P31751-1', 'S', 474)

http://www.uniprot.org/uniprot/P31751-1
name  AKT2
length  482
DRYDSLGLLELDQRTHFPQF S YSASIRE
ok.


---
# AKT3

In [18]:
site_check('Q9Y243-1', 'S', 475)

http://www.uniprot.org/uniprot/Q9Y243-1
name  AKT3
length  480
DGMDCMDNERRPHFPQFSYS A SGRE
X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X


### check PTM
- phosphoserine only listed at 472
- nearest one

In [19]:
site_check('Q9Y243-1', 'S', 472)

http://www.uniprot.org/uniprot/Q9Y243-1
name  AKT3
length  480
YDEDGMDCMDNERRPHFPQF S YSASGRE
ok.


---
# MTOR

In [20]:
site_check('P42345-1', 'S', 2448)

http://www.uniprot.org/uniprot/P42345-1
name  MTOR
length  2550
NWRLMDTNTKGNKRSRTRTD S YSAGQSVEILDGVELGEPAH
ok.


---
# RPS6KB1

In [21]:
site_check('P23443-1', 'T', 389)

http://www.uniprot.org/uniprot/P23443-1
name  RPS6KB1
length  526
FKPLLQSEEDVSQFDSKFTR Q TPVDSPDDSTLSESANQVFL
X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X


### check PTM
- phosphothreonines at positions 252, 412, 444
- there is nothing near 389 listed on UniProt
- should be 412 as per Phosphosite and curated site map

In [22]:
site_check('P23443-1', 'T', 412)

http://www.uniprot.org/uniprot/P23443-1
name  RPS6KB1
length  526
VDSPDDSTLSESANQVFLGF T YVAPSVLESVKEKFSFEPKI
ok.


- Next is the pair of **T421/T424**

In [23]:
site_check('P23443-1', 'T', 421)

http://www.uniprot.org/uniprot/P23443-1
name  RPS6KB1
length  526
SESANQVFLGFTYVAPSVLE S VKEKFSFEPKIRSPRRFIGS
X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X


In [24]:
site_check('P23443-1', 'S', 424)

http://www.uniprot.org/uniprot/P23443-1
name  RPS6KB1
length  526
ANQVFLGFTYVAPSVLESVK E KFSFEPKIRSPRRFIGSPRT
X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X


### check PTM
- there is T444 and S447, also annotated as correct by Phosphosite

In [25]:
site_check('P23443-1', 'T', 444)

http://www.uniprot.org/uniprot/P23443-1
name  RPS6KB1
length  526
EKFSFEPKIRSPRRFIGSPR T PVSPVKFSPGDFWGRGASAS
ok.


In [26]:
site_check('P23443-1', 'S', 447)

http://www.uniprot.org/uniprot/P23443-1
name  RPS6KB1
length  526
SFEPKIRSPRRFIGSPRTPV S PVKFSPGDFWGRGASASTAN
ok.


---
# RPS6

In [27]:
site_check('P62753-1', 'S', 235)

http://www.uniprot.org/uniprot/P62753-1
name  RPS6
length  250
KRMKEAKEKRQEQIAKRRRL S SLRASTSKSESSQK
ok.


In [28]:
site_check('P62753-1', 'S', 236)

http://www.uniprot.org/uniprot/P62753-1
name  RPS6
length  250
RMKEAKEKRQEQIAKRRRLS S LRASTSKSESSQK
ok.


---
# PRKAA1

In [29]:
site_check('Q13131-1', 'T', 172)

http://www.uniprot.org/uniprot/Q13131-1
name  PRKAA1
length  560
KPENVLLDAHMNAKIADFGL S NMMSDGEFLRTSCGSPNYAA
X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X


### check PTM
- phosphothreonine in position 23, 183, 269
- nearest phosphothreonine is at 183, also supported by curated site map and Phosphosite

In [30]:
site_check('Q13131-1', 'T', 183)

http://www.uniprot.org/uniprot/Q13131-1
name  PRKAA1
length  560
NAKIADFGLSNMMSDGEFLR T SCGSPNYAAPEVISGRLYAG
ok.


---
# MAPK8

In [31]:
site_check('P45983-1', 'T', 183)

http://www.uniprot.org/uniprot/P45983-1
name  MAPK8
length  428
CTLKILDFGLARTAGTSFMM T PYVVTRYYRAPEVILGMGYK
ok.


In [32]:
site_check('P45983-1', 'Y', 185)

http://www.uniprot.org/uniprot/P45983-1
name  MAPK8
length  428
LKILDFGLARTAGTSFMMTP Y VVTRYYRAPEVILGMGYKEN
ok.


---
# MAPK9

In [33]:
site_check('P45984-1', 'T', 183)

http://www.uniprot.org/uniprot/P45984-1
name  MAPK9
length  425
CTLKILDFGLARTACTNFMM T PYVVTRYYRAPEVILGMGYK
ok.


In [34]:
site_check('P45984-1', 'Y', 185)

http://www.uniprot.org/uniprot/P45984-1
name  MAPK9
length  425
LKILDFGLARTACTNFMMTP Y VVTRYYRAPEVILGMGYKEN
ok.


---
# MAPK10

In [35]:
site_check('P53779-1', 'T', 183)

http://www.uniprot.org/uniprot/P53779-1
name  MAPK10
length  465
HERMSYLLYQMLCGIKHLHS A GIIHRDLKPSNIVVKSDCTL
X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X


In [36]:
site_check('P53779-1', 'Y', 185)

http://www.uniprot.org/uniprot/P53779-1
name  MAPK10
length  465
RMSYLLYQMLCGIKHLHSAG I IHRDLKPSNIVVKSDCTLKI
X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X


### check PTM
- T221, Y223 pair found with correct spacing

In [37]:
site_check('P53779-1', 'T', 221)

http://www.uniprot.org/uniprot/P53779-1
name  MAPK10
length  465
CTLKILDFGLARTAGTSFMM T PYVVTRYYRAPEVILGMGYK
ok.


In [38]:
site_check('P53779-1', 'Y', 223)

http://www.uniprot.org/uniprot/P53779-1
name  MAPK10
length  465
LKILDFGLARTAGTSFMMTP Y VVTRYYRAPEVILGMGYKEN
ok.


---
# JUN

In [39]:
site_check('P05412-1', 'S', 63)

http://www.uniprot.org/uniprot/P05412-1
name  JUN
length  332
ADPVGSLKPHLRAKNSDLLT S PDVGLLKLASPELERLIIQS
ok.


---
# MAPK14

In [40]:
site_check('Q16539-1', 'T', 180)

http://www.uniprot.org/uniprot/Q16539-1
name  MAPK14
length  361
EDCELKILDFGLARHTDDEM T GYVATRWYRAPEIMLNWMHY
ok.


---
# HSPB1

In [41]:
site_check('P04792-1', 'S', 82)

http://www.uniprot.org/uniprot/P04792-1
name  HSPB1
length  206
AIESPAVAAPAYSRALSRQL S SGVSEIRHTADRWRVSLDVN
ok.


---
# RELA

In [42]:
site_check('Q04206-1', 'S', 536)

http://www.uniprot.org/uniprot/Q04206-1
name  RELA
length  552
APLGAPGLPNGLLSGDEDFS S IADMDFSALLSQISS
ok.


---
# HIST1H3A-J

In [43]:
site_check('P68431-1', 'S', 10)

http://www.uniprot.org/uniprot/P68431-1
name  HIST1H3H
length  137
 MARTKQTAR K STGGKAPRKQLATKAARKSA
X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X


### check PTM
- phosphoserine at position 11
- looks like an off by one

In [44]:
site_check('P68431-1', 'S', 11)

http://www.uniprot.org/uniprot/P68431-1
name  HIST1H3H
length  137
 MARTKQTARK S TGGKAPRKQLATKAARKSAP
ok.


---
# HIST2H3A,C,D

In [45]:
site_check('Q71DI3-1', 'S', 10)

http://www.uniprot.org/uniprot/Q71DI3-1
name  HIST2H3D
length  137
 MARTKQTAR K STGGKAPRKQLATKAARKSA
X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X


### check PTM
- phosphoserine at position 11
- looks like an off by one

In [46]:
site_check('Q71DI3-1', 'S', 11)

http://www.uniprot.org/uniprot/Q71DI3-1
name  HIST2H3D
length  137
 MARTKQTARK S TGGKAPRKQLATKAARKSAP
ok.


---
# H3F3A,B

In [47]:
site_check('P84243-1', 'S', 10)

http://www.uniprot.org/uniprot/P84243-1
name  H3F3B
length  137
 MARTKQTAR K STGGKAPRKQLATKAARKSA
X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X


### check PTM
- phosphoserine at position 11
- looks like an off by one

In [48]:
site_check('P84243-1', 'S', 11)

http://www.uniprot.org/uniprot/P84243-1
name  H3F3B
length  137
 MARTKQTARK S TGGKAPRKQLATKAARKSAP
ok.
