# Required Python library

First we import rdflib which is a well known python library that gives RDF and its query language support to Python2 and 3


In [1]:
import sys
from rdflib import *

from SPARQLWrapper import SPARQLWrapper, JSON

# Protein Names

Protein names are modeled as name resources in the RDF format. There are 3 main types of protein names:

 1. The name recommended by the UniProt consortium: linked to a recommendedName element/property in the RDF format.
 2. Names provided by the submitter of the underlying nucleotide sequence (in UniProtKB/TrEMBL only):
    Shown in submittedName elements/properties in the RDF format.
 3. Alternative names:
    Shown in alternativeName elements/properties in the RDF format.

These types are further categorized into:

 1. Full name:
    Shown in a fullName element/property in the RDF format.
 2. Abbreviations or acronyms of the full name:
    Shown in shortName elements/properties in the RDF format.

There are furthermore a few categories with more specific meanings:

  1. Name of an allergen:
     Shown in an allergenName element/property in the RDF format.
  2. Names of CD antigens:
     Shown in CdAntigenName elements/properties in the RDF format.
  3. Name used in a biotechnological context:
     Shown in a biotechName element/property in the RDF format.
  4. International nonproprietary names:
     Shown in innName elements/properties in the RDF format.
  5. Enzyme Commission EC numbers.
     This links a name with the EC number that clasifies the enzymatic activity of this entry.
     See the notebook enzymes for more details.


In [2]:
entry=Graph().parse(format='ttl',
                     data="""
base <http://purl.uniprot.org/uniprot/>  
prefix up: <http://purl.uniprot.org/core/>
prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> 
prefix isoform:<http://purl.uniprot.org/isoforms/>
prefix enzyme: <http://purl.uniprot.org/enzyme/>

<P12820>
  a up:Protein ;
  up:recommendedName
    <P12820#SIP30A> ;
  up:alternativeName
    <P12820#SIP62B> ,
    <P12820#SIPE4F> ,
    <P12820#SIPFE1> ;
  up:enzyme
    enzyme:3.2.1.- ,
    enzyme:3.4.15.1 ;
  up:sequence
    isoform:P12820-1 .

<P12820#SIP30A>
  rdf:type up:Structured_Name ;
  up:fullName "Angiotensin-converting enzyme" ;
  up:shortName "ACE" ;
  up:ecName "3.2.1.-" ,
    "3.4.15.1" .

<P12820#SIP62B>
  rdf:type up:Structured_Name ;
  up:fullName "Dipeptidyl carboxypeptidase I" .

<P12820#SIPE4F>
  rdf:type up:Structured_Name ;
  up:fullName "Kininase II" .

<P12820#SIPFE1>
  rdf:type up:Structured_Name ;
  up:cdAntigenName "CD143" .

isoform:P12820-1
  rdf:type up:Simple_Sequence ;
  up:precursor true ;
  up:fragment "single" .""")


### Selecting a recommended full name

In [3]:
qres=entry.query("""
PREFIX up: <http://purl.uniprot.org/core/> 
PREFIX uniprotkb: <http://purl.uniprot.org/uniprot/> 
SELECT 
    ?protein 
    ?fullName
WHERE {
    ?protein a up:Protein ;
      up:recommendedName ?recommendedName .
    ?recommendedName up:fullName ?fullName .
}""")
#for t in entry:
#    print(t[0], t[1] ,t[2])
for row in qres:
    print("UniProt recommends that %s is called %s" % row)

UniProt recommends that http://purl.uniprot.org/uniprot/P12820 is called Angiotensin-converting enzyme


### Alternative names

In [6]:
qres=entry.query("""
PREFIX up: <http://purl.uniprot.org/core/> 
PREFIX uniprotkb: <http://purl.uniprot.org/uniprot/> 
SELECT 
    ?protein 
    ?fullName
WHERE {
    ?protein a up:Protein ;
      up:alternativeName ?alternativeName .
    ?alternativeName up:fullName ?fullName .
}""")

for row in qres:
    print("%s is also known as %s" % row)

http://purl.uniprot.org/uniprot/P12820 is also known as Dipeptidyl carboxypeptidase I
http://purl.uniprot.org/uniprot/P12820 is also known as Kininase II
