# Required Python library

First we import rdflib which is a well known python library that gives RDF and its query language support to Python2 and 3


In [1]:
import sys
from rdflib import *

from SPARQLWrapper import SPARQLWrapper, JSON

# Entry identifiers and dates

The UniProt entry has a [primary accession](https://www.uniprot.org/help/accession_numbers) this is the best way to access the entry. In the RDF format the primary accession is part of the IRI identifying the entry.

In [2]:
entry=Graph().parse(format='ttl',
                     data="""
base <http://purl.uniprot.org/uniprot/>  
prefix up: <http://purl.uniprot.org/core/>
prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> 
prefix isoform:<http://purl.uniprot.org/isoforms/>
<O22340>
  rdf:type up:Protein ;
  up:reviewed true ;
  up:created "2001-10-24"^^<xsd:date> ;
  up:modified "2015-04-01"^^<xsd:date> ;
  up:version 86 ;
  up:mnemonic "TPSDA_ABIGR" ;
  up:oldMnemonic
    "TPSD3_ABIGR" ,
    "TSD3_ABIGR" ;
  up:replaces <Q94FV9> ;
  up:sequence isoform:O22340-1 .
isoform:O22340-1
  rdf:type up:Simple_Sequence ;
  up:modified "1998-01-01"^^<xsd:date> ;
  up:version 1 .""")


### Extracting a primaryAccession from a IRI

This is easy enough with some string manipulation. While UniProt primary accession are unique within UniProtKB they may be reused by accident or itentionally by other datasources. If we provided them as strings and you used them in a query that way you might accidentaly retrieve completly wrong records.

In [17]:
qres=entry.query("""
PREFIX up: <http://purl.uniprot.org/core/> 
PREFIX uniprotkb: <http://purl.uniprot.org/uniprot/> 
SELECT 
    ?primaryAccession
WHERE {
  ?protein a up:Protein .
  BIND(substr(str(?protein), strlen(str(uniprotkb:))+1) AS ?primaryAccession)
}""")

for row in qres:
    print("%s is the PrimaryAccession" % row)

O22340 is the PrimaryAccession


### [Entry Name](https://www.uniprot.org/help/entry_name)

The RDF format stores the name in the property `mnemonic` and, for convenience reasons, lists also obsolete entry names with `oldMnemonic` properties.

In [18]:
qres=entry.query("""
PREFIX up: <http://purl.uniprot.org/core/> 
PREFIX uniprotkb: <http://purl.uniprot.org/uniprot/> 
SELECT 
  ?protein ?mnemonic
WHERE {
  ?protein a up:Protein ;
      up:mnemonic ?mnemonic.
}""")

for row in qres:
    print("%s is also known as %s" % row)

http://purl.uniprot.org/uniprot/O22340 is also known as TPSDA_ABIGR


In [21]:
qres=entry.query("""
PREFIX up: <http://purl.uniprot.org/core/> 
PREFIX uniprotkb: <http://purl.uniprot.org/uniprot/> 
SELECT 
  ?protein (GROUP_CONCAT(?oldMnemonic; separator=" and ") AS ?oldMnemonics)
WHERE {
  ?protein a up:Protein ;
      up:oldMnemonic ?oldMnemonic.
} GROUP BY ?protein
""")

for row in qres:
    print("%s used to be known as %s" % row)

http://purl.uniprot.org/uniprot/O22340 used to be known as TPSD3_ABIGR and TSD3_ABIGR


### Swiss-Prot (reviewed) or TrEMBL (unreviewed)

In [22]:
qres=entry.query('prefix up: <http://purl.uniprot.org/core/> \
SELECT \
    ?protein \
    ?isEntryAnSwissProtEntry \
WHERE {\
  ?protein a up:Protein . \
  ?protein up:reviewed ?isEntryAnSwissProtEntry . \
}')

for row in qres:
    print("%s is an Swiss-Prot entry %s" % row)

http://purl.uniprot.org/uniprot/O22340 is an Swiss-Prot entry true


In the UniProt RDF entries that are part of Swiss-Prot have a property reviewed set to true, while the entries in TrEMBL have the reviewed property set to false.

### Dates and versions

We stores the date when an entry was integrated into UniProtKB in the `created` property and the last modification date of the entry and its current version in the `modified` and `version` properties of the entry. The last modification date of the sequence and its current version are displayed in the `modified` and `version` properties of the `sequence` element/subject.We make use of the international standard date [notation](http://www.w3.org/QA/Tips/iso-date)

In [24]:
qres=entry.query("""prefix up: <http://purl.uniprot.org/core/>
SELECT
    ?protein 
    ?modified
    ?version
WHERE {\
  ?protein a up:Protein ;
    up:modified ?modified ;
    up:version ?version .
}""")

for row in qres:
    print("%s was modified on %s and is at version %s" % row)

http://purl.uniprot.org/uniprot/O22340 was modified on 2015-04-01 and is at version 86
