<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Basic-information" data-toc-modified-id="Basic-information-1">Basic information</a></span></li><li><span><a href="#Required-Python-libraries" data-toc-modified-id="Required-Python-libraries-2">Required Python libraries</a></span><ul class="toc-item"><li><span><a href="#Entry-identifier" data-toc-modified-id="Entry-identifier-2.1">Entry identifier</a></span><ul class="toc-item"><li><span><a href="#Extracting-a-primaryAccession-from-a-IRI" data-toc-modified-id="Extracting-a-primaryAccession-from-a-IRI-2.1.1">Extracting a primaryAccession from a IRI</a></span></li></ul></li><li><span><a href="#UniProt-entry-name-(mnemonic)" data-toc-modified-id="UniProt-entry-name-(mnemonic)-2.2">UniProt entry name (mnemonic)</a></span><ul class="toc-item"><li><span><a href="#Old-mnemonics" data-toc-modified-id="Old-mnemonics-2.2.1">Old mnemonics</a></span></li></ul></li><li><span><a href="#Entry-status" data-toc-modified-id="Entry-status-2.3">Entry status</a></span></li><li><span><a href="#Dates-and-versions" data-toc-modified-id="Dates-and-versions-2.4">Dates and versions</a></span></li></ul></li></ul></div>

# Basic information

This notebook aims to show you basic informations of UniProtKB entries:  
- identifier  
- date  
- names 

# Required Python libraries

If you are not familiar with **RDFlib** and **SPARQLWrapper** libraries, please read `00_introduction.ipynb` first. 

In [1]:
from rdflib import Graph
from SPARQLWrapper import SPARQLWrapper, JSON

## Entry identifier

Each UniProt entry is identified by a [primary accession](https://www.uniprot.org/help/accession_numbers).  
This is the best way to access an entry.  
In the RDF format the primary accession is part of the IRI identifying an entry.  

In [2]:
entry=Graph().parse(format='ttl',
                     data="""
base <http://purl.uniprot.org/uniprot/>  
prefix up: <http://purl.uniprot.org/core/>
prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> 
prefix isoform:<http://purl.uniprot.org/isoforms/>
<O22340> rdf:type up:Protein ;
         up:reviewed true ;
         up:created "2001-10-24"^^<xsd:date> ;
         up:modified "2015-04-01"^^<xsd:date> ;
         up:version 86 ;
         up:mnemonic "TPSDA_ABIGR" ;
         up:oldMnemonic "TPSD3_ABIGR" ,
                        "TSD3_ABIGR" ;
         up:replaces <Q94FV9> ;
         up:sequence isoform:O22340-1 .
isoform:O22340-1 rdf:type up:Simple_Sequence ;
                 up:modified "1998-01-01"^^<xsd:date> ;
                 up:version 1 .""")

In [11]:
qres=entry.query("""
PREFIX up: <http://purl.uniprot.org/core/> 
PREFIX uniprotkb: <http://purl.uniprot.org/uniprot/> 
SELECT ?protein
WHERE {
  ?protein a up:Protein .
}""")

for row in qres:
    print("UniProt entry URI = %s" % row)

UniProt entry URI = http://purl.uniprot.org/uniprot/O22340


### Extracting a primaryAccession from a IRI

This is easy enough with some string manipulation.  
While UniProt primary accession are unique within UniProtKB they may be reused by accident or itentionally by other data sources. If we provided them as strings (not URI) and if you used them in a query that way, you might accidentaly retrieve completly wrong records.  


In [3]:
qres=entry.query("""
PREFIX up: <http://purl.uniprot.org/core/> 
PREFIX uniprotkb: <http://purl.uniprot.org/uniprot/> 
SELECT ?primaryAccession
       ?protein
WHERE {
  ?protein a up:Protein .
  BIND(substr(str(?protein), strlen(str(uniprotkb:))+1) AS ?primaryAccession)
}""")

for row in qres:
    print("'%s' is the PrimaryAccession of %s" % row)

'O22340' is the PrimaryAccession of http://purl.uniprot.org/uniprot/O22340


## UniProt entry name (mnemonic)

The UniProtKB/Swiss-Prot **entry name** consists of up to 11 uppercase alphanumeric characters with a naming convention that can be symbolized as **X_Y**, where:  

- **X** is a mnemonic protein identification code of at most 5 alphanumeric characters  
- The **'_'** sign serves as a separator
- **Y** is a mnemonic species identification code of at most 5 alphanumeric characters

The mnemonic code **X** is an abbreviation of the protein/gene name, which does not necessarily correspond to the recommended protein name or to the gene name.  

See more details on [Entry Name](https://www.uniprot.org/help/entry_name) UniProt documentation 

The RDF format stores the **entry name** in the property `mnemonic` and, for convenience reasons, lists also obsolete entry names with `oldMnemonic` properties.

In [4]:
qres=entry.query("""
PREFIX up: <http://purl.uniprot.org/core/> 
PREFIX uniprotkb: <http://purl.uniprot.org/uniprot/> 
SELECT 
  ?protein ?mnemonic
WHERE {
  ?protein a up:Protein ;
      up:mnemonic ?mnemonic.
}""")

for row in qres:
    print("The entry name of %s is %s" % row)

The entry name of http://purl.uniprot.org/uniprot/O22340 is TPSDA_ABIGR


### Old mnemonics

In [5]:
qres=entry.query("""
PREFIX up: <http://purl.uniprot.org/core/> 
PREFIX uniprotkb: <http://purl.uniprot.org/uniprot/> 
SELECT 
  ?protein (GROUP_CONCAT(?oldMnemonic; separator=" and ") AS ?oldMnemonics)
WHERE {
  ?protein a up:Protein ;
      up:oldMnemonic ?oldMnemonic.
} GROUP BY ?protein
""")

for row in qres:
    print("%s used to be known as %s" % row)

http://purl.uniprot.org/uniprot/O22340 used to be known as TPSD3_ABIGR and TSD3_ABIGR


## Entry status 

UniProtKB has two sections:  
- UniProtKB/Swiss-Prot: entries that have been manually annotated and reviewed by UniProtKB biocurators 
- UniProtKB/TrEMBL: entries that have been annotated using annotation pipelines 

The RDF format stores the **entry status** in the property `reviewed`.

In [6]:
sp_entry=Graph().parse(format='ttl',
                     data="""
base <http://purl.uniprot.org/uniprot/>  
prefix up: <http://purl.uniprot.org/core/>
prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> 
prefix isoform:<http://purl.uniprot.org/isoforms/>
<O22340> rdf:type up:Protein ;
         up:reviewed true ;
         up:created "2001-10-24"^^<xsd:date> ;
         up:modified "2015-04-01"^^<xsd:date> ;
         up:version 86 ;
         up:mnemonic "TPSDA_ABIGR" .
""")

qres=sp_entry.query("""
PREFIX up: <http://purl.uniprot.org/core/>
SELECT ?protein
       ?entryName 
       ?reviewed
WHERE {
  ?protein a up:Protein . 
  ?protein up:mnemonic ?entryName . 
  ?protein up:reviewed ?reviewed . 
}""" )

for row in qres:
    print("%s (%s) is a reviewed (Swiss-Prot) entry? %s" % row)

http://purl.uniprot.org/uniprot/O22340 (TPSDA_ABIGR) is a reviewed (Swiss-Prot) entry? true


In [7]:
tr_entry=Graph().parse(format='ttl',
                         data="""base <http://purl.uniprot.org/uniprot/>  
prefix up: <http://purl.uniprot.org/core/> 
prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>  
prefix xsd: <http://www.w3.org/2001/XMLSchema#> 

<A0A024R563> rdf:type up:Protein ;
             up:reviewed false ;
             up:created "2014-07-09"^^xsd:date ;
             up:modified "2020-10-07"^^xsd:date ;
             up:version 30 ;
             up:mnemonic "A0A024R563_HUMAN" ."""
)

qres=tr_entry.query("""
PREFIX up: <http://purl.uniprot.org/core/>
SELECT ?protein
       ?entryName 
       ?reviewed
WHERE {
  ?protein a up:Protein . 
  ?protein up:mnemonic ?entryName . 
  ?protein up:reviewed ?reviewed . 
}""" )

for row in qres:
    print("%s (%s) is a reviewed (Swiss-Prot) entry? %s" % row)

http://purl.uniprot.org/uniprot/A0A024R563 (A0A024R563_HUMAN) is a reviewed (Swiss-Prot) entry? false


In [10]:
tr_entry=Graph().parse(format='ttl',
                     data="""base <http://purl.uniprot.org/uniprot/>  
prefix up: <http://purl.uniprot.org/core/>
prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> 
prefix xsd: <http://www.w3.org/2001/XMLSchema#> 

<A0A024R563> rdf:type up:Protein ;
             up:reviewed false ;
             up:created "2014-07-09"^^xsd:date ;
             up:modified "2020-10-07"^^xsd:date ;
             up:version 30 ;
             up:mnemonic "A0A024R563_HUMAN" ."""
)

qres=tr_entry.query("""
PREFIX up: <http://purl.uniprot.org/core/>
SELECT ?protein
       ?entryName 
       ?reviewed
WHERE {
  ?protein a up:Protein . 
  ?protein up:mnemonic ?entryName . 
  ?protein up:reviewed ?reviewed . 
}""" )

for row in qres:
    print("%s (%s) is a reviewed (Swiss-Prot) entry? %s" % row)

http://purl.uniprot.org/uniprot/A0A024R563 (A0A024R563_HUMAN) is a reviewed (Swiss-Prot) entry? false


## Dates and versions

We stores the date when an entry was integrated into UniProtKB in the `created` property and the last modification date of the entry and its current version in the `modified` and `version` properties of the entry. The last modification date of the sequence and its current version are displayed in the `modified` and `version` properties of the `sequence` element/subject.We make use of the international standard date [notation](http://www.w3.org/QA/Tips/iso-date)

In [9]:
entry=Graph().parse(format='ttl',
                     data="""
base <http://purl.uniprot.org/uniprot/>  
prefix up: <http://purl.uniprot.org/core/>
prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> 
prefix isoform:<http://purl.uniprot.org/isoforms/>
<O22340> rdf:type up:Protein ;
         up:reviewed true ;
         up:created "2001-10-24"^^<xsd:date> ;
         up:modified "2015-04-01"^^<xsd:date> ;
         up:version 86 ;
         up:mnemonic "TPSDA_ABIGR" ;
         up:oldMnemonic "TPSD3_ABIGR" ,
                        "TSD3_ABIGR" ;
         up:replaces <Q94FV9> ;
         up:sequence isoform:O22340-1 .
isoform:O22340-1 rdf:type up:Simple_Sequence ;
                 up:modified "1998-01-01"^^<xsd:date> ;
                 up:version 1 .""")

qres=entry.query("""prefix up: <http://purl.uniprot.org/core/>
SELECT
    ?protein 
    ?created
    ?modified
    ?version
WHERE {\
  ?protein a up:Protein ;
           up:created ?created ;
           up:modified ?modified ;
           up:version ?version .
}""")

for row in qres:
    print("%s was created on %s and modified on %s. It is at version %s" % row)

http://purl.uniprot.org/uniprot/O22340 was created on 2001-10-24 and modified on 2015-04-01. It is at version 86
