# Required Python library

First we import rdflib which is a well known python library that gives RDF and its query language support to Python2 and 3


In [3]:
import sys
from rdflib import *

from SPARQLWrapper import SPARQLWrapper, JSON

# Taxonomy

## Organism identifier

The organism which is the source of a protein sequence is identified by a unique identifier from the NCBI taxonomy database. This is stored in the organism property of the uniprot entry. This is the only taxonomy information that is stored in the RDF format of a UniProtKB entry. However, the full NCBI taxonomy is modelled and available as well.

In [4]:
entry=Graph().parse(format='ttl',
                     data="""
base <http://purl.uniprot.org/uniprot/>  
prefix up: <http://purl.uniprot.org/core/>
prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> 
prefix isoform:<http://purl.uniprot.org/isoforms/>
prefix skos: <http://www.w3.org/2004/02/skos/core#>
prefix taxon: <http://purl.uniprot.org/taxonomy/>

<P05067>
  a up:Protein ;
  up:organism taxon:9606 .""")

### Selecting the organism of a protein

In [5]:
qres=entry.query("""
PREFIX up: <http://purl.uniprot.org/core/> 
PREFIX uniprotkb: <http://purl.uniprot.org/uniprot/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>

SELECT 
    ?protein 
    ?taxon
WHERE {
    ?protein a up:Protein ;
      up:organism ?taxon .
}""")

for row in qres:
    print("%s is from %s" % row)

http://purl.uniprot.org/uniprot/P05067 is from http://purl.uniprot.org/taxonomy/9606


### Taxonomy

In [14]:
taxon=Graph().parse(format='ttl',
                 data="""
base <http://purl.uniprot.org/taxonomy/> 
prefix up: <http://purl.uniprot.org/core/> 
prefix foaf: <http://xmlns.com/foaf/0.1/> 
prefix owl: <http://www.w3.org/2002/07/owl#> 
prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> 
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> 
prefix skos: <http://www.w3.org/2004/02/skos/core#> 
prefix xsd: <http://www.w3.org/2001/XMLSchema#> 

<9606> a up:Taxon ;
  up:rank up:Species ;
  up:mnemonic "HUMAN" ;
  up:scientificName "Homo sapiens" ;
  up:commonName "Human" ;
  up:otherName "Home sapiens" ,
    "Homo sapiens Linnaeus, 1758" ,
    "man" ;
  rdfs:seeAlso <http://animaldiversity.org/site/accounts/information/Homo_sapiens.html> ,
    <http://archaeologyinfo.com/homo-sapiens/> ,
    <http://www.ensembl.org/Homo_sapiens/Info/Index> ,
    <https://www.sciencedaily.com/releases/2005/02/050223122209.htm> ;
  rdfs:subClassOf <9605> ;
  skos:narrowerTransitive <63221> ,
    <741158> ;
  up:partOfLineage false .

<9605> a up:Taxon ;
  up:rank up:Genus ;
  up:scientificName "Homo" ;
  up:otherName "Homo Linnaeus, 1758" ,
    "humans" ;
  rdfs:subClassOf <207598> ;
  skos:narrowerTransitive <9606> ,
    <1425170> ,
    <2665952> ;
  up:partOfLineage true .
""")

### rank and scientificName

The rank and scientificName are by far the most queried properties of a taxon.

In [19]:
qres=taxon.query("""
PREFIX up: <http://purl.uniprot.org/core/> 

SELECT 
    ?taxon 
    ?scientificName
    ?rank
WHERE {
    ?taxon a up:Taxon ;
      up:rank ?rank ;
      up:scientificName ?scientificName .
}""")

for row in qres:
    print("%s rank:%s name:%s" % row)

http://purl.uniprot.org/taxonomy/9606 rank:Homo sapiens name:http://purl.uniprot.org/core/Species
http://purl.uniprot.org/taxonomy/9605 rank:Homo name:http://purl.uniprot.org/core/Genus


### Hierarchy

Querying the taxonomic hierarchy is straightforward using the `rdfs:subClassOf` relationship. For our sparql endpoint we materialize this relation ship so no SPARQL path queries are required. In other endpoints you might need to use `rdfs:subClassOf+` to query by higher levels of taxonomy.

In [23]:
qres=taxon.query("""
PREFIX up: <http://purl.uniprot.org/core/> 
PREFIX uniprotkb: <http://purl.uniprot.org/uniprot/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>

SELECT 
    ?species ?genus 
WHERE {
    ?species a up:Taxon ;
        up:rank up:Species ;
        rdfs:subClassOf ?genus .
    ?genus a up:Taxon ;
        up:rank up:Genus .
}""")

for row in qres:
    print("%s is part of the genus %s" % row)

http://purl.uniprot.org/taxonomy/9606 is part of the genus http://purl.uniprot.org/taxonomy/9605


### Host organisms

Sometimes an organism is known to be hosted inside an other (e.g. parasite, symbiont, infection). 
In which case we use an `host` property, to link from the organism to it's host.

In [24]:
 taxon2=Graph().parse(format='ttl',
                 data="""
base <http://purl.uniprot.org/taxonomy/> 
prefix up: <http://purl.uniprot.org/core/> 

<1241371> a up:Taxon ;
  up:mnemonic "ABHV" ;
  up:host <6451> .
""")

In [25]:
qres=taxon2.query("""
PREFIX up: <http://purl.uniprot.org/core/> 
PREFIX uniprotkb: <http://purl.uniprot.org/uniprot/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>

SELECT 
    ?virus ?host 
WHERE {
    ?virus up:host ?host .
}""")

for row in qres:
    print("%s infects %s" % row)

http://purl.uniprot.org/taxonomy/1241371 is infects http://purl.uniprot.org/taxonomy/6451
