# Enzymes

This notebook aims to show you how enzymes and their related data are represented in UniProt.  



# Import Python package

First we import rdflib which is a well known python library that gives RDF and its query language support to Python 3 (and Python 2).  


In [1]:
from rdflib import Graph
from SPARQLWrapper import SPARQLWrapper, JSON

## Catatlytic activity

A 'Catalytic activity' annotation describes a catalytic activity of an enzyme, i.e. a chemical reaction that the enzyme catalyzes. 

As UniProt to curates reactions at the level of specific enzymes we use chemical reaction descriptions from the Rhea database whenever possible. Rhea uses the ChEBI (Chemical Entities of Biological Interest) ontology to describe reaction participants that are small molecules as well as the reactive groups of large molecules (such as amino acid residues within proteins). These large molecules are identified by a RHEA-COMP identifier. For catalytic activities that can only be described in the form of free text, we follow the NC-IUBMB descriptions. We have also started to curate the physiological direction of a reaction, i.e. the direction of the net flow of reactants in vivo, where evidence for it is available.

Due to their focus on nomenclature, cross-references to Enzyme Commission (EC) numbers have historically been added to the Protein names subsection of UniProtKB entries. To link the EC numbers to the reactions on which they are based, we also add them to 'Catalytic activity' annotations.

In [14]:
A0A0S3QTD0ttl = """
base <http://purl.uniprot.org/uniport/> 
prefix up: <http://purl.uniprot.org/core/> 
prefix foaf: <http://xmlns.com/foaf/0.1/> 
prefix owl: <http://www.w3.org/2002/07/owl#> 
prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> 
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> 
prefix skos: <http://www.w3.org/2004/02/skos/core#> 
prefix xsd: <http://www.w3.org/2001/XMLSchema#>
prefix enzyme: <http://purl.uniprot.org/enzyme/> 
prefix isoform: <http://purl.uniprot.org/isoforms/> 
prefix faldo: <http://biohackathon.org/resource/faldo#>

<A0A0S3QTD0> a up:Protein ;
    up:annotation <A0A0S3QTD0#SIPEF5AED70D74ABDD4> .

<A0A0S3QTD0#SIPEF5AED70D74ABDD4> rdf:type up:Catalytic_Activity_Annotation ;
  up:catalyticActivity <A0A0S3QTD0#SIP1A91565011EC50F6> ;
  up:catalyzedPhysiologicalReaction <http://rdf.rhea-db.org/16846> ,
    <http://rdf.rhea-db.org/16847> .
    
<A0A0S3QTD0#SIP1A91565011EC50F6> rdf:type up:Catalytic_Activity ;
  up:catalyzedReaction <http://rdf.rhea-db.org/16845> ;
  up:enzymeClass enzyme:2.3.3.16 .
"""

A0A0S3QTD0=Graph().parse(format='ttl', data=A0A0S3QTD0ttl)

for subj, pred, obj in A0A0S3QTD0:
   print(subj, pred, obj)


http://purl.uniprot.org/uniport/A0A0S3QTD0#SIPEF5AED70D74ABDD4 http://purl.uniprot.org/core/catalyticActivity http://purl.uniprot.org/uniport/A0A0S3QTD0#SIP1A91565011EC50F6
http://purl.uniprot.org/uniport/A0A0S3QTD0#SIPEF5AED70D74ABDD4 http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://purl.uniprot.org/core/Catalytic_Activity_Annotation
http://purl.uniprot.org/uniport/A0A0S3QTD0#SIPEF5AED70D74ABDD4 http://purl.uniprot.org/core/catalyzedPhysiologicalReaction http://rdf.rhea-db.org/16847
http://purl.uniprot.org/uniport/A0A0S3QTD0#SIP1A91565011EC50F6 http://purl.uniprot.org/core/enzymeClass http://purl.uniprot.org/enzyme/2.3.3.16
http://purl.uniprot.org/uniport/A0A0S3QTD0 http://purl.uniprot.org/core/annotation http://purl.uniprot.org/uniport/A0A0S3QTD0#SIPEF5AED70D74ABDD4
http://purl.uniprot.org/uniport/A0A0S3QTD0#SIPEF5AED70D74ABDD4 http://purl.uniprot.org/core/catalyzedPhysiologicalReaction http://rdf.rhea-db.org/16846
http://purl.uniprot.org/uniport/A0A0S3QTD0 http://www.w3.org/19

## Retrieve the annotated catalytic activity of a protein

In [3]:
qres=A0A0S3QTD0.query("""
PREFIX up: <http://purl.uniprot.org/core/> 
PREFIX uniprotkb: <http://purl.uniprot.org/uniprot/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>

SELECT ?protein ?reaction ?enzymeClass
WHERE {
  ?protein a up:Protein ;
            up:annotation ?annotation .
  ?annotation a up:Catalytic_Activity_Annotation ;
            up:catalyticActivity ?activity . 
  ?activity up:catalyzedReaction ?reaction .
  
  OPTIONAL {
        # Only a subset of all enzymatic activities is present in the Enzyme Classification
        ?activity up:enzymeClass ?enzymeClass
  }
}""")

for row in qres:
    print("The catatlytic activity of %s catalyzes reaction %s which is in EC %s" % row)

The catatlytic activity of http://purl.uniprot.org/uniport/A0A0S3QTD0 catalyzes reaction http://rdf.rhea-db.org/16845 which is in EC http://purl.uniprot.org/enzyme/2.3.3.16


## Catatalytic activity data

**Properties**:
- `catalyticActivity`  
- `catalyzedReaction`  
- `enzymeClass`  
- `catalyzedPhysiologicalReaction`  

## Ative Site

Annotated active sites of enzymes in UniProtKB entries are annotated as so called `Active_Site_Annotations`. These are linked to positions on a sequence.

In [15]:
# The previous entry data now with an active site annotation with its range, region and positions added.

A0A0S3QTD0ttl = A0A0S3QTD0ttl + """
<A0A0S3QTD0> up:annotation <A0A0S3QTD0#SIP0474AA62187DCADD> .
<A0A0S3QTD0#SIP0474AA62187DCADD> rdf:type up:Active_Site_Annotation ;
  up:range <http://purl.uniprot.org/range/-9218584541931438034tt274tt274> .
<http://purl.uniprot.org/range/-9218584541931438034tt274tt274> rdf:type faldo:Region ;
  faldo:begin <http://purl.uniprot.org/position/-9218584541931438034tt274> ;
  faldo:end <http://purl.uniprot.org/position/-9218584541931438034tt274> .
<http://purl.uniprot.org/position/-9218584541931438034tt274> rdf:type faldo:Position ,
    faldo:ExactPosition ;
  faldo:position 274 ;
  faldo:reference isoform:A0A0S3QTD0-1 .
isoform:A0A0S3QTD0-1 rdf:type up:Simple_Sequence ;
  rdf:value "MKLKERLAELIPQWRAEVAEIRKKYGNRKTMDCTIGHAYGGMRGLKALVCDTSEVFPDEGVKFRGYTIPELREGPHKLPTAEGGFEPLPEGLWYLLLTGELPTEEDVKEISAEFTKRMQNVPQYVFDVLRAMPVDTHPMTMFAAGILAMQRESVFAKRYEEGMRREEHWEAMLEDSLNMLAALPVIAAYIYRRKYKGDTHIAPDPNLDWSANLAHMMGFDDFEVYELFRLYMFLHSDHEGGNVSAHTNLLVNSAYSDIYRSFSAAMNGLAGPLHGLANQEVLRWIQMLYKKFGGVPTKEQLERFAWDTLNSGQVIPGYGHAVLRVTDPRYVAQRDFALKHLPDDELFKIVSLCYEVIPEVLKKHGKAKNPWPNVDAHSGVLLWHYGIREYDFYTVLFGVSRALGCTAQAILVRGYMLPIERPKSITTRWVKEVAESLPVAGS" .
"""
A0A0S3QTD0=Graph().parse(format='ttl', data=A0A0S3QTD0ttl)

In [22]:
qres=A0A0S3QTD0.query("""
PREFIX up: <http://purl.uniprot.org/core/> 
PREFIX faldo: <http://biohackathon.org/resource/faldo#>

SELECT ?protein ?activeSite ?activeSiteResidue
WHERE {
  ?protein a up:Protein ;
            up:annotation ?annotation .
  ?annotation a up:Active_Site_Annotation ;
            up:range ?activeSiteRegion . 
  ?activeSiteRegion faldo:begin ?activeSiteBegin ;
                    faldo:end ?activeSiteEnd .
  ?activeSiteBegin faldo:position ?activeSite ;
                   faldo:reference ?sequenceResource .
  ?activeSiteEnd   faldo:position ?activeSite ;
                   faldo:reference ?sequenceResource .
  ?sequenceResource rdf:value ?sequence .
  BIND(SUBSTR(?sequence, ?activeSiteBegin, (?activeSiteBegin - ?activeSiteEnd) +1) AS ?activeSiteResidue)
}""")

for row in qres:
    print("The active site of %s is at %s which is the following residue %s" % row)

The active site of http://purl.uniprot.org/uniport/A0A0S3QTD0 is at 274 which is the following residue None


## Active Site Annotation data

**Properties**:
- `rdfs:comment`
- `up:range`