 #### UniProt SPARQL Endpoint:  http://sparql.uniprot.org/sparql

Setting the endpoint and the format:

In [1]:
%endpoint https://sparql.uniprot.org/sparql
%format JSON

Q1: 1 POINT  How many protein records are in UniProt? 

In [8]:
PREFIX up: <http://purl.uniprot.org/core/>

SELECT (COUNT (?protein) AS ?count)

WHERE
{
    ?protein a up:Protein .
}

count
360157660


Q2: 1 POINT How many Arabidopsis thaliana protein records are in UniProt?

In [9]:
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX up: <http://purl.uniprot.org/core/>

SELECT (COUNT (?protein) AS ?count)

WHERE
{
    ?protein a up:Protein ;
      		up:organism ?taxon_id .
  	?taxon_id a up:Taxon ;
    		up:scientificName "Arabidopsis thaliana" .
}

count
136782


Q3: 1 POINT retrieve pictures of Arabidopsis thaliana from UniProt? 

In [10]:
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX up: <http://purl.uniprot.org/core/>

SELECT ?image

WHERE
{
  ?taxon_id a up:Taxon;
    		up:scientificName "Arabidopsis thaliana";
      		foaf:depiction ?image .
  ?image a foaf:Image .
}

image
https://upload.wikimedia.org/wikipedia/commons/3/39/Arabidopsis.jpg
https://upload.wikimedia.org/wikipedia/commons/thumb/6/60/Arabidopsis_thaliana_inflorescencias.jpg/800px-Arabidopsis_thaliana_inflorescencias.jpg


Q4: 1 POINT:  What is the description of the enzyme activity of UniProt Protein Q9SZZ8 

In [14]:
PREFIX up:<http://purl.uniprot.org/core/>
PREFIX uniprotkb:<http://purl.uniprot.org/uniprot/>
PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#> 

SELECT ?activity_label 

WHERE
{
	uniprotkb:Q9SZZ8 a up:Protein ;
  					up:enzyme ?enzyme .
	?enzyme up:activity ?activity .
  	?activity a up:Catalytic_Activity ;
                rdfs:label ?activity_label .
}

activity_label
Beta-carotene + 4 reduced ferredoxin [iron-sulfur] cluster + 2 H(+) + 2 O(2) = zeaxanthin + 4 oxidized ferredoxin [iron-sulfur] cluster + 2 H(2)O.


Q5: 1 POINT:  Retrieve the proteins ids, and date of submission, for proteins that have been added to UniProt this year   (HINT Google for “SPARQL FILTER by date”)

Bind, replace and str from: https://en.wikibooks.org/wiki/SPARQL/Expressions_and_Functions

I need to limit the number of results, otherwise this query won't work on the jupyter notebook (it keeps loading and the kernel eventually disconnects). It works fine without limit on the sparql endpoint webpage and the results are almost immediate there.

In [19]:
PREFIX up: <http://purl.uniprot.org/core/>

SELECT ?id ?date

WHERE
{
    ?protein a up:Protein ;
               up:created ?date .
    FILTER (?date >= "2021-01-01"^^xsd:date) .
    BIND (REPLACE(STR(?protein), "http://purl.uniprot.org/uniprot/", "") AS ?id) .
    
} LIMIT 20

id,date
A0A1H7ADE3,2021-06-02
A0A1V1AIL4,2021-06-02
A0A2Z0L603,2021-06-02
A0A4J5GG53,2021-04-07
A0A6G8SU52,2021-02-10
A0A6G8SU69,2021-02-10
A0A7C9JLR7,2021-02-10
A0A7C9JMZ7,2021-02-10
A0A7C9KUQ4,2021-02-10
A0A7D4HP61,2021-02-10


Q6: 1 POINT How  many species are in the UniProt taxonomy?

In [15]:
PREFIX up: <http://purl.uniprot.org/core/>

SELECT (COUNT (DISTINCT ?taxon) AS ?count)

WHERE
{
  ?taxon a up:Taxon;
          up:rank up:Species .
}

count
2029846


Q7: 2 POINT  How many species have at least one protein record? (this might take a long time to execute, so do this one last!)

In [13]:
PREFIX up: <http://purl.uniprot.org/core/>

SELECT (COUNT (DISTINCT ?taxon) AS ?count)

WHERE
{
  ?protein a up:Protein;
           up:organism ?taxon .
  ?taxon a up:Taxon;
          up:rank up:Species .
}

count
1057158


Q8: 3 points:  find the AGI codes and gene names for all Arabidopsis thaliana  proteins that have a protein function annotation description that mentions “pattern formation”

In [None]:
CONTAINS( string, comparestring )
# https://en.wikibooks.org/wiki/SPARQL/Expressions_and_Functions

#### From the MetaNetX metabolic networks for metagenomics database SPARQL Endpoint: https://rdf.metanetx.org/sparql

Defining the endpoint:

In [2]:
%endpoint https://rdf.metanetx.org/sparql

Q9: 4 POINTS:  what is the MetaNetX Reaction identifier (starts with “mnxr”) for the UniProt Protein uniprotkb:Q18A79

In [7]:
PREFIX mnx: <https://rdf.metanetx.org/schema/>
PREFIX uniprotkb: <http://purl.uniprot.org/uniprot/>

SELECT DISTINCT ?mnxr_label

WHERE{
    ?pept a mnx:PEPT ;
          mnx:peptXref uniprotkb:Q18A79 .
    ?cata a mnx:CATA ;
          mnx:pept ?pept .
    ?gpr a mnx:GPR ;
         mnx:cata ?cata ;
         mnx:reac ?reac .
    ?reac a mnx:REAC ;
          mnx:mnxr ?mnxr .
    ?mnxr rdfs:label ?mnxr_label .
}

mnxr_label
MNXR165934
MNXR145046


#### FEDERATED QUERY - UniProt and MetaNetX

Q10: 5 POINTS:  What is the official Gene ID (UniProt calls this a “mnemonic”) and the MetaNetX Reaction identifier (mnxr…..) for the protein that has “Starch synthase” catalytic activity in Clostridium difficile (taxon 272563).

In [None]:
PREFIX mnx: <https://rdf.metanetx.org/schema/>
PREFIX uniprotkb: <http://purl.uniprot.org/uniprot/>
PREFIX up: <http://purl.uniprot.org/core/>
PREFIX taxon: <http://purl.uniprot.org/taxonomy/>

SELECT ?gene_id ?mnxr_label
WHERE
{
    ?pept a mnx:PEPT ;
          mnx:peptXref ?protein .
    ?cata a mnx:CATA ;
          mnx:pept ?pept .
    ?gpr a mnx:GPR ;
         mnx:cata ?cata ;
         mnx:reac ?reac .
    ?reac a mnx:REAC ;
          mnx:mnxr ?mnxr .
    ?mnxr rdfs:label ?mnxr_label .
    
    SERVICE <https://sparql.uniprot.org/sparql> {
        
        ?protein a up:Protein ;
                   up:mnemonic ?gene_id ;
                   up:organism taxon:272653 ;
                   up:encodedBy ?gene .
        ?gene a up:Gene .
        
    }    
        
}

In [None]:
CONTAINS( string, comparestring )
# https://en.wikibooks.org/wiki/SPARQL/Expressions_and_Functions