# Assignment 5 - SPARQL
### Guillermo Chumaceiro López

In [8]:
%endpoint https://sparql.uniprot.org/sparql
%format JSON

### Q1. How many protein records are in UniProt? 

In [3]:
PREFIX up: <http://purl.uniprot.org/core/>

SELECT DISTINCT COUNT(?protein) AS ?prot_records
WHERE {
	?protein a up:Protein .
}

prot_records
360157660


### Q2. How many Arabidopsis thaliana protein records are in UniProt? 

In [3]:
PREFIX up: <http://purl.uniprot.org/core/>
PREFIX taxon: <http://purl.uniprot.org/taxonomy/>

SELECT DISTINCT COUNT(?protein) AS ?ara_prot_records
WHERE {
	?protein a up:Protein .
    ?protein up:organism taxon:3702 .
}

ara_prot_records
136782


### Q3. Retrieve pictures of Arabidopsis thaliana from UniProt? 

In [4]:
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX taxon: <http://purl.uniprot.org/taxonomy/>

SELECT DISTINCT ?image
WHERE {
	taxon:3702 foaf:depiction ?image .
}

image
https://upload.wikimedia.org/wikipedia/commons/3/39/Arabidopsis.jpg
https://upload.wikimedia.org/wikipedia/commons/thumb/6/60/Arabidopsis_thaliana_inflorescencias.jpg/800px-Arabidopsis_thaliana_inflorescencias.jpg


### Q4. What is the description of the enzyme activity of UniProt Protein Q9SZZ8

In [5]:
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX up: <http://purl.uniprot.org/core/>
PREFIX uniprotkb: <http://purl.uniprot.org/uniprot/>

SELECT DISTINCT ?equation
WHERE {
	uniprotkb:Q9SZZ8 a up:Protein ;
	up:enzyme ?enzyme .
	?enzyme up:activity ?activity .
	?activity rdfs:label ?equation.
}

equation
Beta-carotene + 4 reduced ferredoxin [iron-sulfur] cluster + 2 H(+) + 2 O(2) = zeaxanthin + 4 oxidized ferredoxin [iron-sulfur] cluster + 2 H(2)O.


### Q5.  Retrieve the proteins ids, and date of submission, for proteins that have been added to UniProt this year

In [4]:
PREFIX up: <http://purl.uniprot.org/core/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

SELECT DISTINCT ?protein ?date
WHERE {
	?protein a up:Protein .
	?protein up:created ?date .
    FILTER((xsd:date(?date) >= "2021-01-01"^^xsd:date) && (xsd:date(?date) <= "2021-12-31"^^xsd:date))
}

protein,date
http://purl.uniprot.org/uniprot/A0A1H7ADE3,2021-06-02
http://purl.uniprot.org/uniprot/A0A1V1AIL4,2021-06-02
http://purl.uniprot.org/uniprot/A0A2Z0L603,2021-06-02
http://purl.uniprot.org/uniprot/A0A4J5GG53,2021-04-07
http://purl.uniprot.org/uniprot/A0A6G8SU52,2021-02-10
http://purl.uniprot.org/uniprot/A0A6G8SU69,2021-02-10
http://purl.uniprot.org/uniprot/A0A7C9JLR7,2021-02-10
http://purl.uniprot.org/uniprot/A0A7C9JMZ7,2021-02-10
http://purl.uniprot.org/uniprot/A0A7C9KUQ4,2021-02-10
http://purl.uniprot.org/uniprot/A0A7D4HP61,2021-02-10


### Q6. How  many species are in the UniProt taxonomy?

In [19]:
PREFIX up: <http://purl.uniprot.org/core/>

SELECT DISTINCT COUNT(?taxon) AS ?species
WHERE
{
    ?taxon a up:Taxon ;
    up:rank up:Species .
}

species
2029846


### Q7. How many species have at least one protein record?

In [23]:
PREFIX up: <http://purl.uniprot.org/core/>

SELECT (COUNT (DISTINCT ?taxon) AS ?species)
WHERE
{
    ?protein a up:Protein .
    ?protein up:organism ?taxon .
    ?taxon a up:Taxon ;
    up:rank up:Species .
}

species
1057158


### Q8. Find the AGI codes and gene names for all Arabidopsis thaliana  proteins that have a protein function annotation description that mentions “pattern formation”

In [7]:
PREFIX up: <http://purl.uniprot.org/core/>
PREFIX taxon: <http://purl.uniprot.org/taxonomy/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT DISTINCT ?AGI_code ?gene_name
WHERE {
	?protein a up:Protein .
    ?protein up:organism taxon:3702 .
    ?protein up:encodedBy ?gene .
    ?gene a up:Gene .
    ?gene skos:prefLabel ?gene_name .
    ?gene up:locusName ?AGI_code .
    ?protein up:annotation ?annotation .
    ?annotation a up:Function_Annotation .
    ?annotation rdfs:comment ?text
    FILTER CONTAINS(?text, 'pattern formation')
}

AGI_code,gene_name
At3g54220,SCR
At4g21750,ATML1
At1g13980,GN
At5g40260,SWEET8
At1g69670,CUL3B
At1g63700,YDA
At2g46710,ROPGAP3
At1g26830,CUL3A
At3g09090,DEX1
At4g37650,SHR


### Q9. What is the MetaNetX Reaction identifier (starts with “mnxr”) for the UniProt Protein uniprotkb:Q18A79

In [23]:
%endpoint https://rdf.metanetx.org/sparql
%format JSON

In [24]:
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX up: <http://purl.uniprot.org/core/>
PREFIX uniprotkb: <http://purl.uniprot.org/uniprot/>
PREFIX mnx: <https://rdf.metanetx.org/schema/>

SELECT DISTINCT ?reac_label
WHERE {
    ?protein mnx:peptXref uniprotkb:Q18A79.
	?cata mnx:pept ?protein .
    ?gpr mnx:cata ?cata ;
         mnx:reac ?reac .
    ?reac mnx:mnxr ?stable_reac .
    ?stable_reac rdfs:label ?reac_label .
}

reac_label
MNXR145046
MNXR165934


### Q10. What is the official Gene ID (UniProt calls this a “mnemonic”) and the MetaNetX Reaction identifier (mnxr…..) for the protein that has “Starch synthase” catalytic activity in Clostridium difficile (taxon 272563).

In [20]:
PREFIX up: <http://purl.uniprot.org/core/>
PREFIX taxon: <http://purl.uniprot.org/taxonomy/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX up: <http://purl.uniprot.org/core/>
PREFIX mnx: <https://rdf.metanetx.org/schema/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>

SELECT DISTINCT ?gene_id ?reac_label
WHERE {
    SERVICE <https://sparql.uniprot.org/sparql> {
        SELECT DISTINCT ?protein ?gene_id
        WHERE {
            ?protein a up:Protein .
            ?protein up:organism taxon:272563 .
            ?protein up:mnemonic ?gene_id .
            ?protein up:enzyme ?enzyme .
            ?enzyme skos:prefLabel ?text .
            FILTER CONTAINS(?text, 'Starch synthase')
        } 
    }
    SERVICE <https://rdf.metanetx.org/sparql> {
        ?prot mnx:peptXref ?protein .
        ?cata mnx:pept ?prot .
        ?gpr mnx:cata ?cata .
        ?gpr mnx:reac ?reac .
        ?reac mnx:mnxr ?stable_reac .
        ?stable_reac rdfs:label ?reac_label .
     }
} 

gene_id,reac_label
GLGA_CLOD6,MNXR145046
GLGA_CLOD6,MNXR165934
