#  SPARQL from Python

`SPARQLWrapper` is a simple Python wrapper around a SPARQL service for remote query execution. Not only does it enable us to write more complex queries to extract information from RDF than those exposed through a library like `rdflib`, it can also convert query results into other formats like JSON and CSV!

## First, what is SPARQL?

SPARQL ("SPARQL Protocol And RDF Query Language") is a W3C standard for querying [RDF](https://rebeccabilbro.github.io/rdf-basics/), which allows us to express queries as three-part statements (e.g. texts that are reviews of a particular product):

    """
    PREFIX ...
    SELECT ...
    WHERE  ...
    """

SPARQL can be used to express queries across diverse data sources, whether the data is stored natively as RDF or viewed as RDF via middleware. SPARQL contains capabilities for querying required and optional graph patterns along with their conjunctions and disjunctions. SPARQL also supports extensible value testing and constraining queries by source RDF graph. The results of SPARQL queries can be results sets or RDF graphs.

## `SPARQLWrapper`

The Python library `SPARQLWrapper` (which can be installed via `pip`) enables us to use the SPARQL query language to interact with remote or local SPARQL endpoints, such as [DBPedia](http://wiki.dbpedia.org/):

In [1]:
from SPARQLWrapper import SPARQLWrapper, JSON

# Specify the DBPedia endpoint
sparql = SPARQLWrapper("http://dbpedia.org/sparql")

# Query for the description of "Capsaicin", filtered by language 
sparql.setQuery("""
    PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
    SELECT ?comment
    WHERE { <http://dbpedia.org/resource/Capsaicin> rdfs:comment ?comment 
    FILTER (LANG(?comment)='en')
    }
""")

# Convert results to JSON format
sparql.setReturnFormat(JSON)
result = sparql.query().convert()

for hit in result["results"]["bindings"]:
    print(hit["comment"]["value"])

Capsaicin (/kæpˈseɪ.ᵻsɪn/ (INN); 8-methyl-N-vanillyl-6-nonenamide) is an active component of chili peppers, which are plants belonging to the genus Capsicum. It is an irritant for mammals, including humans, and produces a sensation of burning in any tissue with which it comes into contact. Capsaicin and several related compounds are called capsaicinoids and are produced as secondary metabolites by chili peppers, probably as deterrents against certain mammals and fungi. Pure capsaicin is a volatile, hydrophobic, colorless, odorless, crystalline to waxy compound.


We can also use the Wikidata Query Service (WDQS) endpoint to query [Wikidata](https://www.wikidata.org/wiki/Wikidata:Main_Page).

In [8]:
sparql = SPARQLWrapper("https://query.wikidata.org/sparql")
sparql.setQuery("""
SELECT ?item ?itemLabel 

WHERE {
  ?item wdt:P279 wd:Q522171.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
""")
sparql.setReturnFormat(JSON)
results = sparql.query().convert()

In this case, we can use `pandas` to review the results as a dataframe.

In [10]:
import pandas as pd

results_df = pd.io.json.json_normalize(results['results']['bindings'])
results_df[['item.value', 'itemLabel.value']]

Unnamed: 0,item.value,itemLabel.value
0,http://www.wikidata.org/entity/Q249114,salsa
1,http://www.wikidata.org/entity/Q335016,Tabasco sauce
2,http://www.wikidata.org/entity/Q360459,Adobo
3,http://www.wikidata.org/entity/Q460439,Blair's 16 Million Reserve
4,http://www.wikidata.org/entity/Q966327,harissa
5,http://www.wikidata.org/entity/Q1026822,Chili oil
6,http://www.wikidata.org/entity/Q1392674,sriracha sauce
7,http://www.wikidata.org/entity/Q2227032,mojo
8,http://www.wikidata.org/entity/Q2279518,Shito
9,http://www.wikidata.org/entity/Q2402909,Valentina


Cool site for finding [live SPARQL endpoints](http://sparqles.ai.wu.ac.at/availability).

In [None]:
# a local RDF file? 
# Put in a local triple store and point your code to localhost

In [None]:
# Use `rdflib` to query the rdflib.graph.Graph()

filename = "path/to/fileneme" #replace with something interesting
uri = "uri_of_interest" #replace with something interesting

import rdflib
import rdfextras
rdfextras.registerplugins() # so we can Graph.query()

g=rdflib.Graph()
g.parse(filename)
results = g.query("""
SELECT ?p ?o
WHERE {
<%s> ?p ?o.
}
ORDER BY (?p)
""" % uri) #get every predicate and object about the uri
