# SPARQL Queries for Local RDF Data

So you found some RDF data (yay!) from an archive somewhere, but there's no active SPARQL endpoint (it's all JSON APIs these days). You could use the methods implemented in the Python package `rdflib` to perform some lightweight searches of the triples, but what if you want to perform more complex queries without having to create a local triple store? In this post we'll see how to use the declarative language of SPARQL to perform complex queries on a local RDF file (inspired by [this Stackoverflow post](https://stackoverflow.com/questions/9877989/python-sparql-querying-local-file))? 

_Note: If you want to follow along at home, check out the post on [RDF basics](https://rebeccabilbro.github.io/rdf-basics/) to find out how to get the [data](http://data.dws.informatik.uni-mannheim.de/structureddata/2014-12/quads/ClassSpecificQuads/schemaorgProduct.nq.sample.txt) used in this post, where it comes from, and what it's all about._

## Make a `ConjunctiveGraph`

First, make sure you have `pip` installed `rdflib`. Now make a `ConjunctiveGraph` from the data. Here we specify that the data should be parsed as nquads, since that's the form that this specific dataset takes.

In [21]:
import os

from rdflib import ConjunctiveGraph

base_folder = "data"
product_path = "products.nq"


def make_graph_from_nquads(input_data):
    g = ConjunctiveGraph(identifier="Products")
    data = open(input_data, "rb")
    g.parse(data, format="nquads")

    return g

g = make_graph_from_nquads(os.path.join(base_folder, product_path))

## Query the `rdflib.graph.Graph()`

Now that we have a graph, `rdflib` exposes a `Graph.query` module that we can use to pass in SPARQL queries, e.g. to count the number of unique products in the dataset:

In [55]:
from rdflib import URIRef 

product_uri = URIRef("http://schema.org/Product")

sparql_query = """
    SELECT ?s
    WHERE {
    ?s ?p ?o .
    }
    """

products = g.query(sparql_query, initBindings={'o' : product_uri})
print("{} total products".format(len(products)))
print("URI of first product: {}".format(list(products)[0]))

216 total products
URI of first product: (rdflib.term.BNode('N1715bb0ff467414298f80cacc323fa28'),)


We can also dig in and explore a bit more about a particular product using it's blank node URI:

In [56]:
sample_product = BNode('N5ff6dab3a3ec40b4a040220b8f2effbe')

sparql_query = """
    SELECT ?p ?o
    WHERE {
    ?s ?p ?o .
    }
    """

results = g.query(sparql_query, initBindings={'s' : sample_product})
for result in results:
    print(result)

(rdflib.term.URIRef('http://schema.org/Product/name'), rdflib.term.Literal('sandcastle1'))
(rdflib.term.URIRef('http://schema.org/Product/image'), rdflib.term.URIRef('http://cubify.blob.core.windows.net/account/QHUHU2JGQ9/GOUAOFHO5W/cb863880-2a9e-457b-ae6d-574d9b46e49f_e719dda8-c96b-4242-bccb-bfa5c28e061c_08fb68af-884d-4b5f-89f3-89'))
(rdflib.term.URIRef('http://schema.org/Product/description'), rdflib.term.Literal('Check out sandcastle1 on Cubify at http://cubify.com/Store/Design/AL98QAW7QL #getthereeasy'))
(rdflib.term.URIRef('http://www.w3.org/1999/02/22-rdf-syntax-ns#type'), rdflib.term.URIRef('http://schema.org/Product'))


This tells that the product has some additional information, including a name ("sandcastle1"), a description ("Check out sandcastle1 on Cubify at http://cubify.com/Store/Design/AL98QAW7QL #getthereeasy"), and an image:

![sandcastle1 image](http://cubify.blob.core.windows.net/account/QHUHU2JGQ9/GOUAOFHO5W/cb863880-2a9e-457b-ae6d-574d9b46e49f_e719dda8-c96b-4242-bccb-bfa5c28e061c_08fb68af-884d-4b5f-89f3-89)


In [None]:
PRODUCT_FIELDS = {"name"    : "http://schema.org/Product/name",
                  "image"   : "http://schema.org/Product/image",
                  "url"     : "http://schema.org/Product/url",
                  "desc"    : "http://schema.org/Product/description",
                  "sku"     : "http://schema.org/Product/sku",
                  "review"  : "http://schema.org/Product/review",
                  "manu"    : "http://schema.org/Product/manufacturer",
                  "reviews" : "http://schema.org/Product/reviews",
                  "prod_id" : "http://schema.org/Product/productID",
                  "mod_date": "http://schema.org/Product/dateModified",
                  "rel_date": "http://schema.org/Product/releaseDate",
                  "brand"   : "http://schema.org/Product/brand",
                  "model"   : "http://schema.org/Product/model",
                  "offers"  : "http://schema.org/Product/offers",
                  "thumb"   : "http://schema.org/Product/thumbnailUrl",
                  "logo"    : "http://schema.org/Product/logo",
                  "rating"  : "http://schema.org/Product/aggregateRating",
    }