# SPARQL Queries for Local RDF Data

So you found some RDF data (yay!) from an archive somewhere, but there's no active SPARQL endpoint (it's all JSON APIs these days). You could use the methods implemented in the Python package `rdflib` to perform some lightweight searches of the triples, but what if you want to perform more complex queries without having to create a local triple store? In this post we'll see how to use the declarative language of SPARQL to perform complex queries on a local RDF file (inspired by [this Stackoverflow post](https://stackoverflow.com/questions/9877989/python-sparql-querying-local-file))? 

_Note: If you want to follow along at home, check out the post on [RDF basics](https://rebeccabilbro.github.io/rdf-basics/) to find out how to get the [data](http://data.dws.informatik.uni-mannheim.de/structureddata/2014-12/quads/ClassSpecificQuads/schemaorgProduct.nq.sample.txt) used in this post, where it comes from, and what it's all about._

## Make a `ConjunctiveGraph`

First, make sure you have `pip` installed `rdflib`. Now make a `ConjunctiveGraph` from the data. Here we specify that the data should be parsed as nquads, since that's the form that this specific dataset takes.

In [1]:
import os

from rdflib import ConjunctiveGraph

base_folder = "data"
product_path = "products.nq"


def make_graph_from_nquads(input_data):
    g = ConjunctiveGraph(identifier="Products")
    data = open(input_data, "rb")
    g.parse(data, format="nquads")

    return g

g = make_graph_from_nquads(os.path.join(base_folder, product_path))

## Query the `rdflib.graph.Graph()`

Now that we have a graph, `rdflib` exposes a `Graph.query` module that we can use to pass in SPARQL queries.

Let's create a reusable function `get_products`, that takes as an argument the `ConjunctiveGraph` and returns a list of products.

In [6]:
from rdflib import URIRef, BNode

def get_products(graph):
    
    product = URIRef("http://schema.org/Product")

    sparql_query = """
        SELECT DISTINCT ?s
        WHERE {
        ?s ?p ?o .
        }
        """

    results = graph.query(sparql_query, initBindings={'o' : product})
    return [str(result[0]) for result in results]

Now we can use our function to get all the unique products in the dataset:

In [7]:
products = get_products(g)

print("{} total products".format(len(products)))
print("URI of first product: {}".format(list(products)[0]))

216 total products
URI of first product: Neaca176676eb429cb27c9aff6ef60886


Now let's create a function `get_product_details` that takes as input the graph and a specific product's uri, and returns all the results containing the details available for that product:

In [8]:
def get_product_details(graph, product_uri):
    
    sparql_query = """
    SELECT ?p ?o
    WHERE {
    ?s ?p ?o .
    }
    """
    
    return graph.query(sparql_query, initBindings={'s' : product_uri})

We can test out our function to explore a specific product from our dataset (in this case the one corresponding to the anonymous node 'N5ff6dab3a3ec40b4a040220b8f2effbe'):

In [9]:
sample_product = BNode('N5ff6dab3a3ec40b4a040220b8f2effbe')
details = get_product_details(g, sample_product)
for detail in details:
    print(detail,'\n')

This tells that the product has some additional information, including a name ("sandcastle1"), a description ("Check out sandcastle1 on Cubify at http://cubify.com/Store/Design/AL98QAW7QL #getthereeasy"), and an image:

![sandcastle1 image](http://cubify.blob.core.windows.net/account/QHUHU2JGQ9/GOUAOFHO5W/cb863880-2a9e-457b-ae6d-574d9b46e49f_e719dda8-c96b-4242-bccb-bfa5c28e061c_08fb68af-884d-4b5f-89f3-89)

We can use the same function to get the details for every product in the dataset:

In [10]:
for product in products:
    print(product,':')
    for detail in get_product_details(g, BNode(product)):
        print('- ', detail)
    print('')

Neaca176676eb429cb27c9aff6ef60886 :
-  (rdflib.term.URIRef('http://schema.org/Product/url'), rdflib.term.Literal('http://ashworthgolf.com/ashworth/G5422326.html', lang='en'))
-  (rdflib.term.URIRef('http://schema.org/Product/name'), rdflib.term.Literal('Cardiff', lang='en'))
-  (rdflib.term.URIRef('http://schema.org/Product/image'), rdflib.term.URIRef('http://demandware.edgesuite.net/aais_prd/on/demandware.static/Sites-TMaG-Site/Sites-tmag-master-catalog/default/v1419672021462/zoom/G54223_zoom_D.jpg'))
-  (rdflib.term.URIRef('http://www.w3.org/1999/02/22-rdf-syntax-ns#type'), rdflib.term.URIRef('http://schema.org/Product'))
-  (rdflib.term.URIRef('http://schema.org/Product/productID'), rdflib.term.Literal('G5422326', lang='en'))

N326f7a36c7ac4bb2bcda3d21ad65c11a :
-  (rdflib.term.URIRef('http://schema.org/Product/image'), rdflib.term.URIRef('http://ii.alatest.com/product/190x190/d/b/Haier-HVTB18DABB18-Bottle-Dual-zone-Wine-Cooler-with-Touch-Screen-Controls-0.jpg'))
-  (rdflib.term.URI

-  (rdflib.term.URIRef('http://www.w3.org/1999/02/22-rdf-syntax-ns#type'), rdflib.term.URIRef('http://schema.org/Product'))
-  (rdflib.term.URIRef('http://schema.org/Product/review'), rdflib.term.BNode('Nb926c57ec0cf411ea999ba4958bc3876'))
-  (rdflib.term.URIRef('http://schema.org/Product/review'), rdflib.term.BNode('Nab7c5be6a0af4a058b7ce129548f1b8d'))
-  (rdflib.term.URIRef('http://schema.org/Product/name'), rdflib.term.Literal('ASUS PRO B43S-XH71 Laptop Computer - Intel Core i7-2620M 2.70GHz, 4GB DDR3, 500GB HDD, DVDRW, 14 Display, Windows 7 Professional 64-bit', lang='en-us'))
-  (rdflib.term.URIRef('http://schema.org/Product/image'), rdflib.term.URIRef('http://ii.alatest.com/product/190x190/f/8/Asus-S-XH71-165619449.jpg'))

N38389faa3d08453fa75e85ed0065ba22 :
-  (rdflib.term.URIRef('http://schema.org/Product/productID'), rdflib.term.Literal('Z1589327', lang='en'))
-  (rdflib.term.URIRef('http://www.w3.org/1999/02/22-rdf-syntax-ns#type'), rdflib.term.URIRef('http://schema.org/Produ

-  (rdflib.term.URIRef('http://schema.org/Product/url'), rdflib.term.Literal('http://adidasgolf.com/Herringbone-Trouser/DW-UU742.html', lang='en'))
-  (rdflib.term.URIRef('http://schema.org/Product/productID'), rdflib.term.Literal('DW-UU742', lang='en'))
-  (rdflib.term.URIRef('http://www.w3.org/1999/02/22-rdf-syntax-ns#type'), rdflib.term.URIRef('http://schema.org/Product'))
-  (rdflib.term.URIRef('http://schema.org/Product/name'), rdflib.term.Literal('Herringbone Trouser', lang='en'))
-  (rdflib.term.URIRef('http://schema.org/Product/image'), rdflib.term.URIRef('http://demandware.edgesuite.net/aais_prd/on/demandware.static/Sites-TMaG-Site/Sites-tmag-master-catalog/default/v1419153454085/zoom/Z42505_zoom.jpg'))

Na2eb62a7a8e3418b92a3994cc1b50a8d :
-  (rdflib.term.URIRef('http://schema.org/Product/image'), rdflib.term.URIRef('http://ii.alatest.com/product/190x190/f/4/Clover-2-Small-Camera-Mounting-Bracket-0.jpg'))
-  (rdflib.term.URIRef('http://schema.org/Product/offers'), rdflib.term.

-  (rdflib.term.URIRef('http://schema.org/Product/productID'), rdflib.term.Literal('DW-BB052', lang='en'))
-  (rdflib.term.URIRef('http://schema.org/Product/image'), rdflib.term.URIRef('http://demandware.edgesuite.net/aais_prd/on/demandware.static/Sites-TMaG-Site/Sites-tmag-master-catalog/default/v1419240262656/zoom/Z58291_zoom.jpg'))
-  (rdflib.term.URIRef('http://schema.org/Product/name'), rdflib.term.Literal('Angular Heathered Blocked V-Neck Sweater', lang='en'))
-  (rdflib.term.URIRef('http://schema.org/Product/url'), rdflib.term.Literal('http://adidasgolf.com/Angular-Heathered-Blocked-V-Neck-Sweater/DW-BB052.html', lang='en'))

N929b1ebd937c4721bccabf33cabf8947 :
-  (rdflib.term.URIRef('http://schema.org/Product/image'), rdflib.term.URIRef('http://wfpquantum.s3.amazonaws.com/images/autos/autos/medium/r0exhattudwp5higv90t-10608030.jpg'))
-  (rdflib.term.URIRef('http://schema.org/Product/manufacturer'), rdflib.term.Literal('Dodge'))
-  (rdflib.term.URIRef('http://schema.org/Product/

-  (rdflib.term.URIRef('http://www.w3.org/1999/02/22-rdf-syntax-ns#type'), rdflib.term.URIRef('http://schema.org/Product'))
-  (rdflib.term.URIRef('http://schema.org/Product/description'), rdflib.term.Literal('MLB Los Angeles Dodgers 15oz. Ceramic Gameball Mug Set of 2. This set includes two large ceramic gameball mugs decorated with a high-quality metal team logos.', lang='en'))

N3eb21a67f5624eaa98309c80bd16ed0e :
-  (rdflib.term.URIRef('http://schema.org/Product/offers'), rdflib.term.BNode('Ndc737bfcacec4cf9a89fa29e34e482bc'))
-  (rdflib.term.URIRef('http://schema.org/Product/description'), rdflib.term.Literal('Enjoy excellent performance, connectivity, picture quality with enhanced clarity on Professional P1913 19-inch Widescreen Flat Panel Monitor from DellTM. The dynamic contrast ratio of 2 million: 1 provides to work in razor-sharp clarity, and experience smooth, jitter-free moving images. Get excellent clarity and rich images with 1400 x 900 at 60 Hz resolution with a color dep

-  (rdflib.term.URIRef('http://schema.org/Product/review'), rdflib.term.BNode('Ne42b822d92cb4589aac185fdd4ad3de8'))
-  (rdflib.term.URIRef('http://schema.org/Product/name'), rdflib.term.Literal('Gateway/DX4860-UR32P Desktop', lang='en-us'))
-  (rdflib.term.URIRef('http://schema.org/Product/image'), rdflib.term.URIRef('http://ii.alatest.com/product/190x190/5/7/Gateway-DX4850-57-PT-GBL02-022-Desktop-PC-1.jpg'))
-  (rdflib.term.URIRef('http://www.w3.org/1999/02/22-rdf-syntax-ns#type'), rdflib.term.URIRef('http://schema.org/Product'))

Naf282649929a4f38842ebc96fbc83d43 :
-  (rdflib.term.URIRef('http://schema.org/Product/review'), rdflib.term.BNode('Nb8955fb3ad514a58b2f86f8c52eb0fce'))
-  (rdflib.term.URIRef('http://schema.org/Product/review'), rdflib.term.BNode('N802797c0f60949518d4e8061f1866ce8'))
-  (rdflib.term.URIRef('http://schema.org/Product/review'), rdflib.term.BNode('N080c440fc0a44c3ba25d2d2a8ecb2858'))
-  (rdflib.term.URIRef('http://schema.org/Product/review'), rdflib.term.BNode(

-  (rdflib.term.URIRef('http://schema.org/Product/logo'), rdflib.term.URIRef('http://andersonfloors.com/Flooring/Design/ASID_China/Hardwood/Design/ASID_China.aspx//Custom2013/FrontEnd/Products2013/Images/ProductLogos_Anderson.png'))
-  (rdflib.term.URIRef('http://www.w3.org/1999/02/22-rdf-syntax-ns#type'), rdflib.term.URIRef('http://schema.org/Product'))
-  (rdflib.term.URIRef('http://schema.org/Product/description'), rdflib.term.Literal('\n                            Rustic and distressed, this enCore™ hickory floor looks as if it might have been in an ancient barn… or even a long-closed factory. But Antique Walk’s patina has such sophistication and elegance, it takes rustic to a new level. Antique Walk looks like the skip-sawn timber of a century ago—crafted for years  of use and aging handsomely over time. It goes anywhere—traditional or contemporary—with incredible style.', lang='en'))
-  (rdflib.term.URIRef('http://schema.org/Product/name'), rdflib.term.Literal('\n                

-  (rdflib.term.URIRef('http://schema.org/Product/offers'), rdflib.term.BNode('N7277a4fceef9481ea1cbdb391e0afc00'))
-  (rdflib.term.URIRef('http://www.w3.org/1999/02/22-rdf-syntax-ns#type'), rdflib.term.URIRef('http://schema.org/Product'))
-  (rdflib.term.URIRef('http://schema.org/Product/aggregateRating'), rdflib.term.BNode('N4f93de66b3904be992fd5f2dad60fd05'))
-  (rdflib.term.URIRef('http://schema.org/Product/image'), rdflib.term.URIRef('http://media.emergencyessenti.netdna-cdn.com/catalog/product/cache/1/image/404x/9df78eab33525d08d6e5fb8d27136e95/c/u/cu_t075_gerber_gator_combo_axe_ii_with_saw__2.jpg'))
-  (rdflib.term.URIRef('http://schema.org/Product/name'), rdflib.term.Literal('\n                    Gerber Gator Combo Axe II with Saw\n                ', lang='en'))

N8ca5a5ce7a5e4553b059c296b09412fd :
-  (rdflib.term.URIRef('http://schema.org/Product/name'), rdflib.term.Literal('Plaid Print Stretch Wind Jacket', lang='en'))
-  (rdflib.term.URIRef('http://schema.org/Product/image'

Now let's say we want to go through each product and find not only the information contained in it's primary triples, but also information in related triples, such as the product reviews, which are stored separately, and of which there may be several or none for a given product.

In [11]:
generic = URIRef("http://www.w3.org/1999/02/22-rdf-syntax-ns#type")
review_desc = URIRef('http://schema.org/Review/description')
product_review = URIRef('http://schema.org/Product/review')   


def get_review(graph, review_uri):
    
    sparql_query = """
    SELECT ?p ?o 
    WHERE {
    ?s ?p ?o .
    }
    """
    
    return graph.query(sparql_query, initBindings={'s': review_uri})
    

# make a dictionary to hold all of the product data
product_dict = {}
    
for product in products:
    product_dict[product] = {"http://schema.org/Product/name" : None,
                             "http://schema.org/Product/image": None,
                             "http://schema.org/Product/url" :  None,
                             "http://schema.org/Product/description": None,
                             "http://schema.org/Product/sku" : None,
                             "http://schema.org/Product/manufacturer": None,
                             "http://schema.org/Product/productID": None,
                             "http://schema.org/Product/dateModified": None,
                             "http://schema.org/Product/releaseDate": None,
                             "http://schema.org/Product/brand": None,
                             "http://schema.org/Product/model": None,
                             "http://schema.org/Product/offers": None,
                             "http://schema.org/Product/thumbnailUrl": None,
                             "http://schema.org/Product/logo": None,
                             "http://schema.org/Product/aggregateRating": None,
                             "http://schema.org/Product/reviews": [],
        }
    for detail_type, detail_content in get_product_details(g, BNode(product)):
        if detail_type == generic:
            continue
        # if the details indicate a product review:
        elif detail_type == product_review:
            for review_type, review_content in get_review(g, detail_content):
                if review_type == review_desc:
                    product_dict[product]["http://schema.org/Product/reviews"].append(
                        str(review_content)
                    )
                else:
                    continue
        else:
            product_dict[product][detail_type] = str(detail_content)

In [12]:
product_dict

{'Neaca176676eb429cb27c9aff6ef60886': {'http://schema.org/Product/name': None,
  'http://schema.org/Product/image': None,
  'http://schema.org/Product/url': None,
  'http://schema.org/Product/description': None,
  'http://schema.org/Product/sku': None,
  'http://schema.org/Product/manufacturer': None,
  'http://schema.org/Product/productID': None,
  'http://schema.org/Product/dateModified': None,
  'http://schema.org/Product/releaseDate': None,
  'http://schema.org/Product/brand': None,
  'http://schema.org/Product/model': None,
  'http://schema.org/Product/offers': None,
  'http://schema.org/Product/thumbnailUrl': None,
  'http://schema.org/Product/logo': None,
  'http://schema.org/Product/aggregateRating': None,
  'http://schema.org/Product/reviews': [],
  rdflib.term.URIRef('http://schema.org/Product/url'): 'http://ashworthgolf.com/ashworth/G5422326.html',
  rdflib.term.URIRef('http://schema.org/Product/name'): 'Cardiff',
  rdflib.term.URIRef('http://schema.org/Product/image'): 'htt

Success! Now we have a dictionary representation of all of our products, together with their reviews.