# Access and query remote data 

 1. SPARQL tutorial
 2. Execute SPARQL queries with RDFLib on a local RDF file
 3. Query a SPARQL endpoint with RDFLib and SPARQLWrapper libraries
 
## 1. SPARQL tutorial

Let's see a few useful SPARQL queries that can be performed over **any** RDF dataset. We apply the queries to ARTchives dataset, so that while we learn SPARQL we also get to know more the contents of our case study.

To see the queries at work, open in a tab of your browser the URL of the [ARTchives SPARQL endpoint](http://artchives.fondazionezeri.unibo.it/sparql), copy and paste the queries in the text area, and press the play button (top-right): 

### 1.1. Get all the classes URIs

This query helps us to understand what are the main entities described in the dataset.

In [2]:
classes_query = """
SELECT DISTINCT ?class_uri
WHERE {
    ?anything a ?class_uri .  
}
"""

The query returns a list of 3 URIs.

 1. http://www.wikidata.org/entity/Q31855
 2.	http://www.wikidata.org/entity/Q5
 3.	http://www.wikidata.org/entity/Q9388534


### 1.2. Get all the classes labels

However, the URIs are not very intuitive. Let's get their names. Classes names are usually recorded as values of the property `rdfs:label`.

In [None]:
classes_and_labels_query = """
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT DISTINCT ?class_uri ?class_name
WHERE {
    ?anything a ?class_uri .
    ?class_uri rdfs:label ?class_name .
}
"""

The query returns zero results. How come?

In ARTchives we find concepts (classes) and relations (predicates) taken from the Wikidata vocabulary. However, the labels and the definitions of these concepts are not stored in ARTchives!

We will see later how to get this information while integrating ARTchives and Wikidata. For the time being, let's put the label as an optional value to be returned by our query.

In [None]:
classes_and_opt_labels_query = """
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT DISTINCT ?class_uri ?class_name
WHERE {
    ?anything a ?class_uri .
    OPTIONAL {?class_uri rdfs:label ?class_name .}
}
"""

The query returns the same list of three URIs and an empty column for the variable `class_name`.

### 1.3 Individuals belonging to classes

Let's get some individuals that fall under those classes to better understand. ARTchives is a rather small dataset, so visualising all the individuals does not take much effort for the SPARQL endpoint. However, usually datasets are far bigger and querying for *all the individuals* might get the endpoint to go in **timeout**, hence not answering the query. 

Since our objective is to get some insights, we can limit the results to 100 results. We can also order our results by class, so that it's easier to understand what is what.

In [None]:
classes_and_individuals_100_query = """
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT DISTINCT ?class_uri ?individual ?individual_label
WHERE {
    ?individual a ?class_uri .
    ?individual rdfs:label ?individual_label .
} ORDER BY ?class_uri
LIMIT 100
"""

**NB** *LIMIT* does not allow you to specify that you want a sample of all classes.

As you see from results, entities represent art historians (Q5), collections (Q9388534), and keepers (Q31855).

Some URIs are repeated, that's because multiple labels for the individuals were (erroneously) recorded (e.g. "Getty Research institute" and "  Getty research institute"). To show only one value among the possible values of `rdfs:label`, we tune the query variable `?individual_label`, asking for a **sample** of values and **grouping results by** variables.

In [None]:
classes_and_individuals_100_query = """
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT DISTINCT ?class_uri ?individual (sample(?individual_label) as ?label)
WHERE {
    ?individual a ?class_uri .
    ?individual rdfs:label ?individual_label .
} 
GROUP BY ?label ?individual ?class_uri
ORDER BY ?class_uri ?label
LIMIT 100
"""

**NB** When grouping results by a variable, you must specify **ALL** the query variables in order of priority.

### 1.4 Get all the properties having a certain class as domain

So far we still don't know precisely which information *can be* recorded for the individuals of the dataset, i.e. we don't know which predicates (also called properties) apply to instances of a certain class.

Let's select individuals of all the classes and look for all the possible properties recorded for individuals of that class. 

In [None]:
properties_by_class_query = """
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX wd: <http://www.wikidata.org/entity/>

SELECT DISTINCT ?class_uri ?property
WHERE {
    ?individual rdf:type ?class_uri ; ## rdf:type == a
                ?property ?value .
}
ORDER BY ?class_uri ?property
"""

**NB** The fact that a predicate *can apply* to that class, does not imply that there is *always* a value for that <subject> <predicate>. For instance, despite an art historian *may* be described with it's biography, it does not mean that the biography it's available for every historians.

### 1.5 Count the number of individuals in a class

Now, let's explore closely what is in ARTchives, for instance counting the number of individuals belonging to a certain class.

In [3]:
count_individuals_by_class_query = """

SELECT ?class (COUNT(?individual) AS ?tot)
WHERE { ?individual a ?class .}
GROUP BY ?class ?tot
"""

As a result we have 6 keepers, 25 art historians and 26 collections.

### 1.6 Count the number of occurrences of properties

Let's see to what extent properties are represented in the dataset, e.g. let's see how many collections have historical notes.

In [None]:
property_count_by_class_query = """
SELECT ?class ?property (COUNT(?property) AS ?prop_count)
WHERE { ?individual a ?class ; ?property ?something .}
GROUP BY ?class ?property ?prop_count 
ORDER BY ?class DESC(?prop_count)
"""

### 1.7 Select only certain values for a variable

Let's say we don't care about properties related to keepers and we want to select only certain classes. We can specify in the query a list of `VALUES` for the variable `?class` that we want to be returned.

In [None]:
property_count_class_values_query = """
SELECT ?class ?property (COUNT(?property) AS ?prop_count)
WHERE { 
    VALUES ?class {<http://www.wikidata.org/entity/Q5> <http://www.wikidata.org/entity/Q9388534>} # look at the syntax
    ?individual a ?class ; ?property ?something .
 }
GROUP BY ?class ?property ?prop_count 
ORDER BY ?class DESC(?prop_count)
"""

This operator applies to any other cases. For instance:

 * select only individuals that are related to `1921` - > `VALUES ?something {'1921'}` 
 * select only birthplaces -> `VALUES ?property {<http://www.wikidata.org/direct/prop/P19>}` 

### 1.8 Let's play with the SPARQL enpoint interface

There are several GUIs (Graphical User Interface) that can be built on top of a SPARQL endpoint. Generally, these offer a number of operations that you can perform on results. The most important ones are the following:

 * display data in several formats (tabular, json raw response, charts)
 * download data (usually in a default JSON format or CSV).
 
**Note for future yourself. What format to choose for download?** It depends. If you are going to  use the  data locally CSV is fine. If you plan to directly use the result data for visualization purposes, JSON *might* be better, but it really depends on which library you will use (some may require you to pass a JSON object as input, some others prefer a table). So think about it ahead!

### 1.9 Analyse the raw response of a SPARQL query

Let's have a look at the JSON result of the query called `count_individuals_by_class_query`.

In [None]:
results = """
{
  "head" : {
    "vars" : [ "class", "tot" ]
  },
  "results" : {
    "bindings" : [ {
      "class" : {
        "type" : "uri",
        "value" : "http://www.wikidata.org/entity/Q31855"
      },
      "tot" : {
        "datatype" : "http://www.w3.org/2001/XMLSchema#integer",
        "type" : "literal",
        "value" : "6"
      }
    }, {
      "class" : {
        "type" : "uri",
        "value" : "http://www.wikidata.org/entity/Q5"
      },
      "tot" : {
        "datatype" : "http://www.w3.org/2001/XMLSchema#integer",
        "type" : "literal",
        "value" : "25"
      }
    }, {
      "class" : {
        "type" : "uri",
        "value" : "http://www.wikidata.org/entity/Q9388534"
      },
      "tot" : {
        "datatype" : "http://www.w3.org/2001/XMLSchema#integer",
        "type" : "literal",
        "value" : "26"
      }
    } ]
  }
}
"""

The JSON returned by **any SPARQL query** has always the same structure, namely: a dictionary with two keys: `head` and `results`.

<pre>
{
  <span style="color:blue">"head"</span> : {
    "vars" : [ "class", "tot" ]
  },
  <span style="color:blue">"results"</span> : {
    "bindings" : [ {
      "class" : {
...
</pre>

**HEADINGS** the value of the key `head` is a dictionary with a key called `vars`, whose value is a list including all the query variables. In our case the query variables are `["class", "tot"]`, which correspond to the names of columns in the tabular view of results
 
<pre>
{
  <b>"head"</b> : {
    <span style="color:blue">"vars" : [ "class", "tot" ]</span>
  },
  "results" : {
    "bindings" : [ {
      "class" : {
...
</pre>

**RESULTS** the value of the key `results` is a dictionary with a key called `bindings`, whose value is a list of dictionaries. 

<pre>
{
  "head" : {
    "vars" : [ "class", "tot" ]</span>
  },
  <b>"results"</b> : {
    <span style="color:blue">"bindings"</span> : [ 
        {...},
        {...},
        {...}
    ]
  }
}    
</pre>

**ROW** Each dictionary in the list corresponds to a row shown in the tabular results.

**MAPPING ROW/COLUMN** Each dictionary/row includes as many dictionaries as the number of query variables (in our case: class and tot). The keys in the dictionary/row are the names of the column/query varaiables.

<pre>
{
  "head" : {
    "vars" : [ "class", "tot" ]</span>
  },
  <b>"results"</b> : {
    "bindings" : [ 
        <span style="color:blue">{
            "class": {...},
            "tot": {...}
        }</span>,
        {...},
        {...}
    ]
  }
}    
</pre>

**CELL** The value of the key/column (class, tot) corresponds to a cell of the tabular result. For every cell two/three variables are recorded according to the type of value, namely:

 * the type, that can be either `uri` or `literal`

 <pre>
      "class" : {
        <span style="color:blue">"type" : "uri",</span>
        "value" : "http://www.wikidata.org/entity/Q31855"
      },
      "tot" : {
        "datatype" : "http://www.w3.org/2001/XMLSchema#integer",
        <span style="color:blue">"type" : "literal",</span>
        "value" : "6"
      }
 </pre>

 * the actual `value` (either the http URI or the string)
 
 <pre>
      "class" : {
        "type" : "uri",
        <span style="color:blue">"value" : "http://www.wikidata.org/entity/Q31855"</span>
      },
      "tot" : {
        "datatype" : "http://www.w3.org/2001/XMLSchema#integer",
        "type" : "literal",
        <span style="color:blue">"value" : "6"</span>
      }
 </pre>
 
 * eventually, if specified in the data, the `datatype` of the literal:
 
 <pre>
      "class" : {
        "type" : "uri",
        "value" : "http://www.wikidata.org/entity/Q31855"
      },
      "tot" : {
        <span style="color:blue">"datatype" : "http://www.w3.org/2001/XMLSchema#integer",</span>
        "type" : "literal",
        "value" : "6"
      }
</pre>


### 1.10 Download and query the JSON result with python

I've downloaded the results of the query in the folder `resources`, called `sparql_query_result.json`.

To query SPARQL JSON results I can use python JSON built-in library (in the end, it's a dictionary, no need of special libraries for this).

For instance, let's open the file and transform the JSON in a python dictionary.

In [4]:
import json , pprint

pp = pprint.PrettyPrinter(indent=1) # just to pretty print results

with open('../resources/sparql_query_result.json','r') as results:
    data = json.load(results)  
    pprint.pprint(data)

{'head': {'vars': ['class', 'tot']},
 'results': {'bindings': [{'class': {'type': 'uri',
                                     'value': 'http://www.wikidata.org/entity/Q31855'},
                           'tot': {'datatype': 'http://www.w3.org/2001/XMLSchema#integer',
                                   'type': 'literal',
                                   'value': '6'}},
                          {'class': {'type': 'uri',
                                     'value': 'http://www.wikidata.org/entity/Q5'},
                           'tot': {'datatype': 'http://www.w3.org/2001/XMLSchema#integer',
                                   'type': 'literal',
                                   'value': '25'}},
                          {'class': {'type': 'uri',
                                     'value': 'http://www.wikidata.org/entity/Q9388534'},
                           'tot': {'datatype': 'http://www.w3.org/2001/XMLSchema#integer',
                                   'type': 'literal',
       

Let's iterate over the JSON and print out colum names.

In [5]:
for column in data["head"]["vars"]: # enter the list
    print(column)

class
tot


Let's iterate over the results and print the values

In [6]:
for result in data["results"]["bindings"]:  # enter the list of dictionaries // do you remember "for row in rows"?
    res_class = result['class']['value']    # the value of the cell under column "class"
    res_tot = result['tot']['value']        # the value of the cell under column "tot"
    print('The class', res_class,'has', res_tot, 'individuals')

The class http://www.wikidata.org/entity/Q31855 has 6 individuals
The class http://www.wikidata.org/entity/Q5 has 25 individuals
The class http://www.wikidata.org/entity/Q9388534 has 26 individuals


## 2. Execute SPARQL queries with RDFLib on a local RDF file

Along with the methods provided by RDFlib for iterating over triples, we can perform SPARQL queries on our local data (again, via RDFLib). To do that we need:

 * parse a local RDF file (we have it! see `../resources/artchives.nq`) into a RDFLib graph
 * write a SPARQL query as a string variable (we have plenty already) 
 * iterate over results of the query (how are the data organised?)
 
The result of a SPARQL query performed over a SPARQL endpoint returns a JSON object organised as we saw. When you query a local RDFLib graph, the result is a **list of tuples**. Every tuple include as many values as the number of query variables (in our case these are 2, `class` and `tot`), which are served in the same order as they appear in the `SELECT` clause. For instance, a `SELECT ?class ?tot` clause returns tuples like `("http://www.wiki..." , "6")`.
Values of the tuples can be accessed by position (e.g. `query_res[0]`) or by variable name (e.g. `query_res["class"]`)

Let's print for each row the results.

In [8]:
import rdflib

# create an empty Graph
g = rdflib.ConjunctiveGraph()

# parse a local RDF file by specifying the format
result = g.parse("../resources/artchives.nq", format='nquads')

query_results = g.query(
    """SELECT ?class (COUNT(?individual) AS ?tot)
    WHERE { ?individual a ?class .}
    GROUP BY ?class ?tot""")

for query_res in query_results:
    print(query_res[0], query_res["tot"]) # notice the two alternative ways to recall values in the tuple

http://www.wikidata.org/entity/Q9388534 25
http://www.wikidata.org/entity/Q31855 5
http://www.wikidata.org/entity/Q5 24


## 3. Query a SPARQL endpoint with RDFLib and SPARQLWrapper libraries

Querying data via SPARQL endpoint interfaces requires you to **separate the code** for collecting the data (the SPARQL query that you perform against the endpoint) from the code for manipulating results. This is not always convenient for **reproducibility** reasons (what if you forget where you saved the SPARQL query and you need to perform it again?), and it is highly discouraged when you need to show your results to a broader community (they want everything in one place runnable with one command line). Moreover, in many cases, to **dump data** locally (whether these are query results or the entire dataset) and parse them in a RDFLib graph it's not possible or convenient, e.g. dumping the entire Wikidata graph would require you ~60GB storage (only for the zipped file!). Moreover, while the online data keep being updated, the local copy goes easily **out-to-date**.

RDFLib allows you to **query remote SPARQL endpoints** and to get up-to-date result data in the same JSON format you'd get if you dump result data from the interface (See above section 1.9). In order to query remote endpoints we use both RDFLib and an extended version of it, called `SPARQLWrapper`.

In order to query a remote SPARQL endpoint you'll need:

 * If you have mac *you may need* to tweak the certificates (use an unverified one) for querying an external service (import ssl and copy/paste the line below) 
 * get the URL of the API of the SPARQL endpoint (sometimes it corresponds to the URL of the interface for querying via SPARQL, sometimes the URL is different!)
 * prepare the SPARQL query (this means you need also to study how data are organised in the source to be queried! try the query on the GUI of the SPARQL endpoint - if available - to see if the query you wrote is correct)
 * create the wrapper around the SPARQL API (via SPARQLWrapper library)
 * send the query and get the JSON results

In [1]:
from SPARQLWrapper import SPARQLWrapper, JSON
import ssl

ssl._create_default_https_context = ssl._create_unverified_context

# get the endpoint API
wikidata_endpoint = "https://query.wikidata.org/bigdata/namespace/wdq/sparql"

# prepare the query : 10 random triples
my_SPARQL_query = """
SELECT *
WHERE {?s ?p ?o}
LIMIT 10
"""

# set the endpoint 
sparql_wd = SPARQLWrapper(wikidata_endpoint)
# set the query
sparql_wd.setQuery(my_SPARQL_query)
# set the returned format
sparql_wd.setReturnFormat(JSON)
# get the results
results = sparql_wd.query().convert()

# manipulate the result
for result in results["results"]["bindings"]:
    print(result["s"]["value"], result["p"]["value"], result["o"]["value"])

http://wikiba.se/ontology#Dump http://creativecommons.org/ns#license http://creativecommons.org/publicdomain/zero/1.0/
http://wikiba.se/ontology#Dump http://schema.org/softwareVersion 1.0.0
http://wikiba.se/ontology#Dump http://schema.org/dateModified 2020-03-02T23:00:02Z
http://wikiba.se/ontology#Dump http://schema.org/dateModified 2020-03-02T23:24:15Z
http://wikiba.se/ontology#Dump http://schema.org/dateModified 2020-03-02T23:24:16Z
http://wikiba.se/ontology#Dump http://schema.org/dateModified 2020-03-02T23:24:19Z
http://wikiba.se/ontology#Dump http://schema.org/dateModified 2020-03-02T23:24:20Z
http://wikiba.se/ontology#Dump http://schema.org/dateModified 2020-03-02T23:24:24Z
http://wikiba.se/ontology#Dump http://schema.org/dateModified 2020-03-02T23:51:40Z
http://wikiba.se/ontology#Dump http://schema.org/dateModified 2020-03-02T23:51:41Z


### 3.1 Integrate art historians' birth places from Wikidata

In the last tutorial we saw how to add triples about historians' birthplaces (`wdt:P19`). We were able to add this information by looking in wikidata for the URI of the place of a certain historian and by adding the new triple manually to the graph. Now we can do this operation **systematically**, meaning that:

*For every art historian in ARTchives that corresponds to an individual available in Wikidata, we get their birthplaces and we add this information to our graph*

To do that we need to:

 * get the list of historians in ARTchives that are also available in Wikidata
 * prepare a SPARQL query that returns the birthplace 
 * for each of them, send a query to Wikidata to get the birthplace
 * if there is a result, we add a triple to our graph
 
**GET THE HISTORIANS' URIs** We need to take into account a couple of caveats.
 
 * How to distinguish historians from other people that are mentioned in ARTchives? We use the pattern `?collection wdt.P170(has creator) ?creator`, which is the only mandatory predicate that distinguishes historians from other people.
 * How do we select only the historians that are both in ARTchives and Wikidata? We match a substring in the URI (if it includes the substring "wikidata.org/entity/", we are sure this is a Wikidata entity.

So:

 * We first iterate over the triples in the graph to get the URIs of the historians. 
 * Then we transform these URIs (that RDFLIb considers as `RDFLib.URIRef`) to strings. We know that we will use this list of URIs in a SPARQL query, possibly in the `VALUES` operator, therefore we wrap the strings in the hooks `<>` 
 * and we add these URIs to a set (a list of unique values)

In [9]:
# import all we need
from rdflib import Namespace , Literal , URIRef
from rdflib.namespace import RDF , RDFS

# bind the uncommon namespaces
wd = Namespace("http://www.wikidata.org/entity/") # remember that a prefix matches a URI until the last slash (or hashtag #)
wdt = Namespace("http://www.wikidata.org/prop/direct/")
art = Namespace("https://w3id.org/artchives/")

# Get the list of art historians in our graph "g"
arthistorians_list = set()

# iterate over the triples in the graph
for s,p,o in g.triples(( None, wdt.P170, None)):   # people "o" are the creator "wdt.P170" of a collection "s"
    if "wikidata.org/entity/" in str(o):           # look for the substring to filter wikidata entities only
        arthistorians_list.add('<' + str(o) + '>')     # remember to transform them in strings! 
    
print(arthistorians_list)

{'<http://www.wikidata.org/entity/Q41616785>', '<http://www.wikidata.org/entity/Q18935222>', '<http://www.wikidata.org/entity/Q3051533>', '<http://www.wikidata.org/entity/Q60185>', '<http://www.wikidata.org/entity/Q55453618>', '<http://www.wikidata.org/entity/Q2824734>', '<http://www.wikidata.org/entity/Q1296486>', '<http://www.wikidata.org/entity/Q1089074>', '<http://www.wikidata.org/entity/Q85761254>', '<http://www.wikidata.org/entity/Q1641821>', '<http://www.wikidata.org/entity/Q1271052>', '<http://www.wikidata.org/entity/Q88907>', '<http://www.wikidata.org/entity/Q19997512>', '<http://www.wikidata.org/entity/Q1712683>', '<http://www.wikidata.org/entity/Q457739>', '<http://www.wikidata.org/entity/Q1715096>', '<http://www.wikidata.org/entity/Q61913691>', '<http://www.wikidata.org/entity/Q537874>', '<http://www.wikidata.org/entity/Q1373290>', '<http://www.wikidata.org/entity/Q90407>', '<http://www.wikidata.org/entity/Q6700132>', '<http://www.wikidata.org/entity/Q995470>', '<http://www

**QUERY THE ENDPOINT** Now that we have the list of historians we have two options:
 * **MMMNEE..** for each of them, we prepare a query to be sent to Wikidata. However, this implies sending many small queries to an external service that may have (reasonably) imposed a query limit (If you ever get an error `429: Too Many requests`, see [here](https://stackoverflow.com/questions/62396801/how-to-handle-too-many-requests-on-wikidata-using-sparqlwrapper) the reason.)
 * **BETTER** we send only one (bigger) query to the Wikidata endpoint where all the historians are terms specified in a `VALUES` list. The result table of our query will include for every row (1) the URI of the historian, (2) the URI of the birthplace, and (3) the label of the birth place *in english only!* (be aware that Wikidata has labels for every language!)
 
**INTEGRATE THE DATA INTO THE GRAPH** Once the wrapper and the query are set we manipulate results:
 
 * only if the birthplace is found we look also for its label 
 * only for those birthplaces that have both URI and label we create a new triple to be added to our graph.

In [10]:
# prepare the values to be queried
historians = ' '.join(arthistorians_list) # <uri1> <uri2> <uri3> ... <uriN>

# prepare the query
birthplace_query = """
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
SELECT DISTINCT ?historian ?birthplace ?birthplace_label
WHERE {
    VALUES ?historian {"""+historians+"""} . # look how we include a variable in a query string!
    ?historian wdt:P19 ?birthplace . 
    ?birthplace rdfs:label ?birthplace_label .
    FILTER (langMatches(lang(?birthplace_label), "EN"))
    } 
"""

# set the endpoint 
sparql_wd = SPARQLWrapper(wikidata_endpoint)
# set the query
sparql_wd.setQuery(birthplace_query)
# set the returned format
sparql_wd.setReturnFormat(JSON)
# get the results
results = sparql_wd.query().convert()

# manipulate the result
for result in results["results"]["bindings"]:
    historian_uri = result["historian"]["value"]
    print("historian:", historian_uri)
    if "birthplace" in result: # some historians may have no birthplace recorded in Wikidata!
        birthplace = result["birthplace"]["value"]
        if "birthplace_label" in result: 
            birthplace_label = result["birthplace_label"]["value"]
            print("found:", birthplace, birthplace_label)
            
            # only if both uri and label are found we add them to the graph
            g.add(( URIRef(historian_uri) , URIRef(wdt.P19) , URIRef(birthplace) ))
            g.add(( URIRef(birthplace) , RDFS.label , Literal(birthplace_label) ))
    else:
        print("nothing found in wikidata :(")

historian: http://www.wikidata.org/entity/Q18935222
found: http://www.wikidata.org/entity/Q64 Berlin
historian: http://www.wikidata.org/entity/Q18935222
found: http://www.wikidata.org/entity/Q64 Berlin
historian: http://www.wikidata.org/entity/Q18935222
found: http://www.wikidata.org/entity/Q64 Berlin
historian: http://www.wikidata.org/entity/Q1629748
found: http://www.wikidata.org/entity/Q64 Berlin
historian: http://www.wikidata.org/entity/Q1629748
found: http://www.wikidata.org/entity/Q64 Berlin
historian: http://www.wikidata.org/entity/Q1629748
found: http://www.wikidata.org/entity/Q64 Berlin
historian: http://www.wikidata.org/entity/Q537874
found: http://www.wikidata.org/entity/Q84 London
historian: http://www.wikidata.org/entity/Q537874
found: http://www.wikidata.org/entity/Q84 London
historian: http://www.wikidata.org/entity/Q537874
found: http://www.wikidata.org/entity/Q84 London
historian: http://www.wikidata.org/entity/Q1089074
found: http://www.wikidata.org/entity/Q220 Rome
h

**STORE** Now that we have added all these new triples to our in-memory graph, we can store these data into a new file.

In [11]:
g.serialize(destination='../resources/artchives_birthplaces.nq', format='nquads')

## For the future yourself

For the sake of the project you have to:
 
 * query and manipulate artchives locally (use the `artchives.nq` file) via RDFlib methods or local SPARQL queries
 * query external sources remotely (Wikidata and others) using SPARQLWrapper
 * save the data extracted from external sources along with artchives data locally (create a new file)

## Resources 

Some SPARQL tutorials:
  
 * [SPARQL Tutorial - Apache](https://jena.apache.org/tutorials/sparql.html)
 * [SPARQL tutorial - stardog](https://www.stardog.com/tutorials/sparql/)
 * [SPARQL tutorial - Programming historian](https://programminghistorian.org/en/lessons/retired/graph-databases-and-SPARQL)
 * [SPARQL tutorial - Wikidata](https://www.wikidata.org/wiki/Wikidata:SPARQL_tutorial) - very useful for your project! It teaches you how to query data on Wikidata 
 
SPARQL and RDFLib:
   
 * [RDFLib documentation on SPARQL](https://rdflib.readthedocs.io/en/stable/intro_to_sparql.html)
 
SPARQLWrapper:
 
 * [SPARQLWrapper documentation](https://sparqlwrapper.readthedocs.io/en/latest/main.html)
 
Wikidata resources:
 * [index of categories](https://www.wikidata.org/wiki/Category:Wikidata:SPARQL_query_service)