# Working with pywikibot, Wikidata API, and SPARQL

In my reconciliation work, my task is usually to take a string, throw that against an API to search, and evaluate the responses to make the linkage between that string to a URI. This is what's meant by the saying "from strings to things."  

A specific use case I've encountered will require a different approach. The use case is: I have already reconciled some strings to VIAF. I then have a list of VIAF IDs. Wikidata is known to have VIAF IDs within its entities. So how can I reconcile the VIAF IDs to Wikidata? 

There's relatively simple ways to do this assuming I have one VIAF ID. But what if I have thousands? 

This notebook will first explore the `pywikibot` python program to explore the Wikidata API and see how the data is structured and modeled. We'll then see if we can tackle our problem using SPARQL. 

## Setup and Working through the pywikibot tutorial 

Of course we should always start by reading the manual, and helpfully Wikidata has its own tutorial for `pywikibot`. The following section will outline how to explore an item/"page". While this isn't our use case, we can start to understand what "claims" are. Claims are important because VIAF IDs, which is what we are after, have a specific claim/property: [P214](https://www.wikidata.org/wiki/Property:P214)

In [98]:
import pywikibot

In [99]:
site = pywikibot.Site("wikidata", "wikidata")
repo = site.data_repository()
item = pywikibot.ItemPage(repo, "Q76")

In [100]:
item_dict = item.get()
clm_dict = item_dict["claims"]
clm_list = clm_dict["P214"]

At this point we could just print the claim value to get a FAST ID. But first we should explore the structure of this specific claim in case we'll need that info later

In [101]:
for clm in clm_list:
    print(clm.toJSON())

{'rank': 'preferred', 'references': [{'snaks': {'P143': [{'datavalue': {'value': {'numeric-id': 8447, 'entity-type': 'item'}, 'type': 'wikibase-entityid'}, 'datatype': 'wikibase-item', 'snaktype': 'value', 'property': 'P143'}]}, 'snaks-order': ['P143'], 'hash': 'd4bd87b862b12d99d26e86472d44f26858dee639'}], 'mainsnak': {'datavalue': {'value': '52010985', 'type': 'string'}, 'datatype': 'external-id', 'snaktype': 'value', 'property': 'P214'}, 'type': 'statement', 'id': 'q76$9AF526A1-C489-4E26-93E0-B831DE7EC2AD'}


Now we can just print the VIAF ID for Obama

In [102]:
for clm in clm_list:
    clm_trgt = clm.getTarget()
    print(clm_trgt)

52010985


And it checks out. The [Wikidata page](https://www.wikidata.org/wiki/Q76) confirms this is the correct VIAF ID. We're starting to see what we might need for our use case. 

## Making a SPARQL query with a VIAF ID

Can we come at this in the opposite way? That is, can we take a VIAF ID and query wikidata via SPARQL? It's actually pretty simple, which should make sense given how SPARQL queries can handle queries with very specific parameters (just like SQL).

From the [Wikidata Query Service](https://query.wikidata.org/), we can run the following query:
```
PREFIX wikibase: <http://wikiba.se/ontology#>
PREFIX wd: <http://www.wikidata.org/entity/> 
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT ?person ?personLabel WHERE {
  ?person wdt:P214 "52010985"   
  SERVICE wikibase:label {
    bd:serviceParam wikibase:language "en" .
  }
}
```

The [result](http://tinyurl.com/llsqors) brings back the correct "Q" wikidata item/page from above, Q76, with the correct label.

We did this using Wikidata's own GUI interface. It's convenient, has autocomplete, and infers all kinds of things as you type out the query. But we need to figure out how to do this via python. Let's run the same query using the python library [SPARQLWrapper](https://rdflib.github.io/sparqlwrapper/).  

In [103]:
import SPARQLWrapper
from SPARQLWrapper import SPARQLWrapper, JSON

In [104]:
sparql = SPARQLWrapper("https://query.wikidata.org/sparql")
sparql.setQuery("""    
    SELECT ?person ?personLabel
    WHERE {
  ?person wdt:P214 "52010985"   
  SERVICE wikibase:label {
    bd:serviceParam wikibase:language "en"
  }
}
""")
sparql.setReturnFormat(JSON)
results = sparql.query().convert()

for result in results["results"]["bindings"]:
    print(result["personLabel"]["value"])
    print(result["person"]["value"])

Barack Obama
http://www.wikidata.org/entity/Q76


This is what we're looking for. We get the label, and the exact URI for Q76 in Wikidata.  

## Iterating a SPARQL query, using a list of values 

We're getting close to solving our original problem. We have successfully queried Wikidata with a VIAF ID and have found the Wikidata URI it matches. But this is so far a one-by-one operation. We need to be able to run this same query, except we want to substitute in all of our VIAF IDs each time, potentially thousands. Knowing the basics of python, you could probably guess we need a `for` loop. And it would make sense to make our list of VIAF IDs into a list that we can then loop over.  

In [168]:
sparql = SPARQLWrapper("https://query.wikidata.org/sparql")
viaf_id = ["52010985", "34562701", "108815043", "76323201"]


for f in viaf_id:
    queryString = 'SELECT ?person ?personLabel WHERE { ?person wdt:P214 "' + f + '" SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }}'
    sparql.setQuery(queryString)
    sparql.setReturnFormat(JSON)
    result1 = sparql.query().convert()
    print(queryString)
    for result in result1["results"]["bindings"]:
        print(result["personLabel"]["value"])
        print(result["person"]["value"])

SELECT ?person ?personLabel WHERE { ?person wdt:P214 "52010985" SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }}
Barack Obama
http://www.wikidata.org/entity/Q76
SELECT ?person ?personLabel WHERE { ?person wdt:P214 "34562701" SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }}
Herbert York
http://www.wikidata.org/entity/Q1609351
SELECT ?person ?personLabel WHERE { ?person wdt:P214 "108815043" SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }}
SELECT ?person ?personLabel WHERE { ?person wdt:P214 "76323201" SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }}
Avrum Stroll
http://www.wikidata.org/entity/Q4829518
