<a href="https://colab.research.google.com/github/royn5618/AILearningPath.io/blob/master/EuroPython2021/SPARQLWrapper_Demo.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Install SPARQLWrapper**

SPARQLWrapper is a simple Python wrapper around a SPARQL service to remotely execute your queries. It helps in creating the query invokation and, possibly, convert the result into a more manageable format.

**Ref** - https://github.com/RDFLib/sparqlwrapper

In [1]:
!pip install SPARQLWrapper

Collecting SPARQLWrapper
  Downloading SPARQLWrapper-1.8.5-py3-none-any.whl (26 kB)
Collecting rdflib>=4.0
  Downloading rdflib-6.0.0-py3-none-any.whl (376 kB)
[?25l[K     |▉                               | 10 kB 23.0 MB/s eta 0:00:01[K     |█▊                              | 20 kB 29.6 MB/s eta 0:00:01[K     |██▋                             | 30 kB 35.0 MB/s eta 0:00:01[K     |███▌                            | 40 kB 6.8 MB/s eta 0:00:01[K     |████▍                           | 51 kB 8.2 MB/s eta 0:00:01[K     |█████▏                          | 61 kB 7.9 MB/s eta 0:00:01[K     |██████                          | 71 kB 8.4 MB/s eta 0:00:01[K     |███████                         | 81 kB 9.3 MB/s eta 0:00:01[K     |███████▉                        | 92 kB 10.2 MB/s eta 0:00:01[K     |████████▊                       | 102 kB 11.1 MB/s eta 0:00:01[K     |█████████▋                      | 112 kB 11.1 MB/s eta 0:00:01[K     |██████████▍                     | 122 kB 11.1 

Next import related python libraries:

In [2]:
from SPARQLWrapper import SPARQLWrapper, JSON, N3
import pandas as pd

Set the SPARQL endpoint:

In [3]:
sparql = SPARQLWrapper('https://dbpedia.org/sparql')

**Create your query**

This query gets the label of the resource called Python. 'Python' is the **thing** or the **subject** here and we are refering to the **property** called 'rdfs:label' (whish is the **predicate** in rdf language) whose value is going to be stored in the **object variable** - ?object.

In [4]:
sparql.setQuery('''
    SELECT ?object
    WHERE { dbr:Python rdfs:label ?object .}
''')

Specify the return format. Here we are using JSON:

In [5]:
sparql.setReturnFormat(JSON)

Execute the query:

In [6]:
results = sparql.query().convert()

Convert the JSON response to a Pandas dataframe using pd.json_normalize():

In [7]:
results = pd.json_normalize(results['results']['bindings'])

In [8]:
results.head()

Unnamed: 0,object.type,object.xml:lang,object.value
0,literal,en,Python
1,literal,ar,بايثون (توضيح)
2,literal,cs,Python (rozcestník)
3,literal,de,Python
4,literal,eo,Pitono (apartigilo)


In [9]:
results.shape

(18, 3)

Note that this dataframe has 18 results. This is because of multilingual responses. A label in each of the different languages is an instance in the response.

Create a function that contains all the query execution steps:

In [10]:
def exec_query(sparql, query):
    sparql.setQuery(query)
    sparql.setReturnFormat(JSON)
    results = sparql.query().convert()
    results = pd.json_normalize(results['results']['bindings'])
    return results

The word 'Python' could mean different things.

In this query, I have added a property called dbo:wikiPageDisambiguates which lists down the wiki page links of the ambiguities in the word 'Python'.

Check out the DBPedia resource here - https://dbpedia.org/page/Python


In [11]:
query = ''' SELECT ?object, ?disamb 
WHERE { dbr:Python rdfs:label ?object; dbo:wikiPageDisambiguates ?disamb .} '''
query

' SELECT ?object, ?disamb \nWHERE { dbr:Python rdfs:label ?object; dbo:wikiPageDisambiguates ?disamb .} '

In [12]:
results = exec_query(sparql, query)

In [13]:
results.head()

Unnamed: 0,object.type,object.xml:lang,object.value,disamb.type,disamb.value
0,literal,in,Python,uri,http://dbpedia.org/resource/Monty_Python
1,literal,in,Python,uri,http://dbpedia.org/resource/Python_(Ford_proto...
2,literal,in,Python,uri,http://dbpedia.org/resource/Python_(Monty)_Pic...
3,literal,in,Python,uri,http://dbpedia.org/resource/Python_(mythology)
4,literal,in,Python,uri,http://dbpedia.org/resource/Python_(nuclear_pr...


Check that two new columns are added, one to describe the type of the variable ?disamb and another to store the value of the variable disamb.

In [14]:
results.shape

(378, 5)

Now, to look at Python as programming language, this is our query.

If you click on the link in dbo:wikiPageDisambiguates in the resource page https://dbpedia.org/page/Python and navigate to 
https://dbpedia.org/page/Python_(programming_language).

You should be able to see all the properties of Python programming language now. From there, I have constructed this query to get the label and the abstract, which is usually a paragraph containing the description of the thing we are at.

In [15]:
query = ''' SELECT ?label, ?abstract  
WHERE { dbr:Python_\(programming_language\) rdfs:label ?label; dbo:abstract ?abstract .} '''
query

' SELECT ?label, ?abstract  \nWHERE { dbr:Python_\\(programming_language\\) rdfs:label ?label; dbo:abstract ?abstract .} '

In [16]:
results = exec_query(sparql, query)
results.head()

Unnamed: 0,label.type,label.xml:lang,label.value,abstract.type,abstract.xml:lang,abstract.value
0,literal,en,Python (programming language),literal,es,Python es un lenguaje de programación interpre...
1,literal,ar,بايثون (لغة برمجة),literal,es,Python es un lenguaje de programación interpre...
2,literal,ca,Python,literal,es,Python es un lenguaje de programación interpre...
3,literal,cs,Python,literal,es,Python es un lenguaje de programación interpre...
4,literal,de,Python (Programmiersprache),literal,es,Python es un lenguaje de programación interpre...


In [17]:
results.shape

(420, 6)

Check that we have 420 instances. This is because of the multiple languages. Feel free to explore what is available.

Filtering is also possible using SPARQL query. In the next query, I am filtering by Engish language of the label and the abstract.

In [18]:
query = ''' SELECT ?label, ?abstract  
WHERE { dbr:Python_\(programming_language\) rdfs:label ?label; dbo:abstract ?abstract .
FILTER(LANG(?abstract) = "en" && LANG(?label) = "en")} '''
query

' SELECT ?label, ?abstract  \nWHERE { dbr:Python_\\(programming_language\\) rdfs:label ?label; dbo:abstract ?abstract .\nFILTER(LANG(?abstract) = "en" && LANG(?label) = "en")} '

In [19]:
results = exec_query(sparql, query)
results.head()

Unnamed: 0,label.type,label.xml:lang,label.value,abstract.type,abstract.xml:lang,abstract.value
0,literal,en,Python (programming language),literal,en,Python is an interpreted high-level general-pu...


In [20]:
results.shape

(1, 6)

Check that there is only one instance now, which is the English language version of the label and abstract.

**More Resources**


1. https://pypi.org/project/SPARQLWrapper/
2. https://readthedocs.org/projects/sparqlwrapper/
3. Endpoint - https://dbpedia.org/sparql (Test your queries here)
4. SPARQL Query Language for RDF - https://www.w3.org/TR/rdf-sparql-query/