# dbptools

An example module illustrating queries for the DBpedia SPARQL endpoint.

## DBpedia & SPARQL

[DBpedia](https://dbpedia.org/about) is a database that compiles structured information from Wikipedia and other Wikimedia projects. This information can be queried from a computer program without having to scrape and process the raw text of Wikipedia articles.

Entries in DBpedia are structured as linked data. Each entity (for example [Ronald Reagan](https://dbpedia.org/page/Ronald_Reagan) or [baked Alaska](https://dbpedia.org/page/Baked_Alaska)) is of a particular type (for example 'agent' or 'dessert'), and has various properties or relations (for example 'birthDate' or 'ingredient').

[SPARQL](https://en.wikipedia.org/wiki/SPARQL) is a query language similar to SQL, with some additional features specifically designed for querying linked data. DBpedia provides an [online form](https://dbpedia.org/sparql) for testing SPARQL queries against the database.

The example module [dbptools.py](dbptools.py) in this repository illustrates basic use of the DBpedia SPARQL endpoint from Python. It relies heavily on an existing Python package for handling SPARQL queries, [SPARQLWrapper](https://github.com/RDFLib/sparqlwrapper).

In [1]:
import dbptools

## Formatting Wikipedia names

Entries in Wikipedia and in DBpedia begin with uppercase letters and represent spaces using the underscore. The module provides a convenience function `format_name()` for converting terms to this format.

In [2]:
dbptools.format_name('jellied eels')

'Jellied_eels'

## Checking if an entry exists

The module provides a function `entry_exists()` for checking whether an entry is present in the database.

In [3]:
dbptools.entry_exists('jellied eels')

True

In [4]:
dbptools.entry_exists('some non-notable person I met once at a convention')

False

## Verbose mode

The constant `VERBOSE` controls whether the text of the SPARQL query is printed every time a query is made. This can be useful for debugging.

In [5]:
dbptools.VERBOSE = True

dbptools.entry_exists('jellied eels')


PREFIX dbpedia: <http://dbpedia.org/resource/>
PREFIX dbo: <http://dbpedia.org/ontology/>
ASK WHERE { dbpedia:Jellied_eels a owl:Thing }


True

Verbose mode can also be useful for learning something about the workings of SPARQL. In the example above we can see that the main query (in curly braces) asks whether the requested entry is of class `Thing`. This is the base class in the database and so all entries that are present will be of this type.

We also see that two 'prefixes' have been prepended to the query automatically. For example, the `dbpedia` prefix points to the DBpedia database `http://dbpedia.org/resource/`. In the main query, the queried entry is preceded by this prefix.

The main query is an `ASK` query, which asks a yes/no question. In this case, the question is: Is Jellied_eels a Thing?

We can leave verbose mode on for the remaining example to see more queries in action.

## Example functions

The module provides three more functions as an illustration of other simple `ASK` queries.

`is_person()` asks whether an entry refers to a person.

In [6]:
dbptools.is_person('Nicki Minaj')


PREFIX dbpedia: <http://dbpedia.org/resource/>
PREFIX dbo: <http://dbpedia.org/ontology/>
ASK WHERE { dbpedia:Nicki_Minaj a owl:Thing }

PREFIX dbpedia: <http://dbpedia.org/resource/>
PREFIX dbo: <http://dbpedia.org/ontology/>
ASK WHERE { dbpedia:Nicki_Minaj a dbo:Person }


True

Again, this is accomplished by using the `a` predicate followed by a class. But this time, the class `Person` comes from a database of definitions (an 'ontology') provided as part of the DBpedia project. We can see this because it is preceded by the `dbo` prefix that is defined at the beginning of the query as pointing to the online location of this ontology, `http://dbpedia.org/ontology/`.

We can see also that the `entry_exists()` query was also issued first. This is because `is_person()` is decorated with a decorator provided in the module, called `check_exists()`. This decorator applies `entry_exists()` to the first argument of the decorated function, and raises `NotInDBPediaError` if the entry could not be found.

In [7]:
dbptools.is_person('some non-notable person I met once at a convention')


PREFIX dbpedia: <http://dbpedia.org/resource/>
PREFIX dbo: <http://dbpedia.org/ontology/>
ASK WHERE { dbpedia:Some_non-notable_person_I_met_once_at_a_convention a owl:Thing }


NotInDBPediaError: 'Some_non-notable_person_I_met_once_at_a_convention' not in DBpedia.

Two other functions `is_politician()` and `is_dead()` work in a similar way.