# dbptools

An example module illustrating queries for the DBpedia SPARQL endpoint.

## DBpedia & SPARQL

[DBpedia](https://dbpedia.org/about) is a database that compiles structured information from Wikipedia and other Wikimedia projects. This information can be queried from a computer program without having to scrape and process the raw text of Wikipedia articles.

Entries in DBpedia are structured as linked data. Each entity (for example [Angela Merkel](https://dbpedia.org/page/Angela_Merkel) or [candiru](https://dbpedia.org/page/Candiru)) is of a particular type (for example 'agent' or 'fish'), and has various properties or relations (for example 'birthDate' or 'taxon').

[SPARQL](https://en.wikipedia.org/wiki/SPARQL) is a query language similar to SQL, with some additional features specifically designed for querying linked data. DBpedia provides an [online form](https://dbpedia.org/sparql) for testing SPARQL queries against the database.

The example module [dbptools.py](dbptools.py) in this repository illustrates basic use of the DBpedia SPARQL endpoint from Python. It relies heavily on an existing Python package for handling SPARQL queries, [SPARQLWrapper](https://github.com/RDFLib/sparqlwrapper).

In [1]:
import dbptools

## DBPEntity

The module uses the `DBPEntity` class to represent entities in DBPedia. To initialize an instance of the class, supply the name of an entry from Wikipedia. The optional argument `verbose` toggles printing the text of SPARQL queries.

In [2]:
entity = dbptools.DBPEntity('Angela Merkel', verbose=True)


PREFIX dbpedia: <http://dbpedia.org/resource/>
PREFIX dbo: <http://dbpedia.org/ontology/>
ASK WHERE { dbpedia:Angela_Merkel a owl:Thing }


We can see that one query was issued already at initialization. This `ASK` query checks that the requested entity exists in DBPedia, by asking whether it is an instance of the base class `Thing`.

If the requested entity is not in DBPedia, we get a `NotInDBPedia` exception.

In [3]:
dbptools.DBPEntity('Angelika Murkle')

NotInDBPediaError: ('Angelika_Murkle', 'not in DBPedia.')

At initialization, the requested name is also 'resolved'. This involves first converting to Wikipedia format (with spaces replaced by underscores), and checking whether the name redirects to another entity.

The final resolved Wikipedia entity name is available in the `resolved_name` attribute.

In [4]:
entity = dbptools.DBPEntity('Angie Merkel', verbose=True)

entity.resolved_name


PREFIX dbpedia: <http://dbpedia.org/resource/>
PREFIX dbo: <http://dbpedia.org/ontology/>
ASK WHERE { dbpedia:Angie_Merkel a owl:Thing }

PREFIX dbpedia: <http://dbpedia.org/resource/>
PREFIX dbo: <http://dbpedia.org/ontology/>
SELECT ?entity WHERE {dbpedia:Angie_Merkel dbo:wikiPageRedirects ?entity} LIMIT 1


'Angela_Merkel'

## Example methods

A few methods illustrate how `DPBEntity` could be extended to make other queries about an entity.

For example `is_person()` or `is_politician()`.

In [5]:
entity.is_person()


PREFIX dbpedia: <http://dbpedia.org/resource/>
PREFIX dbo: <http://dbpedia.org/ontology/>
ASK WHERE { dbpedia:Angela_Merkel a dbo:Person }


True

In [6]:
entity.is_politician()


PREFIX dbpedia: <http://dbpedia.org/resource/>
PREFIX dbo: <http://dbpedia.org/ontology/>
ASK WHERE { dbpedia:Angela_Merkel a dbo:Politician }


True

Many such queries are made simpler by the [DBPedia ontology](https://dbpedia.org/ontology/), which defines classes such as `Person` and `Politician`. But sometimes there is no pre-defined class for the query we wish to make. For example, to find out if a person is dead, we have to check various different death-related properties.

In [7]:
entity.is_dead()


PREFIX dbpedia: <http://dbpedia.org/resource/>
PREFIX dbo: <http://dbpedia.org/ontology/>
ASK WHERE { dbpedia:Angela_Merkel dbo:deathPlace|dbo:deathDate|dbo:deathCause|dbo:bodyDiscovered|dbo:placeOfBurial|dbo:deathYear|dbo:causeOfDeath|dbo:dateOfBurial|dbo:deadInFightDate|dbo:deadInFightPlace|dbo:deathAge ?value }


False