This is the SPARQL endpoint analyzer component of PepeSearch and friends: tools developed at the University of Oslo for simple exploration and querying of Linked Open Data sets.
Try PepeSearch in action with data from the Norwegian Company Registry. There is also an advanced version.
The user interface of PepeSearch employs a configuration file in order to present the user with available concepts, links and properties. This program automates the task of creating that configuration file.
- Explore the source tree
- As source (requires sbcl, version 1.0.44 or newer)
- As Linux binary: amd64 (from 22/05/14)
- As OS X binary: amd64 (from 29/04/14)
Source- and binary versions are run in the same way:
./sparql-endpoint-analyzer.lisp settings.conf 1>output.js
This command reads the configuration file settings.conf
, and puts the
result into the file output.js
.
A configuration file for the SPARQL endpoint analyzer consists of 11 sections, described below. For a sample configuration file, see the next section.
endpoint
: URL of the SPARQL endpoint of interest.username
: optional username for HTTP Basic authentication.password
: optional password for HTTP Basic authentication.output-level
: how verbose the program should be. Set this to “debug” to get a detailed log of all queries sent.hard-limit
: hard limit for each query sent, before filtering is done. This affects the number of retrieved concepts, literals, links and subclass relations.page-limit
: maximum number pages to retrieve after getting a timeout, and attempting a paged retrieval.results-per-page-limit
: maximum number of results per page during a paged retrieval.prefixes
: list of predefined prefixes for convenient naming of JavaScript variables.exclusive-whitelist
: list of URI prefixes. Results will only include URIs prefixed by a string in this list. An empty list disables this feature. When enabled, this feature overrides the normal black- and whitelist.blacklist
: list of URI prefixes. Results will not include URIs prefixed by a string in this list, unless listed inwhitelist
. An empty list disables this feature.whitelist
: list of URI prefixes. Results will not include URIs prefixed by strings listed inblacklist
, unless they’re prefixed by a string in this list. Useful when one wants to whitelist particular URIs in a blacklisted domain.
The following configuration file can be used to slurp the Norwegian Entity Registry endpoint:
# Sample configuration file to retrieve information from the Norwegian Entity
# Registry (http://data.computas.com/)
[endpoint]
http://data.computas.com:3030/sparql
[username]
[password]
[output-level]
normal
[hard-limit]
10000
[page-limit]
5
[results-per-page-limit]
10000
[prefixes]
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX lok: <http://data.computas.com/informasjonsmodell/lokasjon/>
PREFIX nace: <http://data.computas.com/enhetsregisteret/nace/>
PREFIX org: <http://data.computas.com/informasjonsmodell/organisasjon/>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX reg: <http://data.computas.com/informasjonsmodell/regnskapsregisteret/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
[exclusive-whitelist]
[whitelist]
http://www.w3.org/2000/01/rdf-schema#label
[blacklist]
http://www.w3.org/1999/02/22-rdf-syntax-ns#
http://www.w3.org/2002/07/owl#
http://www.w3.org/2004/02/skos/core#
http://www.w3.org/2000/01/rdf-schema#
The PepeSearch configuration file is in the JSON (JavaScript Object Notation) data format, which integrates naturally with the existing JavaScript code base. Four categories of data are collected in this file:
- Types
- Object properties
- Datatype properties
- Subclass relations
For each of these categories, further details are elaborated on in the sections that follow.
Every type found in the dataset is recorded. That is, every ?type
matched
by the following RDF triple:
?concept a ?type .
Types are mapped to concepts in the user interface. Together with its URI, each type entry also contains a short ID for convenience, a human-readable label with possible translations, the ID of a human-readable datatype property for use in the interface, and whether or not the type has any subtypes.
Example entry:
{
"id": "foaf_Person",
"uri": "http://xmlns.com/foaf/0.1/Person",
"label": {
"en": "Person"
},
"display": "foaf_name",
"primary": true
}
We define an object property as any RDF property linking two resources that
have an RDF type. That is, every ?object_property
matched by the
following RDF graph:
?subject a ?subject_type .
?subject ?object_property ?object .
?object a ?object_type .
Object properties are mapped to incoming- and outgoing links in the user interface. Objects become targets of the subjects’ outgoing links, while the subjects become target of the objects’ incoming links.
We define datatype properties as literals linked to by concepts via any
property. That is, every ?literal
matched by the following RDF graph,
filtered by the isLiteral SPARQL predicate:
?concept a ?type .
?concept ?property ?literal .
Subclasses are defined by the rdfs:subClassOf property. That is, every
?subclass
matched by the following RDF graph, where ?subclass
≠
?class
:
?subclass rdfs:subClassOf ?class .