Skip to content

simenheg/sparql-endpoint-analyzer

Repository files navigation

SPARQL endpoint analyzer

This is the SPARQL endpoint analyzer component of PepeSearch and friends: tools developed at the University of Oslo for simple exploration and querying of Linked Open Data sets.

Try PepeSearch in action with data from the Norwegian Company Registry. There is also an advanced version.

The user interface of PepeSearch employs a configuration file in order to present the user with available concepts, links and properties. This program automates the task of creating that configuration file.

Download

Usage

Source- and binary versions are run in the same way:

./sparql-endpoint-analyzer.lisp settings.conf 1>output.js

This command reads the configuration file settings.conf, and puts the result into the file output.js.

Configuration

A configuration file for the SPARQL endpoint analyzer consists of 11 sections, described below. For a sample configuration file, see the next section.

  • endpoint: URL of the SPARQL endpoint of interest.
  • username: optional username for HTTP Basic authentication.
  • password: optional password for HTTP Basic authentication.
  • output-level: how verbose the program should be. Set this to “debug” to get a detailed log of all queries sent.
  • hard-limit: hard limit for each query sent, before filtering is done. This affects the number of retrieved concepts, literals, links and subclass relations.
  • page-limit: maximum number pages to retrieve after getting a timeout, and attempting a paged retrieval.
  • results-per-page-limit: maximum number of results per page during a paged retrieval.
  • prefixes: list of predefined prefixes for convenient naming of JavaScript variables.
  • exclusive-whitelist: list of URI prefixes. Results will only include URIs prefixed by a string in this list. An empty list disables this feature. When enabled, this feature overrides the normal black- and whitelist.
  • blacklist: list of URI prefixes. Results will not include URIs prefixed by a string in this list, unless listed in whitelist. An empty list disables this feature.
  • whitelist: list of URI prefixes. Results will not include URIs prefixed by strings listed in blacklist, unless they’re prefixed by a string in this list. Useful when one wants to whitelist particular URIs in a blacklisted domain.

Sample Configuration File

The following configuration file can be used to slurp the Norwegian Entity Registry endpoint:

# Sample configuration file to retrieve information from the Norwegian Entity
# Registry (http://data.computas.com/)

[endpoint]
http://data.computas.com:3030/sparql

[username]

[password]

[output-level]
normal

[hard-limit]
10000

[page-limit]
5

[results-per-page-limit]
10000

[prefixes]
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX lok: <http://data.computas.com/informasjonsmodell/lokasjon/>
PREFIX nace: <http://data.computas.com/enhetsregisteret/nace/>
PREFIX org: <http://data.computas.com/informasjonsmodell/organisasjon/>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX reg: <http://data.computas.com/informasjonsmodell/regnskapsregisteret/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

[exclusive-whitelist]

[whitelist]
http://www.w3.org/2000/01/rdf-schema#label

[blacklist]
http://www.w3.org/1999/02/22-rdf-syntax-ns#
http://www.w3.org/2002/07/owl#
http://www.w3.org/2004/02/skos/core#
http://www.w3.org/2000/01/rdf-schema#

What is collected

The PepeSearch configuration file is in the JSON (JavaScript Object Notation) data format, which integrates naturally with the existing JavaScript code base. Four categories of data are collected in this file:

  • Types
  • Object properties
  • Datatype properties
  • Subclass relations

For each of these categories, further details are elaborated on in the sections that follow.

Types

Every type found in the dataset is recorded. That is, every ?type matched by the following RDF triple:

?concept a ?type .

Types are mapped to concepts in the user interface. Together with its URI, each type entry also contains a short ID for convenience, a human-readable label with possible translations, the ID of a human-readable datatype property for use in the interface, and whether or not the type has any subtypes.

Example entry:

{
    "id": "foaf_Person",
    "uri": "http://xmlns.com/foaf/0.1/Person",
    "label": {
        "en": "Person"
    },
    "display": "foaf_name",
    "primary": true
}

Object properties

We define an object property as any RDF property linking two resources that have an RDF type. That is, every ?object_property matched by the following RDF graph:

?subject a ?subject_type .
?subject ?object_property ?object .
?object a ?object_type .

Object properties are mapped to incoming- and outgoing links in the user interface. Objects become targets of the subjects’ outgoing links, while the subjects become target of the objects’ incoming links.

Datatype properties

We define datatype properties as literals linked to by concepts via any property. That is, every ?literal matched by the following RDF graph, filtered by the isLiteral SPARQL predicate:

?concept a ?type .
?concept ?property ?literal .

Subclass relations

Subclasses are defined by the rdfs:subClassOf property. That is, every ?subclass matched by the following RDF graph, where ?subclass?class:

?subclass rdfs:subClassOf ?class .

About

SPARQL endpoint analyzer component of PepeSearch and friends

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages