Skip to content
No description, website, or topics provided.
Java
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
src
.gitignore
LICENSE
README.md
pom.xml

README.md

ElasticTriples

Elasticsearch powered triple storage.

Preparation: Elasticsearch installation

To use ElasticTriples, you have to install Elasticsearch. That can by done directly or via Docker:

Preparation: Big data

If you want to process big files, the data should be available in N-Triples format (instead of e.g. Turtle). A software for that is RDF2RDF. After installing it, use {GOPATH}/bin/rdf2rdf -in=in.ttl -out=out.nt to transform you data.

Import data

An import of around 90 million triples can be performed in around 77 minutes (89,902,895 triples; 10.3 GB in Turtle format; 16.3 GB in N-Triples format; import time: 4642.451 seconds). A code example is given in OpalImport.java.

Query data

A search query takes around 2-3 seconds. E.g. extracting one (out of a million) DCAT-dataset with 206 triples uses 2,281 queries inside 3 multi-queries. A code example is given in OpalQuery.java.

Split data

Splitting data is done by requesting single dataset graphs (each 2-3 seconds) and writing the resulting data no files in N-Triples format. A code example is given in OpalSplitter.java.

Filter data

Data can be filtered based on language tags of title literals. A code example is given in OpalSplitter.java.

Credits

Data Science Group (DICE) at Paderborn University

This work has been supported by the German Federal Ministry of Transport and Digital Infrastructure (BMVI) in the project Open Data Portal Germany (OPAL) (funding code 19F2028A).

You can’t perform that action at this time.