Processing SPARQL TOP-k Queries Online with Web Preemption

This repository contains the source code, the configuration files, the queries and the datasets used in the experimental study presented in the paper Processing SPARQL TOP-k Queries Online with Web Preemption.

If you have any questions, feel free to contact the authors;

Julien AIMONIER-DAVAT via julien.aimonier-davat@univ-nantes.fr
Pascal MOLLI via pascal.molli@univ-nantes.fr
Hala SKAF-MOLLI via hala.skaf@univ-nantes.fr

Setup

To quickly get started, run the following commands on one machine which will install everything you need to reproduce our experimental results.

Clone and install the project.

Details

git clone https://github.com/momo54/sage-orderby-experiment.git topk
cd topk

conda env create -f environment.yml
conda activate topk

Install Virtuoso v7.2.6.

Details

wget https://github.com/openlink/virtuoso-opensource/releases/download/v7.2.6.1/virtuoso-opensource-7.2.6.tar.gz
tar -zxvf virtuoso-opensource-7.2.6.tar.gz

cd virtuoso-opensource-7.2.6
./configure
make
make install

To run the experiments, the bin directory of Virtuoso must be defined in your PATH variable.

Install SaGe.

Details

# In the main directory of the github repository
git clone https://github.com/sage-org/sage-engine.git

cd sage-engine
git checkout topk-xp

poetry install --extras "hdt"

Download RDF datasets.

Details

# In the main directory of the github repository
pip install gdown
gdown https://drive.google.com/uc?id=1a-HxE-PxrwWBW70CDvAYeCTJTDPl45R0
tar -zxvf datasets.tar.gz

Load data into Virtuoso

Details

isql "EXEC=ld_dir('datasets', '*.nt', 'http://example.com/datasets/default');"
isql "EXEC=rdf_loader_run();"
isql "EXEC=checkpoint;"

Virtuoso installation can be skipped if your are not interesting in checking the correctness and completeness of query results.

Quickstart

Experiments are powered by snakemake, a scientific workflow management system in Python. Once all configuration files are defined, just run the following commands. Snakemake will generate an archive xp.tar.gz in the specified output directory. Data files in the generated archive can be loaded and visualized using the provided jupyter notebook.

snakemake --configfile config/xp-watdiv.yaml -j1

snakemake --configfile config/xp-wikidata.yaml -j1

jupyter notebook topk.jpynb

Configuration files

Experiments are defined using YAML configuration files available in the config directory. The template of configuration files is the following:

name: ... # the name of the configuration file
output: ... # output directory where data files will be generated
autostart: ... # True to let snakemake starts SaGe and Virtuoso servers, False otherwise
endpoints:
  sage:
    url: ... # URL of the SaGe endpoint
    graph: ... # IRI of an RDF graph
  virtuoso:
    url: # URL of the Virtuoso endpoint
    graph: # IRI of an RDF graph
experiments:
  xp_1: # a name for the experiment
    approaches: [...] # accepted values are "sage", "sage-topk" or "sage-partial-topk"
    workloads: [...] # accepted values are "watdiv", "watdiv-desc" or "wikidata"
    limits: [...] # tested k, i.e. number of results return by TOP-k queries
    runs: [...] # any identifier from 0 to 9 to differentiate each run. The mean of the runs will be computed later...
    quotas: [...] # tested quotas, i.e. duration of a quantum for SaGe
    stateless: ... # False to store query saved plans on the server, True otherwise
    early_pruning: ... # True to enable early-pruning, False otherwise
    max_limit: ... # limit K for the SaGe server
    check: ... # True to check query results using Virtuoso, False otherwise
  ...
  xp_n: ...

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
config		config
output		output
scripts		scripts
workloads		workloads
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
Snakefile		Snakefile
environment.yaml		environment.yaml
topk.ipynb		topk.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Processing SPARQL TOP-k Queries Online with Web Preemption

Setup

Quickstart

Configuration files

About

Releases

Packages

Contributors 2

Languages

License

momo54/sage-orderby-experiment

Folders and files

Latest commit

History

Repository files navigation

Processing SPARQL TOP-k Queries Online with Web Preemption

Setup

Quickstart

Configuration files

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages