Example: setting up a combinations based Linked Data Fragments experiment
This guides gives a quick example of how jbr can be used to initialize, prepare, and run a combinations-based experiment, which repeats the same experiment for different factor values.
This guides knowledge about how to setup a basic jbr experiment.
In this guide, we're going to create a an experiment for measuring the performance of Comunica over an LDF server (with NGINX cache) using the WatDiv benchmark. Our experiment will have two combinations:
- LDF query execution with NGINX cache
- LDF query execution without NGINX cache
If you want to initialize a combinations-based experiment, you will need to initialize your experiment by passing the -c
flag:
$ jbr init -c watdiv ldf-performance-combinations
Initialized new combinations-based experiment in /Users/rtaelman/experiments/ldf-performance-combinations
This experiment requires handlers for the following hooks before it can be used:
- hookSparqlEndpoint
Initialize these hooks by calling 'jbr set-hook <hook> <handler>'
✨ Done in 11.30s
After executing this command, the ldf-performance-combinations
directory will have been created, so let's navigate to it:
$ cd ldf-performance
The output of the init command told us that we still need to configure a handler for the hookSparqlEndpoint
hook,
which will be the SPARQL endpoint our benchmark targets.
Let's plug in a hook for the LDF-based engine:
$ jbr set-hook hookSparqlEndpoint sparql-endpoint-ldf
Handler 'sparql-endpoint-ldf' has been set for hook 'hookSparqlEndpoint' in experiment 'ldf-performance'
This hook requires the following sub-hooks before it can be used:
- hookSparqlEndpoint/hookSparqlEndpointLdfEngine
Initialize these hooks by calling 'jbr set-hook <hook> <handler>'
✨ Done in 2.80s
As the output shows, this hook requires another sub-hook, which is needed to execute SPARQL queries over the Triple Pattern Fragments interface of the LDF server.
For this, we can use the Comunica engine:
$ jbr set-hook hookSparqlEndpoint/hookSparqlEndpointLdfEngine sparql-endpoint-comunica
Handler 'sparql-endpoint-comunica' has been set for hook 'hookSparqlEndpoint/hookSparqlEndpointLdfEngine' in experiment 'ldf-performance'
If you inspect the contents of your experiment directory, you will see that it does not contain a jbr-experiment.json
file.
Instead, it contains a jbr-experiment.json.template
file, which is a parameterized file that is instantiated any number of times based on the factors that are defined in jbr-combinations.json
.
Our goal is to define two experiment combinations: once where the Comunica engine targets the NGINX cache, and once where the Comunica engine avoids the cache and directly targets the LDF server.
We can define the query target of the Comunica engine via the input/context-client.json
file.
Since we need two variants of this file, we need to copy this file so that we have the following instances:
input/context-client-cache.json
:
{
"sources": [ "http://cache/dataset" ]
}
input/context-client-nocache.json
:
{
"sources": [ "http://ldfserver/dataset" ]
}
Next, we need to make sure that the contextClient
entry of our jbr-experiment.json.template
does not stay fixed on the static value input/context-client.json
, but that it varies across input/context-client-cache.json
and input/context-client-nocache.json
.
In order to do this, we define a new factor variable contextClient
, which we can use as follows within our jbr-experiment.json.template
file:
- "contextClient": "input/context-client.json",
+ "contextClient": "%FACTOR-contextClient%",
We can define the values of this variable in the jbr-combinations.json
file as follows:
{
"@context": [
"https://linkedsoftwaredependencies.org/bundles/npm/jbr/^0.0.0/components/context.jsonld"
],
"@id": "urn:jrb:ldf-performance-combinations-combinations",
"@type": "FullFactorialCombinationProvider",
"commonGenerated": false,
"factors": {
"contextClient": [
"input/context-client-cache.json",
"input/context-client-nocache.json"
]
}
}
You can define any number of variables in this way, and jbr will automatically take care of deriving all possible combinations of these factor values.
Once these changes have been made, you can generate the actual combinations by invoking the following command:
$ jbr generate-combinations
Generated 2 experiment combinations
✨ Done in 2.54s
This will create two experiment combinations within the combinations/
directory,
which will be the experiments that will effectively be prepared and run.
Any changes that you make to jbr-experiment.json.template
, jbr-combinations.json
, or any of the files within input/
will require invoking jbr generate-combinations
again before they will take effect.
Before we will prepare our experiment, we will set the commonGenerated
value within jbr-combinations.json
to true
.
This will make sure that the generated/
directory is reused across all our combinations.
We can this because our dataset and queries are identical across these combinations, and this saves us some processing time.
Now, we can prepare the experiment combinations as follows:
$ jbr prepare
🧩 Preparing experiment combination 0
Building LDF server Docker image
Building LDF server cache Docker image
Preparing LDF engine hook
Generating WatDiv dataset and queries
Converting WatDiv dataset to HDT
🧩 Preparing experiment combination 1
Building LDF server Docker image
Building LDF server cache Docker image
Preparing LDF engine hook
Generating WatDiv dataset and queries
Skipped
Converting WatDiv dataset to HDT
Skipped
Once our experimental setup has been finalized, we can run the two experiment combinations as follows:
$ jbr run
🧩 Running experiment combination 0
Endpoint not available yet, waiting for 1 second
Endpoint not available yet, waiting for 1 second
Endpoint not available yet, waiting for 1 second
Endpoint not available yet, waiting for 1 second
Endpoint not available yet, waiting for 1 second
Endpoint not available yet, waiting for 1 second
Endpoint not available yet, waiting for 1 second
Endpoint not available yet, waiting for 1 second
Warming up for 1 rounds
Executed all queries for iteration 1/1"
Executing 20 queries with replication 3
Executed all queries for iteration 3/3"
Writing results to /Users/rtaelman/experiments/ldf-performance-combinations/combinations/combination_0/output
🧩 Running experiment combination 1
Endpoint not available yet, waiting for 1 second
Endpoint not available yet, waiting for 1 second
Endpoint not available yet, waiting for 1 second
Endpoint not available yet, waiting for 1 second
Endpoint not available yet, waiting for 1 second
Endpoint not available yet, waiting for 1 second
Endpoint not available yet, waiting for 1 second
Warming up for 1 rounds
Executed all queries for iteration 1/1"
Executing 20 queries with replication 3
Executed all queries for iteration 3/3"
Writing results to /Users/rtaelman/experiments/ldf-performance-combinations/combinations/combination_1/output
✨ Done in 336.55s
Afterwards, the output of the two experiments will be available within the output/
directory.