Skip to content

Example: setting up a combinations based Linked Data Fragments experiment

Ruben Taelman edited this page Aug 13, 2021 · 2 revisions

This guides gives a quick example of how jbr can be used to initialize, prepare, and run a combinations-based experiment, which repeats the same experiment for different factor values.

This guides knowledge about how to setup a basic jbr experiment.

1. Initialize

In this guide, we're going to create a an experiment for measuring the performance of Comunica over an LDF server (with NGINX cache) using the WatDiv benchmark. Our experiment will have two combinations:

  • LDF query execution with NGINX cache
  • LDF query execution without NGINX cache

If you want to initialize a combinations-based experiment, you will need to initialize your experiment by passing the -c flag:

$ jbr init -c watdiv ldf-performance-combinations

Initialized new combinations-based experiment in /Users/rtaelman/experiments/ldf-performance-combinations

This experiment requires handlers for the following hooks before it can be used:
  - hookSparqlEndpoint
Initialize these hooks by calling 'jbr set-hook <hook> <handler>'

✨ Done in 11.30s

After executing this command, the ldf-performance-combinations directory will have been created, so let's navigate to it:

$ cd ldf-performance

2. Set hooks

The output of the init command told us that we still need to configure a handler for the hookSparqlEndpoint hook, which will be the SPARQL endpoint our benchmark targets.

Let's plug in a hook for the LDF-based engine:

$ jbr set-hook hookSparqlEndpoint sparql-endpoint-ldf

Handler 'sparql-endpoint-ldf' has been set for hook 'hookSparqlEndpoint' in experiment 'ldf-performance'

This hook requires the following sub-hooks before it can be used:
  - hookSparqlEndpoint/hookSparqlEndpointLdfEngine
Initialize these hooks by calling 'jbr set-hook <hook> <handler>'

✨ Done in 2.80s

As the output shows, this hook requires another sub-hook, which is needed to execute SPARQL queries over the Triple Pattern Fragments interface of the LDF server.

For this, we can use the Comunica engine:

$ jbr set-hook hookSparqlEndpoint/hookSparqlEndpointLdfEngine sparql-endpoint-comunica

Handler 'sparql-endpoint-comunica' has been set for hook 'hookSparqlEndpoint/hookSparqlEndpointLdfEngine' in experiment 'ldf-performance'

3. Configure experiment combinations

If you inspect the contents of your experiment directory, you will see that it does not contain a jbr-experiment.json file. Instead, it contains a jbr-experiment.json.template file, which is a parameterized file that is instantiated any number of times based on the factors that are defined in jbr-combinations.json.

Our goal is to define two experiment combinations: once where the Comunica engine targets the NGINX cache, and once where the Comunica engine avoids the cache and directly targets the LDF server.

We can define the query target of the Comunica engine via the input/context-client.json file. Since we need two variants of this file, we need to copy this file so that we have the following instances:

input/context-client-cache.json:

{
  "sources": [ "http://cache/dataset" ]
}

input/context-client-nocache.json:

{
  "sources": [ "http://ldfserver/dataset" ]
}

Next, we need to make sure that the contextClient entry of our jbr-experiment.json.template does not stay fixed on the static value input/context-client.json, but that it varies across input/context-client-cache.json and input/context-client-nocache.json. In order to do this, we define a new factor variable contextClient, which we can use as follows within our jbr-experiment.json.template file:

-      "contextClient": "input/context-client.json",
+      "contextClient": "%FACTOR-contextClient%",

We can define the values of this variable in the jbr-combinations.json file as follows:

{
  "@context": [
    "https://linkedsoftwaredependencies.org/bundles/npm/jbr/^0.0.0/components/context.jsonld"
  ],
  "@id": "urn:jrb:ldf-performance-combinations-combinations",
  "@type": "FullFactorialCombinationProvider",
  "commonGenerated": false,
  "factors": {
    "contextClient": [
      "input/context-client-cache.json",
      "input/context-client-nocache.json"
    ]
  }
}

You can define any number of variables in this way, and jbr will automatically take care of deriving all possible combinations of these factor values.

Once these changes have been made, you can generate the actual combinations by invoking the following command:

$ jbr generate-combinations

Generated 2 experiment combinations
✨ Done in 2.54s

This will create two experiment combinations within the combinations/ directory, which will be the experiments that will effectively be prepared and run.

Any changes that you make to jbr-experiment.json.template, jbr-combinations.json, or any of the files within input/ will require invoking jbr generate-combinations again before they will take effect.

4. Prepare

Before we will prepare our experiment, we will set the commonGenerated value within jbr-combinations.json to true. This will make sure that the generated/ directory is reused across all our combinations. We can this because our dataset and queries are identical across these combinations, and this saves us some processing time.

Now, we can prepare the experiment combinations as follows:

$ jbr prepare

🧩 Preparing experiment combination 0
Building LDF server Docker image
Building LDF server cache Docker image
Preparing LDF engine hook
Generating WatDiv dataset and queries
Converting WatDiv dataset to HDT
🧩 Preparing experiment combination 1
Building LDF server Docker image
Building LDF server cache Docker image
Preparing LDF engine hook
Generating WatDiv dataset and queries
  Skipped
Converting WatDiv dataset to HDT
  Skipped

5. Run

Once our experimental setup has been finalized, we can run the two experiment combinations as follows:

$ jbr run

🧩 Running experiment combination 0
Endpoint not available yet, waiting for 1 second
Endpoint not available yet, waiting for 1 second
Endpoint not available yet, waiting for 1 second
Endpoint not available yet, waiting for 1 second
Endpoint not available yet, waiting for 1 second
Endpoint not available yet, waiting for 1 second
Endpoint not available yet, waiting for 1 second
Endpoint not available yet, waiting for 1 second
Warming up for 1 rounds
Executed all queries for iteration 1/1"
Executing 20 queries with replication 3
Executed all queries for iteration 3/3"
Writing results to /Users/rtaelman/experiments/ldf-performance-combinations/combinations/combination_0/output

🧩 Running experiment combination 1
Endpoint not available yet, waiting for 1 second
Endpoint not available yet, waiting for 1 second
Endpoint not available yet, waiting for 1 second
Endpoint not available yet, waiting for 1 second
Endpoint not available yet, waiting for 1 second
Endpoint not available yet, waiting for 1 second
Endpoint not available yet, waiting for 1 second
Warming up for 1 rounds
Executed all queries for iteration 1/1"
Executing 20 queries with replication 3
Executed all queries for iteration 3/3"
Writing results to /Users/rtaelman/experiments/ldf-performance-combinations/combinations/combination_1/output

✨ Done in 336.55s

Afterwards, the output of the two experiments will be available within the output/ directory.