Skip to content

Commit

Permalink
Add BSBM experiment handler, Closes #9
Browse files Browse the repository at this point in the history
  • Loading branch information
rubensworks committed Jun 18, 2024
1 parent f33fb5a commit 0f89979
Show file tree
Hide file tree
Showing 20 changed files with 1,123 additions and 173 deletions.
158 changes: 158 additions & 0 deletions packages/experiment-bsbm/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,158 @@
# JBR Experiment - BSBM

[![Build status](https://github.com/rubensworks/jbr.js/workflows/CI/badge.svg)](https://github.com/rubensworks/jbr.js/actions?query=workflow%3ACI)
[![Coverage Status](https://coveralls.io/repos/github/rubensworks/jbr.js/badge.svg?branch=master)](https://coveralls.io/github/rubensworks/jbr.js?branch=master)
[![npm version](https://badge.fury.io/js/%40jbr-experiment%2Fwatdiv.svg)](https://www.npmjs.com/package/@jbr-experiment/watdiv)

A [jbr](https://github.com/rubensworks/jbr.js/tree/master/packages/jbr) experiment type for the [Berlin SPARQL Benchmark (BSBM)](http://wbsg.informatik.uni-mannheim.de/bizer/berlinsparqlbenchmark/).

## Requirements

* [Node.js](https://nodejs.org/en/) _(1.12 or higher)_
* [Docker](https://www.docker.com/) _(required for invoking [WatDiv Docker](https://github.com/comunica/watdiv-docker))_
* [jbr](https://github.com/rubensworks/jbr.js/tree/master/packages/jbr) _(required for initializing, preparing, and running experiments on the command line)_

## Quick start

### 1. Install jbr

[jbr](https://github.com/rubensworks/jbr.js/tree/master/packages/jbr) is a command line tool that enables experiments to be initialized, prepared, and started.
It can be installed from the npm registry:

```bash
$ npm install -g jbr
```
or
```bash
$ yarn global add jbr
```

### 2. Initialize a new experiment

Using the `jbr` CLI tool, initialize a new experiment:

```bash
$ jbr init watdiv my-experiment
$ cd my-experiment
```

This will create a new `my-experiment` directory with default configs for this experiment type.

### 3. Configure the required hooks

This experiment type requires you to configure a certain SPARQL endpoint to send queries to for the `hookSparqlEndpoint`.
A value for this hook can be set as follows, such as [`sparql-endpoint-comunica`](https://github.com/rubensworks/jbr.js/tree/master/packages/hook-sparql-endpoint-comunica):

```bash
$ jbr set-hook hookSparqlEndpoint sparql-endpoint-comunica
```

### 4. Prepare the experiment

In order to run all preprocessing steps, such as creating all required datasets, invoke the prepare step:

```bash
$ jbr prepare
```

All prepared files will be contained in the `generated/` directory:
```text
generated/
dataset.hdt
dataset.hdt.index.v1-1
dataset.nt
td_data/
```

### 5. Run the experiment

Once the experiment has been fully configured and prepared, you can run it:

```bash
$ jbr run
```

Once the run step completes, results will be present in the `output/` directory.

## Output

The following output is generated after an experiment has run.

`output/bsbm.xml`:
```xml
<?xml version="1.0"?><bsbm>
<querymix>
<scalefactor>100</scalefactor>
<warmups>10</warmups>
<seed>9834533</seed>
<querymixruns>10</querymixruns>
<minquerymixruntime>1.9948</minquerymixruntime>
<maxquerymixruntime>3.5679</maxquerymixruntime>
<totalruntime>23.993</totalruntime>
<qmph>1500.46</qmph>
<cqet>2.39926</cqet>
<cqetg>2.35945</cqetg>
</querymix>
<queries>
<query nr="1">
<executecount>10</executecount>
<aqet>0.007408</aqet>
<aqetg>0.006883</aqetg>
<qps>135.00</qps>
<minqet>0.00484910</minqet>
<maxqet>0.01610740</maxqet>
<avgresults>0.10</avgresults>
<minresults>0</minresults>
<maxresults>1</maxresults>
<timeoutcount>0</timeoutcount>
</query>
</queries>
</bsbm>
```

More output can be found in `output/logs/bsbm-run.txt`

## Configuration

The default generated configuration file (`jbr-experiment.json`) for this experiment looks as follows:

```json
{
"@context": [
"https://linkedsoftwaredependencies.org/bundles/npm/jbr/^5.0.0/components/context.jsonld",
"https://linkedsoftwaredependencies.org/bundles/npm/@jbr-experiment/bsbm/^5.0.0/components/context.jsonld"
],
"@id": "urn:jrb:test-bsbm",
"@type": "ExperimentBsbm",
"productCount": 1000,
"generateHdt": true,
"endpointUrl": "http://localhost:3001/sparql",
"endpointUrlExternal": "http://localhost:3001/sparql",
"warmupRuns": 5,
"runs": 50,
"hookSparqlEndpoint": {
"@id": "urn:jrb:test-watdiv:hookSparqlEndpoint",
"@type": "HookNonConfigured"
}
}
```

Any config changes require re-running the prepare step.

More background information on these config options can be found in the README of [WatDiv Docker](https://github.com/comunica/watdiv-docker).

### Configuration fields

* `productCount`: The number of products in the dataset. 91 products make about 50K triples. Defaults to 1000.
* `generateHdt`: If a `dataset.hdt` should also be generated.
* `endpointUrl`: URL through which the SPARQL endpoint of the `hookSparqlEndpoint` hook will be exposed from within the Docker container. When the endpoint is hosted on your main machine outside of Docker, this will be something like `http://host.docker.internal:3001/sparql`.
* `endpointUrlExternal`: URL through which the SPARQL endpoint of the `hookSparqlEndpoint` hook will be exposed.
* `warmupRuns`: Number of warmup runs.
* `runs`: Number of actual query runs.

## License

jbr.js is written by [Ruben Taelman](http://www.rubensworks.net/).

This code is copyrighted by [Ghent University – imec](http://idlab.ugent.be/)
and released under the [MIT license](http://opensource.org/licenses/MIT).
2 changes: 2 additions & 0 deletions packages/experiment-bsbm/index.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
export * from './lib/ExperimentHandlerBsbm';
export * from './lib/ExperimentBsbm';
166 changes: 166 additions & 0 deletions packages/experiment-bsbm/lib/ExperimentBsbm.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,166 @@
import * as Path from 'path';
import * as fs from 'fs-extra';
import { secureProcessHandler, HttpAvailabilityLatch, HdtConverter } from 'jbr';
import type { Experiment, Hook, ICleanTargets, ITaskContext, IRunTaskContext, DockerContainerHandler } from 'jbr';

/**
* An experiment instance for BSBM.
*/
export class ExperimentBsbm implements Experiment {
public static readonly DOCKER_IMAGE_BSBM = `vcity/bsbm:v1.0`;
public readonly httpAvailabilityLatch = new HttpAvailabilityLatch();
public readonly productCount: number;
public readonly generateHdt: boolean;
public readonly hookSparqlEndpoint: Hook;
public readonly endpointUrl: string;
public readonly endpointUrlExternal: string;
public readonly warmupRuns: number;
public readonly runs: number;

/**
* @param productCount
* @param generateHdt
* @param hookSparqlEndpoint
* @param endpointUrl
* @param endpointUrlExternal
* @param warmupRuns
* @param runs
*/
public constructor(
productCount: number,
generateHdt: boolean,
hookSparqlEndpoint: Hook,
endpointUrl: string,
endpointUrlExternal: string,
warmupRuns: number,
runs: number,
) {
this.productCount = productCount;
this.generateHdt = generateHdt;
this.hookSparqlEndpoint = hookSparqlEndpoint;
this.endpointUrl = endpointUrl;
this.endpointUrlExternal = endpointUrlExternal;
this.warmupRuns = warmupRuns;
this.runs = runs;
}

public async prepare(context: ITaskContext, forceOverwriteGenerated: boolean): Promise<void> {
// Prepare hook
await this.hookSparqlEndpoint.prepare(context, forceOverwriteGenerated);

// Ensure logs directory exists
await fs.ensureDir(Path.join(context.experimentPaths.output, 'logs'));

// Prepare dataset
context.logger.info(`Generating BSBM dataset`);
if (!forceOverwriteGenerated && await fs.pathExists(Path.join(context.experimentPaths.generated, 'dataset.nt'))) {
context.logger.info(` Skipped`);
} else {
await context.docker.imagePuller.pull({ repoTag: ExperimentBsbm.DOCKER_IMAGE_BSBM });
await (await context.docker.containerCreator.start({
imageName: ExperimentBsbm.DOCKER_IMAGE_BSBM,
cmdArgs: [
'generate',
'-dir',
'/data/td_data',
'-pc', String(this.productCount),
'-fc',
],
hostConfig: {
Binds: [
`${context.experimentPaths.generated}:/data`,
],
},
logFilePath: Path.join(context.experimentPaths.output, 'logs', 'bsbm-generation.txt'),
})).join();
}

if (this.generateHdt) {
await new HdtConverter(context, forceOverwriteGenerated, 'bsbm').generate();
}
}

public async run(context: IRunTaskContext): Promise<void> {
// Create shared network
const networkHandler = await context.docker.networkCreator.create({
Name: context.docker.imageBuilder.getImageName(context, `experiment-bsbm-network`),
});
const network = networkHandler.network.id;

// Setup SPARQL endpoint
const endpointProcessHandler = await this.hookSparqlEndpoint.start(context);
const endpointProcessHandlerSafe = secureProcessHandler(endpointProcessHandler, context);

// Wait for the cache proxy to be fully available
await this.waitForEndpoint(context);

// Run experiment
context.logger.info(`Running experiment`);
const stopEndpointStats: () => void = await endpointProcessHandler.startCollectingStats();
const testDriverHandler = await this.startTestDriver(context, [
'-idir',
'/data/td_data',
'-seed',
'9834533',
'-o',
'single.xml',
'-w',
String(this.warmupRuns),
'-runs',
String(this.runs),
this.endpointUrl,
], network);

// Wait for the experiment driver to end
await testDriverHandler.join();
stopEndpointStats();

// Move output file to output directory
await fs.move(
Path.join(context.experimentPaths.generated, 'single.xml'),
Path.join(context.experimentPaths.output, 'bsbm.xml'),
{
overwrite: true,
},
);

// Close process safely
await endpointProcessHandlerSafe();
await networkHandler.close();
}

protected async startTestDriver(
context: IRunTaskContext,
args: string[],
network: string,
): Promise<DockerContainerHandler> {
return await context.docker.containerCreator.start({
imageName: ExperimentBsbm.DOCKER_IMAGE_BSBM,
cmdArgs: [
'testdriver',
...args,
],
hostConfig: {
Binds: [
`${context.experimentPaths.generated}:/data`,
],
NetworkMode: network,
},
logFilePath: Path.join(context.experimentPaths.output, 'logs', 'bsbm-run.txt'),
});
}

public async clean(context: ITaskContext, cleanTargets: ICleanTargets): Promise<void> {
await this.hookSparqlEndpoint.clean(context, cleanTargets);

if (cleanTargets.docker) {
await context.docker.networkCreator.remove(
context.docker.imageBuilder.getImageName(context, `experiment-bsbm-network`),
);
}
}

public async waitForEndpoint(context: ITaskContext): Promise<void> {
await this.httpAvailabilityLatch.sleepUntilAvailable(context, this.endpointUrlExternal);
}
}
31 changes: 31 additions & 0 deletions packages/experiment-bsbm/lib/ExperimentHandlerBsbm.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
import type { IExperimentPaths } from 'jbr';
import { ExperimentHandler } from 'jbr';
import { ExperimentBsbm } from './ExperimentBsbm';

/**
* An experiment handler for BSBM.
*/
export class ExperimentHandlerBsbm extends ExperimentHandler<ExperimentBsbm> {
public constructor() {
super('bsbm', ExperimentBsbm.name);
}

public getDefaultParams(experimentPaths: IExperimentPaths): Record<string, any> {
return {
productCount: 1000,
generateHdt: false,
endpointUrl: 'http://localhost:3001/sparql',
endpointUrlExternal: 'http://localhost:3001/sparql',
warmupRuns: 5,
runs: 50,
};
}

public getHookNames(): string[] {
return [ 'hookSparqlEndpoint' ];
}

public async init(experimentPaths: IExperimentPaths, experiment: ExperimentBsbm): Promise<void> {
// Do nothing
}
}
Loading

0 comments on commit 0f89979

Please sign in to comment.