Skip to content

Manual: Running Benchmark

AJ edited this page May 28, 2018 · 3 revisions

To benchmark a graph-processing platform with Graphalytics, go through the following steps:

  1. Obtain the platform driver (platform-specific).
  2. Download the benchmark resources.
  3. Verify the necessary prerequisites (platform-specific).
  4. Adjust the benchmark configurations (platform-specific).
  5. Test the benchmark execution.
  6. Execute the benchmark.
  7. Examine the benchmark report.

Note that Step 1, 3, and 4 are platform-specific: follow also the detailed instructions in the README file of each platform.

1. Obtaining the platform driver of a specific platform (platform-specific)

There are three possible ways to obtain a platform driver:

  1. Recommended: Build the platform drivers from the source code: Find in our website the corresponding Github repositories to build from the source code. See also Software Build for more details

  2. Download the (prebuilt) Graphalytics platform driver: Graphalytics maintains a list of (prebuilt) platform drivers distribution that are publicly available, which can be downloaded from our website.

  3. Develop a platform driver for a new platform: Graphalytics can be easily extended by developing platform drivers for platforms that are not yet supported. See Implementing Driver for more details.

2. Download the benchmark resources.

To execute the benchmark, necessary benchmark resources must be available in the cluster environment:

  1. Input datasets: real-world and synthetic graphs selected for the benchmark.
  2. Validation datasets: reference outputs cross-validated by multiple platforms.

Download the required benchmark resources from the datasets page into your cluster environment.

3. Verify the necessary prerequisites (platform-specific).

Large-scale graph-processing platforms are usually complex distributed or parallel systems, which might require various platform-specific dependencies. Follow the detailed instructions in the README file of each platform to configure the cluster environment properly.

4. Adjust the benchmark configurations (platform-specific).

The Graphalytics distribution includes a config-template directory containing (template) configuration files. Before editing any configuration files, it is recommended to create a copy of the config-template directory and name it config.

Benchmark configuration

Select one of the three types of benchmark (test, standard, custom) by editing config/benchmark.properties (only include the benchmark type you need). More fine-grained configuration of each benchmark type can be adjusted at the corresponding benchmark properties file (config/benchmarks/*.properties).

Data configuration

Large-scale graph dataset can take enormous data storage. Place the downloaded dataset in the proper storage device. Set in config/benchmark.properties the data directories.

graphs.root-directory: input graphs datasets (dataset.v and dataset.e).
graphs.cache-directory: formatted graph datasets (with only essential edge properties).
graphs.validation-directory = validation graph datasets (reference outputs).
graphs.output-directory = output graph datasets (results of executing algorithms).

Find more details regarding the data flow during the benchmark execution in Chapter 3 Benchmark Process of the technical specification.

System configuration

The performance of the system-under-test (platform + environment) can be greatly impacted by proper tuning and optimization. Follow the detailed instructions in the README file of each platform driver to fine-tune the system configuration.

5. Test the benchmark execution.

Executing the benchmark can be a very time-consuming process. To verify that Step 2 (verify the necessary prerequisites) and Step 3 (adjust the benchmark configurations) are properly done, the benchmark suite provides an optional test benchmark which executes 6 core algorithms on 2 tiny graph datasets, example-directed and example-undirected. Configuration errors are more likely to be caught before the actual benchmark starts.

6. Execute the benchmark

After completing the benchmark configuration, compile and run the benchmark:

  1. If applicable, run bin/sh/compile-benchmark.sh to compile the source code. This is usually required by C++ platforms, and can be omitted by Java platforms.

  2. Run bin/sh/run-benchmark.sh to execute the benchmark. The benchmark suite summarizes the targeted benchmark execution, submits a list of benchmark jobs to the platform, and generates the benchmark report.

Find more details regarding the benchmark process in Chapter 3 Benchmark Process of the technical specification.

7. Examine the benchmark report

After the benchmark is completed (successfully), the report for each benchmark can be found in the report directory. Each report contains the following elements:

  • report.htm and html directory: the (human-readable) HTML report summarizing the benchmark results.
  • json directory: the (machine-readable) data archive of the benchmark results. To submit your benchmark result on the Global Competition, see Submitting Results for more detailed instructions.
  • archive directory (optional): performance archives of each platform run for finer-grained analysis.