Skip to content

Manual: Using your own data sets

AJ edited this page Feb 25, 2019 · 1 revision

Using your own data sets

A Graphalytics data set consists of a vertex file, edge list file and validation data for each algorithm. As an example, download a small data set from our website. In this section, we explain how to use your own data sets with/without any validation data.

Let's say you want to use a data set named test-dataset.

First, you'll need to make sure that you have a vertices (test-dataset.v) and edge (test-dataset.e) file of your graph, similar as the ones in the existing Graphalytics data sets. Put these in the directory of graphs.root-directory. Note: if you only have an edge list and you want to convert it to a vertex file and (sorted) edge list file, you can use this script. If you have some validation data, you can add these in graphs.validation-directory. Append the validation data according to the algorithm name. If you don't have any validation data, turn validation off.

  1. Copy an existing graph property file in config/graphs/ and rename it to test-dataset.properties. Modify this file according to your graph properties and needs. Make sure to rename the property names (from the old graph you've copied) to your new graph name as well.

  2. Edit config/graph.properties. In the graph.names property, add the name of your new graph along with the others. Also add an include and refer to your graph file, e.g.: include = graphs/test-dataset.properties

  3. In config/benchmarks/custom.properties, add test-dataset to benchmark.custom.graphs.

  4. Edit config/benchmark.properties and only include the custom benchmark, i.e.: comment out include = benchmarks/test.properties and include = benchmarks/standard.properties

Run the custom benchmark.