PDTL: Parallel and Distributed Triangle Listing for Massive Graphs
This repository contains the following:
- Ilias Giechaskiel's MPhil Dissertation.
- The University of Cambridge Computer Laboratory Technical Report.
- The ICPP Pre-Print.
- The ICPP Presentation.
Source and Compilation
If you want to alter the source code, at a high-level, the files are as follows:
mgt.[h/cpp]implements the MGT algorithm with our modifications.
localmgt.cppcontains a main to run MGT locally, while
pdtlmaster.cppimplement our distributed PDTL framework.
highdegreehandler.[h/cpp]implements the algorithm for the case when there are high-degree vertices, and
inmem.cppimplements one of the simple in-memory algorithms.
fileconverter.[h/cpp]implement various parsing and conversion functions, with the main in
- Everything else is used to make the code more modular.
Binaries and Execution
The code/bin directory contains all the executable binaries.
With the exception of some of the parser utilities, all binaries require the input in the following format: the graph must be undirected, and is split in 2 files, a degree file (ending in
.deg), and an adjacency file (ending in
.adj). The degree file contains the degrees of the vertices as
v, d(v) (with
v ranging from
|V|), while the adjacency file just stores the neighbors of each vertex in order (and sorted).
This format was chosen for compatibility with the original MGT implementation, which is explained in the manual. The format is also explained further in Section V-B of our paper, and a real example is given in code/graphs.
How to execute the various binaries is discussed below. The inputs and outputs always refer to the base name of the
Use this for an in-memory triangle listing algorithm. Simply execute
inmem.bin input output ordered, where
input is the base input name,
output is a 0 for counting, and non-0 for listing, and
ordered is non-0 if the adjacency list is already ordered.
Use this to remove high-degree vertices before orientation. Execute
highdegreehandler.bin input output maxdeg report, where
input is the (base) input name,
output is the (base) output name for the low-degree graph,
maxdeg is the maximum degree to allow in the new graph, and
report is non-0 for listing (instead of counting).
Use this for our version of the MGT algorithm. Execute
mgt.bin filename maxdeg output mem instances, where
filename is the (base) input name,
output is non-0 for listing,
mem is the maximum memory (in MB) to allocate per thread, and
instances is the number of threads to use.
maxdeg is 0 if orientation has not yet been performed, while it is non-zero when the file is already oriented, and has a maximum out-degree equal to
Use these two binaries to execute the distributed version of our algorithms. Run
pdtlclient.bin port delete on the remote machines, where
port refers to the port number to listen for incoming connections and
delete is non-zero to delete the files after the counting/listing for the particular graph has finished. This needs to be manually shutdown, e.g. via Ctrl-C.
pdtlmaster.bin filename maxdeg memsize instances output ip port mem instances ... is used to run the master.
filename is the name of graph,
maxdeg is as for
mgt.bin (0 for orientation, non-zero for already oriented),
instances is the memory (in MB) per thread and number of threads to allocate to the master, and
output is once again non-0 for listing.
For each client, add the following four arguments:
port for the IPv4 address and port of the client,
instances for the number of threads, and
mem the memory (in MB) per thread.
parser.bin contains all the utilities for converting between different file formats.
output always refer to the basenames of the input and output graphs.
parser.bin order input output to order the neighbors of all vertices. This function requires memory proportional to the maximum degree.
parser.bin undirect input output to convert a directed graph into an undirected graph. This function requires memory proportional to the total number of edges.
parser.bin orient input output [mem] [numthreads] to orient the given graph. Optionally, add a
mem parameter to specify the amount of memory to allocate (in MB), per thread (0 for unlimited), and
numthreads to specify the number of threads.
parser.bin convert input output opt/xstream to convert the graph from the PDTL format to either
parser.bin parse input output snap/xstream [mem] [vn] to convert a graph from either the
snap or the
xstream format into the format required by PDTL.
mem optionally specifies the maximum amount of memory to allocate (0 for unlimited), and in the case of
vn is equal to either 2 or 3, to indicate the type of X-Stream edges used.
The repository also contains an example of a real graph in the format required by PDTL. This is the Berkley-Stanford web graph from the SNAP Repository. Note that orientation has not yet been performed on this graph.