Approximation Pagerank algorithm for GraphLab. This repository contains all the GraphLab's code where the "synchronous_engine.hpp" file is updated to allows selective synchronization of replicas and vertex data. This updated version of the synchronous engine is exploited by the approximation PageRank application included in the "apps/rw_pagerank…
C++ Python CMake Makefile C Prolog Other
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
apps
cmake
cxxtest
demoapps
dist
doc
license
matlab
patches
scripts
src
tests
toolkits
BINARY_README
CMakeLists.txt
Doxyfile
Doxyfile_internal
README.md
configure

README.md

Approximation PageRank for GraphLab.

=====================================

This repository contains all the GraphLab code with the following two changes:

  1. The synchronous engine was slightly changed to support: (i) optional vertex data synchronization, (ii) synchronization of a replica with a probability defined by user, and (iii) user program gets the actual fraction of replicas that were synchronized ("activated").

  2. Approximation PageRank application: "rw_pagerank". The algorithm is based on random walks.

=====================================

Execution example of the rw_pagerank application:

mpiexec -n 20 -hostfile ~/machines rw_pagerank --graph grap_file --rwnum 800000 --maxwait 4 --replicap 0.7

This command executes the rw_pagerank application on the cluster of 20 machines. The directed graph should be provided in the graph_file in the "tsv" format. The application is launched with 800000 initial random walks, 4 iterations, and the replica activation probability 0.7.

======================================

The new synchronous engine may be used by any other application in the following way:

  1. Define the graphlab options object:

graphlab::graphlab_options rw_opts;

  1. Set the optional "vertex_data_sync" parameter using:

rw_opts.engine_args.set_option("vertex_data_sync",false);

I.e., here we ask from the synchronous engine not to synchronize vertex data.

  1. Set the optional "activation_prob" parameter using:

rw_opts.engine_args.set_option("activation_prob",replica_activation_prob);

  1. If "activation_prob" was set (to a value less than 1), add the following parameter to the vertex program:

public: float frac_of_active_replicas;

This variable will be set by the synchronous engine and will represent the actual fraction of replicas that were activated for the "scatter" in the current iteration.

=========================================