Skip to content

puzzlef/pagerank-barrierfree-openmp-dynamic

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

48 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Design of OpenMP-based Barrier-free Dynamic PageRank algorithm for link analysis.

Barrier-free PageRank (1) is an asynchronous PageRank variant (2) where each thread independently processes a subset of vertices in the graph without waiting for other threads to complete an iteration. This minimizes unnecessary waits and allows threads to be on different iteration numbers. We add self-loops to each vertex in the graph to avoid the cost of computing common teleport contribution to all vertices (3).

When dealing with dynamic graphs that change over time, computing ranks from scratch on every update (static PageRank) may not be suitable for interactive systems. In such cases, it is preferable to process only the ranks of vertices likely to have changed. Here, we propose the Dynamic Frontier approach for With-barrier and Barrier-free PageRank that incrementally identifies an active subset of vertices in the graph which are likely to be affected by the update, and then performs PageRank computation on only this subset of vertices. This approach performs better than Naive-dynamic and Traversal-based (BFS/DFS) approaches. We observe Traversal-based approaches to be slower in general compared to Naive-dynamic for all but the smallest of batch sizes. Accordingly, we do not include Traversal-based approach in our experiments.

The input data used for our experiments is available from the SuiteSparse Matrix Collection. This experiment was done with guidance from Prof. Kishore Kothapalli, Prof. Hemalatha Eedi, and Prof. Sathya Peri.


In the Absence of Faults

In this experiment we compare With-barrier and Barrier-free PageRank (Static, Naive-dynamic, and Dynamic Frontier) with a batch size of B = 10^-8 to 0.1 |E|. The batch update consists of a random batch of edge deletions (of size B), and then we insert the same edges back. We compute Dynamic PageRanks (Naive-dynamic, Dynamic Frontier) back to back (i.e., once upon deletions, and then upon insertions), and then compare obtained ranks wrt reference ranks on the original graph. Reference ranks are obtained using a very low tolerance (10^-100, limited to 500 iterations) With-barrier PageRank.

From results we observe that Dynamic Frontier outperforms Naive-dynamic upto a batch size of 10^-3 |E|. For a billion-edge graph, this amounts to edge deletions of 1 million vertices, followed by edge insertions of 1 million vertices. We also observe that Barrier-free outperforms With-barrier for all batch sizes, which can be attributed to the lack of waiting time spent waiting at the barrier. Further, we observe that Barrier-free obtains ranks of equivalent quality as With-barrier (slightly better in case of Naive-dynamic, Dynamic Frontier).

See code, output, or sheets.


With Uniform Thread delays

In this experiment, we compare With-barrier and Barrier-free PageRank (Static, Naive-dynamic, and Dynamic Frontier) in the presence of random thread sleep on a batch size of 10^-4 |E|. During rank computation of each vertex, a thread may randomly sleep with a probability of 10^-9 to 10^-6. On a graph with 10 million vertices, this amounts to an average of 0.01 to 10 threads sleeps per iteration. We try three different sleep durations, i.e., 50ms, 100ms, 200ms (JVM's GC pause is set to a soft limit of 200ms as reference). Batch updates are generated as above.

Results indicate that Barrier-free outperforms With-barrier by a significant margin when thread sleeps are more common. This can be useful in non-dedicated systems where we occasionally run other activities in addition to the main computation. Further, we observe that Barrier-free obtains ranks of equivalent quality (slightly better) as With-barrier.

See code, output, or sheets.


With Non-uniform Thread crashes

In this experiment, we test Dynamic Frontier based Barrier-free PageRank in the presence of random thread crash on a batch size of 10^-4 |E|. During rank computation of each vertex, a thread may randomly crash (crash-stop model) with a high probability of 10^-5. On a graph with 10 million vertices, this amounts to an average of 100 threads crashing per iteration. We limit the number of threads that may crash from 0 to 56 threads. Batch updates are generated as above. Note that With-barrier PageRank cannot tolerate thread crashes.

Results indicate that Barrier-free tolerates thread crashes to the maximum extent possible, with almost no drop is result quality. Note that runtime taken by the algorithm increases with the number of crashed threads, which is expected as the workload is distributed among the remaining threads. Further, there is no added cost to offering this safety to crash-stop faults. We anticipate this to be useful in robotic systems which operate in harsh enviroments.

See code, output, or sheets.


Strong scaling (in the absence of faults)

In this experiment, we test the strong-scaling behavior of Dynamic Frontier based Barrier-free PageRank in the absence of faults on a batch size of 10^-4 |E|. We adjust the number of threads from 1 to 64 threads in multiples of 2. Batch updates are generated as above.

Results indicate that Dynamic Frontier based Barrier-free PageRank offers a speedup of 21.3x on 64 threads over 1 thread. We observe that the rate of increase in speedup drops after 32 threads, which is likely due to the AMD CPU being internally partitioned into 4 NUMA nodes.

See code, output, or sheets.


Dynamic Contracting Frontier approach

We also experiment with a Dynamic Contracting Frontier approach, where we remove vertices from the affected set once their ranks are computed. However, this showed a small performance drop compared to the Dynamic Frontier approach, and thus we do not discuss it any further.


With Check and mark

With Check and mark, we mark a vertex as affected only if its not already marked, i.e., we check before marking. It is a very simple optimization, and we apply it to Dynamic Frontier based With-barrier and Barrier-free PageRank. Results indicate that this offer a small performance improvement (for large batch updates).


With Chunked mark

With Chunked mark, we consider vertices to be grouped by IDs of size 2^n, and mark entire chunk as affected instead of marking just a single vertex. This reduces the memory occupied by the affected flag vector, but has the side effect of marking additional (unaffected) vertices are affected. It is a very simple optimization, and we apply it to Dynamic Frontier based With-barrier and Barrier-free PageRank. Results indicate that this does not offer any improvement.



References




ORG