Feature Parallel ErdosRenyi Generator #221

manpen · 2018-08-10T13:47:03Z

Apologies for issuing this PR without a description.

Remark: NetworKit calls G(n,p) graphs Erdos-Renyi-Graphs. It is my understanding that G(n,p) are infact Gilbert graphs, while ER introduced the G(n,m) model. For consistency, I will keep on refering to G(n,p) as ER.

The G(n, p) model was first introduced by Edgar Gilbert in a 1959 paper studying the connectivity threshold mentioned above.[3] The G(n, M) model was introduced by Erdős and Rényi in their 1959 paper. As with Gilbert, their first investigations were as to the connectivity of G(n, M), with the more detailed analysis following in 1960. https://en.wikipedia.org/wiki/Erd%C5%91s%E2%80%93R%C3%A9nyi_model

The primary reason for this PR is the fact that the GraphBuilder benchmarks use ER graphs. Unfortunately it relied on a quadratic time sampling that required (for the parameters used) expected 1000 random floats for each edge generated. Hence, the runtime of the benchmarks was primarily dominated by the generation of test data and did not actually measure the GraphBuilder itself.

After migrating away from the quadratic time sampling to the parallel ER model, the runtime dropped from ~1900ms to ~30ms. This includes sampling the data and maintaining the GraphBuilder. I hence increased the problem size by a factor of 100 to get meaningful data. Also observe that previously the parallel benchmark did not measure a parallel structure as it passed parallel=false to the GraphBuilder. I also moved the benchmarks from generators/test to graph/test as the GraphBuilder is part of the latter.

As the benchmark cannot accept a Graph data structure, the ER generator itself is implemented as a StreamGenerator; it has a forEach callback and expects (e.g.) a lambda to be called for each edge. It basically uses the linear time algorithm previously implemented in ErdosRenyiGenerator with a few tweaks giving a sequential speed-up between 3x and 5x; it scales almost perfectly, as the hot-section runs pleasingly parallel.

I now demoted the original ErdosRenyiGenerator to a wrapper to the Enumerator. Here the speed-ups rather small, since the runtime is dominated by the GraphBuilder.

rionda · 2018-08-10T14:02:28Z

networkit/cpp/generators/ErdosRenyiEnumerator.h

+
+/**
+ * Generates a stream of edges of a G(n,p) graph. The edges are not
+ * written two memory, but emitted only via usual @a forEdges semantics


"two" -> "to"

manpen · 2018-08-10T15:45:10Z

I forgot to ask my important question:

The old ER generator generates only edge (u, v) with u > v. While I agree with this for undirected graphs, a directed G(n,p) graph may also contain edges (v < u) and esp. self-loops. The old generator was unable of emitting them. The new enumerator implements this, as it is the mathematically correct way to do it.

There is however code that relies on directed graphs with (u < v), e.g. CentralityGTest.testGroupDegreeDirected which cannot deal with self-loops.

IMHO the other code should be fixed, and the generator should produce graphs according to definition even if this breaks user-code. What's your opinion?

manpen · 2018-08-12T18:46:22Z

What do think about the following solution:

Deprecate ErdosRenyiGenerator (as its broken for the directed case and has a misleading name)
Rename ErdosRenyiEnumerator (and its wrapper) to GilbertGNPEnumerator/GilbertGNPGenerator
(later) Introduce an ErdosRenyiGNMGenerator (e.g. the simple parallel algorithm in https://arxiv.org/abs/1710.07565)

std::bind (and its indirection via std::function) is slow. Replace it by hard-coded chaining of PRNG and distribution gives a speed-up of nearly a factor 2.

It's an (optionally parallel) variant of the O(m) time algorithm by Batagelj and Brandes already implemented sequentially in ErdosRenyiGenerator.cpp. The difference is that it is an header-only solution invoking a callback for every edge generated. Thus it can be used as a faster Generator and also as a data source to benchmark the GraphBuilder.

According to my experiments generating a float or double random variate is 2.1 times slower than generating an integer variate. Additionally std::log2 is roughly 1.5 times faster than std::log. Hence we get a speed-up of more than 3x with this commit.

The old benchmark relied on a O(n*n) time algorithm to compute an Erdos-Renyi-Graph causing requiring the generation of 1'000 random numbers per edge inserted for the used parameters. Hence, the benchmark did not really measure the GraphBuilder but rather the generation process. After replacing the generator with the efficient ErdosRenyi enumerator it runtime dropped from roughly 1600ms to 30ms. Hence, in the original benchmark roughly 1% of time was spent on the actual experiment. I now increased the dataset size by a factor of 8 to have a somewhat meaningful runtimes.

The toGraphParallel section did pass parallel=false to the builder. Hence the time difference to the sequential variant seems to be only noise. Now that it's enabled, the parallel GraphBuilder is consistently slower on my machine. Also split several benchmarks into own Test cases to be able to select them using GTest filter.

The generator is now basically a pImpl wrapper to the enumerator. This should keep compilation times low despite the template magic of the enumerator.

I do not see any point on having them in GeneratorsBenchmark. This is misleading at best.

For p>0.5 it more sensible to draw the edges NOT included (rather than the ones to include). The enumerator now uses this strategy to reduce the number of computations required. It also now supports p = 0.0 and p = 1.0. efficently. These extremal values are efficiently supported as special cases and do not require drawing of random numbers. They are actively used by some tests.

manpen · 2018-08-29T12:18:47Z

Just to let you know: I included all changes discussed last week (optional support to generated directed graphs without self loops and support for the extreme cases of p=0.0 and p=1.0); in case of p>0.5 there's also a further optimisation that the implementation does not draw a random number of each edge but for each edge NOT included (which are expected to be less). This does not yield an asymptomatic improvement, but is faster in practice.

I consider all tests passing, as the one failing test case in Travis is not related to this PR and hence feel it is completed from a technical point of view. I did however not change the interface of the Python code to expose the flag for self-loop-free instances, as the naming issue is still open.

avdgrinten · 2018-10-01T08:00:56Z

I did not have the time to look at this PR yet. It also has a few merge conflicts. Could you resolve the conflicts while I review the PR?

After the conflicts are resolved, the CI can be restarted (with the new configuration) by closing/reopening the PR.

manpen · 2018-10-01T08:05:11Z

Yes, sure.

networkit/cpp/generators/ErdosRenyiEnumerator.h

networkit/cpp/generators/ErdosRenyiGenerator.h

avdgrinten · 2018-10-01T08:30:25Z

We will probably not rename the generator to GilbertGenerator (or similar). Even if ER is technically the wrong name, it has been in use for too long to change it now. If we want to add a true ER generator, we will have to use a different name for that.

avdgrinten

In general the PR looks good to me. I will merge it after some minor questions have been addressed.

networkit/cpp/generators/test/GeneratorsBenchmark.cpp

networkit/cpp/graph/test/GraphBuilderBenchmark.h

manpen · 2018-10-01T12:53:10Z

A small question regarding the name: is it okay if I rename the Enumerator part into GilbertEnumerator. This is new code, and the (correct) name does not break existing code; the only odd part is then that the ErdosRenyiGenerator uses the GilbertEnumerator ... I feel this is the right way to do it, but I completely understand if you prefer ErdosRenyiEnumerator.

avdgrinten · 2018-10-01T12:58:59Z

I think I'd still prefer ErdosRenyiEnumerator for consistency. If ER generator did not exist yet it would be a different matter...

…osrenyi

manpen · 2018-12-12T08:43:57Z

This failed CI test is again a sporadic Travis bug during system setup. I dealt with all remarks and consider this PR ready. Please let me know if there's anything else.

avdgrinten · 2018-12-14T13:50:43Z

LGTM

manpen changed the title ~~Feature parallel erdosrenyi~~ Feature Parallel EdosRenyi Generator Aug 10, 2018

rionda reviewed Aug 10, 2018

View reviewed changes

manpen force-pushed the feature_parallel_erdosrenyi branch from d5f4656 to da9ef8a Compare August 10, 2018 14:11

manpen changed the title ~~Feature Parallel EdosRenyi Generator~~ Feature Parallel ErdosRenyi Generator Aug 22, 2018

manpen added 14 commits August 22, 2018 16:13

Optimize GeneratorsBenchmark test for GraphBuilder

0147298

std::bind (and its indirection via std::function) is slow. Replace it by hard-coded chaining of PRNG and distribution gives a speed-up of nearly a factor 2.

Add documentation for ErdosRenyiEnumerator

f07c94d

Make fixed-point arith. optional in ErdosRenyiEnumerator

3324976

Use ErdosRenyiEnumerator in ErdosRenyiGenerator

d139c7d

The generator is now basically a pImpl wrapper to the enumerator. This should keep compilation times low despite the template magic of the enumerator.

Move GraphBuilder benchmark to GraphBuilderBenchmarks.cpp

3583ac5

I do not see any point on having them in GeneratorsBenchmark. This is misleading at best.

Minor silencing gcc

3c16355

Count number of edges produced in ErdosRenyiEnumerator

d52f493

ErdosRenyiEnumerator: Improve load balancing

681ece5

Allow directed ErdosRenyi without self loops

fd40637

manpen force-pushed the feature_parallel_erdosrenyi branch 2 times, most recently from c4255d8 to 55f66ab Compare August 25, 2018 14:45

manpen added 2 commits August 26, 2018 15:40

Improve testing of ErdosRenyiEnumerator

ea4e5a5

Remove ErdosRenyiEnumeratorGTest.h

b7b36dd

manpen force-pushed the feature_parallel_erdosrenyi branch from 55f66ab to b7b36dd Compare August 26, 2018 13:42

avdgrinten reviewed Oct 1, 2018

View reviewed changes

networkit/cpp/generators/ErdosRenyiEnumerator.h Outdated Show resolved Hide resolved

avdgrinten reviewed Oct 1, 2018

View reviewed changes

networkit/cpp/generators/ErdosRenyiGenerator.h Outdated Show resolved Hide resolved

avdgrinten reviewed Oct 1, 2018

View reviewed changes

networkit/cpp/generators/test/GeneratorsBenchmark.cpp Outdated Show resolved Hide resolved

networkit/cpp/generators/test/GeneratorsBenchmark.cpp Outdated Show resolved Hide resolved

networkit/cpp/graph/test/GraphBuilderBenchmark.h Outdated Show resolved Hide resolved

manpen added 3 commits December 11, 2018 17:27

Merge remote-tracking branch 'upstream/Dev' into feature_parallel_erd…

7322025

…osrenyi

Fix wrong comment

b959a3d

Minor improvements as suggested in PR discussion

5f4279b

angriman merged commit 26587a4 into networkit:Dev Dec 17, 2018

manpen deleted the feature_parallel_erdosrenyi branch June 5, 2019 14:41

manpen mentioned this pull request Jul 21, 2019

Add selfLoops flag to Python's ErdosRenyiGenerator #383

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Parallel ErdosRenyi Generator #221

Feature Parallel ErdosRenyi Generator #221

manpen commented Aug 10, 2018 •

edited

Loading

rionda Aug 10, 2018

manpen Aug 10, 2018

manpen commented Aug 10, 2018

manpen commented Aug 12, 2018

manpen commented Aug 29, 2018

avdgrinten commented Oct 1, 2018 •

edited

Loading

manpen commented Oct 1, 2018

avdgrinten commented Oct 1, 2018

avdgrinten left a comment

manpen commented Oct 1, 2018

avdgrinten commented Oct 1, 2018

manpen commented Dec 12, 2018

avdgrinten commented Dec 14, 2018

Feature Parallel ErdosRenyi Generator #221

Feature Parallel ErdosRenyi Generator #221

Conversation

manpen commented Aug 10, 2018 • edited Loading

rionda Aug 10, 2018

Choose a reason for hiding this comment

manpen Aug 10, 2018

Choose a reason for hiding this comment

manpen commented Aug 10, 2018

manpen commented Aug 12, 2018

manpen commented Aug 29, 2018

avdgrinten commented Oct 1, 2018 • edited Loading

manpen commented Oct 1, 2018

avdgrinten commented Oct 1, 2018

avdgrinten left a comment

Choose a reason for hiding this comment

manpen commented Oct 1, 2018

avdgrinten commented Oct 1, 2018

manpen commented Dec 12, 2018

avdgrinten commented Dec 14, 2018

manpen commented Aug 10, 2018 •

edited

Loading

avdgrinten commented Oct 1, 2018 •

edited

Loading