-
Notifications
You must be signed in to change notification settings - Fork 224
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/girgs #320
Feature/girgs #320
Conversation
I didn't have time to review the code yet. Hence, I will only try to answer your design questions. For 2) and 3): Why don't you simply add a (non-virtual) method While I don't think that precomputation in a constructor is problematic per se, I believe that most |
Thank you for your answer! A code review is not necessary at the moment -- I included the code only for completeness. Currently, I only have the questions posted above; so esp. 1) whether foreign coding styles are acceptable in clearly isolated file that are also encapsulated into an own namespace and how we deal with the band-based RHG generator. Regarding your So as I see it we have four options: I feel c) and d) are viable, but favoured c) since it does not break our paradigm. |
Regarding the foreign code: I dislike the idea of copying code into NetworKit (even though I know that it has been done in the past) -- I fear that the code will diverge over time and important updates will not be pulled. I don't really see how this is better than adding a submodule (provided that the foreign code is located in a well-maintained public repository). I would prefer it even more if the submodule could be made optional: i.e., it is built by default but if (for whatever reason) the user decides to not clone it, the build still succeeds (without the GIRG generator, obviously). I think this is a route that we should take in the future as will i) make handling dependencies less painful, and hence ii) allow us to use them more freely. Regarding the constructor: b) is not viable. a) is not pretty but there is precedent in NetworKit to fail at runtime if parameters are chosen badly. As far as I can see, this is only a problem in Python right? In C++ we can just add multiple constructors (and use tag dispatch if they would be ambiguous otherwise). My suggestion would be to not compromise the quality of the C++ code because Python is missing features. We could use proper constructors in the C++ case and revert to static factory methods for Python. What do you think about that? |
Regarding the constructor issue: Yes, this is a purely pythonic issue. The C++ code is all nice and clean using polymorphic constructors. Following your suggestion, I will remove the Regarding the copied code: it is change somewhat. I migrated to NetworKit's PRNG infrastructure, NetworKit's parallel sorting dispatcher and added SignalHandlers. This cannot be easily achieved in the external library, so we either needed a branch or fork. Both do not solve the issue of possible divergence. Hence I would vote against it and copy the code if that's fine with you -- if we do not rename all internal variable (etc.) then porting possible changes from the original library remains straight forward. |
Alright, if the code needs to be modified, that changes the situation somewhat. Yes, in that case it is preferable to copy it into NetworKit. Coding style is not a major issue, if it is at least internally consistent. Regarding the static method: what about just using tag dispatch? Something like struct UseExternalPointsTag { };
static constexpr UseExternalPointsTag useExternalPoints{};
// Declaration
GeometricInhomogenousGenerator::GeometricInhomogenousGenerator(UseExternalPointsTag, std::vector<...> points)
// Usage
GeometricInhomogenousGenerator myGenerator{useExternalPoints, myPointSet}; I consider this to be more idiomatic C++ than the static factory method. |
I just noticed that you're not using a static factory method in the C++ code anyway. In this case, my suggestion above does not make sense. |
Copying code: cool! Sorry for my inexact reply. I would not touch the C++ code; its currently using three polymorphic overloads: /**
* @param[in] n Number of nodes
* @param[in] avgDegree The desired average degree with 0 < avgDegree < n-1
* @param[in] powerlawExp Target exponent of power-law distribution with powerlawExp > 2.0
* @param[in] alpha parameter (1/temperature) with alpha > 1
* @param[in] dimension Dimension of the underlying geometry with 1 <= dimension <= 5
*/
GeometricInhomogenousGenerator(count n, double avgDegree, double powerlawExp=3, double alpha=std::numeric_limits<double>::infinity(), unsigned dim=1);
/**
* Construct generator from weights that are then scale to match avgDegree and T.
*
* @param[in] points Coordinates of points
* @param[in] weights Unscaled weights (assumed to be powerlaw distributed)
* @param[in] avgDegree The desired average degree with 0 < avgDegree < n-1
* @param[in] alpha parameter (1/temperature) with alpha > 1
*
* @warning points and weights are moved into the container. The we're not using
* rvalue refs because Cython does not handle them.
*/
GeometricInhomogenousGenerator(std::vector<coordinate_t>& points, std::vector<double>& weights, double avgDegree, double alpha);
/**
* Construct generator from *already scaled* weights.
*
* @param[in] points Coordinates of points
* @param[in] weights *Scaled* weights
* @param[in] alpha parameter (1/temperature) with alpha > 1
*
* @warning points and weights are moved into the container. The we're not using
* rvalue refs because Cython does not handle them.
*/
GeometricInhomogenousGenerator(std::vector<coordinate_t>& points, std::vector<double>& weights, double alpha); The static factory ( gen = nk.generators.GeometricInhomogenousGenerators.fromDataset(coords, weights)
G = gen.generate() |
* @param[in] avgDegree The desired average degree with 0 < avgDegree < n-1 | ||
* @param[in] T Temperature with 0 <= T < 1 | ||
*/ | ||
GeometricInhomogenousGenerator(std::vector<coordinate_t>&& points, std::vector<double>&& weights, double avgDegree, double T); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do the constructors take the vectors by rvalue reference? You're essentially forcing users who want to pass copies to do this explicitly. I would prefer passing the vectors by value. This will still be cheap (at least if the user passes a temporary or uses std::move
) but will also gracefully fall back to a copy.
EDIT: Taking them by (non-const) reference is not a good solution either as this prevents passing temporaries. If the class needs an internal copy, take the vectors by value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In order to keep the necessary memory footprint as small as possible, we do the following:
- Move the user provided data set into the container.
- Convert it into an internal data structure
- OPTIONAL: Delete the the user provided data structure.
- Build the graph <- this is when we need the most memory so deleting the input might make sense.
The reason it is moved into the class is to be able delete the data as soon as it's not needed anymore. If you consider this interface as too complicated, we can drop the capability of deleting the data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I understand that you want to do this. My point is that passing by value enables this behavior without unexpected pitfalls. If you take the vector by value, you still get zero-copy behavior if the user passes a rvalue. However, it also works correctly (but less efficiently) if the user passes an lvalue and decides to reuse it later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Missed that and will fix it!
Just to let you know: I'm still hunting down a bug in that only occurs on NetworKit's CI on OSX (that's how I spotted #326) which might still take some time. |
70efaf2
to
e95d8e4
Compare
Contrary to the initial PR message, I will add
|
The failing CI build is only due to an Homebrew issue. Previous builds have been okay. |
What is the current status of this PR? |
I'm currently not aware of any bugs and welcome a review. |
Do you think this PR could be rebased to clean up the history? I am not suggesting to squash it into a single commit but maybe into a smaller number of commits? There are quite a few commits that fix up older commits (and merges from Dev to the feature branch); this makes reviewing the whole thing quite a bit tedious. |
I can give it a try this afternoon. |
Co-authored-by: Christopher Weyand <christopher.weyand@student.hpi.de>
Co-authored-by: Christopher Weyand <christopher.weyand@student.hpi.de>
Remove OMP max-reduction as not supported by MSVC Avoid UB on post-decremented iterator GIRGs without MSVC warnings Increase slack in probabilistic test Adhere to NetworKit code style
70efaf2
to
ad9b1d8
Compare
While not perfectly linear, the history should now be much more clean. Some intermediate commits do not compile (as we a couple of file have been moved and commits are now sorted semantically rather than historically). Hope that's alright with you. |
Yes, that's much better than before. It's a bit unfortunate that intermediate commits don't compile, but if we merge with a merge commit, it is possible to deal with that. |
I will try to review / merge this PR today. |
Thank you! It's not urgent. |
After our last discussion of how to deal with the GIRG / HyperGirg generators, I believe that the necessary changes can be done in the GIRGs library itself in a generic way. If we can add the GIRGs library as another git submodule, I'd close this PR and would prepare another one including only the necessary glue code. |
That would be great. If it cannot be done, we should treat GIRGs the same way as ttmath and isolate its code in an own directory under |
This PR will introduces two new (related) generators described in https://arxiv.org/abs/1905.06706.
A very large portion of the code is from @chistopher.
"girgs" (a generator for up to five-dimensional tori -> https://arxiv.org/abs/1511.00576). The girgs generator adds functionality to NetworKit and hence should be interesting in its own right.
"hypergirgs" (hyperbolic Graphs with positive temperature support)
In our experiments "hypergirgs" is consistently faster (~ 4times) than the band-based RHG generator currently integrated in NetworKit for threshold graphs.
HyperGIRGs also support temperatures T in [0; 1). Here, NetworKit currently uses a QuadTree-based approach with a complexity of (n^{3/2} + m)log n. HyperGirgs, in contrast, has an expected linear complexity, which yields significant speed-ups (we measured between two and three
orders of magnitude for n -> 1e6).
The implementation additionally has nice features, such as deterministic output for identical seeds, even in the context of parallel computation (which the current positive temperature generator has not).
The only caveat is that the current QuadTree generator support temperatures T>1; while those instances are not that interesting (this regime is quite quirky), it seems strange to reduce functionality.
Hence my suggestion would be to remove the band-based generator and replace it by HyperGIRGs, which would also take-over the T < 1 domain from the QuadTree. Then, the slow QuadTree is only used for T > 1. Is this okay?
Currently this PR contains only GIRGs ince input on some design decisions would be appreciate before I migrate hypergirgs.
1.) Since you are reluctant to add new dependencies, I copied the code from https://github.com/chistopher/girgs and put it into
networkit/cpp/generators/girgs
-- the files are isolated and the generator code itself lives within theNetworKit::girgs
namespace.The generator itself doesn't follow NetworKit coding style in order to ease porting changes from the upstream library. However, the NetworKit bindings (in
networkit/cpp/generator/GeometricInhomogenousGenerator.(cpp|h)
) to the generator obey the style guide. Is this acceptable?2.) Similarly to Hyperbolic Graphs, GIRGs are defined via a geometric embedding. Hence the generator starts by distributing points in space. In the past, I was unable to execute certain experiments since the current RHG implementation hides these points.
Hence I added the following interface:
When calling the GIRG constructor the points are already sampled (while we usally have no significant computation before calling
generate()
, sampling points is very fast and usually takes only a few milliseconds). The points are then accessible via thegetPositions()
/getWeights()
methods. When callinggenerate()
this data is freed to have more memory for the graph builder. If the user wishes to generate the graph and retain the points,generateKeepingInput()
is available (I opted for a new method rather than an overload ofgenerate
to avoid confusion on the virtual inheritance). OK?3.) Another use case one might have is to provide a user-prepared set of points to the generator. In the C++ interface this is easily implemented via polymorphic constructors. Unfortunately, we do not have them available in Python. Since I prefer an easy readable constructor signature over a
*args
/**kwargs
mess, I opted to declare a second Python classGeometricInhomogenousFromPointSetGenerator
which expects the point set in its constructor.Is this okay? Do you prefer a static factory function? Alternative suggestions?
Sorry for the long post and thanks!