-
Notifications
You must be signed in to change notification settings - Fork 224
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor/graph writers #375
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR really simplifies code. Awesome. Most of my comments are "while-you're-at-it" style (that's what you get for touching old code ;) ), so feel free to disregard. But I think const write-methods should be considered.
* @param[in] graph The graph object | ||
* @param[in] path The file path to be written to | ||
*/ | ||
virtual void write(Graph& graph, std::string path) const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a shame we are loosing the constness here. I just (quickly) went through all writers and it seems to me that only NetworKitBinaryWriter
(which needlessly write to its nodes
attribute) and GraphToolBinaryWriter
(which needlessly has non-const methods which can be marked const) have use non-const writers.
If you've time, please make the GraphWriter::write const; otherwise we can do it in another PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, as you wrote those classes could easily be adapted to implement a const write
method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I disagree here. Why write()
is essentially the GraphWriter
s run
method. If you want to write in parallel, construct two writers. Making the virtual method const
heavily constraints what is efficiently possible in a GraphWriter
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To elaborate: The base class for GraphWriter
s should absolutely remain non-const
. I don't see what advantage const
offers for writers.
More broadly, I do not think that we should support parallel usage of any algorithm (or I/O interface) in NetworKit unless there is a good reason to do so. const
is an unnecessary pessimization here: it forces implementers to maintain local copies of algorithm state instead of storing the state in the algorithm's object. Often, this is much less efficient, especially if an algorithm's object is reused. Reuse of algorithm objects is arguably much more common than parallel use of the same algorithm's object.
Also note that const
does affect the API: turning a const
method to non-const
is a breaking change. We should only apply const
to functions that will not write to the algorithm's state, forever, regardless of implementation strategy. I am not sure if any of the GraphWriter
s should have const write()
methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I see your point and are kind of convinced that the base class should remain non-const, I still have two points:
- Currently there ARE writers that are const and the breaking of API you mentioned would occur in this PR
- AFAICS we currently have no examples where the state of the write should change (I do not think that
nodes
should be a member ofNetworKitBinaryWriter
but rather a local variable; it's only used inwrite
. InGraphToolBinaryWriter
no state is changed)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that those count
variables should be local. However, there might be writers that have a legitimate reason to construct large data structures (e.g., std::vector
s or std::unordered_map
s to remap node IDs?) in memory. Those structures should probably be stored in the object to avoid reallocating them whenever write()
is called.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree that for future writers it might be an issue, and -- given that -- I agree that the base class should remain non-const. I'm just saying that if we do this, we should make a conscious decision to break the interface (also as you already pointed out with probably no serious implications).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now the base class write
is non-const. Are there any other issues or we can merge this?
5c46d71
to
c729c10
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the changes and sorry for the fuzz about the const-ness.
I only spotted two minor issues. Everything else looks fine to me :)
|
||
virtual void write(const Graph& G, bool weighted, std::string path); | ||
virtual void write(const Graph &G, bool weighted, const std::string &path); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not have to be virtual
-- the writer is declared final
#include <fstream> | ||
#include <iostream> | ||
#include <string> | ||
|
||
#include <networkit/io/GraphReader.hpp> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for bringing this up again: I believe #include <networkit/io/GraphReader.hpp>
is not required.
Also while fstream
was not required in the header, the implemenation needs it. So please include it in the .cpp file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed fstream
because it is already included in GraphWriter.hpp
. However, GraphWriter.hpp
does not need fstream
, only its derived classes use it. Should I move fstream
to the derived classes and remove it from GraphWriter.hpp
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I missed that. Sorry. Never mind then. While I try to not rely on indirect includes, it's quite common in networkit, so I guess it's fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that there is no valid reason for including fstream
in GraphWriter.hpp
, and I agree that we should not rely on indirect includes. I'll move fstream
to the derived classes.
c729c10
to
e1b3e07
Compare
Thank you :) |
I just wanted to leave a big thank you to @cndolo, @angriman, and @avdgrinten. I used this PR (based on #325) to cache my rather large ASCII-based inputs with NKB. In one case, reading the file dropped from ~5min to less than 10s; that's really cool! |
That nice to hear @manpen! Indeed, fast caching is one of the main motivations of the NKBG format. Unfortunately, #380 bumps the version number of the format once more, without adding compatibility fixes for the current version (as I didn't expect that there were any users already). Is that a problem for your use case? If yes, we should add those compatibility fixes (those fixes should only consist of a few lines anyway - basically we would need to add code to ignore the last two fields of the file header). My general plan is provide backwards-compatibility once the format appears in a released version of NetworKit. |
For me personally that's not an issue and I would even vote to not include backward compatibility just yet in favour of an easier code -- after all, it the |
Great, then our compatibility guarantee for the NKBG format is: once it appears in a release, we provide backwards compatibility indefinitely. |
@angriman This PR has conflicts against |
Signed-off-by: Eugenio Angriman <eugenio.angriman@informatik.hu-berlin.de>
…m GraphWriter Signed-off-by: Eugenio Angriman <eugenio.angriman@informatik.hu-berlin.de>
e1b3e07
to
3122c4c
Compare
@avdgrinten conflicts have been fixed. |
All classes that write graphs inherit from
GraphWriter
. ExposedNetworkitBinaryReader/Writer
to Python, together with theGraphWriter
hierarchy.