New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow Graph::filter_map to change IndexType #480
base: master
Are you sure you want to change the base?
Conversation
I believe the CI failures are completely unrelated to the PR |
@ABorgna ping This is a very minor (and hopefully uncontroversial) change |
Hi, sorry for the delay. Adding the extra parameter is a breaking change, as it can break type inference on the call sites (as shown by the failing test). |
Ah, I hadn't taken that into account, you're right that it's a breaking change in this way (but I'm pretty sure it should always be fixable by adding more typing information). My use-case is using |
@ABorgna I've updated the tests to make them pass |
Whoops, I ran the extended test suite this time, so should be good now, sorry @ABorgna |
Rebased onto master |
Now I haven't been active here, so take my thoughts with a grain of salt. Would it be possible to add a separate method that does this? I think that avoids the downsides, and you could still avoid duplicating the implementation. Here's what a maintainer thinks of: Error handling needs to be thought of in some way - the new index type could be smaller and could be too small to handle even the filtered graph. Is whatever it does now appropriate, or do you need to do further checks? Documentation also needs to be thought of in some way - should the doc comment be updated? StableGraph and Graph are at a near correspondance, if we add a new method or change an existing one in Graph, should StableGraph also be updated? (Not mandatory, but if it's easy.) |
I wouldn't be opposed as to changing the name of the method to get it upstreamed, do you have any suggestions? |
Maybe .filter_map_with_index_type (too long, but it follows the logical pattern). Think seriously about the error handling, because even if simple, changing index type adds an error scenario to this method that couldn't happen before. And let's check with @indietyp since he is planning newtyped indexes for the next version, how does it fit with that? |
In the current version the NodeIndex and EdgeIndex are defined as new type structs NodeIndex(usize) and EdgeIndex(usize). The idea behind this is simple: we're using graphs to essentially always index into some sort of continuous array of things, even with a GraphMap (those have a new trait that is used) so instead of choosing a size you are bound, just like Vec or HashMap by usize. The plan for filter_map and filter is for them to be more in line with iterators and the rest of the ecosystem, that means: they take self and return a read only GraphView instead, that can then be collected into another storage of choosing if one needs to mutate the view. Therefore changing the IndexType is likely no longer needed. (an explanation as to why an opaque newtype was chosen is also more und depth in the core crate docs of the 0.7.0 branch) |
If NodeIndex is always usize sized it sounds nice for some cases but it will be a performance cost for big graphs, which is the reason for the index type parameter - to be able to save both memory and cpu time by using an appropriately smaller index size. Of course it is possible to have NodeIndex(usize) in a public API while still using a smaller type in the stored data (then also changing index type by filter map doesn't become as relevant.) |
Yes, I think the benefits outweigh the costs here. I has several different iterations of the design, first a NodeIndex (previous one) but that was too confusing/a pitfall. Most cases studied don't actively make us of this except to artificially limit the amount of nodes that should exist, in that case graph storage implements should explicitly have facilities to limit them (or leave it up to the user like Vec and such). After that I had an iteration where the Id was completely opaque (and was Clone not Copy as requirement), but that had its own hurdles, for one implementing algorithms got more cumbersome and due to the double indirection in most/all cases (as we now take references instead of values) had a slowdown of 10% at maximum (I believe I posted speed analytics in one of the now merged PRs) - This actually split Indices into Managed or Arbitrary, which was interesting to explore but sadly convoluted the external API too much. So I found that setting it externally as an opaque value was the best way forward as it allows for easiest implementation while covering most if not all cases and crucially also removes the possibility of abuse or pitfalls like in GraphMap by making clear: the index (just like any sequential container) cannot be chosen by the user. I reckon that in most cases the NodeIndex isn't stored directly. DinoGraph for example uses 3/4th of the usize for an index and 1/4th for a generational index, GraphMap will be directly based on DinoGraph instead and the only place there still exists is Csr (which I am looking to replace mostly with ndarray or similar i. The underlying implementation) or the adjacency list which also uses a usize. So I don't really see how at least in the vast majority of projected cases a variability in the external non graph specific API outweighs the potential cost in implementation time + usage. |
From the documentation:
i.e. if nodes/edges are removed, the resulting graph may have incompatible node indices, which one might want to encode in the type of
IndexType
.This change allows for
filter_map
to be more general with respect to index type.