Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow Graph::filter_map to change IndexType #480

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

NickHu
Copy link
Contributor

@NickHu NickHu commented Jan 26, 2022

From the documentation:

The resulting graph has the structure of a subgraph of the original graph. If no nodes are removed, the resulting graph has compatible node indices; if neither nodes nor edges are removed, the result has the same graph indices as self.

i.e. if nodes/edges are removed, the resulting graph may have incompatible node indices, which one might want to encode in the type of IndexType.

This change allows for filter_map to be more general with respect to index type.

@NickHu
Copy link
Contributor Author

NickHu commented Jan 26, 2022

I believe the CI failures are completely unrelated to the PR

@NickHu
Copy link
Contributor Author

NickHu commented May 30, 2022

@ABorgna ping

This is a very minor (and hopefully uncontroversial) change

@ABorgna
Copy link
Member

ABorgna commented May 31, 2022

Hi, sorry for the delay.

Adding the extra parameter is a breaking change, as it can break type inference on the call sites (as shown by the failing test).
The only problem I see with this is that it may introduce a lot of boilerplate for the caller. Did you have a use case in mind ?

@NickHu
Copy link
Contributor Author

NickHu commented May 31, 2022

Ah, I hadn't taken that into account, you're right that it's a breaking change in this way (but I'm pretty sure it should always be fixable by adding more typing information).

My use-case is using NodeIndex<MyType> like a phantom type to help the type-checker make sure I'm not mixing up indices. For instance, at one point I have a big graph (say, where the node indices have type NodeIndex<DefaultIx>), and I use filter_map to split that up into a bunch of different subgraphs which I then process. The indices of those are notionally different, and not compatible, so I use this change to make the subgraphs have node index type NodeIndex<RestrictionIx> or something like that and prohibit myself from mixing up indices.

@NickHu
Copy link
Contributor Author

NickHu commented May 31, 2022

@ABorgna I've updated the tests to make them pass

@NickHu
Copy link
Contributor Author

NickHu commented May 31, 2022

Whoops, I ran the extended test suite this time, so should be good now, sorry @ABorgna

@NickHu
Copy link
Contributor Author

NickHu commented Jan 23, 2023

Rebased onto master

@bluss
Copy link
Member

bluss commented Mar 1, 2024

Now I haven't been active here, so take my thoughts with a grain of salt. Would it be possible to add a separate method that does this? I think that avoids the downsides, and you could still avoid duplicating the implementation.

Here's what a maintainer thinks of:

Error handling needs to be thought of in some way - the new index type could be smaller and could be too small to handle even the filtered graph. Is whatever it does now appropriate, or do you need to do further checks?

Documentation also needs to be thought of in some way - should the doc comment be updated?

StableGraph and Graph are at a near correspondance, if we add a new method or change an existing one in Graph, should StableGraph also be updated? (Not mandatory, but if it's easy.)

@NickHu
Copy link
Contributor Author

NickHu commented Mar 4, 2024

I wouldn't be opposed as to changing the name of the method to get it upstreamed, do you have any suggestions?

@bluss
Copy link
Member

bluss commented Mar 5, 2024

Maybe .filter_map_with_index_type (too long, but it follows the logical pattern). Think seriously about the error handling, because even if simple, changing index type adds an error scenario to this method that couldn't happen before. And let's check with @indietyp since he is planning newtyped indexes for the next version, how does it fit with that?

@indietyp
Copy link
Member

indietyp commented Mar 5, 2024

In the current version the NodeIndex and EdgeIndex are defined as new type structs NodeIndex(usize) and EdgeIndex(usize). The idea behind this is simple: we're using graphs to essentially always index into some sort of continuous array of things, even with a GraphMap (those have a new trait that is used) so instead of choosing a size you are bound, just like Vec or HashMap by usize.

The plan for filter_map and filter is for them to be more in line with iterators and the rest of the ecosystem, that means: they take self and return a read only GraphView instead, that can then be collected into another storage of choosing if one needs to mutate the view.

Therefore changing the IndexType is likely no longer needed. (an explanation as to why an opaque newtype was chosen is also more und depth in the core crate docs of the 0.7.0 branch)

@bluss
Copy link
Member

bluss commented Mar 5, 2024

If NodeIndex is always usize sized it sounds nice for some cases but it will be a performance cost for big graphs, which is the reason for the index type parameter - to be able to save both memory and cpu time by using an appropriately smaller index size.

Of course it is possible to have NodeIndex(usize) in a public API while still using a smaller type in the stored data (then also changing index type by filter map doesn't become as relevant.)

@indietyp
Copy link
Member

indietyp commented Mar 5, 2024

Yes, I think the benefits outweigh the costs here. I has several different iterations of the design, first a NodeIndex (previous one) but that was too confusing/a pitfall. Most cases studied don't actively make us of this except to artificially limit the amount of nodes that should exist, in that case graph storage implements should explicitly have facilities to limit them (or leave it up to the user like Vec and such).

After that I had an iteration where the Id was completely opaque (and was Clone not Copy as requirement), but that had its own hurdles, for one implementing algorithms got more cumbersome and due to the double indirection in most/all cases (as we now take references instead of values) had a slowdown of 10% at maximum (I believe I posted speed analytics in one of the now merged PRs) - This actually split Indices into Managed or Arbitrary, which was interesting to explore but sadly convoluted the external API too much.

So I found that setting it externally as an opaque value was the best way forward as it allows for easiest implementation while covering most if not all cases and crucially also removes the possibility of abuse or pitfalls like in GraphMap by making clear: the index (just like any sequential container) cannot be chosen by the user.

I reckon that in most cases the NodeIndex isn't stored directly. DinoGraph for example uses 3/4th of the usize for an index and 1/4th for a generational index, GraphMap will be directly based on DinoGraph instead and the only place there still exists is Csr (which I am looking to replace mostly with ndarray or similar i. The underlying implementation) or the adjacency list which also uses a usize. So I don't really see how at least in the vast majority of projected cases a variability in the external non graph specific API outweighs the potential cost in implementation time + usage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants