Allow Graph::filter_map to change IndexType #480

NickHu · 2022-01-26T11:09:52Z

From the documentation:

The resulting graph has the structure of a subgraph of the original graph. If no nodes are removed, the resulting graph has compatible node indices; if neither nodes nor edges are removed, the result has the same graph indices as self.

i.e. if nodes/edges are removed, the resulting graph may have incompatible node indices, which one might want to encode in the type of IndexType.

This change allows for filter_map to be more general with respect to index type.

NickHu · 2022-01-26T16:55:59Z

I believe the CI failures are completely unrelated to the PR

NickHu · 2022-05-30T23:18:22Z

@ABorgna ping

This is a very minor (and hopefully uncontroversial) change

ABorgna · 2022-05-31T10:58:13Z

Hi, sorry for the delay.

Adding the extra parameter is a breaking change, as it can break type inference on the call sites (as shown by the failing test).
The only problem I see with this is that it may introduce a lot of boilerplate for the caller. Did you have a use case in mind ?

NickHu · 2022-05-31T11:25:51Z

Ah, I hadn't taken that into account, you're right that it's a breaking change in this way (but I'm pretty sure it should always be fixable by adding more typing information).

My use-case is using NodeIndex<MyType> like a phantom type to help the type-checker make sure I'm not mixing up indices. For instance, at one point I have a big graph (say, where the node indices have type NodeIndex<DefaultIx>), and I use filter_map to split that up into a bunch of different subgraphs which I then process. The indices of those are notionally different, and not compatible, so I use this change to make the subgraphs have node index type NodeIndex<RestrictionIx> or something like that and prohibit myself from mixing up indices.

NickHu · 2022-05-31T13:27:47Z

@ABorgna I've updated the tests to make them pass

NickHu · 2022-05-31T14:07:22Z

Whoops, I ran the extended test suite this time, so should be good now, sorry @ABorgna

NickHu · 2023-01-23T12:10:30Z

Rebased onto master

bluss · 2024-03-01T16:29:45Z

Now I haven't been active here, so take my thoughts with a grain of salt. Would it be possible to add a separate method that does this? I think that avoids the downsides, and you could still avoid duplicating the implementation.

Here's what a maintainer thinks of:

Error handling needs to be thought of in some way - the new index type could be smaller and could be too small to handle even the filtered graph. Is whatever it does now appropriate, or do you need to do further checks?

Documentation also needs to be thought of in some way - should the doc comment be updated?

StableGraph and Graph are at a near correspondance, if we add a new method or change an existing one in Graph, should StableGraph also be updated? (Not mandatory, but if it's easy.)

NickHu · 2024-03-04T10:38:29Z

I wouldn't be opposed as to changing the name of the method to get it upstreamed, do you have any suggestions?

bluss · 2024-03-05T20:52:28Z

Maybe .filter_map_with_index_type (too long, but it follows the logical pattern). Think seriously about the error handling, because even if simple, changing index type adds an error scenario to this method that couldn't happen before. And let's check with @indietyp since he is planning newtyped indexes for the next version, how does it fit with that?

indietyp · 2024-03-05T20:58:25Z

In the current version the NodeIndex and EdgeIndex are defined as new type structs NodeIndex(usize) and EdgeIndex(usize). The idea behind this is simple: we're using graphs to essentially always index into some sort of continuous array of things, even with a GraphMap (those have a new trait that is used) so instead of choosing a size you are bound, just like Vec or HashMap by usize.

The plan for filter_map and filter is for them to be more in line with iterators and the rest of the ecosystem, that means: they take self and return a read only GraphView instead, that can then be collected into another storage of choosing if one needs to mutate the view.

Therefore changing the IndexType is likely no longer needed. (an explanation as to why an opaque newtype was chosen is also more und depth in the core crate docs of the 0.7.0 branch)

bluss · 2024-03-05T21:04:46Z

If NodeIndex is always usize sized it sounds nice for some cases but it will be a performance cost for big graphs, which is the reason for the index type parameter - to be able to save both memory and cpu time by using an appropriately smaller index size.

Of course it is possible to have NodeIndex(usize) in a public API while still using a smaller type in the stored data (then also changing index type by filter map doesn't become as relevant.)

indietyp · 2024-03-05T21:40:50Z

Yes, I think the benefits outweigh the costs here. I has several different iterations of the design, first a NodeIndex (previous one) but that was too confusing/a pitfall. Most cases studied don't actively make us of this except to artificially limit the amount of nodes that should exist, in that case graph storage implements should explicitly have facilities to limit them (or leave it up to the user like Vec and such).

After that I had an iteration where the Id was completely opaque (and was Clone not Copy as requirement), but that had its own hurdles, for one implementing algorithms got more cumbersome and due to the double indirection in most/all cases (as we now take references instead of values) had a slowdown of 10% at maximum (I believe I posted speed analytics in one of the now merged PRs) - This actually split Indices into Managed or Arbitrary, which was interesting to explore but sadly convoluted the external API too much.

So I found that setting it externally as an opaque value was the best way forward as it allows for easiest implementation while covering most if not all cases and crucially also removes the possibility of abuse or pitfalls like in GraphMap by making clear: the index (just like any sequential container) cannot be chosen by the user.

I reckon that in most cases the NodeIndex isn't stored directly. DinoGraph for example uses 3/4th of the usize for an index and 1/4th for a generational index, GraphMap will be directly based on DinoGraph instead and the only place there still exists is Csr (which I am looking to replace mostly with ndarray or similar i. The underlying implementation) or the adjacency list which also uses a usize. So I don't really see how at least in the vast majority of projected cases a variability in the external non graph specific API outweighs the potential cost in implementation time + usage.

NickHu force-pushed the filter_map branch from f71140f to fd2798f Compare May 30, 2022 23:13

ABorgna added enhancement breaking-change labels May 31, 2022

NickHu force-pushed the filter_map branch from 3fc730d to 8c22226 Compare May 31, 2022 14:06

NickHu force-pushed the filter_map branch from cc06c7c to 5a10d71 Compare October 26, 2022 10:32

NickHu force-pushed the filter_map branch from 5a10d71 to a6bd2ae Compare January 23, 2023 12:10

NickHu added 2 commits February 7, 2024 14:35

Allow Graph::filter_map to change IndexType

425bf51

Fix filter_map test by specifying Index type

80fa2cf

NickHu force-pushed the filter_map branch from a6bd2ae to 80fa2cf Compare February 7, 2024 14:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow Graph::filter_map to change IndexType #480

Allow Graph::filter_map to change IndexType #480

NickHu commented Jan 26, 2022

NickHu commented Jan 26, 2022

NickHu commented May 30, 2022

ABorgna commented May 31, 2022

NickHu commented May 31, 2022

NickHu commented May 31, 2022

NickHu commented May 31, 2022

NickHu commented Jan 23, 2023

bluss commented Mar 1, 2024

NickHu commented Mar 4, 2024

bluss commented Mar 5, 2024 •

edited

indietyp commented Mar 5, 2024

bluss commented Mar 5, 2024 •

edited

indietyp commented Mar 5, 2024

Allow Graph::filter_map to change IndexType #480

Are you sure you want to change the base?

Allow Graph::filter_map to change IndexType #480

Conversation

NickHu commented Jan 26, 2022

NickHu commented Jan 26, 2022

NickHu commented May 30, 2022

ABorgna commented May 31, 2022

NickHu commented May 31, 2022

NickHu commented May 31, 2022

NickHu commented May 31, 2022

NickHu commented Jan 23, 2023

bluss commented Mar 1, 2024

NickHu commented Mar 4, 2024

bluss commented Mar 5, 2024 • edited

indietyp commented Mar 5, 2024

bluss commented Mar 5, 2024 • edited

indietyp commented Mar 5, 2024

bluss commented Mar 5, 2024 •

edited

bluss commented Mar 5, 2024 •

edited