Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Future of this crate #551

Open
indietyp opened this issue May 11, 2023 · 29 comments
Open

Proposal: Future of this crate #551

indietyp opened this issue May 11, 2023 · 29 comments
Labels
C-proposal Category: A proposal that proposes new functionality or a new direction in the crate

Comments

@indietyp
Copy link
Member

indietyp commented May 11, 2023

Hi!

Thank you all for creating such an incredible library. I use petgraph nearly on a day-to-day basis in both my professional and personal projects. After observing progress (and the different issues) over the last couple of months, I thought that it might make sense for me to contribute to petgraph itself, not only potentially adding some new functionality but also maybe lead petgraph into a new direction while maintaining its usefulness and broad adoption.

After thinking about it, I have come up with some action points I'd like to realize, and I know they are pretty drastic. Nevertheless, to not duplicate work or implementation, it makes sense to build onto the foundation of the existing petgraph.

If this is suitable for this crate, please let me know, and I'll start working on it. I still have some open questions, like:

  • How would we go about code review/PRs? Do I create intermittent PRs that add functionality, slowly transforming everything, or have one big blob that will then be reviewed/merged?
  • Should this be done on the main branch? (maybe a dedicated next branch might make sense)
  • I want to commit to being a long-term maintainer of the crate. Is this alright? Is this okay?

Thank you so much for considering all of this! (and I guess this is where I need to mention @bluss?) I look forward to working with you all!

I am open to having a lively discussion and going more in-depth, potentially leading to a change in plans.

Plan

1. Reorganization

Split up the crate:

  • petgraph-core: Includes all traits. To reduce churn, we should consider stabilizing this first and avoid breaking changes. This way, other crates can feel comfortable implementing their algorithms onto it.
  • petgraph-graph: includes Graph and StableGraph (as they are commonly used interchangeably, I'd bundle them in a single crate)
  • petgraph-adjacency-list: includes List
  • petgraph-csr: includes CSR graph
  • petgraph-graphmap: includes GraphMap
  • petgraph-matrix: includes MatrixGraph (name might need changes, to not confuse with matrices in a non-graph type sense)
  • petgraph-algorithms: includes all algorithms currently in petgraph; we might want to feature gate them (not only scc and friends, but also, if possible, Bfs)
  • petgraph-dot: support for dot writing (and maybe reading in the future)
  • petgraph: re-export of all sub-crates, main entry point for binaries

(names are ofc up to debate)

2. GATification (#552)

This would raise the MSRV to 1.65 but enable a lot more flexibility; every trait that could benefit from this should be converted into its GAT form, allowing things like mutable vs. reference access.

3. Trait rework (#552)

Think if we can extend the current set of traits to enable more flexible approaches, things like: Filter, FilterMap, and Map as traits.

IMO there are a lot of valuable traits that are hidden or need to be correctly explained. I want to surface traits more as well as adequately describe them. Some traits, like IntoNeighbours, are (again IMO) named quite misleading. While technically correct, it is only implemented for &Graph and friends, meaning it is non-consuming (even tho Into as a trait is consuming).

We should also look at all functions implemented directly and see if we can reconcile them into traits. Even marker traits may be of use.

4. Backlog Bonanza

Over the past few months (/ year?), many issues and pull requests have been opened once the new foundation of the crate has been created. I want to go through all issues and address (or close them).

5. My friend, my foe NetworkX

I enjoy the NetworkX library from Python, and I miss some of the capabilities it provided; as a long-term plan, I'd love to be able to use similar functionality in Rust. I think petgraph has the potential to live up to it.

This includes the rich set of graph algorithms it provides. I want petgraph to be able to do most of the possible things in NetworkX. I want to implement most of the functions and generators that NetworkX provides (and even some novel ones?) so that petgraph is a comprehensive toolkit for graph analysis.

6. no-std

Judging from the issues (and my own selfish needs), I think no-std is a somewhat important goal for such a "fundamental" library and enables the use of petgraph in many more use cases. Initially, I'd like petgraph to be available with alloc (I don't think that should be too hard™). In the future, we might be able also to make alloc opt-in, creating a separate crate for, e.g., fixed-size graphs or use of the nightly allocator API.

7. CI

While doing some major rework on the library itself, I'd like to touch up some areas, adding more tests, benchmarks, and coverage statistics, as well as tests using cargo-hack to ensure that all feature permutations work.

8. Documentation Touchup

I think the existing documentation is already excellent, but the docs in the lib.rs file do not adequately introduce the available traits, etc.; while doing this rework, I'd like to review the docs and add some sections.

Timeline

This would most definitely be a breaking change (warranting a 0.7.0 release). 1-3 are more immediate goals, while I'd like to focus on 4-8 longer term.

I guess that 1-3 might take one to three months maximum, depending on review speed, discussions, etc., for 4-8, as they are more of an ongoing effort, I have no time guesstimate.

@bluss
Copy link
Member

bluss commented May 11, 2023

cc @XVilka @ABorgna

becoming a new long-term maintainer sounds great, I'm happy to read about your initiative. I am not able to be very active in this.
My advice would be to think of petgraph releases - releases is when the code really reaches other users on crates.io. So remember to design for them - and gather a status about current master vs crates.io releases.

@indietyp
Copy link
Member Author

Thank you for your kind words @bluss !

I plan on keeping a change log of all changes while doing them (as to not forget anything) as well as maybe write up a blog post once ready. (Tho I don't know where to publish it to just yet)

I fully expected to work on this by myself so no worries!

To adjust to the resource constraints of the team my proposed plan of action is (depending on how comfortable you are with it):

  • create a new origin/next branch
  • do all changes there in there without PRs that need review as there aren't enough resources to do so
  • once ready have someone look over it (optional) merge to master and publish to crates.io

The last step will only happen once (A) enough test coverage has been achieved - I am aiming for 90+% (I know that coverage is not the best statistic, but it's a good indicator IMO) (B) everything has been properly documented.

While working on the changes I'd be happy to answer any questions and/or report progress!

While doing all of this I'd love to be able to privately message you in case I find something that is unclear to me (I don't expect this to be a regular occurrence)

@bluss
Copy link
Member

bluss commented May 13, 2023

I see that you are asking me, but I'm not the active maintainer unfortunately, so we'll have to ask everyone. Maybe @ABorgna has some thoughts.

@ABorgna
Copy link
Member

ABorgna commented May 13, 2023

That sounds like a great proposal!
Here are some general comments, we can make specific issues for each as needed.

Some thoughts

  • 1: I think there was some previous talk about something similar this, but I can't find it.
    I'm in favour of this. I just hope it doesn't add too much boilerplate, CI runtime, etc.

    The dot module is a simple and small impl, it may not warrant an stand-alone module. The rest I think are ok.

  • 2+3: I see as the same package. Raising the MSRV is always a delicate point, but this should be the time to clean things up using GATs.

    The current trait definitions I agree can be quite confusing for a new user. A first step would be doing the analysis you mention, and proposing a new/renamed set of traits. We should consider how to manage the API breakage, add deprecation warnings, etc.

    Developing this in a branch/fork sounds alright.

  • 4: The growing backlog is a very present issue. It's always to have helping hands :)

  • 5: This sounds great, esp after implementing (1).

  • 6: A highly requested feature. There are some PRs with discussion to start from: No_std feature switch  #238 NO_STD #370


These are quite a few ideas. I think they can be implemented each on their own, and we can progressively merge them.

I just added you as a collaborator. Thanks for offering your time !

@bluss
Copy link
Member

bluss commented May 13, 2023

@ABorgna Can I add you as direct owner of the crate on petgraph for future proofness? Ownership through team is not like full ownership, that I know. Great that petgraph has you! And let me know if there is somthing admin-y that I need to do.

If one would contact me, maybe it works to contact me with dms from #rust-sci:matrix.org on Matrix/Element or https://social.linux.pizza/@bluss on mastodon. The latter is the likely the best, right now.

@ABorgna
Copy link
Member

ABorgna commented May 13, 2023

Sure!

@bluss
Copy link
Member

bluss commented May 13, 2023

The current trait_template! macro and what it does is truly a monster. An achievement at the time, funny still, but it would be great to replace all of that with HRTB or GAT systems for visitor traits. And that has always been the plan! GAT just took some time to arrive..

@indietyp
Copy link
Member Author

indietyp commented May 13, 2023

That is the plan! I want to eliminate most, if not all, macros used to implement traits. I have made several proposals in the linked issue. Due to the complexity of the iterators (and their usage of GATs), I'd like to wait until we have decided how to "modernize" all the other traits. Here experimentation is key.

@indietyp
Copy link
Member Author

@ABorgna I seem unable to create new branches, meaning I cannot create my working branches (without a fork) or the next branch.

@ABorgna
Copy link
Member

ABorgna commented May 14, 2023

@ABorgna I seem unable to create new branches, meaning I cannot create my working branches (without a fork) or the next branch.

Can you check now? I updated the permissions, I hope that's enough.

@indietyp
Copy link
Member Author

Didn't seem to have worked, I still get:

remote: Permission to petgraph/petgraph.git denied to indietyp.
fatal: unable to access 'https://github.com/petgraph/petgraph.git/': The requested URL returned error: 403

@indietyp
Copy link
Member Author

@bluss I've been looking through the code and have been wondering, is there a specific reason why DfsSpace exists? It was added 7 years ago, but functionally it is just a wrapper around Dfs

@indietyp
Copy link
Member Author

Unrelated to the initial proposal, but maybe someone knows here: was there any specific reason the Graph serialization format was used?

It seems to be non-standard, and from what I gathered, there are several "standard-ish" formats out there, like GEXF (being the most popular, but also being XML based), Cytoscape JSON, sigma.js.

The reason I am asking is because I think adhering to a common standard would be quite beneficial, it would allow graphs generated in e.g. petgraph be serialized, loaded in networkx then visualized in cytoscape.

@pnevyk
Copy link
Contributor

pnevyk commented May 21, 2023

I just contributed one algorithm in the past, but here are some of my thoughts. I have been doing some independent experimentation in this repository (this is not to promote it, just a reference for those who are interested in the ideas that I am going to present).

  1. I believe that it would be helpful to users, especially beginners, to organize the algorithms into modules depending on the problem they solve (like shortest paths, strongly connected components, etc.). The module documentation could nicely describe the problem at hand, list all algorithms that are implemented for it and present their advantages and disadvantages. It could also help discoverability of the algorithms, users know what they need to solve, but not necessarily the names of the algorithms for it.

  2. Use new types for return values from the algorithms instead of generic types like Vec<T> or HashMap<T>. This allows changing the internals in backwards-compatible manner. For example, dijkstra now returns HashMap<G::NodeId, K>. If this was hidden in a newtype, it could easily be changed to use fxhash hasher, or to add predecessors hashmap so that the shortest path can be reconstructed. Hiding the internals might also allow having the results in a form that is not very convenient to use directly by the user, but is effective and easily wrapped to provide a nice interface. Quite a shameless plug, because I contributed this algorithm, but I think that Matching demonstrates this quite nicely.

  3. Builder pattern for optional parameters of algorithms. I think I took inspiration for this from Ergonomic APIs for hard problems talk (link starts at the time when this particular topic is presented). Some examples of optional parameters are the goal node for Dijkstra or reusing state from previous algorithm run on the same or similar graph (btw, I personally understood DfsSpace to have this use case). This probably doesn't sound too compelling, but there may be more interesting examples. Important is that new optional parameters can again be added in backwards-compatible way.

  4. Separating graph storage (adjacency list, CSR, ...) from the high-level semantics (generic graph, path, ...). The graph storages could be required to implement only a minimal set of APIs, while the high-level encapsulations could wrap them and provide broad range of convenient API. This would force all implementations to have the same API (although the question is if that is a good thing, I am thinking about GraphMap for example which has different method signature for adding an edge). It may also simplify the support for no_std, as there could exist a no_std implementation for a storage, while the high-level encapsulation could be used unchanged. It would be somewhat similar to allocator_api.

These are the main ones. Everything would be a massive breaking change. I would definitely understand that it would not be worthy. Or deciding to postpone it and release a version with traits rework first, which in itself will probably be quite significant.

Regarding the comments about dot crate and serialization format, how about having a crate that implements various serialization formats and allows to export and load a graph from them. This would justify a standalone module while providing flexibility in the choice of the format.

@indietyp
Copy link
Member Author

indietyp commented May 21, 2023

Hey @pnevyk, thank you for your opinion. Mine are pretty closely aligned with yours, which is excellent!

I already did (1) in #561 (with backward-compatible imports in the main crate).

All in all, these are all quite drastic changes (with mine combined), and I have been contemplating just ripping off the band-aid and also touching up the algorithms as well. I like the idea of using the builder pattern for the API; it may make sense to facilitate the builder-lite pattern.

You have an entry point: floyd_warshal(graph: &G) -> FloydWarshall

Then on FloydWarshall you have the functions:

  • with_edge_cost()
  • run(self) -> Result

Where Result is an opaque new-type. Theoretically, we could also use the function as a "convenience" method and expose the FloydWarshall struct with a more flexible constructor.

The goal is to convert every algorithm into a generic one, so no more complement(graph: &Graph<...>), but instead use the traits defined in petgraph-core. Specifically for the traits, I have an open proposal at #552, which should hopefully make things more straightforward and open up implementing more algorithms in the petgraph-algorithms crate.

About 4: that is precisely what #561 is doing; everything is its crate, so you have petgraph-csr, petgraph-graph, petgraph-graphmap, etc.

About dot: this currently remains in the petgraph crate, but I thought it would make sense to instead have a petgraph-io crate; this would also be a massive breaking change; instead of directly implementing serde; we would implement different formats in petgraph-io, and people could then choose which one to use (using, e.g., #[serde(with =)]), I dislike that the current format is not standards compliant.

(by the way I like the API of matching that you created)

@pnevyk
Copy link
Contributor

pnevyk commented May 21, 2023

Theoretically, we could also use the function as a "convenience" method and expose the FloydWarshall struct with a more flexible constructor.

This sounds like a great compromise!

The goal is to convert every algorithm into a generic one

I like this.

About 4: that is precisely what #561 is doing

Hmm, I took a look and it's not precisely what I meant, if I didn't overlook something. My idea was to allow something like this:

let matrix = Graph::new_in(AdjMatrix::new());
let csr = Graph::new_in(Csr::new());

This might allow to be minimalistic with required API on the storages themselves, but implement variety of methods on the Graph (not to be confused with current Graph type) which would use the minimal API as building blocks. For example, it is possible to implement replace_node by mem::replace(self.node_mut(v).unwrap(), weight).

I thought it would make sense to instead have a petgraph-io crate

That's exactly what I had in mind.

instead of directly implementing serde; we would implement different formats in petgraph-io, and people could then choose which one to use

I think these could be orthogonal features. When you implement serde Serialize and Deserialize, you don't have control on which format the user (de)serializes to/from. So petgraph could implement serde as well as various formats in petgraph-io which the user could use directly (something like Dot::write(&graph, &mut buf), but I am not suggesting any particular form of the API).

All in all, these are all quite drastic changes (with mine combined), and I have been contemplating just ripping off the band-aid and also touching up the algorithms as well.

I think it would be good to proceed in smaller steps, especially as petgraph is quite an established library. Smaller steps have also higher chance for being reviewed and accepted by the maintainers. But it's good to have an ambitious vision :)

@indietyp
Copy link
Member Author

Hmm, I took a look and it's not precisely what I meant, if I didn't overlook something. My idea was to allow something like this:

I really, really like this. I haven't considered such an API just yet, but the idea that you could simply do the following:

struct Graph<S: Storage> {...}

... sounds like a very interesting idea. It would remove the burden of every single graph implementation to implement every single trait (which we do right now :/) and would make the API a whole lot cleaner.

There are some things to consider, as there are some differences between graphs that we currently have; some do not support self-loops, and others do not support multiple edges between the same nodes, but I wonder if we can overcome those, by only having a select number of traits that Storage can implement.

I will create a competing proposal to #552 to explore this idea further.

At the top of my head, Storage would need to at least:

  1. differentiate between directed and undirected
  2. ability to manipulate nodes and edges (get them, remove them, insert them)
  3. have a Node/Edge Index and Weight (for some, like GraphMap, this might be synonymous)
  4. get a specific node and its edges (depending on if directed or undirected)

... and I think that's it 🤔

... but I guess (like you mentioned) this would be "even more" of a breaking change than I am proposing.

... but now that I think about it, the trait rework and your allocator API style idea do not necessarily XOR themselves; what I could imagine is that petgraph-algorithms still only depend on generics, and we transition in 0.8.0 from everything implementing all traits to the storage API for all built-in graphs. This would still enable interop with other crates which might not be able to fully buy into our new storage API. 🧐

@pnevyk
Copy link
Contributor

pnevyk commented May 21, 2023

  1. get neighbors of a node

But let's keep this thread relatively focused on the plan that you proposed in your original post. I just wanted to share the ideas so they could potentially influence the work somehow, but I definitely don't want to put even more work on you.

If you are interested, you could have a look on my work, where I implemented the ideas that I presented here. It might be useful. I don't link it to promote my "library" here, personally I would rather have these features in petgraph.

@indietyp indietyp added the C-proposal Category: A proposal that proposes new functionality or a new direction in the crate label May 22, 2023
@marvin-hansen
Copy link

@indietyp is there a chance you can merge #547 ?

I'm asking because the PR contains a simple fix for an integer overflow resulting from an ancient default setting that prevents the MatrixGraph from silently overwriting nodes when hitting the 50k limit.

-No breaking changes

  • All tests are passing
    -All benches passing w/o regression

BTW, totally love the new generic storage design. Thank you for all your good work.

@indietyp
Copy link
Member Author

indietyp commented Jun 19, 2023

DefaultIx is used in the generic; wouldn't it be possible to set that as a generic as a stop-gap?

My current plan of action is as follows:

  1. finish Split out petgraph into separate crates #561 (hopefully this week, porting all tests has been a ride)
  2. work on traits
  3. Review the PRs to see what we can resolve/merge.

I will likely change DefaultIx in the 0.7.0 release anyway (unifying it).

@marvin-hansen
Copy link

@indietyp setting in generic is perfectly fine for the time being. Can you do that or should I propose a PR?

I am asking for this patch because I am about to release my crate which currently depends on a patched fork of petgraph, but I would rather prefer to switch to the official release if its possible this gets fixed.

@indietyp
Copy link
Member Author

What I mean is, wouldn't you be able to do: MatrixGraph<N, E, Directed, Option<E>, u32> instead of changing the DefaultIx?

@marvin-hansen
Copy link

marvin-hansen commented Jun 26, 2023 via email

@bluss
Copy link
Member

bluss commented Mar 1, 2024

@indietyp about the serialization format, it's just bespoke, written to be a version-stable format (instead of an autoderived one). Using a standard format would be better, but does that fit into the serde model, and how does the user's weight types fit into that, they can be arbitrary serde serialized entities too.

@indietyp about DfsSpace, I don't really know, if the code looks like it has no use, it can of course be refactored.

@indietyp
Copy link
Member Author

indietyp commented Mar 1, 2024

I've done some research into different formats and it seems like the best way to do this is to simply allow exports to defined formats as well as a default one. I think (but need to check) there's a format supported by NetworkX that supports arbitrary node and and edge weights.

about DfsSpace during my exhaustive refactoring (which is now finally going to continue as I have a bit more free time) DfsSpace has gone away.

I am also sorry for the delay, 0.7.0 has transformed a bit in scope and what it does and it has been quite a bit more work than anticipated^^

@bluss
Copy link
Member

bluss commented Mar 1, 2024

Well, don't be sorry, I haven't had the mind space to work on this myself. We all want the project to be stable and have a community of contributors, that's ultimately better than any single individuals carrying it by themselves. Is it possible to reduce the scope or release an alpha version early? Or maybe some other way of of breaking it up into multiple and incremental PRs.

As big as the 0.7 PR is now, it's probably better to merge it and develop in the tree instead of a PR, that could be more fun? If 0.6.x patches are needed, a branch could track that.

@indietyp
Copy link
Member Author

indietyp commented Mar 1, 2024

Yea for sure we can do that!

I planned for an alpha release soon, I think I've got the essentials finally figured out (as to how I want things to work, looking at implementations etc)

I am currently working on a comprehensive test suite for graph implementations and once that's done I think I'd be ready to open this up in tree.

This is because we then have

  1. Trait based algorithm implementation done (see ShortestPath) and how they are supposed to work as a template
  2. Essential Graph traits
  3. New default graph implementation (Dino)

All of these can then be used to port existing code to 0.7.0 quite easily, but finding the right balance of how things might look like was a challenge (hence why I took it on mostly alone)

It might make sense here to create a project board with open things for 0.7.0 release and see when we can create an alpha.

The most demotivating thing for me rn was creating documentation as that is quite the laborious task, so for example help on that front would be huge. I thought about delaying the documentation until it's in an alpha state/ready to ship, but I found that in some cases formulating the documentation lead to some more introspection and redefinition.

@indietyp
Copy link
Member Author

indietyp commented Mar 1, 2024

after the graph test suite my next goal was to either move to graph views (for reverse etc) and/or traversal algorithms. I really want to push this forward in the coming weeks as I (personally) really like the new direction of the project that developed over the last months.

@indietyp
Copy link
Member Author

indietyp commented Mar 1, 2024

about incremental, it might be best to focus on the most common algorithms and adapt/implement those and then release an alpha. Most things besides other graph implementations are already working.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-proposal Category: A proposal that proposes new functionality or a new direction in the crate
Projects
None yet
Development

No branches or pull requests

5 participants