Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New algorithm for ranking spanning arborescences for (#4322) #4796

Closed
wants to merge 25 commits into from

Conversation

edxu96
Copy link

@edxu96 edxu96 commented May 13, 2021

I have done:

  • Use black.
  • Run all the tests for algorithms and docs locally.
  • Unit tests for core classes and functions used by these algorithms.
  • Use decorator for graph type.

As far as I know, there are problems:

  • All the edges are supposed to have the specified numerical attribute.
  • There is only one test case with 5 spanning arborescences.
  • The passed graph is deep copied and frozen inside the generator, in case it is modified later.

Copy link
Member

@dschult dschult left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes might get this through the tests...
Also, the test file should be treated by black...

@dschult
Copy link
Member

dschult commented Jun 13, 2021

This looks like it is going to be very hard to review.
You've got Type Hinting which we don't use in the package itself.
You are subclassing the base classes with multiple subclasses or subclasses when all it seems like you are doing is adding data structures and methods to to find the source and target of edges. My first quick look suggests that all of this can be handled using NetworkX's paradigm of edges as 2-tuples (source, target). No complex class structure is needed. The type hinting suggests that nodes and edges are going to be strings, which makes it hard to incorporate into the NetworkX environment where nodes are any non-None hashable and edges are 2-tuples (or 3-tuples for MultiEdges). It would also be helpful if you didn't put what are essentially doc_strings in the middle of code as if they are comments. Put the docstrings just after the viewable function or class signature so they get processed by our autodoc sphinx tools into the reference manual for the package. docstrings on __init__ are ok... but will never get seen, so you should put most everything into the class doc_string, especially the parameters that the user should provide. It might also help to have a general overview of the whole module in the docstring at the top of the module so someone reading the code (or looking at this module to try to figure out whether it is what they want) can get a handle on the almost 1700 lines of code. There seems to be a lot of unnecessary code in this module. Subclasses that only provide an __init__ method that only calls the __init__ from the base class aren't needed. Also, use Python and NetworkX data structures where possible -- like 2-tuples for edges.

I guess a summary of all these comments is a request that you go through the code again with the goal of refactoring it into a minimalist, sleak set of functions and data structures.

The code clearly is a lot of work and it passes the tests and provides a nice feature. Thank you for this!

@edxu96
Copy link
Author

edxu96 commented Jun 13, 2021

Thanks for the reply.

You've got Type Hinting which we don't use in the package itself.

  • No problem. Will adapt it to networkx's style.

You are subclassing the base classes with multiple subclasses or subclasses when all it seems like you are doing is adding data structures and methods to to find the source and target of edges. My first quick look suggests that all of this can be handled using NetworkX's paradigm of edges as 2-tuples (source, target).

This is the first thing I thought about when implementing this algorithm. The data structure with three dictionaries for sources, targets, weights of edges is used in the paper [camerini1980ranking] all along. There is no other paper or code implementation in the past 41 years, as far as I know. I learnt basic graph algorithms systematically before and (I think) am familiar with networkx. The way I see it, to redesign and code this algorithm in a more efficient way with a better data structure will take a long time.

The type hinting suggests that nodes and edges are going to be strings, which makes it hard to incorporate into the NetworkX environment where nodes are any non-None hashable and edges are 2-tuples (or 3-tuples for MultiEdges).

I would suggest users to stick to string as node name. This is not an algorithm that someone can play around ... and at least I am not able to design any unit tests for more generic cases.

It would also be helpful if you didn't put what are essentially doc_strings in the middle of code as if they are comments. Put the docstrings just after the viewable function or class signature so they get processed by our autodoc sphinx tools into the reference manual for the package. docstrings on init are ok... but will never get seen, so you should put most everything into the class doc_string, especially the parameters that the user should provide. It might also help to have a general overview of the whole module in the docstring at the top of the module so someone reading the code (or looking at this module to try to figure out whether it is what they want) can get a handle on the almost 1700 lines of code.

  • No problem. Will do that.

There seems to be a lot of unnecessary code in this module. Subclasses that only provide an init method that only calls the init from the base class aren't needed.

  • Will do that. (I didn't implement this algorithm alone. It was integrated in a huge program in my thesis, and it did took me a long time to decouple it from the rest.)

refactoring it into a minimalist, sleak set of functions and data structures.

Will see what I can do after solving those easy ones. It might be very hard to break them even smaller. In [camerini1980ranking], three giant algorithms, written in extremely brief mathematical terms, are present directly, along with some weird definitions ...

I did learn a lot by implementing it. It works okay with a larger case in my thesis, so I pass the defense :-)


  • [camerini1980ranking] Camerini, P. M., Fratta, L., & Maffioli, F. (1980). Ranking arborescences in O (Km log n) time. European Journal of Operational Research, 4(4), 235-242.

Copy link
Contributor

@rossbar rossbar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like this one is stuck. I'm +1 for the idea, but -1 for the implementation that introduces many new graph classes which (IMO) makes the code very difficult to understand. Even if the newly-introduced subclasses were made private (which would be necessary IMO), it still wouldn't help the readability very much as the OO design introduces a lot of indirection and new interfaces to learn before you can start to get down to the algorithm itself.

@mjschwenne
Copy link
Contributor

I believe that this functionality has already be implemented with the ArborescenceIterator, documentation here, although the complexities of the two methods are different with this one performing better for graphs with more arborescences (either because they are larger or denser).

To get the $K$ maximum arborescences create a maximum ArborescenceIterator and break the loop after $K$ arborescences have been generated.

@edxu96
Copy link
Author

edxu96 commented Jun 28, 2022

I believe that this functionality has already be implemented with the ArborescenceIterator, documentation here, although the complexities of the two methods are different with this one performing better for graphs with more arborescences (either because they are larger or denser).

To get the K maximum arborescences create a maximum ArborescenceIterator and break the loop after K arborescences have been generated.

I agree this algorithm does the same thing as ArborescenceIterator. They also both create an iterator. Thank both of you for pointing it out.

I agree the implementation is not good, because I exactly followed [camerini1980ranking]. The paper also mentioned that it was not efficient, because the algorithm was found for the first time. The type annotations I added are not accurate, so some work is needed for mypy tests.

@edxu96 edxu96 closed this Jun 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants