Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revisiting nxp algorithms #63

Merged
merged 9 commits into from
Jun 6, 2024

Conversation

Schefflera-Arboricola
Copy link
Member

@Schefflera-Arboricola Schefflera-Arboricola commented May 8, 2024

  • added chunking and get_chunks to all algorithms(in my gsoc proposal this was an "expanding" goal, not a "revisiting" goal, but it seemed more apt to include it here)
  • testing all get_chunks together(rn ignores funcs that require args other than G)
  • enhanced is_reachable and tournament_is_strongly_connected in tournament.py(can be experimented with and enhanced further but probably should leave that for a different PR)
  • making docstrings more aligned with sphinx guidelines
  • any updates from nx 3.3(None)
  • updated pip installs in test.yml in gh wf
  • renamed the Dispatcher class to BackendInterface (ref. comment) and updated it so that running nxp.ParallelGraph() or nxp.ParallelGraph([(1, 2), (2, 3)]) doesn't give any error and added __str__ so that ParallelGraph with n nodes and m edges is printed instead of <nx_parallel.interface.ParallelGraph object at 0x10495e6d0> when a ParallelGraph is passed in print() (This is in experimental section of the proposal)

@Schefflera-Arboricola Schefflera-Arboricola marked this pull request as ready for review May 15, 2024 17:56
return nxp.chunks(_nodes, num_in_chunk)

get_chunks_funcs = get_functions_with_get_chunks()
ignore_funcs = [
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the functions that require some argument(s) other than G to be passed in the function call are ignored.

Should the get_chunk for these be tested in test_get_chunks.py or in a separate test folder in the respective modules to which these functions belong?

I think it would be better to have all the get_chunk smoke tests in one place. But, I don't know how many functions in networkx would fall in this category(i.e. algos that require args other than just G). LMK what you think.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about having tests of the functionality of get_chunks as a function? Those would be separate from any tests that the other functions are using get_chunks correctly (which is I think what the smoke tests are doing... right?).

From a theoretical perspective, if the smoke tests are testing that the nxp functions are using get_chunks correctly, that should probably be in a test module for that function.
And how many functions in networkx have parameters other than G is less important than how many functions supported by nx-parallel do. I suspect that most functions do have some input other than G.

Do we need to test all the functions' get_chunk parameter? I guess it is a check of whether each function is using chunks correctly.

Manually listing functions that can (or can't) be run with G as the only parameter doesn't scale nicely to all networkx functions. Another option is to set up test files for each module -- similar to networkx. But that adds a lot of testing files. Yet another option is to find some way to run the networkx test suite with all calls to nx-parallel setting a specified value for the get_chunks parameter. That might not be practical, and I haven't thought much about it. This is tricky... :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO it would be ok to have all the smoke tests for all the functions' get_chunks in one file, right now, bcoz we don't have a lot of functions in ignore_func, and having so many test files with just one smoke test(all of which can be summed up in a loop like this) seems unnecessary and redundant. But this is definitely not scalable, so, we can handle this differently once we have a bunch of functions in nx-parallel that require args other than just G. And we can have separate tests folders if necessary, like we currently have for betweenness_centrality(which not just tests get_chunks but also shows how custom chunking can be useful). LMK what you think of this.

And should we add the smoke tests for the functions in ignore_func in this file only or in separate tests folders in respective modules?

What do you think about having tests of the functionality of get_chunks as a function? Those would be separate from any tests that the other functions are using get_chunks correctly (which is I think what the smoke tests are doing... right?).

I think we can test get_chunks separately for all functions in separate tests folders, once we have other additional tests for nx-parallel functions. But right now get_chunks smoke test is the only additional test for all nx-parallel functions.

Yet another option is to find some way to run the networkx test suite with all calls to nx-parallel setting a specified value for the get_chunks parameter.

But, then this wouldn't check our default chunking.

@@ -1,6 +1,4 @@
import time
import random
import types

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I initially created a separate pull request for updating the timing script (PR #61), as I believed it to be an independent PR. This decision was also influenced by the mention in my SciPyCon poster (to be presented on 10th July'24) stating "refer the timing folder to know more about speedups." At the time, having it as a separate PR made sense.

However, with the recent updates and enhancements to the tournament algorithms in this PR, it has become apparent that PR #61 is now dependent on these changes.

Could you please advise whether I should continue working on the PR #61 branch or start updating the timing script in the current PR's branch?

Please LMK what would be the most convenient approach for you to review these timely.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is better to keep the timing PR as a separate PR. It is true that there will likely be merge conflicts. But large PRs are hard to review and #63 is already very big. (maybe #63 should be split up, but I don't know -- it depends how easy that is to do). I think it is better to embrace rebasing/merging with conflicts than to try to avoid it. It gives more reviewable PRs. But it is definitely more work for making the PRs.

The changes in this module of this PR don't seem difficult to rebase/merge when the time comes, so I would just keep the PRs separate based on this module at least...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But large PRs are hard to review and #63 is already very big. (maybe #63 should be split up, but I don't know -- it depends how easy that is to do).

All the commits in this PR correspond to the checklist points in the first comment of this PR. So, I think reviewing this PR commit-wise would be easier than reviewing it in one go.

The changes in this module of this PR don't seem difficult to rebase/merge when the time comes, so I would just keep the PRs separate based on this module at least...

I can work on updating the timing script PR separately and the merge conflicts won't be that difficult for me to resolve, so then this PR will need to be merged before the timing script PR. My main concern was just to have the updated heatmaps in the repo before SciPyCon.

Copy link
Member

@dschult dschult left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The renaming looks good.
I have some suggestions for the __init__ method of ParallelGraph.
Otherwise it seems fine.

nx_parallel/interface.py Outdated Show resolved Hide resolved
nx_parallel/interface.py Outdated Show resolved Hide resolved
@dschult dschult merged commit 94adb09 into networkx:main Jun 6, 2024
11 checks passed
@jarrodmillman jarrodmillman added this to the 0.3 milestone Jun 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

3 participants