Revisiting nxp algorithms #63

Schefflera-Arboricola · 2024-05-08T13:58:48Z

added chunking and get_chunks to all algorithms(in my gsoc proposal this was an "expanding" goal, not a "revisiting" goal, but it seemed more apt to include it here)
testing all get_chunks together(rn ignores funcs that require args other than G)
enhanced is_reachable and tournament_is_strongly_connected in tournament.py(can be experimented with and enhanced further but probably should leave that for a different PR)
making docstrings more aligned with sphinx guidelines
any updates from nx 3.3(None)
updated pip installs in test.yml in gh wf
renamed the Dispatcher class to BackendInterface (ref. comment) and updated it so that running nxp.ParallelGraph() or nxp.ParallelGraph([(1, 2), (2, 3)]) doesn't give any error and added __str__ so that ParallelGraph with n nodes and m edges is printed instead of <nx_parallel.interface.ParallelGraph object at 0x10495e6d0> when a ParallelGraph is passed in print() (This is in experimental section of the proposal)

Schefflera-Arboricola · 2024-05-21T15:04:14Z

nx_parallel/tests/test_get_chunks.py

+        return nxp.chunks(_nodes, num_in_chunk)
+
+    get_chunks_funcs = get_functions_with_get_chunks()
+    ignore_funcs = [


the functions that require some argument(s) other than G to be passed in the function call are ignored.

Should the get_chunk for these be tested in test_get_chunks.py or in a separate test folder in the respective modules to which these functions belong?

I think it would be better to have all the get_chunk smoke tests in one place. But, I don't know how many functions in networkx would fall in this category(i.e. algos that require args other than just G). LMK what you think.

What do you think about having tests of the functionality of get_chunks as a function? Those would be separate from any tests that the other functions are using get_chunks correctly (which is I think what the smoke tests are doing... right?).

From a theoretical perspective, if the smoke tests are testing that the nxp functions are using get_chunks correctly, that should probably be in a test module for that function.
And how many functions in networkx have parameters other than G is less important than how many functions supported by nx-parallel do. I suspect that most functions do have some input other than G.

Do we need to test all the functions' get_chunk parameter? I guess it is a check of whether each function is using chunks correctly.

Manually listing functions that can (or can't) be run with G as the only parameter doesn't scale nicely to all networkx functions. Another option is to set up test files for each module -- similar to networkx. But that adds a lot of testing files. Yet another option is to find some way to run the networkx test suite with all calls to nx-parallel setting a specified value for the get_chunks parameter. That might not be practical, and I haven't thought much about it. This is tricky... :)

IMO it would be ok to have all the smoke tests for all the functions' get_chunks in one file, right now, bcoz we don't have a lot of functions in ignore_func, and having so many test files with just one smoke test(all of which can be summed up in a loop like this) seems unnecessary and redundant. But this is definitely not scalable, so, we can handle this differently once we have a bunch of functions in nx-parallel that require args other than just G. And we can have separate tests folders if necessary, like we currently have for betweenness_centrality(which not just tests get_chunks but also shows how custom chunking can be useful). LMK what you think of this.

And should we add the smoke tests for the functions in ignore_func in this file only or in separate tests folders in respective modules?

What do you think about having tests of the functionality of get_chunks as a function? Those would be separate from any tests that the other functions are using get_chunks correctly (which is I think what the smoke tests are doing... right?).

I think we can test get_chunks separately for all functions in separate tests folders, once we have other additional tests for nx-parallel functions. But right now get_chunks smoke test is the only additional test for all nx-parallel functions.

Yet another option is to find some way to run the networkx test suite with all calls to nx-parallel setting a specified value for the get_chunks parameter.

But, then this wouldn't check our default chunking.

Schefflera-Arboricola · 2024-05-21T15:43:06Z

timing/timing_individual_function.py

@@ -1,6 +1,4 @@
 import time
-import random
-import types



I initially created a separate pull request for updating the timing script (PR #61), as I believed it to be an independent PR. This decision was also influenced by the mention in my SciPyCon poster (to be presented on 10th July'24) stating "refer the timing folder to know more about speedups." At the time, having it as a separate PR made sense.

However, with the recent updates and enhancements to the tournament algorithms in this PR, it has become apparent that PR #61 is now dependent on these changes.

Could you please advise whether I should continue working on the PR #61 branch or start updating the timing script in the current PR's branch?

Please LMK what would be the most convenient approach for you to review these timely.

I think it is better to keep the timing PR as a separate PR. It is true that there will likely be merge conflicts. But large PRs are hard to review and #63 is already very big. (maybe #63 should be split up, but I don't know -- it depends how easy that is to do). I think it is better to embrace rebasing/merging with conflicts than to try to avoid it. It gives more reviewable PRs. But it is definitely more work for making the PRs.

The changes in this module of this PR don't seem difficult to rebase/merge when the time comes, so I would just keep the PRs separate based on this module at least...

But large PRs are hard to review and #63 is already very big. (maybe #63 should be split up, but I don't know -- it depends how easy that is to do).

All the commits in this PR correspond to the checklist points in the first comment of this PR. So, I think reviewing this PR commit-wise would be easier than reviewing it in one go.

The changes in this module of this PR don't seem difficult to rebase/merge when the time comes, so I would just keep the PRs separate based on this module at least...

I can work on updating the timing script PR separately and the merge conflicts won't be that difficult for me to resolve, so then this PR will need to be merged before the timing script PR. My main concern was just to have the updated heatmaps in the repo before SciPyCon.

dschult

The renaming looks good.
I have some suggestions for the __init__ method of ParallelGraph.
Otherwise it seems fine.

nx_parallel/interface.py

Schefflera-Arboricola added 4 commits May 8, 2024 12:02

added chunking and get_chunks to all algorithms

2302bcf

added smoke tests for get_chunks for all functions

90ffa81

enh funcs in tournament.py

0ee33c0

updated test_get_chunks

f04576b

dschult added the type: Maintenance label May 8, 2024

Merge branch 'networkx:main' into revisit_funcs

49a50c5

Schefflera-Arboricola marked this pull request as ready for review May 15, 2024 17:56

making docs more aligned with sphinx guidelines

06916f9

Schefflera-Arboricola commented May 21, 2024

View reviewed changes

Schefflera-Arboricola mentioned this pull request May 26, 2024

Add parallel version of edge_betweenness_centrality #60

Merged

renamed n updated Dispatcher class, updated test.yml

041457c

dschult reviewed May 28, 2024

View reviewed changes

nx_parallel/interface.py Outdated Show resolved Hide resolved

nx_parallel/interface.py Outdated Show resolved Hide resolved

Schefflera-Arboricola added 2 commits June 5, 2024 18:28

minor maintenance things

aff95b7

rm try-except

bd94f35

dschult approved these changes Jun 6, 2024

View reviewed changes

dschult merged commit 94adb09 into networkx:main Jun 6, 2024
11 checks passed

jarrodmillman added this to the 0.3 milestone Jun 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revisiting nxp algorithms #63

Revisiting nxp algorithms #63

Schefflera-Arboricola commented May 8, 2024 •

edited

Loading

Schefflera-Arboricola May 21, 2024

dschult May 22, 2024

Schefflera-Arboricola May 22, 2024

Schefflera-Arboricola May 21, 2024

dschult May 22, 2024

Schefflera-Arboricola May 22, 2024

dschult left a comment

Revisiting nxp algorithms #63

Revisiting nxp algorithms #63

Conversation

Schefflera-Arboricola commented May 8, 2024 • edited Loading

Schefflera-Arboricola May 21, 2024

Choose a reason for hiding this comment

dschult May 22, 2024

Choose a reason for hiding this comment

Schefflera-Arboricola May 22, 2024

Choose a reason for hiding this comment

Schefflera-Arboricola May 21, 2024

Choose a reason for hiding this comment

dschult May 22, 2024

Choose a reason for hiding this comment

Schefflera-Arboricola May 22, 2024

Choose a reason for hiding this comment

dschult left a comment

Choose a reason for hiding this comment

Schefflera-Arboricola commented May 8, 2024 •

edited

Loading