Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Speed up common/non_neighbors by using _adj dict operations #7244

Merged
merged 7 commits into from Feb 2, 2024

Conversation

MridulS
Copy link
Member

@MridulS MridulS commented Jan 21, 2024

I'm not sure why we should not use direct set operations on the _adj dictionary to find common neighbors. This will require a deprecation probably as we will change the behavior of the return type (generator to set).

This was spurred by #7243.

@dschult
Copy link
Member

dschult commented Jan 21, 2024

This looks good. The reasons for the generator approach are old and no longer relevant (sets were slower and _adj was only used in the base classes).

I am not sure that we need a deprecation -- it is true that the type of the return value is changed from generator to set. But that will only break code that is calling next directly on the return value. Anything using len or looping over the result will work quite nicely. Searching github for nx.common_neighbors shows no cases where it won't just work (of the first 3 pages of results -- I can look further). But I tend to lean toward "just putting it in", so I'm biased here. :) Are there cases where a set would not work but in a silent way?

There are quite a few other functions in the function.py module that might be similarly sped up by large factors. That code has not been looked at since the sets have been improved and it's pretty clear that performance wasn't the intent in some choices (like using neighbors(G, u, v) instead of G.neighbors(u, v) in nonneighbors).

@MridulS MridulS changed the title ENH: Speed up common_neighbors by using _adj dict operations ENH: Speed up common/non_neighbors by using _adj dict operations Jan 22, 2024
@dschult
Copy link
Member

dschult commented Jan 22, 2024

The non_neighbors looks good too.
Do you want to change all the G.adj[ and G[ to G._adj[ in this module? Or am I trying to put too much into this PR... :}
Thanks!

@MridulS
Copy link
Member Author

MridulS commented Jan 22, 2024

I think I have covered the all adj calls in functions.py, not sure if I missed something!

Copy link
Member

@dschult dschult left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be cool to see if these changes make any difference. That could help motivate us to do similar things elsewhere. Perhaps timing the tests is a place to start -- though they are not big graphs there...

Other possibilities:
1292: Maybe node in G._adj is faster than node in G?
654: values[n] -> v because otherwise why loop over items()
902-904: return from inside the if-structure rather than assigning to values

That's what I found -- and I think this is getting into the weeds. :)

@eriknw
Copy link
Contributor

eriknw commented Jan 22, 2024

1292: Maybe node in G._adj is faster than node in G?

This changes behavior. Graph.__contains__ has a try/except statement, so using node in G._adj may raise where code does not currently. This may not be a bad thing, but it's different.

@dschult
Copy link
Member

dschult commented Jan 22, 2024

This changes behavior. Graph.contains has a try/except statement,

Nice catch!

I guess that raises the question of whether a single try/except with a loop inside is better than many try/excepts inside a loop. That should be a python dependent question maybe answered elsewhere -- but I didn't find in a quick search.

Co-authored-by: Dan Schult <dschult@colgate.edu>
Copy link
Member

@dschult dschult left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I approve this PR.

style changes if you want:

  • in the test module, change nbors -> nbrs
  • the suggestion below for avoiding ugly(IMO) line breaks

networkx/algorithms/link_prediction.py Outdated Show resolved Hide resolved
@rossbar
Copy link
Contributor

rossbar commented Jan 25, 2024

I hope this doesn't distract the discussion too much... but since this PR largely deals with performance, I thought it'd be a good opportunity to try adding some benchmarks!

I went ahead and pushed up benchmarks for nx.non_neighbors (bcd2d2e) and nx.common_neighbors (6a36798) just so others can try out benchmarks while reviewing. You can then use these new benchmarks to compare the performance improvements of these changes relative to main. For example:

# --bench accepts a regex for limiting which benchmarks are run
asv continuous --bench CommonNeighbors main speedup_neighbors

Which on my system (m1 mac running asahi linux) gives

asv continuous --bench CommonNeighbors main speedup_neighbors
| Change   | Before [d6569adf] <main>   | After [6a367983] <speedup_neighbors>   |   Ratio | Benchmark (Parameter)                                          |
|----------|----------------------------|----------------------------------------|---------|----------------------------------------------------------------|
| -        | 1.18±0.02μs                | 742±10ns                               |    0.63 | benchmark_neighbors.CommonNeighbors.time_star_rim_rim(100)     |
| -        | 1.18±0.03μs                | 735±10ns                               |    0.62 | benchmark_neighbors.CommonNeighbors.time_star_rim_rim(1000)    |
| -        | 1.19±0.01μs                | 728±7ns                                |    0.61 | benchmark_neighbors.CommonNeighbors.time_star_rim_rim(10)      |
| -        | 3.79±0.1μs                 | 941±10ns                               |    0.25 | benchmark_neighbors.CommonNeighbors.time_complete(10)          |
| -        | 5.07±0.08μs                | 665±5ns                                |    0.13 | benchmark_neighbors.CommonNeighbors.time_star_center_rim(10)   |
| -        | 30.5±0.3μs                 | 3.80±0.05μs                            |    0.12 | benchmark_neighbors.CommonNeighbors.time_complete(100)         |
| -        | 311±8μs                    | 28.6±0.4μs                             |    0.09 | benchmark_neighbors.CommonNeighbors.time_complete(1000)        |
| -        | 44.0±0.6μs                 | 683±10ns                               |    0.02 | benchmark_neighbors.CommonNeighbors.time_star_center_rim(100)  |
| -        | 439±7μs                    | 732±10ns                               |    0    | benchmark_neighbors.CommonNeighbors.time_star_center_rim(1000) 

Anyways - IMO it's really nice to be able to get a sense for the magnitude of performance changes in these types of PRs1, and I wanted to share my experimentation! I'm also happy to pull these two commits back out and put them in a separate PR!

Footnotes

  1. These changes are actually a little trickier to benchmark than you'd expect since the return types are switching from a generator to a set, so the benchmarks have to have something baked-in to consume the generator and ensure the outputs are comparable.

@MridulS
Copy link
Member Author

MridulS commented Feb 2, 2024

Let's merge this in with the benchmarks :)

@MridulS MridulS merged commit 72d1f68 into networkx:main Feb 2, 2024
39 checks passed
@jarrodmillman jarrodmillman added this to the 3.3 milestone Feb 2, 2024
cvanelteren pushed a commit to cvanelteren/networkx that referenced this pull request Apr 22, 2024
…workx#7244)

* ENH: Speed up common_neighbors by using _adj dict operations

* Speed up non_neighbors

* need for speed

* Update networkx/classes/function.py

Co-authored-by: Dan Schult <dschult@colgate.edu>

* Update networkx/algorithms/link_prediction.py

Co-authored-by: Dan Schult <dschult@colgate.edu>

* Add benchmarks for non_neighbors.

* Add benchmarks for common_neighbors.

---------

Co-authored-by: Dan Schult <dschult@colgate.edu>
Co-authored-by: Ross Barnowski <rossbar@berkeley.edu>
Co-authored-by: Ross Barnowski <rossbar@caltech.edu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

None yet

5 participants