ENH: Speed up common/non_neighbors by using _adj dict operations #7244

MridulS · 2024-01-21T12:49:19Z

I'm not sure why we should not use direct set operations on the _adj dictionary to find common neighbors. This will require a deprecation probably as we will change the behavior of the return type (generator to set).

This was spurred by #7243.

dschult · 2024-01-21T13:42:45Z

This looks good. The reasons for the generator approach are old and no longer relevant (sets were slower and _adj was only used in the base classes).

I am not sure that we need a deprecation -- it is true that the type of the return value is changed from generator to set. But that will only break code that is calling next directly on the return value. Anything using len or looping over the result will work quite nicely. Searching github for nx.common_neighbors shows no cases where it won't just work (of the first 3 pages of results -- I can look further). But I tend to lean toward "just putting it in", so I'm biased here. :) Are there cases where a set would not work but in a silent way?

There are quite a few other functions in the function.py module that might be similarly sped up by large factors. That code has not been looked at since the sets have been improved and it's pretty clear that performance wasn't the intent in some choices (like using neighbors(G, u, v) instead of G.neighbors(u, v) in nonneighbors).

dschult · 2024-01-22T14:31:52Z

The non_neighbors looks good too.
Do you want to change all the G.adj[ and G[ to G._adj[ in this module? Or am I trying to put too much into this PR... :}
Thanks!

MridulS · 2024-01-22T14:59:34Z

I think I have covered the all adj calls in functions.py, not sure if I missed something!

dschult

It would be cool to see if these changes make any difference. That could help motivate us to do similar things elsewhere. Perhaps timing the tests is a place to start -- though they are not big graphs there...

Other possibilities:
1292: Maybe node in G._adj is faster than node in G?
654: values[n] -> v because otherwise why loop over items()
902-904: return from inside the if-structure rather than assigning to values

That's what I found -- and I think this is getting into the weeds. :)

eriknw · 2024-01-22T16:36:14Z

1292: Maybe node in G._adj is faster than node in G?

This changes behavior. Graph.__contains__ has a try/except statement, so using node in G._adj may raise where code does not currently. This may not be a bad thing, but it's different.

dschult · 2024-01-22T16:49:23Z

This changes behavior. Graph.contains has a try/except statement,

Nice catch!

I guess that raises the question of whether a single try/except with a loop inside is better than many try/excepts inside a loop. That should be a python dependent question maybe answered elsewhere -- but I didn't find in a quick search.

networkx/classes/function.py

Co-authored-by: Dan Schult <dschult@colgate.edu>

dschult

I approve this PR.

style changes if you want:

in the test module, change nbors -> nbrs
the suggestion below for avoiding ugly(IMO) line breaks

networkx/algorithms/link_prediction.py

Co-authored-by: Dan Schult <dschult@colgate.edu>

rossbar · 2024-01-25T07:34:18Z

I hope this doesn't distract the discussion too much... but since this PR largely deals with performance, I thought it'd be a good opportunity to try adding some benchmarks!

I went ahead and pushed up benchmarks for nx.non_neighbors (bcd2d2e) and nx.common_neighbors (6a36798) just so others can try out benchmarks while reviewing. You can then use these new benchmarks to compare the performance improvements of these changes relative to main. For example:

# --bench accepts a regex for limiting which benchmarks are run
asv continuous --bench CommonNeighbors main speedup_neighbors

Which on my system (m1 mac running asahi linux) gives

asv continuous --bench CommonNeighbors main speedup_neighbors

| Change   | Before [d6569adf] <main>   | After [6a367983] <speedup_neighbors>   |   Ratio | Benchmark (Parameter)                                          |
|----------|----------------------------|----------------------------------------|---------|----------------------------------------------------------------|
| -        | 1.18±0.02μs                | 742±10ns                               |    0.63 | benchmark_neighbors.CommonNeighbors.time_star_rim_rim(100)     |
| -        | 1.18±0.03μs                | 735±10ns                               |    0.62 | benchmark_neighbors.CommonNeighbors.time_star_rim_rim(1000)    |
| -        | 1.19±0.01μs                | 728±7ns                                |    0.61 | benchmark_neighbors.CommonNeighbors.time_star_rim_rim(10)      |
| -        | 3.79±0.1μs                 | 941±10ns                               |    0.25 | benchmark_neighbors.CommonNeighbors.time_complete(10)          |
| -        | 5.07±0.08μs                | 665±5ns                                |    0.13 | benchmark_neighbors.CommonNeighbors.time_star_center_rim(10)   |
| -        | 30.5±0.3μs                 | 3.80±0.05μs                            |    0.12 | benchmark_neighbors.CommonNeighbors.time_complete(100)         |
| -        | 311±8μs                    | 28.6±0.4μs                             |    0.09 | benchmark_neighbors.CommonNeighbors.time_complete(1000)        |
| -        | 44.0±0.6μs                 | 683±10ns                               |    0.02 | benchmark_neighbors.CommonNeighbors.time_star_center_rim(100)  |
| -        | 439±7μs                    | 732±10ns                               |    0    | benchmark_neighbors.CommonNeighbors.time_star_center_rim(1000)

Anyways - IMO it's really nice to be able to get a sense for the magnitude of performance changes in these types of PRs¹, and I wanted to share my experimentation! I'm also happy to pull these two commits back out and put them in a separate PR!

These changes are actually a little trickier to benchmark than you'd expect since the return types are switching from a generator to a set, so the benchmarks have to have something baked-in to consume the generator and ensure the outputs are comparable. ↩

MridulS · 2024-02-02T08:34:16Z

Let's merge this in with the benchmarks :)

…workx#7244) * ENH: Speed up common_neighbors by using _adj dict operations * Speed up non_neighbors * need for speed * Update networkx/classes/function.py Co-authored-by: Dan Schult <dschult@colgate.edu> * Update networkx/algorithms/link_prediction.py Co-authored-by: Dan Schult <dschult@colgate.edu> * Add benchmarks for non_neighbors. * Add benchmarks for common_neighbors. --------- Co-authored-by: Dan Schult <dschult@colgate.edu> Co-authored-by: Ross Barnowski <rossbar@berkeley.edu> Co-authored-by: Ross Barnowski <rossbar@caltech.edu>

ENH: Speed up common_neighbors by using _adj dict operations

c787790

MridulS added the type: API label Jan 21, 2024

Speed up non_neighbors

d856e4f

MridulS changed the title ~~ENH: Speed up common_neighbors by using _adj dict operations~~ ENH: Speed up common/non_neighbors by using _adj dict operations Jan 22, 2024

need for speed

71bb531

dschult reviewed Jan 22, 2024

View reviewed changes

networkx/classes/function.py Outdated Show resolved Hide resolved

Update networkx/classes/function.py

5e6b76d

Co-authored-by: Dan Schult <dschult@colgate.edu>

dschult approved these changes Jan 24, 2024

View reviewed changes

networkx/algorithms/link_prediction.py Outdated Show resolved Hide resolved

rossbar and others added 3 commits January 24, 2024 23:13

Update networkx/algorithms/link_prediction.py

5560b9b

Co-authored-by: Dan Schult <dschult@colgate.edu>

Add benchmarks for non_neighbors.

bcd2d2e

Add benchmarks for common_neighbors.

6a36798

MridulS merged commit 72d1f68 into networkx:main Feb 2, 2024
39 checks passed

jarrodmillman added this to the 3.3 milestone Feb 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Speed up common/non_neighbors by using _adj dict operations #7244

ENH: Speed up common/non_neighbors by using _adj dict operations #7244

MridulS commented Jan 21, 2024

dschult commented Jan 21, 2024 •

edited

dschult commented Jan 22, 2024

MridulS commented Jan 22, 2024

dschult left a comment

eriknw commented Jan 22, 2024

dschult commented Jan 22, 2024

dschult left a comment

rossbar commented Jan 25, 2024

MridulS commented Feb 2, 2024

ENH: Speed up common/non_neighbors by using _adj dict operations #7244

ENH: Speed up common/non_neighbors by using _adj dict operations #7244

Conversation

MridulS commented Jan 21, 2024

dschult commented Jan 21, 2024 • edited

dschult commented Jan 22, 2024

MridulS commented Jan 22, 2024

dschult left a comment

Choose a reason for hiding this comment

eriknw commented Jan 22, 2024

dschult commented Jan 22, 2024

dschult left a comment

Choose a reason for hiding this comment

rossbar commented Jan 25, 2024

Footnotes

MridulS commented Feb 2, 2024

dschult commented Jan 21, 2024 •

edited