Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REF: refactor Graph.to_W to avoid perf bottleneck #691

Merged
merged 2 commits into from
Mar 9, 2024

Conversation

martinfleis
Copy link
Member

Closes #672

For adjacency of 1,404,080 edges, the conversion time is 42.7s on main and 4.5s in this PR.

@jGaboardi not sure how to properly label this...

@jGaboardi
Copy link
Member

I'd say this is an "enhancement" due to the performance improvement.

@martinfleis
Copy link
Member Author

Thinking about this more, I think that we shall use the same logic in weights and neighbors properties that are unbearably slow for large graphs. Hard to catch these things when developing on small graphs...

Copy link

codecov bot commented Mar 8, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 85.0%. Comparing base (018f1e2) to head (bcabdbc).
Report is 4 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##            main    #691     +/-   ##
=======================================
+ Coverage   84.7%   85.0%   +0.3%     
=======================================
  Files        141     141             
  Lines      15200   15203      +3     
=======================================
+ Hits       12880   12924     +44     
+ Misses      2320    2279     -41     
Files Coverage Δ
libpysal/graph/base.py 97.9% <100.0%> (+<0.1%) ⬆️

... and 11 files with indirect coverage changes

@knaaptime
Copy link
Member

i got a question about the performance of the weights builders in the workshop yesterday, and i had to proudly scoff that they're the fastest you can find... 'Martin uses these routinely on datasets in the millions' :D

Copy link
Member

@knaaptime knaaptime left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks nice. The idea is the groupby already gives you all the info you need, so just bypass the apply and build the lists cheaply?
Agreed, lets adopt this same style elsewhere

@jGaboardi
Copy link
Member

Do we know what's causing the failures on macOS & ubuntu-dev?

@knaaptime
Copy link
Member

the mac test looks like its probably a fluke. the ubuntu stuff is all over the place

@martinfleis
Copy link
Member Author

Builders are fast but some compatibility bits seem to be worse :). I'm noticing it only now when I regularly use Graph on large data. There's a reason it is still experimental :)

@ljwolf
Copy link
Member

ljwolf commented Mar 8, 2024

This looks fine to me! I think we may want to consider some benchmarks with asv for these kinds of things... I think that construction, serialization, conversion, lag, and standardization are the big targets?

@jGaboardi
Copy link
Member

Looks like all failures will be fixed by #692. Going ahead with that merge now and see if we can get this green.

@jGaboardi
Copy link
Member

All green (for now --> the macOS failure just before had to do with a connection issue for geodatasets).

Not sure if we wanted to see about implementing @ljwolf's idea for asv benchmarking here, or that's something from the future. Thinking probably it's future, but I'll wait on merging until confirmed.

@martinfleis
Copy link
Member Author

Not going to do asv here. I'm generally a bit skeptic about it given we have it in geopandas and no one is running it or anything.

@martinfleis martinfleis merged commit 8748bdd into pysal:main Mar 9, 2024
11 checks passed
@martinfleis martinfleis deleted the to_w branch March 9, 2024 08:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Graph.to_W is slow
4 participants