perf: memory usage for dask pca#4126
Open
ilan-gold wants to merge 3 commits into
Open
Conversation
9019dd1 to
d66c54d
Compare
51e05a5 to
2a3f86e
Compare
c8a2e06 to
04309a8
Compare
for more information, see https://pre-commit.ci
❌ 1 Tests Failed:
View the top 1 failed test(s) by shortest run time
To view more test analytics, go to the Test Analytics Dashboard |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This should give a very nice memory performance by not having the intermediate product of
x.T @ xin memory. I am not sure whether the upper-triangular performance benefit is real or not - It seems to appear real on the tahoe dataset but I'm not 100% sure given how large the dataset is and how reliant this pipeline is on i/o.The failing test here is a warning because the kernel is not parallel so I see two options there:
fau@njit(nogil=True, nopython=True)manually (since this is only ever run from dask anyway)