New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use 'dask_auto' when rechunk=True in change_dtype. #2645
Use 'dask_auto' when rechunk=True in change_dtype. #2645
Conversation
Codecov Report
@@ Coverage Diff @@
## RELEASE_next_patch #2645 +/- ##
===================================================
Coverage 76.36% 76.36%
===================================================
Files 202 202
Lines 29675 29677 +2
Branches 6488 6489 +1
===================================================
+ Hits 22661 22663 +2
Misses 5237 5237
Partials 1777 1777
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested the implementation using:
import dask.array as da
import hyperspy.api as hs
chunks = (32, 32, 32, 32)
dask_array = da.zeros((128, 128, 256, 256), chunks=chunks)
s = hs.signals.Signal2D(dask_array).as_lazy()
s.save("test_data.hspy", chunks=chunks)
Then:
import hyperspy.api as hs
s0 = hs.load("test_data.hspy", lazy=True)
s0.change_dtype('float32', rechunk=False)
s0.compute_navigator() # 3.9 s, chunksize (32, 32, 32, 32)
s0T = s0.T
s0T.compute_navigator() # 1.0 s, chunksize (32, 32, 32, 32)
s0.close_file()
s1 = hs.load("test_data.hspy", lazy=True)
s1.change_dtype('float32', rechunk=True)
s1.compute_navigator() # 181 s, chunksize (16, 16, 256, 256)
s1T = s1.T
s1T.compute_navigator() # 0.8 s, chunksize (256, 256, 16, 16)
s1.close_file()
Gives:
Rechunk | Transpose | compute_navigator() |
plot() |
Chunksize |
---|---|---|---|---|
False | False | 0.9 s | 5 s | (32, 32, 32, 32) |
False | True | 4.1 s | 1.3 s | (32, 32, 32, 32) |
True | False | 4.0 s | 15.7 s | (64, 64, 64, 64) |
True | True | 15.2 s | 4.2 s | (64, 64, 64, 64) |
Which is an improvement. The reason for the increase in runtime with rechunk=True
is due to the bigger chunks (32 vs 64).
So even though it isn't a perfect solution, it is a good improvement, so I suggest merging it. Further optimizations can be done in the future, if there is a need.
I'll merge this after all the tests have passed. |
Closes #2637.
Progress of the PR
CHANGES.rst
(if appropriate),Minimal example of the bug fix or the new feature
compute_navigator()
plot()