Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use 'dask_auto' when rechunk=True in change_dtype. #2645

Merged

Conversation

ericpre
Copy link
Member

@ericpre ericpre commented Feb 11, 2021

Closes #2637.

Progress of the PR

  • Use 'dask_auto' when rechunk=True in change_dtype,
  • add entry to CHANGES.rst (if appropriate),
  • update tests,
  • ready for review.

Minimal example of the bug fix or the new feature

import dask.array as da
import hyperspy.api as hs

chunks = (32, 32, 32, 32)
dask_array = da.random.random_integers(low=100, size=(256, 256, 256, 256), chunks=chunks)
s = hs.signals.Signal2D(dask_array).as_lazy()

s.change_dtype('uint8')
sT = s.T

sT.compute_navigator()
sT.plot()
Rechunk Transpose compute_navigator() plot() Chunksize
False False 0.3 s 0.6 s (32, 32, 32, 32)
False True 0.4 s 0.4 s (32, 32, 32, 32)
True False 1.6 s 1.7 s ((96, 96, 64), (96, 96, 64), (96, 96, 64), (96, 96, 64))
True True 1.5 s 1.5 s ((96, 96, 64), (96, 96, 64), (96, 96, 64), (96, 96, 64))

@codecov
Copy link

codecov bot commented Feb 11, 2021

Codecov Report

Merging #2645 (ca7fac7) into RELEASE_next_patch (c73b3e8) will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@                 Coverage Diff                 @@
##           RELEASE_next_patch    #2645   +/-   ##
===================================================
  Coverage               76.36%   76.36%           
===================================================
  Files                     202      202           
  Lines                   29675    29677    +2     
  Branches                 6488     6489    +1     
===================================================
+ Hits                    22661    22663    +2     
  Misses                   5237     5237           
  Partials                 1777     1777           
Impacted Files Coverage Δ
hyperspy/_signals/lazy.py 89.23% <100.00%> (+0.03%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c73b3e8...a1c85bd. Read the comment docs.

Copy link
Contributor

@magnunor magnunor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested the implementation using:

import dask.array as da
import hyperspy.api as hs
chunks = (32, 32, 32, 32)
dask_array = da.zeros((128, 128, 256, 256), chunks=chunks)
s = hs.signals.Signal2D(dask_array).as_lazy()
s.save("test_data.hspy", chunks=chunks)

Then:

import hyperspy.api as hs
s0 = hs.load("test_data.hspy", lazy=True)
s0.change_dtype('float32', rechunk=False)
s0.compute_navigator() # 3.9 s, chunksize (32, 32, 32, 32)
s0T = s0.T
s0T.compute_navigator() # 1.0 s, chunksize (32, 32, 32, 32)
s0.close_file()
s1 = hs.load("test_data.hspy", lazy=True)
s1.change_dtype('float32', rechunk=True)
s1.compute_navigator() # 181 s, chunksize (16, 16, 256, 256)
s1T = s1.T
s1T.compute_navigator() # 0.8 s, chunksize (256, 256, 16, 16)
s1.close_file()

Gives:

Rechunk Transpose compute_navigator() plot() Chunksize
False False 0.9 s 5 s (32, 32, 32, 32)
False True 4.1 s 1.3 s (32, 32, 32, 32)
True False 4.0 s 15.7 s (64, 64, 64, 64)
True True 15.2 s 4.2 s (64, 64, 64, 64)

Which is an improvement. The reason for the increase in runtime with rechunk=True is due to the bigger chunks (32 vs 64).

So even though it isn't a perfect solution, it is a good improvement, so I suggest merging it. Further optimizations can be done in the future, if there is a need.

CHANGES.rst Outdated Show resolved Hide resolved
@magnunor
Copy link
Contributor

I'll merge this after all the tests have passed.

@magnunor magnunor merged commit cc33313 into hyperspy:RELEASE_next_patch Feb 16, 2021
@ericpre ericpre deleted the fix_rechunk_change_dtype branch February 16, 2021 16:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants