Remove process dask array #827

CSSFrancis · 2022-04-21T17:48:08Z

name: Remove process dask array
about: This removes all instances of process_dask_array and replaces them with map instead for more constancy throughout the code.

Checklist

Updated CHANGELOG.md
(if finished) Requested a review (from pc494 if you are unsure who is most suitable)

@magnunor Do you see any reason against doing this?

…rect beam and replaced it with `map`

magnunor

I agree with the changes! Much more streamlined code.

I don't have the time to check the functionality, but if the unit tests still pass, it should be ok.

@CSSFrancis, maybe do a simple runtime check with a dataset, and see if the computation time is the same for both the current version of hyperspy_release_next_minor, and your pull request.

pyxem/signals/diffraction2d.py

magnunor · 2022-04-24T17:50:27Z

pyxem/tests/signals/test_diffraction2d.py

@@ -862,7 +862,7 @@ def test_non_uniform_chunks(self):
    def test_return_shifts_non_lazy(self):
        s = self.s
        s_shifts = s.center_direct_beam(method="blur", sigma=1, return_shifts=True)
-        assert s_shifts._lazy is False
+        assert s_shifts._lazy is True


Why this change? I'm thinking a non-lazy input should return a non-lazy output?

Yes that is probably what should happen. I didn't look that over too well I'll make that change back.

pyxem/signals/diffraction2d.py

CSSFrancis · 2022-04-25T21:09:02Z

I tested this using 8 cores on a cluster and:

Alignment of a 128x128x256x256 dataset using the map method takes:

40.2 s ± 858 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

And using _process_dask_array it takes:

38.8 s ± 978 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

So the new method is marginally slower which is to be expected considering the extra overhead for the map function. The two are within error of each other.

I also did a test on my local computer

'''python

import numpy as np
from pyxem.signals import Diffraction2D
data = np.zeros((30,30, 20, 16), dtype=np.int16)
x_pos_list = np.random.randint(8 - 2, 8 + 2, 30, dtype=np.int16)
x_pos_list[x_pos_list == 8] = 9
y_pos_list = np.random.randint(10 - 2, 10 + 2, 30,dtype=np.int16)
for ix in range(len(x_pos_list)):
for iy in range(len(y_pos_list)):
data[iy, ix, y_pos_list[iy], x_pos_list[ix]] = 9
s = Diffraction2D(data)
s.axes_manager[0].scale = 0.5
s.axes_manager[1].scale = 0.6
s.axes_manager[2].scale = 3
s.axes_manager[3].scale = 4
s_lazy = s.as_lazy()

s.center_direct_beam(method='blur', sigma=1)```

New: 1.06 s ± 57.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Old : 1.13 s ± 2.35 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

On another note I think that this solves whatever the problem I was having with #782 as I can now use dask-distributed to center and save a 1028x1028x256x256 dataset with 8 cores and only around 10-20 Gb of RAM compared to previously when it would take upwards of 100 Gb.

…t_direct_beam_position` Removed commented code

…irect_beam`

magnunor · 2022-04-27T09:41:07Z

Looks good, there are some possible optimizations in center_direct_beam I think should be easy to implement. I'll make a separate pull request for that.

magnunor · 2022-04-27T15:06:03Z

there are some possible optimizations in center_direct_beam I think should be easy to implement. I'll make a separate pull request for that.

I checked this, and the things which I thought would improve the runtime, did not work. So I won't make the aforementioned "optimization" pull request.

Refactor: Removed _process_function_blockwise from centering the di…

6a7ce11

…rect beam and replaced it with `map`

CSSFrancis changed the base branch from master to hyperspy_release_next_minor April 21, 2022 17:48

CSSFrancis requested a review from magnunor April 21, 2022 17:48

magnunor reviewed Apr 24, 2022

View reviewed changes

BugFix: Fixed lazy casting to non-lazy when centers --> shifts in `ge…

ad6a78d

…t_direct_beam_position` Removed commented code

CSSFrancis force-pushed the Remove_process_dask_array branch from a3177ec to ad6a78d Compare April 25, 2022 21:18

BugFix: Roll back to non-lazy signal--> non lazy shifts for `center_d…

f9f551f

…irect_beam`

CSSFrancis mentioned this pull request Apr 26, 2022

Version 0.14.0 #788

Closed

magnunor merged commit 47d8336 into pyxem:hyperspy_release_next_minor Apr 27, 2022

CSSFrancis mentioned this pull request Aug 10, 2022

Memory Leak with get_direct_beam_position #782

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove process dask array #827

Remove process dask array #827

CSSFrancis commented Apr 21, 2022

magnunor left a comment

magnunor Apr 24, 2022

CSSFrancis Apr 25, 2022

CSSFrancis commented Apr 25, 2022 •

edited

magnunor commented Apr 27, 2022

magnunor commented Apr 27, 2022

Remove process dask array #827

Remove process dask array #827

Conversation

CSSFrancis commented Apr 21, 2022

magnunor left a comment

Choose a reason for hiding this comment

magnunor Apr 24, 2022

Choose a reason for hiding this comment

CSSFrancis Apr 25, 2022

Choose a reason for hiding this comment

CSSFrancis commented Apr 25, 2022 • edited

magnunor commented Apr 27, 2022

magnunor commented Apr 27, 2022

CSSFrancis commented Apr 25, 2022 •

edited