Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove process dask array #827

Conversation

CSSFrancis
Copy link
Member


name: Remove process dask array
about: This removes all instances of process_dask_array and replaces them with map instead for more constancy throughout the code.


Checklist

  • Updated CHANGELOG.md
  • (if finished) Requested a review (from pc494 if you are unsure who is most suitable)

@magnunor Do you see any reason against doing this?

@CSSFrancis CSSFrancis changed the base branch from master to hyperspy_release_next_minor April 21, 2022 17:48
Copy link
Collaborator

@magnunor magnunor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the changes! Much more streamlined code.

I don't have the time to check the functionality, but if the unit tests still pass, it should be ok.

@CSSFrancis, maybe do a simple runtime check with a dataset, and see if the computation time is the same for both the current version of hyperspy_release_next_minor, and your pull request.

pyxem/signals/diffraction2d.py Show resolved Hide resolved
@@ -862,7 +862,7 @@ def test_non_uniform_chunks(self):
def test_return_shifts_non_lazy(self):
s = self.s
s_shifts = s.center_direct_beam(method="blur", sigma=1, return_shifts=True)
assert s_shifts._lazy is False
assert s_shifts._lazy is True
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this change? I'm thinking a non-lazy input should return a non-lazy output?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that is probably what should happen. I didn't look that over too well I'll make that change back.

pyxem/signals/diffraction2d.py Show resolved Hide resolved
@CSSFrancis
Copy link
Member Author

CSSFrancis commented Apr 25, 2022

I tested this using 8 cores on a cluster and:

Alignment of a 128x128x256x256 dataset using the map method takes:

40.2 s ± 858 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

And using _process_dask_array it takes:

38.8 s ± 978 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

So the new method is marginally slower which is to be expected considering the extra overhead for the map function. The two are within error of each other.


I also did a test on my local computer

'''python

import numpy as np
from pyxem.signals import Diffraction2D
data = np.zeros((30,30, 20, 16), dtype=np.int16)
x_pos_list = np.random.randint(8 - 2, 8 + 2, 30, dtype=np.int16)
x_pos_list[x_pos_list == 8] = 9
y_pos_list = np.random.randint(10 - 2, 10 + 2, 30,dtype=np.int16)
for ix in range(len(x_pos_list)):
for iy in range(len(y_pos_list)):
data[iy, ix, y_pos_list[iy], x_pos_list[ix]] = 9
s = Diffraction2D(data)
s.axes_manager[0].scale = 0.5
s.axes_manager[1].scale = 0.6
s.axes_manager[2].scale = 3
s.axes_manager[3].scale = 4
s_lazy = s.as_lazy()

s.center_direct_beam(method='blur', sigma=1)```

New: 1.06 s ± 57.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Old : 1.13 s ± 2.35 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

On another note I think that this solves whatever the problem I was having with #782 as I can now use dask-distributed to center and save a 1028x1028x256x256 dataset with 8 cores and only around 10-20 Gb of RAM compared to previously when it would take upwards of 100 Gb.

…t_direct_beam_position`

Removed commented code
@CSSFrancis CSSFrancis mentioned this pull request Apr 26, 2022
@magnunor
Copy link
Collaborator

Looks good, there are some possible optimizations in center_direct_beam I think should be easy to implement. I'll make a separate pull request for that.

@magnunor magnunor merged commit 47d8336 into pyxem:hyperspy_release_next_minor Apr 27, 2022
@magnunor
Copy link
Collaborator

there are some possible optimizations in center_direct_beam I think should be easy to implement. I'll make a separate pull request for that.

I checked this, and the things which I thought would improve the runtime, did not work. So I won't make the aforementioned "optimization" pull request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants