Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve plotting of lazy signals, by keeping current chunk #2568

Merged
merged 28 commits into from Dec 8, 2021

Conversation

magnunor
Copy link
Contributor

@magnunor magnunor commented Oct 30, 2020

Summary of the pull request

Old way

Currently, plotting lazy signals in HyperSpy can be quite slow. One of these reasons is how data is retrieved from the dask array. For a 4-dimensional dataset, with chunking (32, 32, 32, 32), the plotting works something like this when the navigator is moved to a new position

  • Load the full chunk (32, 32, 256, 256) into memory
    • This is not explicitly done in the HyperSpy code, but is how dask + h5py handles retrieving data "under the hood": the whole chunk is loaded
  • Extract the data from the one position the navigator is on
  • Discard the rest of the chunk
  • If the navigator is moved to a new position, this whole process is repeated
    • Even if the new position is in the same chunk

This leads to a whole lot of inefficiency, and very slow navigation.

New way

This pull request addresses this, by temporarily keeping the chunk in memory. Leading to much better responsiveness when navigating. In comparison to the current way of plotting lazy signals:

  • Load the full chunk (32, 32, 256, 256) into memory
  • Extract the one position the navigator is on
  • Keep the chunk in s._temp_plot_data, and its "slice" in s._temp_plot_data_slice
  • If the navigator is moved to a new position, it is checked if this new position is inside the already loaded chunk
    • If it is in the chunk: extract the data from the one position the navigator is on. Keep the chunk in memory
    • If it is NOT in the chunk: delete the old chunk, and load the new one into memory. Extract data from the navigator position.

This way, navigation is just as fast as a non-lazy signal when navigating inside one chunk. This leads to a much smoother user experience. For a comparison on an artificial dataset, chunked (32, 32, 32, 32), where I held down the right arrow key to make the navigator move:

Old

lazy_plotting_old

New

lazy_plotting_new

The one downside is this slightly increases the memory use when plotting, but just as much as one chunk. I think that in most applications this is not an issue.

One question is if the temporary plotting chunk should be kept in memory after closing the plot. On one hand, keeping it in memory will make the any subsequent plotting quicker, but it will take a little bit of memory.

Changes

  • Add a function for finding out which navigation chunk a navigation position "belongs" to
  • Add a method to handle the logical outlined above: when to keep and when to discard a chunk
  • Small change to BaseSignal.__call__ for handling lazy signals

Todo

To test this

>>> import dask.array as da
>>> from hyperspy._lazy_signals import LazySignal2D
>>> s = LazySignal2D(da.random.randint(0, 256, size=(128, 128, 256, 256), chunks=(32, 32, 32, 32)))
>>> s_nav = s.isig[0, 0].T
>>> s.plot(navigator=s_nav)

@hakonanes
Copy link
Contributor

This looks very nice.

One question is if the temporary plotting chunk should be kept in memory after closing the plot.

I think yes, because I often find my self inspecting 4D signals (EBSD) with different navigators (quality/similarity/orientation maps), and thus have to plot, close, plot again and so on. I think my replotting would benefit from keeping temporary chunks after closing a plot.

@magnunor
Copy link
Contributor Author

magnunor commented Nov 5, 2020

All done! The MacOS Python 3.6 test failed due to a timeout, which I don't think is related to the code changes in this PR. Travis is failing due to several errors in IO, which does not seem to be related to this PR.

@thomasaarholt
Copy link
Contributor

This is disturbingly much faster. Well done. Happy to merge this unless there is anything else you'd like to change?

@ericpre
Copy link
Member

ericpre commented Nov 16, 2020

There were a few things I wanted to comments on this PR. I should be able to review it in the next few days.

@codecov
Copy link

codecov bot commented Dec 2, 2020

Codecov Report

Merging #2568 (ed46373) into RELEASE_next_minor (a2e7871) will decrease coverage by 0.03%.
The diff coverage is 100.00%.

Impacted file tree graph

@@                  Coverage Diff                   @@
##           RELEASE_next_minor    #2568      +/-   ##
======================================================
- Coverage               77.11%   77.07%   -0.04%     
======================================================
  Files                     206      206              
  Lines                   31656    31682      +26     
  Branches                 6934     7145     +211     
======================================================
+ Hits                    24412    24420       +8     
- Misses                   5489     5502      +13     
- Partials                 1755     1760       +5     
Impacted Files Coverage Δ
hyperspy/_signals/signal2d.py 80.40% <ø> (-0.14%) ⬇️
hyperspy/_signals/lazy.py 89.46% <100.00%> (+0.27%) ⬆️
hyperspy/signal.py 73.95% <100.00%> (+0.01%) ⬆️
hyperspy/misc/array_tools.py 81.25% <0.00%> (-3.13%) ⬇️
hyperspy/utils/peakfinders2D.py 55.86% <0.00%> (-1.36%) ⬇️
hyperspy/learn/ornmf.py 87.23% <0.00%> (-1.13%) ⬇️
hyperspy/learn/rpca.py 87.44% <0.00%> (-0.86%) ⬇️
hyperspy/drawing/utils.py 74.32% <0.00%> (-0.49%) ⬇️
hyperspy/drawing/image.py 73.83% <0.00%> (-0.24%) ⬇️
... and 6 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a2e7871...ed46373. Read the comment docs.

@ericpre
Copy link
Member

ericpre commented Dec 2, 2020

Is it not possible to the numpy array dask is actually using? Or any of the dask caching? See for example: napari/napari#1173

@magnunor
Copy link
Contributor Author

magnunor commented Dec 8, 2020

The best option there is probably the dask caching. I'll give it a try, and see how it performs.

@magnunor
Copy link
Contributor Author

Working on this now, testing some implementations using persist and caching.

@magnunor
Copy link
Contributor Author

Very short summary: the current implementation in this pull request seems to give the best performance and "smooth" user experience when plotting.

I suggest using this methodology (keeping temporary NumPy array), then add Opportunistic Caching for lazy signals in a separate pull request.


I did a few tests, trying to figure out which implementation we should use.

There are a couple of functionalities in dask for doing this:

  • Related to this, is optimization.fuse. Which I'm not 100% familiar with, but by disabling it, we effectively get the whole chunk cached into memory when asking for a single position within the chunk.
  • dask_array.persist() which tells dask to keep essentially compute the results of a dask_array, but rather than returning a numpy array it keeps the results as a dask array. Accessing this dask array is then really fast

dask.base.config.set({"optimization.fuse.active": False})

The main aim of this pull request, is keep one or several chunks in RAM during the plotting.
There are three main ways of doing this:

  1. The current implementation in this pull request: temporarily store the chunk as a NumPy array inside the signal
  2. Use dask_array.persist() and store this "computed" dask_array inside the signal
  3. Use caching + optimization.fuse

Note that 1 and 2 also benefit from the caching.


After testing this quite a bit, this is my summary:

Numpy Array Persist Cache + fuse=False
Performance 6e-6 s per frame 0.02 s per frame 0.02 s per frame
Code complexity High Medium Low

I think the "temporary" NumPy array implementation is the way to go, as it leaves to a more "smooth" user experience when doing the plotting. Even if it adds some more code complexity.


Benchmarking

Making the dataset

import numpy as np
from tqdm import tqdm
import h5py
import dask.array as da
from hyperspy._lazy_signals import LazySignal2D

#################################################
f = h5py.File("001_test_data_256x256.hdf5", mode='w')
data = f.create_dataset(
        'data', shape=(256, 256, 256, 256), dtype=np.uint16, chunks=(32, 32, 32, 32),
        compression='gzip')

x_pos_list = np.linspace(128-20, 128+20, 256, dtype=np.uint16)
y_pos_list = np.linspace(128-40, 128+40, 256, dtype=np.uint16)

image = np.zeros((256, 256), dtype=np.uint16)
pbar = tqdm(total=256*256)

for ix_cube in range(8):
    for iy_cube in range(8):
        data_cube = np.zeros((32, 32, 256, 256), dtype=np.uint16)
        for ix in range(32):
            for iy in range(32):
                ix_pos = ix_cube * 32 + ix
                iy_pos = iy_cube * 32 + iy
                x_pos = x_pos_list[ix_pos]
                y_pos = y_pos_list[iy_pos]
                image[x_pos, y_pos] = 10
                data_cube[ix, iy, :, :] = image
                image[x_pos, y_pos] = 0
                pbar.update(1)
        data[ix_cube*32:(ix_cube+1)*32, iy_cube*32:(iy_cube+1)*32, :, :] = data_cube
        del data_cube

f.close()

#################################################
dask_array = da.from_array(h5py.File("001_test_data_256x256.hdf5")['data'], chunks=(32, 32, 32, 32))
s = LazySignal2D(dask_array)
s.save("001_test_data_256x256.hspy", chunks=(32, 32, 32, 32), overwrite=True)

#################################################
s_nav = s.sum(axis=(-1, -2)).T
s_nav.compute()
s_nav.save("001_test_data_256x256_nav.hspy", overwrite=True)

Running the caching and presist:

from time import time
import numpy as np
import dask
import dask.array as da
import hyperspy.api as hs
from dask.cache import Cache
cache = Cache(2e9)
cache.register()

s = hs.load("001_test_data_256x256.hspy", lazy=True)

# Loading a chunk
t0 = time()
data_chunk = s.data[0:32, 0:32].persist()
print("Loading full chunk: {0}".format(time() - t0))

t0 = time()
value = np.array(data_chunk[15, 10])
print("Loading single frame: {0}".format(time() - t0))

t0 = time()
value = np.array(data_chunk[14, 10])
print("Loading another single frame: {0}".format(time() - t0))

t0 = time()
value = np.array(data_chunk[15, 10])
print("Loading same frame as earlier: {0}".format(time() - t0))

######## New chunk
t0 = time()
data_chunk = s.data[32:64, 0:32].persist()
print("Loading new full chunk: {0}".format(time() - t0))

t0 = time()
value = np.array(data_chunk[20, 30])
print("Loading new frame: {0}".format(time() - t0))

# Loading the same chunk as earlier. Which will be pretty quick, due to caching
t0 = time()
data_chunk = s.data[0:32, 0:32].persist()
print("Loading same chunk as earlier: {0}".format(time() - t0))

Output:

Loading full chunk: 1.0767991542816162
Loading single frame: 0.019429445266723633
Loading another single frame: 0.018543720245361328
Loading same frame as earlier: 0.00683283805847168
Loading new full chunk: 1.0792076587677002
Loading new frame: 0.018936872482299805
Loading same chunk as earlier: 0.014795541763305664

Running the caching + fuse=False

from time import time
import numpy as np
import dask.base
import dask.array as da
import hyperspy.api as hs
from dask.cache import Cache
cache = Cache(2e9)
cache.register()

s = hs.load("001_test_data_256x256.hspy", lazy=True)

dask.base.config.set({"optimization.fuse.active": False})

t0 = time()
value = np.array(s.data[15, 10])
print("Loading single frame: {0}".format(time() - t0))

t0 = time()
value = np.array(s.data[14, 10])
print("Loading another single frame: {0}".format(time() - t0))

t0 = time()
value = np.array(s.data[15, 10])
print("Loading same frame as earlier: {0}".format(time() - t0))

t0 = time()
value = np.array(s.data[104, 211])
print("Loading frame from new chunk: {0}".format(time() - t0))

t0 = time()
value = np.array(s.data[109, 211])
print("Loading frame from same chunk: {0}".format(time() - t0))

Output:

Loading single frame: 0.9813816547393799
Loading another single frame: 0.019327402114868164
Loading same frame as earlier: 0.007616281509399414
Loading frame from new chunk: 1.0024607181549072
Loading frame from same chunk: 0.01819133758544922

Running the NumPy + cache

from time import time
import numpy as np
import dask.base
import dask.array as da
import hyperspy.api as hs
from dask.cache import Cache
cache = Cache(2e9)
cache.register()

s = hs.load("001_test_data_256x256.hspy", lazy=True)

t0 = time()
data = np.array(s.data[0:32, 0:32])
print("Loading chunk: {0}".format(time() - t0))

t0 = time()
value = data[15, 10]
print("Loading single frame: {0}".format(time() - t0))

t0 = time()
value = data[14, 10]
print("Loading another single frame: {0}".format(time() - t0))

t0 = time()
value = data[15, 10]
print("Loading same frame as earlier: {0}".format(time() - t0))

t0 = time()
data = np.array(s.data[32:64, 0:32])
print("Loading new chunk: {0}".format(time() - t0))

t0 = time()
value = data[24, 15]
print("Loading frame from new chunk: {0}".format(time() - t0))

# Loading previous chunk, which will be quick due to caching
t0 = time()
data = np.array(s.data[0:32, 0:32])
print("Loading previous chunk: {0}".format(time() - t0))

Output:

Loading chunk: 1.1524648666381836
Loading single frame: 4.76837158203125e-06
Loading another single frame: 1.9073486328125e-06
Loading same frame as earlier: 7.152557373046875e-07
Loading new chunk: 1.1481475830078125
Loading frame from new chunk: 0.0003986358642578125
Loading previous chunk: 0.1158292293548584

Summarizing the results

Rounding the numbers

Numpy Array NumPy array + cache Persist + cache Cache + fuse
Load chunk1 1.15 1.15 1.05 N/A
Load image from chunk1 0.000006 0.000006 0.016 1.0
Load second image from chunk1 0.000001 0.000001 0.016 0.016
Load same image from chunk1 0.0000007 0.0000009 0.006 0.006
Load chunk2 1.15 1.15 1.0 N/A
Load image from chunk2 0.0003 0.0005 0.015 0.016
Load chunk1 again 1.15 0.11 0.011 0.007

Thus, there seems to be some added overhead with accessing a dask array, which we can avoid by temporarily storing the current plotting "chunk" as a NumPy array.While the 0.016 seconds to access an image (in for example Cache + fuse "Load image from chunk1") doesn't seem as much, for me it definitely feels quite a bit slower compared to the NumPy array one. Using the opportunistic caching together with the "NumPy" method seems to work really well, as it will keep a certain number of the chunks in memory, which makes them fairly quick to access (0.01).

Copy link
Member

@ericpre ericpre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was initially not convinced about the benefit about this approach because it doesn't work with ROI and I don't think that it is hyperspy task to do the caching of dask array. However, since the overhead for slicing in dask is noticeable here and until the latter is addressed, it is useful to have this workaround. Also, it should be useful for multifit of lazy signal.
A better approach would be to reduce the slicing overhead to an extend that it is not significant anymore for our use case - which should be possible - but this should be done in dask and will more complicated to get right.

Instead of creating and deleting attribute at various point of the code, I think that the following alternative would be more clear and explicit:

  • define initial value (None) for the cache in __init__, add a comment to explain how it works
  • connect clearing cache method to data_changed event in __init__
  • add a private method to clear the cache - reset to initial value (None)

hyperspy/_signals/lazy.py Outdated Show resolved Hide resolved
hyperspy/_signals/lazy.py Outdated Show resolved Hide resolved
hyperspy/_signals/lazy.py Outdated Show resolved Hide resolved
hyperspy/tests/signal/test_lazy.py Outdated Show resolved Hide resolved
hyperspy/_signals/lazy.py Outdated Show resolved Hide resolved
hyperspy/tests/signal/test_lazy.py Outdated Show resolved Hide resolved
hyperspy/tests/signal/test_lazy.py Outdated Show resolved Hide resolved
hyperspy/_signals/lazy.py Outdated Show resolved Hide resolved
hyperspy/_signals/lazy.py Outdated Show resolved Hide resolved
hyperspy/_signals/lazy.py Outdated Show resolved Hide resolved
@magnunor
Copy link
Contributor Author

Todo for myself: check that the following code does not produce any errors.

import numpy as np
import dask.array as da
import hyperspy.api as hs

data = da.zeros((128, 128, 256, 256), chunks=(32, 32, 32, 32), dtype=np.uint16)
s = hs.signals.Signal2D(data).as_lazy()
s.save("000_dataset.hspy", chunks=(32, 32, 32, 32))

@magnunor
Copy link
Contributor Author

There are some unit tests which fails in some strange ways. For example test_estimate_parameters_binned:

 >       if not self._clear_cache_dask_data in self.events.data_changed._connected_all:
E       AttributeError: 'Signal1D' object has no attribute '_clear_cache_dask_data'

This error happens at https://github.com/hyperspy/hyperspy/blob/RELEASE_next_minor/hyperspy/_signals/signal1d.py#L1659, and I'm guessing it is due to LazySignal1D not having access to LazySignal's methods when running LazySignal's __init__ method?

LazySignal1D inherits both LazySignal and Signal1D, so I'm guessing that might be causing some strangeness...


Secondly, when is s.events created? Another test is failing due to

E       AttributeError: 'LazySignal1D' object has no attribute 'events'

Previously, when self was Lazy, it would call the LazySignal.__init__
signal with `_lazy': False, leading to the non-lazy version of the
signal calling the LazySignal.__init__. However, the non-lazy signal
did not have access to a function which is only available in
LazySignal. This lead to an error, meaning get_current_signal would
fail.
Some LazySignals are initialized without events, when they are
created with full_initialisation=False. For these very few signals,
the LazySignal.__init__ function failed. By checking in the __init__
function that events exist, the init no longer fails.

However, this might(?) lead to memory leaks when plotting with newly
created signals.
@magnunor
Copy link
Contributor Author

I figured out why the previous tests failed, but I'm uncertain if the fixes are the best ones:


The first one is related to BaseSignal.get_current_signal, where calling it with (for example) LazySignal1D lead to a Signal1D object running LazySignal1D's __init__ function. This lead to an error.

The problem was in get_current_signal, this part (with self as a LazySignal1D):

cs = self.__class__(
    self(),
    axes=self.axes_manager._get_signal_axes_dicts(),
    metadata=metadata.as_dictionary(),
    attributes={'_lazy': False})

Somehow, attributes={'_lazy': False}) lead to the Signal1D running the wrong __init__.

This was fixed with

cs = self.__class__(
    self(),
    axes=self.axes_manager._get_signal_axes_dicts(),
    metadata=metadata.as_dictionary())
    
if self._lazy:
    cs._lazy = False
    cs._assign_subclass()

The second issue was with s.events not existing in certain situtations. This lead to issues, due to LazySignal.__init__ wanting to connect the "cache clearing function" to the data_changed event.

This is due to full_initialisation=False being used for some signals, which does not initialize s.events.

The easy fix was to do:

if hasattr(self, "events"):
    if not self._clear_cache_dask_data in self.events.data_changed._connected_all:
        self.events.data_changed.connect(self._clear_cache_dask_data)

However, I'm not sure if this is the best solution.

@magnunor
Copy link
Contributor Author

I've implemented all the feedback, and answered the questions in the code. I left the questions "Unresolved".

I ran into some issues with the LazySignal.__init__ function, where I'm a bit uncertain if the fixes are optimal: #2568 (comment).

Other than that, this pull request is ready for review!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants