Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lazy html representation #2855

Merged

Conversation

CSSFrancis
Copy link
Member

@CSSFrancis CSSFrancis commented Nov 16, 2021

This adds in an html representation for a lazy signal. I think this helps to visualize and understand higher dimensional data. It separates chunks into the signal and navigation axes.

A few sentences and/or a bulleted list to describe and motivate the change:

  • I think it was pretty hard to know how the chunking was being applied to the signal and navigation axes
  • A lot of times I was using display(s.data) for lazy_signals to use the dask html representation.

Progress of the PR

  • Change implemented (can be split into several points),
  • update docstring
  • update user guide
  • add an changelog entry in the upcoming_changes folder (see [upcoming_changes/README.rst]
  • Check formatting changelog entry in the readthedocs doc build of this PR (link in github checks)
  • add tests (Not really sure how to test this?)
  • ready for review

Minimal example of the bug fix or the new feature

import hyperspy.api as hs
import dask.array as da
x = da.zeros((10,203,202,101))
s = hs.signals.Signal2D(x).as_lazy()
s.set_signal_type("diffraction")
s.metadata.General.title="Test"
display(s)

image

For anyone using lazy signals @ericpre @pc494 @magnunor @hakonanes I'm open to any suggestions as to what you think might be useful to add to the html representation.

Other Things to Consider:

  • Adding in some information about the axes, potentially adding real units to the chunk images rather than pixels.
  • Displaying the "fast" axes, or the axes you can operate on without rechunking.
  • Method of loading the data (Is the data compressed, memmapped, or computed) This is related to the "cost" to rechunk the data.

Anything else?

@magnunor
Copy link
Contributor

Good idea! Especially since the doing the "wrong" chunking in the wrong dimension can lead to some really slow processing, or the whole computer crashing due to lack of memory.

I'll try to chip in some review-time!

@hakonanes
Copy link
Contributor

This looks very nice and informative! I like this representation much more than checking print(s.data) and print(s.data.chunksize) all the time, which I'm currently doing.

I think the representation is dense enough, and shouldn't contain much more information. Perhaps the "fast" axes can be bold?

I'm not sure how many more new dependencies changing dask[array] to dask introduces, is it possible to only include the widgets and the array modules? Should perhaps this nice representation be an optional download with hyperspy?

@CSSFrancis
Copy link
Member Author

I think the representation is dense enough, and shouldn't contain much more information. Perhaps the "fast" axes can be bold?

Good idea! I can easily make that change.

I'm not sure how many more new dependencies changing dask[array] to dask introduces, is it possible to only include the widgets and the array modules? Should perhaps this nice representation be an optional download with hyperspy?

The widgets module was just added to help out with this kind of representations so I'm not sure you can install just dask.widgets. At least it isn't listed under packages

https://github.com/dask/dask/blob/f5881891505b9a2ba2da195befb11ad7b4c7bb23/setup.py#L41-L50

I think that an optional download is probably the best case scenario for now.

@codecov
Copy link

codecov bot commented Nov 16, 2021

Codecov Report

Merging #2855 (3d025cc) into RELEASE_next_minor (c4d1b69) will increase coverage by 0.02%.
The diff coverage is 100.00%.

Impacted file tree graph

@@                  Coverage Diff                   @@
##           RELEASE_next_minor    #2855      +/-   ##
======================================================
+ Coverage               77.10%   77.12%   +0.02%     
======================================================
  Files                     206      206              
  Lines                   31821    31868      +47     
  Branches                 7152     7161       +9     
======================================================
+ Hits                    24535    24579      +44     
- Misses                   5533     5534       +1     
- Partials                 1753     1755       +2     
Impacted Files Coverage Δ
hyperspy/__init__.py 100.00% <100.00%> (ø)
hyperspy/_signals/lazy.py 90.27% <100.00%> (+0.80%) ⬆️
hyperspy/signal.py 73.79% <100.00%> (-0.02%) ⬇️
hyperspy/misc/eels/eelsdb.py 60.86% <0.00%> (-4.35%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c4d1b69...3d025cc. Read the comment docs.

@hakonanes
Copy link
Contributor

Regarding dask packages, I just did

conda create -n dask-test python=3.9
conda activate dask-test
pip install dask[array]
python -c "from dask.widgets import TEMPLATE_PATH"

which imported TEMPLATE_PATH successfully. Looking at the init for dask.widgets it seems like you'll get that module with dask[diagnostics], and it seems like you'll get the diagnostics module with dask[array]. Based on this I think the current dask[array] suffices, or did you have a different experience?

@ericpre
Copy link
Member

ericpre commented Nov 17, 2021

xref #1931.

  • add tests (Not really sure how to test this?)

Already testing that the code run is already good, even if the output is not checked. You could save the html code and compare to this, but it would mean that it is heavily dependent on the specific dask widgets implementation and therefore it is not good.

Other Things to Consider:

  • Adding in some information about the axes, potentially adding real units to the chunk images rather than pixels.
  • Displaying the "fast" axes, or the axes you can operate on without rechunking.
  • Method of loading the data (Is the data compressed, memmapped, or computed) This is related to the "cost" to rechunk the data.

In my opinion, this is good as it is in the image of your summary. If users want more information about the dask graph and the source of the task, they should use dask directly - there are some widgets to visualise it.

Copy link
Member

@ericpre ericpre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, I made some suggestions/comments.

Some alignments are a bit off, but this is not a big deal:

  • the space about above "Title"
  • "Navigation Axes` is misaligned with the chunks of the navigation space

image

Also, could you please rebase this branch to tidy up the history and remove some spurious files/commits?

hyperspy/tests/signal/test_mask Outdated Show resolved Hide resolved
hyperspy/__init__.py Outdated Show resolved Hide resolved
setup.py Outdated Show resolved Hide resolved
hyperspy/_signals/lazy.py Outdated Show resolved Hide resolved
Copy link
Member

@ericpre ericpre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good.

This doesn't work when the array contain only one single chunk. It doesn't raise an error because it falls back to non html display, but it is confusing and it should be made consistent

Example to reproduce and raise the error by calling _repr_html_ directly:

import dask.array as da
import hyperspy.api as hs

s = hs.signals.BaseSignal(da.random.random((500, 1000))).as_lazy()

s._repr_html_()

which raises:

---------------------------------------------------------------------------
NotImplementedError                       Traceback (most recent call last)
/tmp/ipykernel_80943/4231446018.py in <module>
----> 1 s._repr_html_()

~/Dev/hyperspy/hyperspy/_signals/lazy.py in _repr_html_(self)
    170             from dask.widgets import get_template
    171             from dask.utils import format_bytes
--> 172             nav_grid = svg(chunks=self.get_chunk_size(axis=self.axes_manager.navigation_axes),
    173                            size=config.get("array.svg.size", 160))
    174             sig_grid = svg(chunks=self.get_chunk_size(axis=self.axes_manager.signal_axes),

/opt/miniforge/lib/python3.9/site-packages/dask/array/svg.py in svg(chunks, size, **kwargs)
     27         raise NotImplementedError("Can't generate SVG with 0-length dimensions")
     28     if len(chunks) == 0:
---> 29         raise NotImplementedError("Can't generate SVG with 0 dimensions")
     30     if len(chunks) == 1:
     31         return svg_1d(chunks, size=size, **kwargs)

NotImplementedError: Can't generate SVG with 0 dimensions

doc/dev_guide/lazy_computations.rst Outdated Show resolved Hide resolved
upcoming_changes/2855.new.rst Outdated Show resolved Hide resolved
doc/dev_guide/lazy_computations.rst Outdated Show resolved Hide resolved
hyperspy/_signals/lazy.py Show resolved Hide resolved
hyperspy/_signals/lazy.py Outdated Show resolved Hide resolved
hyperspy/_signals/lazy.py Outdated Show resolved Hide resolved
hyperspy/_signals/lazy.py Show resolved Hide resolved
Copy link
Member

@ericpre ericpre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rebased to get the test suite to pass because of the ipython 8.0 issue and I tidy/fix minor issues with the doc and packaging.

@ericpre ericpre merged commit db5743b into hyperspy:RELEASE_next_minor Jan 24, 2022
@ericpre ericpre added this to the v1.7 milestone Jan 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants