Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix lagging 3d view for big data in auto color mode #6411

Merged
merged 41 commits into from Nov 14, 2023

Conversation

Czaki
Copy link
Collaborator

@Czaki Czaki commented Nov 4, 2023

References and relevant issues

closes #6397

Description

This array fixes fps performance issues in OpenGL introduced by #3308. In that PR, the texture type was changed to float32 in order to directly pass the labels values to the texture. It turns out that OpenGL performance for float32 textures is much worse than for uint8 textures.

Here we change the code to use uint8 whenever the final number of colors is less than 255 in automatic coloring mode, or uint16 if the number is less than 65535.

This is achieved by transforming original data using a modulo-like operation that avoids the background label landing on 0.

This PR introduces numba dependency, which might not be a long-term solution. We may try to move this utility to some package that already contains compiled code. We can revisit the decision if it causes issues (such as a delay in supporting newer Python versions), and perhaps push such a function to a compiled dependency such as scikit-image.

This PR also disables the caching strategy used to speed up painting until someone actually starts painting. It is a significant speedup in startup time and reduces memory usage.

@github-actions github-actions bot added tests Something related to our tests qt Relates to qt labels Nov 5, 2023
Copy link

codecov bot commented Nov 5, 2023

Codecov Report

Merging #6411 (8e13fc7) into main (932054d) will decrease coverage by 0.06%.
The diff coverage is 95.65%.

@@            Coverage Diff             @@
##             main    #6411      +/-   ##
==========================================
- Coverage   92.19%   92.13%   -0.06%     
==========================================
  Files         599      601       +2     
  Lines       52987    53088     +101     
==========================================
+ Hits        48853    48915      +62     
- Misses       4134     4173      +39     
Files Coverage Δ
napari/_qt/_tests/test_qt_viewer.py 94.70% <100.00%> (ø)
napari/_qt/layer_controls/qt_labels_controls.py 94.64% <ø> (ø)
napari/_vispy/_tests/test_vispy_labels_layer.py 98.18% <100.00%> (ø)
napari/_vispy/utils/visual.py 96.22% <100.00%> (ø)
napari/_vispy/visuals/image.py 88.88% <100.00%> (+1.38%) ⬆️
napari/_vispy/visuals/util.py 100.00% <100.00%> (ø)
napari/_vispy/visuals/volume.py 100.00% <100.00%> (ø)
napari/layers/labels/_labels_utils.py 95.83% <100.00%> (ø)
napari/layers/labels/_tests/test_labels.py 100.00% <100.00%> (ø)
napari/utils/colormaps/_tests/test_colormap.py 100.00% <100.00%> (ø)
... and 6 more

... and 12 files with indirect coverage changes

@Czaki Czaki marked this pull request as ready for review November 5, 2023 22:56
@imagesc-bot
Copy link

This pull request has been mentioned on Image.sc Forum. There might be relevant details there:

https://forum.image.sc/t/request-for-testing-napari-0-4-19-release-candidate/87972/14

Copy link
Member

@jni jni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Czaki I've left some initial comments. Still getting my head around the flow, but I think I understand the gist. It's very cool and I love that you fixed the direct case too where possible. ❤️

napari/_tests/test_windowsettings.py Outdated Show resolved Hide resolved
napari/_vispy/layers/image.py Show resolved Hide resolved
):
self._on_display_change(data)
else:
node.set_data(data)

node.visible = not self.layer._slice.empty and self.layer.visible
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change intentional? I find it surprising... Is it because _on_display_change correctly sets the node visibility anyway?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. It is intentional not to trigger obsolete events.

napari/_vispy/layers/labels.py Outdated Show resolved Hide resolved
napari/_vispy/layers/labels.py Outdated Show resolved Hide resolved
napari/layers/labels/labels.py Show resolved Hide resolved
def _cast_labels_to_minimum_type_auto(
data: np.ndarray, num_colors: int, dtype
) -> np.ndarray:
result_array = np.zeros_like(data, dtype=dtype)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oof. This is a big step. However, I tried my best to optimise this in NumPy, avoiding unnecessary allocations, and got 10x worse speed...

In [14]: %timeit cm._cast_labels_to_minimum_type_auto(data, 50, np.uint8)
OMP: Info #276: omp_set_nested routine deprecated, please use omp_set_max_active_levels instead.
33 ms ± 2.19 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [16]: %%timeit
    ...: np.add(np.mod(data, np.uint8(50), out=out, casting='unsafe'),
    ...:        np.uint8(1),
    ...:        where=(data >= 50) | (data < 0),
    ...:        out=out)
    ...:
388 ms ± 830 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)

Even if we optimise for unsigned ints, we are still close to 10x worse:

In [15]: %%timeit
    ...: np.add(np.mod(data, np.uint8(50), out=out, casting='unsafe'),
    ...:        np.uint8(1),
    ...:        where=(data >= 50),
    ...:        out=out)
    ...:
280 ms ± 458 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)

So, maybe we should try to bite the bullet. Alternatively, maybe we can dispatch to the above NumPy code if numba is not available, but recommend numba (and include it in [all]).

What do you think @Czaki?

Certainly, depending on numba would open up a lot of optimisations. But it feels like we are making a big decision under time pressure. 😅

Copy link
Member

@jni jni Nov 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I just saw what you think in the PR description 😅. Yes I guess you are right that for 0.4.19 it is not a huge risk. But maybe since this is going into main, we make the dependency optional (as discussed above), and when backporting to 0.4.19 we tear out the compatibility code? Or vice versa, we can add non-numba option in a future PR, your call.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the best option will be to donate such code to some of our compiled dependencies (vispy, scikit?)

Here I see a double allocation:

where=(data >= 50) | (data < 0),

I would rather open an issue for this problem than add much slower/memory consummimg code.

napari/utils/colormaps/colormap.py Show resolved Hide resolved
napari/utils/colormaps/colormap.py Show resolved Hide resolved
napari/_vispy/visuals/util.py Outdated Show resolved Hide resolved
@jni
Copy link
Member

jni commented Nov 6, 2023

This PR disables caching used for speedup painting until someone starts painting. It is a significant speedup and reduces memory usage.

Nice catch!

Co-authored-by: Juan Nunez-Iglesias <jni@fastmail.com>
@Carreau
Copy link
Contributor

Carreau commented Nov 8, 2023

restarting what looks like random failure.

@Czaki
Copy link
Collaborator Author

Czaki commented Nov 8, 2023

restarting what looks like random failure.

I have opened PR to fix this random failures #6423

@Czaki Czaki closed this Nov 13, 2023
@Czaki Czaki reopened this Nov 13, 2023
@brisvag
Copy link
Contributor

brisvag commented Nov 13, 2023

My impression from the previous talk (during the last fixing of labels rendering) is that we want to support setting different backgrounds from 0. Maybe we should just drop it. I just do not want to break this.

I remember this being added specifically cause someone asked for it and they had data with -1 as background... I think @kevinyamauchi was implementing this at the time maybe?

@jni
Copy link
Member

jni commented Nov 13, 2023

Good thing @Czaki convinced me to keep the functionality then. 😅 Thanks for chiming in @brisvag! 🙏

@jni
Copy link
Member

jni commented Nov 14, 2023

Gaaah, @Czaki while I was rewording the description, I thought of another issue with this PR: we should not use num_colors to do the mod operation, or labels that have the same color will always have the same color. The mod operation (a) should only occur when the label data dtype doesn't fit in uint8 in the first place, and (b) should be by a prime number that is greater than num_colors but still fits in the desired dtype, and that prime number should get changed when Shuffle is hit. 😕 And we should figure out a test that catches this issue in the future...

At any rate, though, in order to avoid complicating this PR further, I think that we should merge this, and fix it in a subsequent PR. We can raise an issue with the 0.4.19 milestone to make sure it gets handled...

Have I got the analysis right?

@jni
Copy link
Member

jni commented Nov 14, 2023

Ok after a quick chat with @Czaki we have determined that this issue also exists on main. Therefore, I'm going to merge this and make an issue, which we can then fix. (It may even go back further, but I'm not certain about this, will need to test properly.)

@jni jni merged commit a6e1569 into napari:main Nov 14, 2023
41 of 49 checks passed
@jni jni deleted the bugfix/draw_performance branch November 14, 2023 09:13
@jni jni removed the ready to merge Last chance for comments! Will be merged in ~24h label Nov 14, 2023
@jni
Copy link
Member

jni commented Nov 14, 2023

The issue is explained in detail in #6448.

Czaki added a commit that referenced this pull request Nov 14, 2023
closes #6397

This array fixes fps performance issues in OpenGL introduced by #3308.
In that PR, the texture type was changed to float32 in order to directly
pass the labels values to the texture. It turns out that OpenGL
performance for float32 textures is much worse than for uint8 textures.

Here we change the code to use uint8 whenever the final number of colors
is less than 255 in automatic coloring mode, or uint16 if the number is
less than 65535.

This is achieved by transforming original data using a modulo-like
operation that avoids the background label landing on 0.

This PR introduces numba dependency, which might not be a long-term
solution. We may try to move this utility to some package that already
contains compiled code. We can revisit the decision if it causes issues
(such as a delay in supporting newer Python versions), and perhaps push
such a function to a compiled dependency such as scikit-image.

This PR also disables caching used for speedup painting until someone starts
painting. It is a significant speedup and reduces memory usage.

---------

Co-authored-by: Juan Nunez-Iglesias <jni@fastmail.com>
Co-authored-by: Matthias Bussonnier <bussonniermatthias@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
jni added a commit that referenced this pull request Nov 17, 2023
# Description

Changes from #6411 make the thumbnail not work correctly as the
slice is storing cast values, not the original ones for thumbnail
generation.

Before this PR:

![Zrzut ekranu z 2023-11-15
12-24-56](https://github.com/napari/napari/assets/3826210/1d5942a9-22dc-4f70-8be0-3c9a7c8e1b9f)

With this PR:
 
![Zrzut ekranu z 2023-11-15
12-23-32](https://github.com/napari/napari/assets/3826210/01c65497-6c8c-4bf0-8887-5c58db04052b)

Code:

```python
import napari
import numpy as np

data = np.asarray([[0, 1], [2, 3]])

viewer = napari.Viewer()
viewer.add_labels(data, opacity=1)

napari.run()
```

---------

Co-authored-by: Juan Nunez-Iglesias <jni@fastmail.com>
Czaki added a commit that referenced this pull request Nov 22, 2023
Changes from #6411 make the thumbnail not work correctly as the
slice is storing cast values, not the original ones for thumbnail
generation.

Before this PR:

![Zrzut ekranu z 2023-11-15
12-24-56](https://github.com/napari/napari/assets/3826210/1d5942a9-22dc-4f70-8be0-3c9a7c8e1b9f)

With this PR:

![Zrzut ekranu z 2023-11-15
12-23-32](https://github.com/napari/napari/assets/3826210/01c65497-6c8c-4bf0-8887-5c58db04052b)

Code:

```python
import napari
import numpy as np

data = np.asarray([[0, 1], [2, 3]])

viewer = napari.Viewer()
viewer.add_labels(data, opacity=1)

napari.run()
```

---------

Co-authored-by: Juan Nunez-Iglesias <jni@fastmail.com>
Czaki added a commit that referenced this pull request Nov 24, 2023
closes #6397

This array fixes fps performance issues in OpenGL introduced by #3308.
In that PR, the texture type was changed to float32 in order to directly
pass the labels values to the texture. It turns out that OpenGL
performance for float32 textures is much worse than for uint8 textures.

Here we change the code to use uint8 whenever the final number of colors
is less than 255 in automatic coloring mode, or uint16 if the number is
less than 65535.

This is achieved by transforming original data using a modulo-like
operation that avoids the background label landing on 0.

This PR introduces numba dependency, which might not be a long-term
solution. We may try to move this utility to some package that already
contains compiled code. We can revisit the decision if it causes issues
(such as a delay in supporting newer Python versions), and perhaps push
such a function to a compiled dependency such as scikit-image.

This PR also disables caching used for speedup painting until someone starts
painting. It is a significant speedup and reduces memory usage.

---------

Co-authored-by: Juan Nunez-Iglesias <jni@fastmail.com>
Co-authored-by: Matthias Bussonnier <bussonniermatthias@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Czaki added a commit that referenced this pull request Nov 24, 2023
Changes from #6411 make the thumbnail not work correctly as the
slice is storing cast values, not the original ones for thumbnail
generation.

Before this PR:

![Zrzut ekranu z 2023-11-15
12-24-56](https://github.com/napari/napari/assets/3826210/1d5942a9-22dc-4f70-8be0-3c9a7c8e1b9f)

With this PR:

![Zrzut ekranu z 2023-11-15
12-23-32](https://github.com/napari/napari/assets/3826210/01c65497-6c8c-4bf0-8887-5c58db04052b)

Code:

```python
import napari
import numpy as np

data = np.asarray([[0, 1], [2, 3]])

viewer = napari.Viewer()
viewer.add_labels(data, opacity=1)

napari.run()
```

---------

Co-authored-by: Juan Nunez-Iglesias <jni@fastmail.com>
jni added a commit that referenced this pull request Dec 1, 2023
…#6467)

# Description

This is follow-up to #6411. 

When labels data are int8, uint8, int16, or uint16, no copy is made —
the data are sent directly as a uint8 or uint16 view to the GPU for
rendering.

Higher bit depth integer data is first converted on the CPU to uint8,
uint16, or float32 (depending on the number of colors in the colormap)
using the formula `texture = (values - 1) % ncolors + 1` — but with the
background value always mapping to 0.

---------

Co-authored-by: Juan Nunez-Iglesias <jni@fastmail.com>
Co-authored-by: Lorenzo Gaifas <brisvag@gmail.com>
jni added a commit that referenced this pull request Dec 15, 2023
…rect color mode (#6439)

Closes #6518
Closes #6084

# Description

In this PR, similarly to #6411, instead of using `float32` to pass data
to the GPU there we introduce heuristics for choosing smaller data
types, while keeping high performance.

Instead of complex calculation of color in the shader, a precomputed
texture array is used.
To avoid repetitive texture calculation, the textures are cached in the
`Colormap` objects.

For data of type uint8/int8/uint16/int16 we do not perform any transform
of data. We send them to the GPU as it is. This allows to reduce
computational time.

Based on experiments, the rendering performance is a little worse for
uint16/int16 than for uint8/int8. But it may depend on the GPU. Also,
using uint16/int16 means usage more GPU memory than for 8 bits type.
Still less than current main.

For datatypes using at least 32 bits, we add a preprocessing step where
we identify a set of labels that are mapped to the same color and map
all of them to the same value.
This often saves enough space to fall back to uint8/uint16. It allows
using a smaller additional array, and use less GPU memory. If there are
more than `2**16` distinct colors, then float32 is used, though
performance will be reduced.

We support only up to `2**23` distinct colors for now. 

For reduced memory usage, part of the functions used for data
preprocessing are compiled using numba. We provide a version of the
function that does not require `numba` but it limits the number of
distinct colors to `2**16` and involves additional array creation (more
memory usage).

---------

Co-authored-by: Juan Nunez-Iglesias <jni@fastmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Lorenzo Gaifas <brisvag@gmail.com>
Co-authored-by: Andy Sweet <andrew.d.sweet@gmail.com>
Czaki added a commit that referenced this pull request Dec 15, 2023
…#6467)

This is follow-up to #6411.

When labels data are int8, uint8, int16, or uint16, no copy is made —
the data are sent directly as a uint8 or uint16 view to the GPU for
rendering.

Higher bit depth integer data is first converted on the CPU to uint8,
uint16, or float32 (depending on the number of colors in the colormap)
using the formula `texture = (values - 1) % ncolors + 1` — but with the
background value always mapping to 0.

---------

Co-authored-by: Juan Nunez-Iglesias <jni@fastmail.com>
Co-authored-by: Lorenzo Gaifas <brisvag@gmail.com>
Czaki added a commit that referenced this pull request Dec 15, 2023
…rect color mode (#6439)

Closes #6518
Closes #6084

In this PR, similarly to #6411, instead of using `float32` to pass data
to the GPU there we introduce heuristics for choosing smaller data
types, while keeping high performance.

Instead of complex calculation of color in the shader, a precomputed
texture array is used.
To avoid repetitive texture calculation, the textures are cached in the
`Colormap` objects.

For data of type uint8/int8/uint16/int16 we do not perform any transform
of data. We send them to the GPU as it is. This allows to reduce
computational time.

Based on experiments, the rendering performance is a little worse for
uint16/int16 than for uint8/int8. But it may depend on the GPU. Also,
using uint16/int16 means usage more GPU memory than for 8 bits type.
Still less than current main.

For datatypes using at least 32 bits, we add a preprocessing step where
we identify a set of labels that are mapped to the same color and map
all of them to the same value.
This often saves enough space to fall back to uint8/uint16. It allows
using a smaller additional array, and use less GPU memory. If there are
more than `2**16` distinct colors, then float32 is used, though
performance will be reduced.

We support only up to `2**23` distinct colors for now.

For reduced memory usage, part of the functions used for data
preprocessing are compiled using numba. We provide a version of the
function that does not require `numba` but it limits the number of
distinct colors to `2**16` and involves additional array creation (more
memory usage).

---------

Co-authored-by: Juan Nunez-Iglesias <jni@fastmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Lorenzo Gaifas <brisvag@gmail.com>
Co-authored-by: Andy Sweet <andrew.d.sweet@gmail.com>
jni added a commit that referenced this pull request Feb 13, 2024
…#6652)

# Description

In #6411, we missed these lines:

https://github.com/napari/napari/blob/4f4c063ae5dd79d6d188e201d44b8d57eba71909/napari/_vispy/layers/image.py#L25-L32

These were added in #3990 because 'auto' has a slightly different
meaning in VisPy than *really* fully auto: it chooses the default
texture dtype for the input NumPy array's dtype. For float arrays, this
is float. However, if your OpenGL implementation doesn't support float
textures, VisPy will raise.

Instead, passing `texture_format=None` when creating the ImageNode tells
VisPy to transform the data to whatever texture format it sees fit.

In #6411, when getting the VisPy node for a given dtype, we only check
for `texture_format != 'auto'` as the "catch-all" texture format. But,
in fact, if we are on a machine that doesn't support float32 textures,
by this point in the code the format has been changed to None,
incorrectly triggering these lines:

https://github.com/napari/napari/blob/89f8194d3fa4eef620755804806ac69ef684df63/napari/_vispy/layers/image.py#L59-L73

and causing a ValueError.

This PR fixes that by also checking for None in that same clause.

This PR also adds a test by monkeypatching the function that checks for
float32 texture support.

# References

#3988
#3990

---------

Co-authored-by: Juan Nunez-Iglesias <jni@fastmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Czaki added a commit that referenced this pull request Feb 20, 2024
…#6652)

# Description

In #6411, we missed these lines:

https://github.com/napari/napari/blob/4f4c063ae5dd79d6d188e201d44b8d57eba71909/napari/_vispy/layers/image.py#L25-L32

These were added in #3990 because 'auto' has a slightly different
meaning in VisPy than *really* fully auto: it chooses the default
texture dtype for the input NumPy array's dtype. For float arrays, this
is float. However, if your OpenGL implementation doesn't support float
textures, VisPy will raise.

Instead, passing `texture_format=None` when creating the ImageNode tells
VisPy to transform the data to whatever texture format it sees fit.

In #6411, when getting the VisPy node for a given dtype, we only check
for `texture_format != 'auto'` as the "catch-all" texture format. But,
in fact, if we are on a machine that doesn't support float32 textures,
by this point in the code the format has been changed to None,
incorrectly triggering these lines:

https://github.com/napari/napari/blob/89f8194d3fa4eef620755804806ac69ef684df63/napari/_vispy/layers/image.py#L59-L73

and causing a ValueError.

This PR fixes that by also checking for None in that same clause.

This PR also adds a test by monkeypatching the function that checks for
float32 texture support.

# References

#3988
#3990

---------

Co-authored-by: Juan Nunez-Iglesias <jni@fastmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Relates to performance qt Relates to qt tests Something related to our tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Labels layer is lagging in 3d for big data
5 participants