Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Efficient labels mapping for drawing in Labels (60 FPS even with 8000x8000 images) #5732

Merged
merged 42 commits into from May 11, 2023

Conversation

ksofiyuk
Copy link
Contributor

@ksofiyuk ksofiyuk commented Apr 15, 2023

Description

This PR introduces two independent optimizations for labels rendering during drawing: local updates and caching optimizations.

These optimizations virtually solve any performance issues related to drawing in the Labels layer. With them, it is possible to achieve 60 FPS when drawing even in 8000x8000 images.

Local updates

The initial implementation updates the whole labels map and copies it to the shader on each brush update, which becomes a major bottleneck when you work with high-resolution images no matter how effective the labels mapping implementation is.

In this PR, only partial updates are sent to the VisPy labels layer. Each time Layers.data_setitem is called, it tracks the bounds of modified region, and instead of calling Labels.refresh() that triggers the update of the whole labels image, it calls Labels._partial_labels_refresh that emits the Labels.events.labels_update event that comes with the slice localizing the modified region, which is then handled by VispyLabelsLayer._on_partial_labels_update. VisPy textures can be partially updated, which is used in the _on_partial_labels_update method.

Caching optimization

The idea is to recompute color mapping only for the elements of a label map that are changed from a previous update. Even in the parts of the code where the color mapping is implemented quite efficiently it can give up to 5x speedup by avoiding slow np.float32 recomputations for most pixels.

In practice, when you use brush, less than 1% of pixels are updated at each iteration (on large images the percentage is even smaller). As a result, this optimization should work and give a significant boost in 99%+ of typical use case scenarios.

Benchmark results

I added a new benchmark (benchmark_labels_layer.LabelsDrawing2DSuite) that measures the timings of brush drawing in different modes (auto/direct, contour == 0/1) with different brush sizes. It simulates the brush drawing from the position (0, 0) to (n - 1, n - 1) with 30 refresh updates along the way.

This PR:

              ====== ============ ============ ============ ============ ============
              --                                  color_mode / contour
              ------------------- ---------------------------------------------------
                n     brush_size    auto / 0     auto / 1    direct / 0   direct / 1
              ====== ============ ============ ============ ============ ============
               512        8        28.2±0.3ms    33.6±1ms     29.2±1ms    32.9±0.7ms
               512        64       22.8±0.1ms   28.1±0.2ms   22.9±0.1ms   29.1±0.3ms
               512       256       128±0.9ms    153±0.7ms    128±0.7ms     153±1ms
               3072       8        147±0.6ms    165±0.7ms     147±3ms     166±0.4ms
               3072       64       131±0.9ms     151±2ms     132±0.8ms     150±1ms
               3072      256        450±2ms      502±2ms     452±0.7ms     501±2ms
              ====== ============ ============ ============ ============ ============

main 7b3f7ec:

              ====== ============ ============ ============ ============ ============
              --                                  color_mode / contour
              ------------------- ---------------------------------------------------
                n     brush_size    auto / 0     auto / 1    direct / 0   direct / 1
              ====== ============ ============ ============ ============ ============
               512        8        86.6±0.7ms    242±2ms      100±1ms      256±1ms
               512        64        75.6±1ms     230±1ms      90.6±1ms     241±1ms
               512       256       163±0.9ms     316±2ms      180±1ms      328±1ms
               3072       8        1.20±0.01s   6.33±0.02s   1.80±0.02s   6.90±0.02s
               3072       64       1.15±0.01s    6.30±0s      1.77±0s      6.90±0s
               3072      256       1.30±0.01s   6.40±0.01s    1.92±0s     7.04±0.02s
              ====== ============ ============ ============ ============ ============

The difference is most noticeable when the heavy labels mapping code is used (contour=1), with small brush sizes it gives almost 50x performance boost.

Please keep in mind that this benchmark only measures the time it takes to update layer.data + the time it takes to convert labels to colors before sending them to VisPy. This benchmark does not take into account the time it takes to transfer data from napari to OpenGL or the time it takes to process labels in any type of OpenGL shader. And the local updates also give a substantial speed up in the part that is not measured.

Type of change

  • Optimization (non-breaking change which speedups existing code)

How has this been tested?

  • all tests pass with my change

Final checklist:

  • My PR is the minimum possible work for the desired functionality
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have added tests that prove my fix is effective or that my feature works
  • If I included new strings, I have used trans. to make them localizable.
    For more information see our translations guide.

Copy link
Contributor

@alisterburt alisterburt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very nice work @ksofiyuk !

Changes look great to me, typing test failure seems unrelated (as in #5723) - for me this is ready to merge

supersedes #5723

napari/layers/labels/labels.py Outdated Show resolved Hide resolved
@alisterburt alisterburt added the ready to merge Last chance for comments! Will be merged in ~24h label Apr 15, 2023
@ksofiyuk
Copy link
Contributor Author

ksofiyuk commented Apr 15, 2023

@alisterburt Actually, there is a small problem. It failed some tests for the Label layer. It turned out that this optimization cannot be easily applied when Layer.contour > 0 as morphological operations are performed on the raw array for computing contours. I'll disable it when Layer.contour > 0.

@alisterburt
Copy link
Contributor

ooh, I just saw you brought it back, could you explain how this is working?

@ksofiyuk
Copy link
Contributor Author

I found out that there is no need to disable it when contour > 0, it is enough just to move the cache initialization after the if block, in which contours are computed. However, computing contours at each update makes caching much less useful, but it should give some gain anyway.

@codecov
Copy link

codecov bot commented Apr 15, 2023

Codecov Report

Merging #5732 (570aa2a) into main (7b3f7ec) will increase coverage by 0.01%.
The diff coverage is 96.03%.

@@            Coverage Diff             @@
##             main    #5732      +/-   ##
==========================================
+ Coverage   89.89%   89.91%   +0.01%     
==========================================
  Files         614      615       +1     
  Lines       52283    52474     +191     
==========================================
+ Hits        47002    47180     +178     
- Misses       5281     5294      +13     
Impacted Files Coverage Δ
napari/_vispy/layers/labels.py 93.33% <90.90%> (-6.67%) ⬇️
napari/layers/labels/labels.py 95.86% <93.65%> (+0.37%) ⬆️
napari/layers/labels/_labels_utils.py 95.83% <100.00%> (+0.91%) ⬆️
napari/layers/labels/_tests/test_labels.py 99.51% <100.00%> (+0.02%) ⬆️
napari/utils/colormaps/colormap_utils.py 93.30% <100.00%> (+0.03%) ⬆️

... and 17 files with indirect coverage changes

@alisterburt alisterburt added this to the 0.4 milestone Apr 15, 2023
@Czaki
Copy link
Collaborator

Czaki commented Apr 15, 2023

@alisterburt Labels and milestone (are you sure for 0.4)?

Copy link
Member

@jni jni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of comments here,

  1. Fix inefficient label mapping in direct color mode (10-20x speedup) #5723 seems to be reasonably independent of this — could we merge that first? It will make the changes more granular and easier to review — Fix inefficient label mapping in direct color mode (10-20x speedup) #5723 was already ready to merge so I kinda wanna get that in while we review this. Does that work @ksofiyuk? They look to me like they sort of touch different parts of the file. But,

  2. Another reason why I want to have the two reviews separately is that I think there should be a bit more of a change in the middle to make it clearer what's going on. For example, I don't like reusing variable names (raw_modified = raw_modified[changed_mask]), as it makes it harder to reason about the code (is this image shaped or a linear array of changed values?). I also don't like that now, in most cases, image = will actually not be an image, but a linear set of values. So I'd like a bit of an update to the logic and variable names to better reflect what's going on.

  3. Finally, while Fix inefficient label mapping in direct color mode (10-20x speedup) #5723 has no bearing on Use a shader for low discrepancy label conversion #3308, this does have a bearing on it — it looks like caching will be beneficial to Use a shader for low discrepancy label conversion #3308 also, except we are caching .astype(np.float32) instead of the complicated legacy logic here. So having a separate PR will make it easier to port those relevant changes to Use a shader for low discrepancy label conversion #3308.

btw, I'm super curious — I thought that this would actually track changed pixels from painting and so on, in which case the speedup would be obvious. But no, it actually checks for changes by doing a full array != value, which is not immediately obviously faster than doing array.astype(). But your benchmarks (and my own experimentation with this PR 🙏) suggest that it definitely is faster! Do I understand correctly that that's what you're doing here?

@jni
Copy link
Member

jni commented Apr 16, 2023

I just saw that indeed this simply has all of the original commits from #5723, so it will be very straightforward to rebase/merge main after that merges. I'm gonna go ahead and do that.

@jni jni added highlight PR that should be mentioned in next release notes performance Relates to performance labels Apr 16, 2023
@jni
Copy link
Member

jni commented Apr 16, 2023

Oh! One more Q: could you please test this solution with zarr, tensorstore, or dask arrays as labels? I'm concerned about some of these operations (boolean indexing) in those contexts...

@jni
Copy link
Member

jni commented Apr 16, 2023

@ksofiyuk are you comfortable with cherry-picking (should be easier than merging here)? If not I can do it for you and force push.

@alisterburt alisterburt removed this from the 0.4 milestone Apr 16, 2023
@kevinyamauchi
Copy link
Contributor

Really nice work. Thank you @ksofiyuk !

@psobolewskiPhD psobolewskiPhD removed the ready to merge Last chance for comments! Will be merged in ~24h label May 12, 2023
jni pushed a commit that referenced this pull request May 25, 2023
…ching) (#5841)

# Description

The labels caching from #5732 broke the "show selected" feature because
of caching. This PR fixes it by invalidating the cache when "show
selected" is activated.

I also added a test that catches this bug.
@Czaki Czaki mentioned this pull request Jun 7, 2023
@jni jni mentioned this pull request Jun 19, 2023
7 tasks
Czaki pushed a commit that referenced this pull request Jun 19, 2023
…x8000 images) (#5732)

# Description

This PR introduces two independent optimizations for labels rendering
during drawing: local updates and caching optimizations.

These optimizations virtually solve any performance issues related to
drawing in the Labels layer. With them, it is possible to achieve 60 FPS
when drawing even in 8000x8000 images.

### Local updates

The initial implementation updates the whole labels map and copies it to
the shader on each brush update, which becomes a major bottleneck when
you work with high-resolution images no matter how effective the labels
mapping implementation is.

In this PR, only partial updates are sent to the VisPy labels layer.
Each time `Layers.data_setitem` is called, it tracks the bounds of
modified region, and instead of calling `Labels.refresh()` that triggers
the update of the whole labels image, it calls
`Labels._partial_labels_refresh` that emits the
`Labels.events.labels_update` event that comes with the slice localizing
the modified region, which is then handled by
`VispyLabelsLayer._on_partial_labels_update`. VisPy textures can be
partially updated, which is used in the `_on_partial_labels_update`
method.

### Caching optimization

The idea is to recompute color mapping only for the elements of a label
map that are changed from a previous update. Even in the parts of the
code where the color mapping is implemented quite efficiently it can
give up to 5x speedup by avoiding slow `np.float32` recomputations for
most pixels.

In practice, when you use brush, less than 1% of pixels are updated at
each iteration (on large images the percentage is even smaller). As a
result, this optimization should work and give a significant boost in
99%+ of typical use case scenarios.

## Benchmark results

I added a new benchmark (benchmark_labels_layer.LabelsDrawing2DSuite)
that measures the timings of brush drawing in different modes
(auto/direct, contour == 0/1) with different brush sizes. It simulates
the brush drawing from the position (0, 0) to (n - 1, n - 1) with 30
refresh updates along the way.

This PR:
```                                                                                                                          
              ====== ============ ============ ============ ============ ============
              --                                  color_mode / contour
              ------------------- ---------------------------------------------------
                n     brush_size    auto / 0     auto / 1    direct / 0   direct / 1
              ====== ============ ============ ============ ============ ============
               512        8        28.2±0.3ms    33.6±1ms     29.2±1ms    32.9±0.7ms
               512        64       22.8±0.1ms   28.1±0.2ms   22.9±0.1ms   29.1±0.3ms
               512       256       128±0.9ms    153±0.7ms    128±0.7ms     153±1ms
               3072       8        147±0.6ms    165±0.7ms     147±3ms     166±0.4ms
               3072       64       131±0.9ms     151±2ms     132±0.8ms     150±1ms
               3072      256        450±2ms      502±2ms     452±0.7ms     501±2ms
              ====== ============ ============ ============ ============ ============
```

main 7b3f7ec:
```
              ====== ============ ============ ============ ============ ============
              --                                  color_mode / contour
              ------------------- ---------------------------------------------------
                n     brush_size    auto / 0     auto / 1    direct / 0   direct / 1
              ====== ============ ============ ============ ============ ============
               512        8        86.6±0.7ms    242±2ms      100±1ms      256±1ms
               512        64        75.6±1ms     230±1ms      90.6±1ms     241±1ms
               512       256       163±0.9ms     316±2ms      180±1ms      328±1ms
               3072       8        1.20±0.01s   6.33±0.02s   1.80±0.02s   6.90±0.02s
               3072       64       1.15±0.01s    6.30±0s      1.77±0s      6.90±0s
               3072      256       1.30±0.01s   6.40±0.01s    1.92±0s     7.04±0.02s
              ====== ============ ============ ============ ============ ============
```

The difference is most noticeable when the heavy labels mapping code is
used (contour=1), with small brush sizes it gives almost 50x performance
boost.

Please keep in mind that this benchmark only measures the time it takes
to update `layer.data` + the time it takes to convert labels to colors
before sending them to VisPy. This benchmark does not take into account
the time it takes to transfer data from napari to OpenGL or the time it
takes to process labels in any type of OpenGL shader. And the local
updates also give a substantial speed up in the part that is not
measured.

## Type of change
<!-- Please delete options that are not relevant. -->
- [X] Optimization (non-breaking change which speedups existing code)

# How has this been tested?
- [X] all tests pass with my change

## Final checklist:
- [X] My PR is the minimum possible work for the desired functionality
- [X] I have commented my code, particularly in hard-to-understand areas
- [X] I have made corresponding changes to the documentation
- [X] I have added tests that prove my fix is effective or that my
feature works
- [ ] If I included new strings, I have used `trans.` to make them
localizable.
For more information see our [translations
guide](https://napari.org/developers/translations.html).

---------

Co-authored-by: alisterburt <alisterburt@gmail.com>
Co-authored-by: Juan Nunez-Iglesias <jni@fastmail.com>
Czaki pushed a commit that referenced this pull request Jun 19, 2023
…ching) (#5841)

# Description

The labels caching from #5732 broke the "show selected" feature because
of caching. This PR fixes it by invalidating the cache when "show
selected" is activated.

I also added a test that catches this bug.
Czaki pushed a commit that referenced this pull request Jun 21, 2023
…x8000 images) (#5732)

# Description

This PR introduces two independent optimizations for labels rendering
during drawing: local updates and caching optimizations.

These optimizations virtually solve any performance issues related to
drawing in the Labels layer. With them, it is possible to achieve 60 FPS
when drawing even in 8000x8000 images.

### Local updates

The initial implementation updates the whole labels map and copies it to
the shader on each brush update, which becomes a major bottleneck when
you work with high-resolution images no matter how effective the labels
mapping implementation is.

In this PR, only partial updates are sent to the VisPy labels layer.
Each time `Layers.data_setitem` is called, it tracks the bounds of
modified region, and instead of calling `Labels.refresh()` that triggers
the update of the whole labels image, it calls
`Labels._partial_labels_refresh` that emits the
`Labels.events.labels_update` event that comes with the slice localizing
the modified region, which is then handled by
`VispyLabelsLayer._on_partial_labels_update`. VisPy textures can be
partially updated, which is used in the `_on_partial_labels_update`
method.

### Caching optimization

The idea is to recompute color mapping only for the elements of a label
map that are changed from a previous update. Even in the parts of the
code where the color mapping is implemented quite efficiently it can
give up to 5x speedup by avoiding slow `np.float32` recomputations for
most pixels.

In practice, when you use brush, less than 1% of pixels are updated at
each iteration (on large images the percentage is even smaller). As a
result, this optimization should work and give a significant boost in
99%+ of typical use case scenarios.

## Benchmark results

I added a new benchmark (benchmark_labels_layer.LabelsDrawing2DSuite)
that measures the timings of brush drawing in different modes
(auto/direct, contour == 0/1) with different brush sizes. It simulates
the brush drawing from the position (0, 0) to (n - 1, n - 1) with 30
refresh updates along the way.

This PR:
```                                                                                                                          
              ====== ============ ============ ============ ============ ============
              --                                  color_mode / contour
              ------------------- ---------------------------------------------------
                n     brush_size    auto / 0     auto / 1    direct / 0   direct / 1
              ====== ============ ============ ============ ============ ============
               512        8        28.2±0.3ms    33.6±1ms     29.2±1ms    32.9±0.7ms
               512        64       22.8±0.1ms   28.1±0.2ms   22.9±0.1ms   29.1±0.3ms
               512       256       128±0.9ms    153±0.7ms    128±0.7ms     153±1ms
               3072       8        147±0.6ms    165±0.7ms     147±3ms     166±0.4ms
               3072       64       131±0.9ms     151±2ms     132±0.8ms     150±1ms
               3072      256        450±2ms      502±2ms     452±0.7ms     501±2ms
              ====== ============ ============ ============ ============ ============
```

main 7b3f7ec:
```
              ====== ============ ============ ============ ============ ============
              --                                  color_mode / contour
              ------------------- ---------------------------------------------------
                n     brush_size    auto / 0     auto / 1    direct / 0   direct / 1
              ====== ============ ============ ============ ============ ============
               512        8        86.6±0.7ms    242±2ms      100±1ms      256±1ms
               512        64        75.6±1ms     230±1ms      90.6±1ms     241±1ms
               512       256       163±0.9ms     316±2ms      180±1ms      328±1ms
               3072       8        1.20±0.01s   6.33±0.02s   1.80±0.02s   6.90±0.02s
               3072       64       1.15±0.01s    6.30±0s      1.77±0s      6.90±0s
               3072      256       1.30±0.01s   6.40±0.01s    1.92±0s     7.04±0.02s
              ====== ============ ============ ============ ============ ============
```

The difference is most noticeable when the heavy labels mapping code is
used (contour=1), with small brush sizes it gives almost 50x performance
boost.

Please keep in mind that this benchmark only measures the time it takes
to update `layer.data` + the time it takes to convert labels to colors
before sending them to VisPy. This benchmark does not take into account
the time it takes to transfer data from napari to OpenGL or the time it
takes to process labels in any type of OpenGL shader. And the local
updates also give a substantial speed up in the part that is not
measured.

## Type of change
<!-- Please delete options that are not relevant. -->
- [X] Optimization (non-breaking change which speedups existing code)

# How has this been tested?
- [X] all tests pass with my change

## Final checklist:
- [X] My PR is the minimum possible work for the desired functionality
- [X] I have commented my code, particularly in hard-to-understand areas
- [X] I have made corresponding changes to the documentation
- [X] I have added tests that prove my fix is effective or that my
feature works
- [ ] If I included new strings, I have used `trans.` to make them
localizable.
For more information see our [translations
guide](https://napari.org/developers/translations.html).

---------

Co-authored-by: alisterburt <alisterburt@gmail.com>
Co-authored-by: Juan Nunez-Iglesias <jni@fastmail.com>
Czaki pushed a commit that referenced this pull request Jun 21, 2023
…ching) (#5841)

# Description

The labels caching from #5732 broke the "show selected" feature because
of caching. This PR fixes it by invalidating the cache when "show
selected" is activated.

I also added a test that catches this bug.
Czaki pushed a commit that referenced this pull request Jun 21, 2023
…x8000 images) (#5732)

# Description

This PR introduces two independent optimizations for labels rendering
during drawing: local updates and caching optimizations.

These optimizations virtually solve any performance issues related to
drawing in the Labels layer. With them, it is possible to achieve 60 FPS
when drawing even in 8000x8000 images.

### Local updates

The initial implementation updates the whole labels map and copies it to
the shader on each brush update, which becomes a major bottleneck when
you work with high-resolution images no matter how effective the labels
mapping implementation is.

In this PR, only partial updates are sent to the VisPy labels layer.
Each time `Layers.data_setitem` is called, it tracks the bounds of
modified region, and instead of calling `Labels.refresh()` that triggers
the update of the whole labels image, it calls
`Labels._partial_labels_refresh` that emits the
`Labels.events.labels_update` event that comes with the slice localizing
the modified region, which is then handled by
`VispyLabelsLayer._on_partial_labels_update`. VisPy textures can be
partially updated, which is used in the `_on_partial_labels_update`
method.

### Caching optimization

The idea is to recompute color mapping only for the elements of a label
map that are changed from a previous update. Even in the parts of the
code where the color mapping is implemented quite efficiently it can
give up to 5x speedup by avoiding slow `np.float32` recomputations for
most pixels.

In practice, when you use brush, less than 1% of pixels are updated at
each iteration (on large images the percentage is even smaller). As a
result, this optimization should work and give a significant boost in
99%+ of typical use case scenarios.

## Benchmark results

I added a new benchmark (benchmark_labels_layer.LabelsDrawing2DSuite)
that measures the timings of brush drawing in different modes
(auto/direct, contour == 0/1) with different brush sizes. It simulates
the brush drawing from the position (0, 0) to (n - 1, n - 1) with 30
refresh updates along the way.

This PR:
```                                                                                                                          
              ====== ============ ============ ============ ============ ============
              --                                  color_mode / contour
              ------------------- ---------------------------------------------------
                n     brush_size    auto / 0     auto / 1    direct / 0   direct / 1
              ====== ============ ============ ============ ============ ============
               512        8        28.2±0.3ms    33.6±1ms     29.2±1ms    32.9±0.7ms
               512        64       22.8±0.1ms   28.1±0.2ms   22.9±0.1ms   29.1±0.3ms
               512       256       128±0.9ms    153±0.7ms    128±0.7ms     153±1ms
               3072       8        147±0.6ms    165±0.7ms     147±3ms     166±0.4ms
               3072       64       131±0.9ms     151±2ms     132±0.8ms     150±1ms
               3072      256        450±2ms      502±2ms     452±0.7ms     501±2ms
              ====== ============ ============ ============ ============ ============
```

main 7b3f7ec:
```
              ====== ============ ============ ============ ============ ============
              --                                  color_mode / contour
              ------------------- ---------------------------------------------------
                n     brush_size    auto / 0     auto / 1    direct / 0   direct / 1
              ====== ============ ============ ============ ============ ============
               512        8        86.6±0.7ms    242±2ms      100±1ms      256±1ms
               512        64        75.6±1ms     230±1ms      90.6±1ms     241±1ms
               512       256       163±0.9ms     316±2ms      180±1ms      328±1ms
               3072       8        1.20±0.01s   6.33±0.02s   1.80±0.02s   6.90±0.02s
               3072       64       1.15±0.01s    6.30±0s      1.77±0s      6.90±0s
               3072      256       1.30±0.01s   6.40±0.01s    1.92±0s     7.04±0.02s
              ====== ============ ============ ============ ============ ============
```

The difference is most noticeable when the heavy labels mapping code is
used (contour=1), with small brush sizes it gives almost 50x performance
boost.

Please keep in mind that this benchmark only measures the time it takes
to update `layer.data` + the time it takes to convert labels to colors
before sending them to VisPy. This benchmark does not take into account
the time it takes to transfer data from napari to OpenGL or the time it
takes to process labels in any type of OpenGL shader. And the local
updates also give a substantial speed up in the part that is not
measured.

## Type of change
<!-- Please delete options that are not relevant. -->
- [X] Optimization (non-breaking change which speedups existing code)

# How has this been tested?
- [X] all tests pass with my change

## Final checklist:
- [X] My PR is the minimum possible work for the desired functionality
- [X] I have commented my code, particularly in hard-to-understand areas
- [X] I have made corresponding changes to the documentation
- [X] I have added tests that prove my fix is effective or that my
feature works
- [ ] If I included new strings, I have used `trans.` to make them
localizable.
For more information see our [translations
guide](https://napari.org/developers/translations.html).

---------

Co-authored-by: alisterburt <alisterburt@gmail.com>
Co-authored-by: Juan Nunez-Iglesias <jni@fastmail.com>
Czaki pushed a commit that referenced this pull request Jun 21, 2023
…ching) (#5841)

# Description

The labels caching from #5732 broke the "show selected" feature because
of caching. This PR fixes it by invalidating the cache when "show
selected" is activated.

I also added a test that catches this bug.
Czaki pushed a commit that referenced this pull request Jun 21, 2023
…x8000 images) (#5732)

# Description

This PR introduces two independent optimizations for labels rendering
during drawing: local updates and caching optimizations.

These optimizations virtually solve any performance issues related to
drawing in the Labels layer. With them, it is possible to achieve 60 FPS
when drawing even in 8000x8000 images.

### Local updates

The initial implementation updates the whole labels map and copies it to
the shader on each brush update, which becomes a major bottleneck when
you work with high-resolution images no matter how effective the labels
mapping implementation is.

In this PR, only partial updates are sent to the VisPy labels layer.
Each time `Layers.data_setitem` is called, it tracks the bounds of
modified region, and instead of calling `Labels.refresh()` that triggers
the update of the whole labels image, it calls
`Labels._partial_labels_refresh` that emits the
`Labels.events.labels_update` event that comes with the slice localizing
the modified region, which is then handled by
`VispyLabelsLayer._on_partial_labels_update`. VisPy textures can be
partially updated, which is used in the `_on_partial_labels_update`
method.

### Caching optimization

The idea is to recompute color mapping only for the elements of a label
map that are changed from a previous update. Even in the parts of the
code where the color mapping is implemented quite efficiently it can
give up to 5x speedup by avoiding slow `np.float32` recomputations for
most pixels.

In practice, when you use brush, less than 1% of pixels are updated at
each iteration (on large images the percentage is even smaller). As a
result, this optimization should work and give a significant boost in
99%+ of typical use case scenarios.

## Benchmark results

I added a new benchmark (benchmark_labels_layer.LabelsDrawing2DSuite)
that measures the timings of brush drawing in different modes
(auto/direct, contour == 0/1) with different brush sizes. It simulates
the brush drawing from the position (0, 0) to (n - 1, n - 1) with 30
refresh updates along the way.

This PR:
```                                                                                                                          
              ====== ============ ============ ============ ============ ============
              --                                  color_mode / contour
              ------------------- ---------------------------------------------------
                n     brush_size    auto / 0     auto / 1    direct / 0   direct / 1
              ====== ============ ============ ============ ============ ============
               512        8        28.2±0.3ms    33.6±1ms     29.2±1ms    32.9±0.7ms
               512        64       22.8±0.1ms   28.1±0.2ms   22.9±0.1ms   29.1±0.3ms
               512       256       128±0.9ms    153±0.7ms    128±0.7ms     153±1ms
               3072       8        147±0.6ms    165±0.7ms     147±3ms     166±0.4ms
               3072       64       131±0.9ms     151±2ms     132±0.8ms     150±1ms
               3072      256        450±2ms      502±2ms     452±0.7ms     501±2ms
              ====== ============ ============ ============ ============ ============
```

main 7b3f7ec:
```
              ====== ============ ============ ============ ============ ============
              --                                  color_mode / contour
              ------------------- ---------------------------------------------------
                n     brush_size    auto / 0     auto / 1    direct / 0   direct / 1
              ====== ============ ============ ============ ============ ============
               512        8        86.6±0.7ms    242±2ms      100±1ms      256±1ms
               512        64        75.6±1ms     230±1ms      90.6±1ms     241±1ms
               512       256       163±0.9ms     316±2ms      180±1ms      328±1ms
               3072       8        1.20±0.01s   6.33±0.02s   1.80±0.02s   6.90±0.02s
               3072       64       1.15±0.01s    6.30±0s      1.77±0s      6.90±0s
               3072      256       1.30±0.01s   6.40±0.01s    1.92±0s     7.04±0.02s
              ====== ============ ============ ============ ============ ============
```

The difference is most noticeable when the heavy labels mapping code is
used (contour=1), with small brush sizes it gives almost 50x performance
boost.

Please keep in mind that this benchmark only measures the time it takes
to update `layer.data` + the time it takes to convert labels to colors
before sending them to VisPy. This benchmark does not take into account
the time it takes to transfer data from napari to OpenGL or the time it
takes to process labels in any type of OpenGL shader. And the local
updates also give a substantial speed up in the part that is not
measured.

## Type of change
<!-- Please delete options that are not relevant. -->
- [X] Optimization (non-breaking change which speedups existing code)

# How has this been tested?
- [X] all tests pass with my change

## Final checklist:
- [X] My PR is the minimum possible work for the desired functionality
- [X] I have commented my code, particularly in hard-to-understand areas
- [X] I have made corresponding changes to the documentation
- [X] I have added tests that prove my fix is effective or that my
feature works
- [ ] If I included new strings, I have used `trans.` to make them
localizable.
For more information see our [translations
guide](https://napari.org/developers/translations.html).

---------

Co-authored-by: alisterburt <alisterburt@gmail.com>
Co-authored-by: Juan Nunez-Iglesias <jni@fastmail.com>
Czaki pushed a commit that referenced this pull request Jun 21, 2023
…ching) (#5841)

# Description

The labels caching from #5732 broke the "show selected" feature because
of caching. This PR fixes it by invalidating the cache when "show
selected" is activated.

I also added a test that catches this bug.
@imagesc-bot
Copy link

This pull request has been mentioned on Image.sc Forum. There might be relevant details there:

https://forum.image.sc/t/announcement-napari-0-4-18-released/83322/1

jni added a commit that referenced this pull request Aug 25, 2023
Fixes #6079

It turns out that the caching behaviour introduced in #5732 depends on the
slice data being updated by painting. This works out for NumPy arrays because
the slice data is a view of the original data, so updating the original (as
painting does) updates the slice. However, when the data is a zarr or
tensorstore array, the slice is a NumPy copy of the original data, so the
caching mechanism believes that nothing has changed and the display is not
updated.

This adds tests for the behaviour and fixes it by painting directly into the
slice data if the data array is not a NumPy array. It's a bit of a bandaid fix
but it works and is
[endorsed](#6079 (comment))
by our slicing expert @andy-sweet. 😂

(I've also made a couple of drive-by updates to the code because some
methods are no longer used in the code after #5732 but that was missed at the
time.)

## Type of change

- [x] Bug-fix (non-breaking change which fixes an issue)

---------

Co-authored-by: Grzegorz Bokota <bokota+github@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Czaki added a commit that referenced this pull request Oct 17, 2023
Fixes #6079

It turns out that the caching behaviour introduced in #5732 depends on the
slice data being updated by painting. This works out for NumPy arrays because
the slice data is a view of the original data, so updating the original (as
painting does) updates the slice. However, when the data is a zarr or
tensorstore array, the slice is a NumPy copy of the original data, so the
caching mechanism believes that nothing has changed and the display is not
updated.

This adds tests for the behaviour and fixes it by painting directly into the
slice data if the data array is not a NumPy array. It's a bit of a bandaid fix
but it works and is
[endorsed](#6079 (comment))
by our slicing expert @andy-sweet. 😂

(I've also made a couple of drive-by updates to the code because some
methods are no longer used in the code after #5732 but that was missed at the
time.)

- [x] Bug-fix (non-breaking change which fixes an issue)

---------

Co-authored-by: Grzegorz Bokota <bokota+github@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
@ksofiyuk ksofiyuk deleted the efficient_label_mapping2 branch January 8, 2024 10:26
Czaki added a commit that referenced this pull request Jan 22, 2024
# References and relevant issues

closes #6579
supersedes #6583

# Description

#5732 introduced a cache of mapped data so that only changed indices
were mapped to texture dtypes/values and sent on to the GPU. In this PR,
an alternate strategy is introduced: rather than caching
previously-transformed data and then doing a diff with the cache, we
paint the data *and* the texture-mapped data directly.

The partial update of the on-GPU texture also introduced in #5732 is
maintained, as it can dramatically reduce the amount of data needing to
be transferred from CPU to GPU memory.

This PR is built on top of #6602.

---------

Co-authored-by: Juan Nunez-Iglesias <jni@fastmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Czaki added a commit that referenced this pull request Jan 23, 2024
closes #6579
supersedes #6583

were mapped to texture dtypes/values and sent on to the GPU. In this PR,
an alternate strategy is introduced: rather than caching
previously-transformed data and then doing a diff with the cache, we
paint the data *and* the texture-mapped data directly.

The partial update of the on-GPU texture also introduced in #5732 is
maintained, as it can dramatically reduce the amount of data needing to
be transferred from CPU to GPU memory.

This PR is built on top of #6602.

---------

Co-authored-by: Juan Nunez-Iglesias <jni@fastmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
highlight PR that should be mentioned in next release notes performance Relates to performance tests Something related to our tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants