Efficient labels mapping for drawing in Labels (60 FPS even with 8000x8000 images) #5732

ksofiyuk · 2023-04-15T08:58:58Z

Description

This PR introduces two independent optimizations for labels rendering during drawing: local updates and caching optimizations.

These optimizations virtually solve any performance issues related to drawing in the Labels layer. With them, it is possible to achieve 60 FPS when drawing even in 8000x8000 images.

Local updates

The initial implementation updates the whole labels map and copies it to the shader on each brush update, which becomes a major bottleneck when you work with high-resolution images no matter how effective the labels mapping implementation is.

In this PR, only partial updates are sent to the VisPy labels layer. Each time Layers.data_setitem is called, it tracks the bounds of modified region, and instead of calling Labels.refresh() that triggers the update of the whole labels image, it calls Labels._partial_labels_refresh that emits the Labels.events.labels_update event that comes with the slice localizing the modified region, which is then handled by VispyLabelsLayer._on_partial_labels_update. VisPy textures can be partially updated, which is used in the _on_partial_labels_update method.

Caching optimization

The idea is to recompute color mapping only for the elements of a label map that are changed from a previous update. Even in the parts of the code where the color mapping is implemented quite efficiently it can give up to 5x speedup by avoiding slow np.float32 recomputations for most pixels.

In practice, when you use brush, less than 1% of pixels are updated at each iteration (on large images the percentage is even smaller). As a result, this optimization should work and give a significant boost in 99%+ of typical use case scenarios.

Benchmark results

I added a new benchmark (benchmark_labels_layer.LabelsDrawing2DSuite) that measures the timings of brush drawing in different modes (auto/direct, contour == 0/1) with different brush sizes. It simulates the brush drawing from the position (0, 0) to (n - 1, n - 1) with 30 refresh updates along the way.

This PR:

              ====== ============ ============ ============ ============ ============
              --                                  color_mode / contour
              ------------------- ---------------------------------------------------
                n     brush_size    auto / 0     auto / 1    direct / 0   direct / 1
              ====== ============ ============ ============ ============ ============
               512        8        28.2±0.3ms    33.6±1ms     29.2±1ms    32.9±0.7ms
               512        64       22.8±0.1ms   28.1±0.2ms   22.9±0.1ms   29.1±0.3ms
               512       256       128±0.9ms    153±0.7ms    128±0.7ms     153±1ms
               3072       8        147±0.6ms    165±0.7ms     147±3ms     166±0.4ms
               3072       64       131±0.9ms     151±2ms     132±0.8ms     150±1ms
               3072      256        450±2ms      502±2ms     452±0.7ms     501±2ms
              ====== ============ ============ ============ ============ ============

main 7b3f7ec:

              ====== ============ ============ ============ ============ ============
              --                                  color_mode / contour
              ------------------- ---------------------------------------------------
                n     brush_size    auto / 0     auto / 1    direct / 0   direct / 1
              ====== ============ ============ ============ ============ ============
               512        8        86.6±0.7ms    242±2ms      100±1ms      256±1ms
               512        64        75.6±1ms     230±1ms      90.6±1ms     241±1ms
               512       256       163±0.9ms     316±2ms      180±1ms      328±1ms
               3072       8        1.20±0.01s   6.33±0.02s   1.80±0.02s   6.90±0.02s
               3072       64       1.15±0.01s    6.30±0s      1.77±0s      6.90±0s
               3072      256       1.30±0.01s   6.40±0.01s    1.92±0s     7.04±0.02s
              ====== ============ ============ ============ ============ ============

The difference is most noticeable when the heavy labels mapping code is used (contour=1), with small brush sizes it gives almost 50x performance boost.

Please keep in mind that this benchmark only measures the time it takes to update layer.data + the time it takes to convert labels to colors before sending them to VisPy. This benchmark does not take into account the time it takes to transfer data from napari to OpenGL or the time it takes to process labels in any type of OpenGL shader. And the local updates also give a substantial speed up in the part that is not measured.

Type of change

Optimization (non-breaking change which speedups existing code)

How has this been tested?

all tests pass with my change

Final checklist:

My PR is the minimum possible work for the desired functionality
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
I have added tests that prove my fix is effective or that my feature works
If I included new strings, I have used trans. to make them localizable.
For more information see our translations guide.

alisterburt

very nice work @ksofiyuk !

Changes look great to me, typing test failure seems unrelated (as in #5723) - for me this is ready to merge

supersedes #5723

napari/layers/labels/labels.py

ksofiyuk · 2023-04-15T09:49:10Z

@alisterburt Actually, there is a small problem. It failed some tests for the Label layer. It turned out that this optimization cannot be easily applied when Layer.contour > 0 as morphological operations are performed on the raw array for computing contours. I'll disable it when Layer.contour > 0.

…k/napari into efficient_label_mapping2

alisterburt · 2023-04-15T10:18:30Z

ooh, I just saw you brought it back, could you explain how this is working?

ksofiyuk · 2023-04-15T10:19:49Z

I found out that there is no need to disable it when contour > 0, it is enough just to move the cache initialization after the if block, in which contours are computed. However, computing contours at each update makes caching much less useful, but it should give some gain anyway.

codecov · 2023-04-15T10:54:30Z

Codecov Report

Merging #5732 (570aa2a) into main (7b3f7ec) will increase coverage by 0.01%.
The diff coverage is 96.03%.

@@            Coverage Diff             @@
##             main    #5732      +/-   ##
==========================================
+ Coverage   89.89%   89.91%   +0.01%     
==========================================
  Files         614      615       +1     
  Lines       52283    52474     +191     
==========================================
+ Hits        47002    47180     +178     
- Misses       5281     5294      +13

Impacted Files	Coverage Δ
napari/_vispy/layers/labels.py	`93.33% <90.90%> (-6.67%)`	⬇️
napari/layers/labels/labels.py	`95.86% <93.65%> (+0.37%)`	⬆️
napari/layers/labels/_labels_utils.py	`95.83% <100.00%> (+0.91%)`	⬆️
napari/layers/labels/_tests/test_labels.py	`99.51% <100.00%> (+0.02%)`	⬆️
napari/utils/colormaps/colormap_utils.py	`93.30% <100.00%> (+0.03%)`	⬆️

... and 17 files with indirect coverage changes

Czaki · 2023-04-15T17:53:06Z

@alisterburt Labels and milestone (are you sure for 0.4)?

jni

Couple of comments here,

Fix inefficient label mapping in direct color mode (10-20x speedup) #5723 seems to be reasonably independent of this — could we merge that first? It will make the changes more granular and easier to review — Fix inefficient label mapping in direct color mode (10-20x speedup) #5723 was already ready to merge so I kinda wanna get that in while we review this. Does that work @ksofiyuk? They look to me like they sort of touch different parts of the file. But,
Another reason why I want to have the two reviews separately is that I think there should be a bit more of a change in the middle to make it clearer what's going on. For example, I don't like reusing variable names (raw_modified = raw_modified[changed_mask]), as it makes it harder to reason about the code (is this image shaped or a linear array of changed values?). I also don't like that now, in most cases, image = will actually not be an image, but a linear set of values. So I'd like a bit of an update to the logic and variable names to better reflect what's going on.
Finally, while Fix inefficient label mapping in direct color mode (10-20x speedup) #5723 has no bearing on Use a shader for low discrepancy label conversion #3308, this does have a bearing on it — it looks like caching will be beneficial to Use a shader for low discrepancy label conversion #3308 also, except we are caching .astype(np.float32) instead of the complicated legacy logic here. So having a separate PR will make it easier to port those relevant changes to Use a shader for low discrepancy label conversion #3308.

btw, I'm super curious — I thought that this would actually track changed pixels from painting and so on, in which case the speedup would be obvious. But no, it actually checks for changes by doing a full array != value, which is not immediately obviously faster than doing array.astype(). But your benchmarks (and my own experimentation with this PR 🙏) suggest that it definitely is faster! Do I understand correctly that that's what you're doing here?

jni · 2023-04-16T08:33:22Z

I just saw that indeed this simply has all of the original commits from #5723, so it will be very straightforward to rebase/merge main after that merges. I'm gonna go ahead and do that.

jni · 2023-04-16T08:38:00Z

Oh! One more Q: could you please test this solution with zarr, tensorstore, or dask arrays as labels? I'm concerned about some of these operations (boolean indexing) in those contexts...

jni · 2023-04-16T08:41:22Z

@ksofiyuk are you comfortable with cherry-picking (should be easier than merging here)? If not I can do it for you and force push.

kevinyamauchi · 2023-05-12T13:01:59Z

Really nice work. Thank you @ksofiyuk !