Making Points layer creation and update faster #6727

jules-vanaret · 2024-03-07T18:28:57Z

References and relevant issues

I work with large points datasets (~10⁶ vertices), and creating/setting data of a Points layer is rather slow, on the order of 2 seconds on my machine). I have dug into the code and found that the culprit is this function to handle symbols associated to points in napari/layers/points/_points_utils.py, which takes around 80% of the full computation time:

def coerce_symbols(array: Iterable) -> np.ndarray:
    """
    Parse an array of symbols and convert it to the correct strings.

    Ensures that all strings are valid symbols and converts aliases.

    Parameters
    ----------
    array : np.ndarray
        Array of strings matching Symbol values.
    """
    # dtype has to be object, otherwise np.vectorize will cut it down to `U(N)`,
    # where N is the biggest string currently in the array.
    array = [SYMBOL_ALIAS.get(k, k) for k in (str(x).lower() for x in array)]
    return np.vectorize(Symbol, otypes=[object])(array)

The purpose of coerce_symbols is to transform "raw" (e.g string like 'o') points symbols into proper Symbol instances.

Note that even if symbol is not specified when creating the Points layer (and then it is just equal to 'o'), or specified with a unique symbol for all points, the symbol setter in napari/layers/points/points.py:

@symbol.setter
def symbol(self, symbol: Union[str, np.ndarray, list]) -> None:
    symbol = np.broadcast_to(symbol, self.data.shape[0])

immediately transform this unique symbol into a potentially huge numpy array of size N_vertices.

The main reason for the fact that coerce_symbols is slow in the case of large N_vertices is because it loops twice through this huge numpy array of (potentially simply duplicated) symbols: once with (str(x).lower() for x in array) and more or less once with np.vectorize(Symbol, otypes=[object])(array). I believe this can be improved with minor changes

Description

The PR stems from two ideas to reduce computation time.

if the user provides no symbol when creating the Point layer, or provides a single string or Symbol instance, it is much faster to first convert it to a proper Symbol instance and THEN to broadcast it to an array of size N_vertices.
if the user provides a list of N_vertices symbols for the points (composed of strings or Symbol instances), there are not that many valid symbols (currently 14 different ones) that they can choose from. This means that instead of converting each string symbol into a Symbol instance individually, it is faster to first find all unique raw symbols from symbol provided by the user, convert them in to Symbol, and then simply map each raw symbol to its corresponding Symbol instance.

The new coerce_symbols function I propose looks like this:

def coerce_symbols(symbol: Union[str, Symbol, np.ndarray, list]) -> np.ndarray:
    """
    Parse an array of symbols and convert it to the correct strings.

    Ensures that all strings are valid symbols and converts aliases.

    Parameters
    ----------
    array : np.ndarray
        Array of strings matching Symbol values.
    """
    # if symbol is a unique string or Symbol instance, convert it to a
    # proper Symbol instance
    if isinstance(symbol, (str, Symbol)):
        return np.array(symbol_conversion(symbol), dtype=object)

    # otherwise, create a dictionary to map raw symbols to their
    # Symbol instance counterpart
    symbol_dict = create_symbol_dict(symbol)
    # then use a vectorized "dict.get" to convert the raw symbols to
    # their Symbol instance counterpart quickly
    return fast_dict_get(symbol, symbol_dict)

Instead of operating on an already-broadcasted array of symbols, it can take as input either a single symbol, or a list of symbols.

The symbol setter in napari/layers/points/points.py is modified to look like this:

@symbol.setter
def symbol(self, symbol: Union[str, np.ndarray, list]) -> None:
    coerced_symbols = coerce_symbols(symbol)
    coerced_symbols = np.broadcast_to(
        coerced_symbols, self.data.shape[0]
    )

It now tries to broadcast symbols to the number of vertices AFTER converting them to proper Symbol instances.

Finally, coerce_symbols is also used in the current_symbol setter, but this new implementation does not interfere with its current behavior.

Performance changes

Qualitative performance changes can be assessed using the following script, which simulates a random points dataset with N_vertices vertices.

import numpy as np
import napari
from time import time
from napari.layers.points._points_constants import SYMBOL_ALIAS, Symbol
    
np.random.seed(2024)

N_vertices = 1_000_000
symbols = list(SYMBOL_ALIAS.keys())
symbols = np.random.choice(symbols, N_vertices).tolist()

# simulate 3D points with 4th dimension (time)
points = np.random.rand(N_vertices, 4) * 100

t0 = time()
points_layer = napari.layers.Points(points, name='points')
print(f"Creating layer with {N_vertices} vertices: {time() - t0:.4f} seconds")

t0 = time()
points_layer = napari.layers.Points(points, name='points', symbol=symbols)
print(f"Creating layer with {N_vertices} vertices and as many symbols: {time() - t0:.4f} seconds")

t0 = time()
points_layer = napari.layers.Points(points, name='points', symbol=Symbol.RING)
print(f"Creating layer with {N_vertices} vertices and a Symbol instance: {time() - t0:.4f} seconds")

For different values of N_vertices, I time the following duration to define the Points layer:

	Old implementation	New implementation
`N_vertices = 10_000`	0.06s, 0.04s, 0.03s	0.04s, 0.0075s, 0.0065s
`N_vertices = 1_000_000`	1.35s, 1.45s, 1.05s	0.24s, 0.24s, 0.20s
`N_vertices = 10_000_000`	13.3s, 14.0s, 18.0s	1.94s, 2.48s, 1.92s

Note that the proposed implementation has a much nicer scaling with the number of vertices.

Benchmarks

From what I have been able to see, the current benchmarking suite already computes the time it takes to set the data of a Point layer, so there is no need to test for anything else.

Final Checklist

My PR is the minimum possible work for the desired functionality
I have commented my code, particularly in hard-to-understand areas
I have added tests that prove my fix is effective or that my feature works

jules-vanaret · 2024-03-07T19:00:40Z

I have realized that broadcast_to was also used as a check that the input symbol parameter had the right shape if it was given as a list or an array. I have added it back (not only for unique symbols).

codecov · 2024-03-07T19:43:36Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 92.44%. Comparing base (1dcbade) to head (762426e).
Report is 2 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #6727      +/-   ##
==========================================
- Coverage   92.45%   92.44%   -0.01%     
==========================================
  Files         617      617              
  Lines       55166    55181      +15     
==========================================
+ Hits        51001    51010       +9     
- Misses       4165     4171       +6

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

psobolewskiPhD · 2024-03-07T20:32:34Z

This looks to be addressing: #6275
Have you checked with vispy main vispy/vispy#2533 ?

DragaDoncila · 2024-03-08T05:24:28Z

Thanks for the fantastic PR description @jules-vanaret, and for your attention on the points layer 👀

This looks to be addressing: #6275

@psobolewskiPhD yep I'd say you're right. Based on the profiling and discussion in that thread it looks like both the vispy.set_data and our own coerce_symbols could be improved. Unless vispy/#2533 removes the need for us to coerce_symbols, it sounds like this PR would be an improvement regardless of the vispy PR?

jules-vanaret · 2024-03-08T09:46:51Z

Thank you both for taking a look!
@psobolewskiPhD all the tests were made on a fresh environment, with Vispy 0.14.1, which I believe is the most recent released version. Do you think a new version of Vispy in the near future will make this PR obsolete ?

Czaki · 2024-03-08T09:52:36Z

@brisvag Did you know what is required to release vispy 0.14.2?

brisvag · 2024-03-12T10:48:23Z

Nice work, I was never happy about this code ^^' @Czaki releasing a patch version for vispy can be done whenever we like I think.

Czaki · 2024-03-12T10:52:57Z

So maybe you could make release, and we could check how this PR works with the new vispy?

brisvag · 2024-03-14T16:50:50Z

Vispy 0.14.2 released :)

psobolewskiPhD · 2024-03-14T17:26:03Z

FYI I ran the script from the OP (n= 10 mil) with newest vispy and psygnal and I get:

Creating layer with 10000000 vertices: 11.0905 seconds
Creating layer with 10000000 vertices and as many symbols: 12.3066 seconds
Creating layer with 10000000 vertices and a Symbol instance: 7.7045 seconds

Virtually the same prior to updating them. However with this PR:

Creating layer with 10000000 vertices: 0.9560 seconds
Creating layer with 10000000 vertices and as many symbols: 1.3793 seconds
Creating layer with 10000000 vertices and a Symbol instance: 0.8674 seconds

Adding a layer using viewer.add_points(data) (data = rng.random(size=(10_000_000, 2)) * 1000) takes <10 s, which is >10s faster than the same on main.

jules-vanaret · 2024-03-14T17:32:48Z

Whoops you beat me to it ! :)
I report the same orders of magnitude on my end with the newest Vispy.

psobolewskiPhD

This looks great to me!
But I'd like @Czaki who wrote the previous implementation to take a look too.

jules-vanaret · 2024-04-05T16:44:19Z

@Czaki should I run additional benchmarks or add comments to the code ?

Czaki · 2024-04-05T20:27:31Z

I will try to review tomorrow. Please resolve the conflicts

for more information, see https://pre-commit.ci

DragaDoncila · 2024-04-17T01:48:12Z

@Czaki any further comments on this one?

napari/layers/points/_points_utils.py

Co-authored-by: Grzegorz Bokota <bokota+github@gmail.com>

github-actions · 2024-04-30T13:40:21Z

The Qt benchmark run requested by PR #6727 (e37f69c vs 2a22052) has finished with status 'failure'. See the CI logs and artifacts for further details.
The non-Qt benchmark run requested by PR #6727 (e37f69c vs 2a22052) has finished with status 'failure'. See the CI logs and artifacts for further details.

github-actions · 2024-04-30T19:15:57Z

The Qt benchmark run requested by PR #6727 (f41f486 vs 2a22052) has finished with status 'failure'. See the CI logs and artifacts for further details.
The non-Qt benchmark run requested by PR #6727 (f41f486 vs 2a22052) has finished with status 'failure'. See the CI logs and artifacts for further details.

github-actions · 2024-04-30T22:22:55Z

The Qt benchmark run requested by PR #6727 (f41f486 vs 2a22052) has finished with status 'failure'. See the CI logs and artifacts for further details.
The non-Qt benchmark run requested by PR #6727 (f41f486 vs 2a22052) has finished with status 'failure'. See the CI logs and artifacts for further details.

Czaki · 2024-04-30T22:34:03Z

It looks like a global translation dict solve the performance problems:

| Change   | Before [2a220521]    | After [f41f4862]    |   Ratio | Benchmark (Parameter)                                          |
|----------|----------------------|---------------------|---------|----------------------------------------------------------------|
| -        | 16.2±0.2ms           | 9.19±0.1ms          |    0.57 | benchmark_points_layer.Points3DSuite.time_create_layer(4096)   |
| -        | 15.5±0.3ms           | 8.24±0.04ms         |    0.53 | benchmark_points_layer.Points2DSuite.time_create_layer(4096)   |
| -        | 40.8±0.9ms           | 12.5±0.06ms         |    0.31 | benchmark_points_layer.Points2DSuite.time_create_layer(16384)  |
| -        | 40.8±0.3ms           | 12.5±0.1ms          |    0.31 | benchmark_points_layer.Points3DSuite.time_create_layer(16384)  |
| -        | 141±2ms              | 30.0±0.6ms          |    0.21 | benchmark_points_layer.Points2DSuite.time_create_layer(65536)  |
| -        | 138±0.9ms            | 27.4±0.6ms          |    0.2  | benchmark_points_layer.Points3DSuite.time_create_layer(65536)  |
| -        | 1.66±0.01s           | 62.4±1ms            |    0.04 | benchmark_python_layer.CoerceSymbolsSuite.time_coerce_symbols1 |
| -        | 1.66±0.01s           | 63.3±0.5ms          |    0.04 | benchmark_python_layer.CoerceSymbolsSuite.time_coerce_symbols2 |

@jules-vanaret are you ok with these changes?

For typing, I prefer to wait on #6866 as it will bump mypy and current output is false negative.

Co-authored-by: Grzegorz Bokota <bokota+github@gmail.com>

for more information, see https://pre-commit.ci

jules-vanaret · 2024-05-01T17:09:00Z

Sorry for the delay. Thank you for your work @Czaki, precomputing the dict makes much more sense ! Some tests are still failing though.

Czaki · 2024-05-01T22:37:20Z

Yes, but I prefer to merge first #6866, as it will impact failing test.

github-actions · 2024-05-07T03:46:48Z

The Qt benchmark run requested by PR #6727 (0b86e44 vs 1dcbade) has finished with status 'success'. See the CI logs and artifacts for further details.
The non-Qt benchmark run requested by PR #6727 (0b86e44 vs 1dcbade) has finished with status 'success'. See the CI logs and artifacts for further details.

Signed-off-by: Grzegorz Bokota <bokota+github@gmail.com>

Czaki · 2024-05-07T06:15:52Z

@jules-vanaret Could you check current state? I have cleanup code a little and tweak documentation. For me, it is close to merging.

@psobolewskiPhD could you also check?

psobolewskiPhD · 2024-05-13T00:42:39Z

Everything still looks good to me.
it's even faster:

Creating layer with 1000000 vertices: 0.1736 seconds
Creating layer with 1000000 vertices and as many symbols: 0.3118 seconds
Creating layer with 1000000 vertices and a Symbol instance: 0.0992 seconds

jules-vanaret · 2024-05-13T10:07:43Z

I just rechecked the code and some cases are indeed faster (cases that are not are marginally so).
From my point of view, everything seems ok!

jules-vanaret · 2024-05-15T15:35:55Z

Thanks @jni for merging the PR! :)
And thank you @brisvag for the Vispy push, @psobolewskiPhD for the thorough benchmarking, and particularly @Czaki for all your work on optimization, clean up, and general coordination of the PR.

jules-vanaret added 3 commits March 7, 2024 18:23

reordered SYMBOL_ALIAS alphabetically to match Symbol class

d313e06

implemented faster symbols coercing

26a0ccc

added comments

64828e3

jules-vanaret changed the title ~~Points setter optim~~ Making Points layer creation and update faster Mar 7, 2024

simplified function

f544631

psobolewskiPhD added performance Relates to performance highlight PR that should be mentioned in next release notes labels Mar 14, 2024

psobolewskiPhD mentioned this pull request Mar 14, 2024

Adding one layer with many points is surprisingly slow #6275

Open

psobolewskiPhD approved these changes Mar 19, 2024

View reviewed changes

psobolewskiPhD requested a review from Czaki March 19, 2024 02:58

psobolewskiPhD added this to the 0.5.0 milestone Mar 21, 2024

jules-vanaret and others added 2 commits April 8, 2024 15:38

Merge branch 'main' into points_setter_optim

38075dd

[pre-commit.ci] auto fixes from pre-commit.com hooks

d31d6c1

for more information, see https://pre-commit.ci

Merge branch 'main' into points_setter_optim

7fc82b1

Czaki reviewed Apr 17, 2024

View reviewed changes

napari/layers/points/_points_utils.py Outdated Show resolved Hide resolved

napari/layers/points/_points_utils.py Outdated Show resolved Hide resolved

napari/layers/points/_points_utils.py Outdated Show resolved Hide resolved

Update napari/layers/points/_points_utils.py

d25f288

Co-authored-by: Grzegorz Bokota <bokota+github@gmail.com>

Czaki mentioned this pull request Apr 30, 2024

Fix benchmark reporting #6877

Merged

try global dict

e153c88

Czaki added run-benchmarks Add this label to trigger a full benchmark run in a PR and removed run-benchmarks Add this label to trigger a full benchmark run in a PR labels Apr 30, 2024

jules-vanaret and others added 2 commits May 1, 2024 19:07

Apply suggestions from code review

54cfab8

Co-authored-by: Grzegorz Bokota <bokota+github@gmail.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

e1d9bf6

for more information, see https://pre-commit.ci

Czaki added 3 commits May 2, 2024 13:59

Merge branch 'main' into points_setter_optim

ffa174e

Merge branch 'main' into points_setter_optim

a70d06c

silence mypy

a6a22d1

Czaki added run-benchmarks Add this label to trigger a full benchmark run in a PR and removed run-benchmarks Add this label to trigger a full benchmark run in a PR labels May 7, 2024

Czaki added ready to merge Last chance for comments! Will be merged in ~24h and removed run-benchmarks Add this label to trigger a full benchmark run in a PR ready to merge Last chance for comments! Will be merged in ~24h labels May 7, 2024

cleanup of code

762426e

Signed-off-by: Grzegorz Bokota <bokota+github@gmail.com>

Czaki added the ready to merge Last chance for comments! Will be merged in ~24h label May 13, 2024

brisvag approved these changes May 13, 2024

View reviewed changes

jni merged commit 0e6f0af into napari:main May 15, 2024
37 checks passed

github-actions bot removed the ready to merge Last chance for comments! Will be merged in ~24h label May 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Making Points layer creation and update faster #6727

Making Points layer creation and update faster #6727

jules-vanaret commented Mar 7, 2024 •

edited

jules-vanaret commented Mar 7, 2024

codecov bot commented Mar 7, 2024 •

edited

psobolewskiPhD commented Mar 7, 2024

DragaDoncila commented Mar 8, 2024

jules-vanaret commented Mar 8, 2024

Czaki commented Mar 8, 2024

brisvag commented Mar 12, 2024

Czaki commented Mar 12, 2024

brisvag commented Mar 14, 2024

psobolewskiPhD commented Mar 14, 2024

jules-vanaret commented Mar 14, 2024

psobolewskiPhD left a comment

jules-vanaret commented Apr 5, 2024

Czaki commented Apr 5, 2024

DragaDoncila commented Apr 17, 2024

github-actions bot commented Apr 30, 2024

github-actions bot commented Apr 30, 2024

github-actions bot commented Apr 30, 2024

Czaki commented Apr 30, 2024

jules-vanaret commented May 1, 2024 •

edited

Czaki commented May 1, 2024

github-actions bot commented May 7, 2024

Czaki commented May 7, 2024

psobolewskiPhD commented May 13, 2024

jules-vanaret commented May 13, 2024

jules-vanaret commented May 15, 2024

Making Points layer creation and update faster #6727

Making Points layer creation and update faster #6727

Conversation

jules-vanaret commented Mar 7, 2024 • edited

References and relevant issues

Description

Performance changes

Benchmarks

Final Checklist

jules-vanaret commented Mar 7, 2024

codecov bot commented Mar 7, 2024 • edited

Codecov Report

psobolewskiPhD commented Mar 7, 2024

DragaDoncila commented Mar 8, 2024

jules-vanaret commented Mar 8, 2024

Czaki commented Mar 8, 2024

brisvag commented Mar 12, 2024

Czaki commented Mar 12, 2024

brisvag commented Mar 14, 2024

psobolewskiPhD commented Mar 14, 2024

jules-vanaret commented Mar 14, 2024

psobolewskiPhD left a comment

Choose a reason for hiding this comment

jules-vanaret commented Apr 5, 2024

Czaki commented Apr 5, 2024

DragaDoncila commented Apr 17, 2024

github-actions bot commented Apr 30, 2024

github-actions bot commented Apr 30, 2024

github-actions bot commented Apr 30, 2024

Czaki commented Apr 30, 2024

jules-vanaret commented May 1, 2024 • edited

Czaki commented May 1, 2024

github-actions bot commented May 7, 2024

Czaki commented May 7, 2024

psobolewskiPhD commented May 13, 2024

jules-vanaret commented May 13, 2024

jules-vanaret commented May 15, 2024

jules-vanaret commented Mar 7, 2024 •

edited

codecov bot commented Mar 7, 2024 •

edited

jules-vanaret commented May 1, 2024 •

edited