Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Making Points layer creation and update faster #6727

Merged
merged 22 commits into from
May 15, 2024

Conversation

jules-vanaret
Copy link
Contributor

@jules-vanaret jules-vanaret commented Mar 7, 2024

References and relevant issues

I work with large points datasets (~10⁶ vertices), and creating/setting data of a Points layer is rather slow, on the order of 2 seconds on my machine). I have dug into the code and found that the culprit is this function to handle symbols associated to points in napari/layers/points/_points_utils.py, which takes around 80% of the full computation time:

def coerce_symbols(array: Iterable) -> np.ndarray:
    """
    Parse an array of symbols and convert it to the correct strings.

    Ensures that all strings are valid symbols and converts aliases.

    Parameters
    ----------
    array : np.ndarray
        Array of strings matching Symbol values.
    """
    # dtype has to be object, otherwise np.vectorize will cut it down to `U(N)`,
    # where N is the biggest string currently in the array.
    array = [SYMBOL_ALIAS.get(k, k) for k in (str(x).lower() for x in array)]
    return np.vectorize(Symbol, otypes=[object])(array)

The purpose of coerce_symbols is to transform "raw" (e.g string like 'o') points symbols into proper Symbol instances.

Note that even if symbol is not specified when creating the Points layer (and then it is just equal to 'o'), or specified with a unique symbol for all points, the symbol setter in napari/layers/points/points.py:

@symbol.setter
def symbol(self, symbol: Union[str, np.ndarray, list]) -> None:
    symbol = np.broadcast_to(symbol, self.data.shape[0])

immediately transform this unique symbol into a potentially huge numpy array of size N_vertices.

The main reason for the fact that coerce_symbols is slow in the case of large N_vertices is because it loops twice through this huge numpy array of (potentially simply duplicated) symbols: once with (str(x).lower() for x in array) and more or less once with np.vectorize(Symbol, otypes=[object])(array). I believe this can be improved with minor changes

Description

The PR stems from two ideas to reduce computation time.

  1. if the user provides no symbol when creating the Point layer, or provides a single string or Symbol instance, it is much faster to first convert it to a proper Symbol instance and THEN to broadcast it to an array of size N_vertices.
  2. if the user provides a list of N_vertices symbols for the points (composed of strings or Symbol instances), there are not that many valid symbols (currently 14 different ones) that they can choose from. This means that instead of converting each string symbol into a Symbol instance individually, it is faster to first find all unique raw symbols from symbol provided by the user, convert them in to Symbol, and then simply map each raw symbol to its corresponding Symbol instance.

The new coerce_symbols function I propose looks like this:

def coerce_symbols(symbol: Union[str, Symbol, np.ndarray, list]) -> np.ndarray:
    """
    Parse an array of symbols and convert it to the correct strings.

    Ensures that all strings are valid symbols and converts aliases.

    Parameters
    ----------
    array : np.ndarray
        Array of strings matching Symbol values.
    """
    # if symbol is a unique string or Symbol instance, convert it to a
    # proper Symbol instance
    if isinstance(symbol, (str, Symbol)):
        return np.array(symbol_conversion(symbol), dtype=object)

    # otherwise, create a dictionary to map raw symbols to their
    # Symbol instance counterpart
    symbol_dict = create_symbol_dict(symbol)
    # then use a vectorized "dict.get" to convert the raw symbols to
    # their Symbol instance counterpart quickly
    return fast_dict_get(symbol, symbol_dict)

Instead of operating on an already-broadcasted array of symbols, it can take as input either a single symbol, or a list of symbols.

The symbol setter in napari/layers/points/points.py is modified to look like this:

@symbol.setter
def symbol(self, symbol: Union[str, np.ndarray, list]) -> None:
    coerced_symbols = coerce_symbols(symbol)
    coerced_symbols = np.broadcast_to(
        coerced_symbols, self.data.shape[0]
    )

It now tries to broadcast symbols to the number of vertices AFTER converting them to proper Symbol instances.

Finally, coerce_symbols is also used in the current_symbol setter, but this new implementation does not interfere with its current behavior.

Performance changes

Qualitative performance changes can be assessed using the following script, which simulates a random points dataset with N_vertices vertices.

import numpy as np
import napari
from time import time
from napari.layers.points._points_constants import SYMBOL_ALIAS, Symbol
    
np.random.seed(2024)

N_vertices = 1_000_000
symbols = list(SYMBOL_ALIAS.keys())
symbols = np.random.choice(symbols, N_vertices).tolist()

# simulate 3D points with 4th dimension (time)
points = np.random.rand(N_vertices, 4) * 100

t0 = time()
points_layer = napari.layers.Points(points, name='points')
print(f"Creating layer with {N_vertices} vertices: {time() - t0:.4f} seconds")

t0 = time()
points_layer = napari.layers.Points(points, name='points', symbol=symbols)
print(f"Creating layer with {N_vertices} vertices and as many symbols: {time() - t0:.4f} seconds")

t0 = time()
points_layer = napari.layers.Points(points, name='points', symbol=Symbol.RING)
print(f"Creating layer with {N_vertices} vertices and a Symbol instance: {time() - t0:.4f} seconds")

For different values of N_vertices, I time the following duration to define the Points layer:

Old implementation New implementation
N_vertices = 10_000 0.06s, 0.04s, 0.03s 0.04s, 0.0075s, 0.0065s
N_vertices = 1_000_000 1.35s, 1.45s, 1.05s 0.24s, 0.24s, 0.20s
N_vertices = 10_000_000 13.3s, 14.0s, 18.0s 1.94s, 2.48s, 1.92s

Note that the proposed implementation has a much nicer scaling with the number of vertices.

Benchmarks

From what I have been able to see, the current benchmarking suite already computes the time it takes to set the data of a Point layer, so there is no need to test for anything else.

Final Checklist

  • My PR is the minimum possible work for the desired functionality
  • I have commented my code, particularly in hard-to-understand areas
  • I have added tests that prove my fix is effective or that my feature works

@jules-vanaret jules-vanaret changed the title Points setter optim Making Points layer creation and update faster Mar 7, 2024
@jules-vanaret
Copy link
Contributor Author

I have realized that broadcast_to was also used as a check that the input symbol parameter had the right shape if it was given as a list or an array. I have added it back (not only for unique symbols).

Copy link

codecov bot commented Mar 7, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 92.44%. Comparing base (1dcbade) to head (762426e).
Report is 2 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #6727      +/-   ##
==========================================
- Coverage   92.45%   92.44%   -0.01%     
==========================================
  Files         617      617              
  Lines       55166    55181      +15     
==========================================
+ Hits        51001    51010       +9     
- Misses       4165     4171       +6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@psobolewskiPhD
Copy link
Member

This looks to be addressing: #6275
Have you checked with vispy main vispy/vispy#2533 ?

@DragaDoncila
Copy link
Contributor

Thanks for the fantastic PR description @jules-vanaret, and for your attention on the points layer 👀

This looks to be addressing: #6275

@psobolewskiPhD yep I'd say you're right. Based on the profiling and discussion in that thread it looks like both the vispy.set_data and our own coerce_symbols could be improved. Unless vispy/#2533 removes the need for us to coerce_symbols, it sounds like this PR would be an improvement regardless of the vispy PR?

@jules-vanaret
Copy link
Contributor Author

Thank you both for taking a look!
@psobolewskiPhD all the tests were made on a fresh environment, with Vispy 0.14.1, which I believe is the most recent released version. Do you think a new version of Vispy in the near future will make this PR obsolete ?

@Czaki
Copy link
Collaborator

Czaki commented Mar 8, 2024

@brisvag Did you know what is required to release vispy 0.14.2?

@brisvag
Copy link
Contributor

brisvag commented Mar 12, 2024

Nice work, I was never happy about this code ^^' @Czaki releasing a patch version for vispy can be done whenever we like I think.

@Czaki
Copy link
Collaborator

Czaki commented Mar 12, 2024

So maybe you could make release, and we could check how this PR works with the new vispy?

@brisvag
Copy link
Contributor

brisvag commented Mar 14, 2024

Vispy 0.14.2 released :)

@psobolewskiPhD
Copy link
Member

FYI I ran the script from the OP (n= 10 mil) with newest vispy and psygnal and I get:

Creating layer with 10000000 vertices: 11.0905 seconds
Creating layer with 10000000 vertices and as many symbols: 12.3066 seconds
Creating layer with 10000000 vertices and a Symbol instance: 7.7045 seconds

Virtually the same prior to updating them. However with this PR:

Creating layer with 10000000 vertices: 0.9560 seconds
Creating layer with 10000000 vertices and as many symbols: 1.3793 seconds
Creating layer with 10000000 vertices and a Symbol instance: 0.8674 seconds

Adding a layer using viewer.add_points(data) (data = rng.random(size=(10_000_000, 2)) * 1000) takes <10 s, which is >10s faster than the same on main.

@jules-vanaret
Copy link
Contributor Author

Whoops you beat me to it ! :)
I report the same orders of magnitude on my end with the newest Vispy.

@psobolewskiPhD psobolewskiPhD added performance Relates to performance highlight PR that should be mentioned in next release notes labels Mar 14, 2024
Copy link
Member

@psobolewskiPhD psobolewskiPhD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great to me!
But I'd like @Czaki who wrote the previous implementation to take a look too.

@psobolewskiPhD psobolewskiPhD added this to the 0.5.0 milestone Mar 21, 2024
@jules-vanaret
Copy link
Contributor Author

@Czaki should I run additional benchmarks or add comments to the code ?

@Czaki
Copy link
Collaborator

Czaki commented Apr 5, 2024

I will try to review tomorrow. Please resolve the conflicts

@DragaDoncila
Copy link
Contributor

@Czaki any further comments on this one?

napari/layers/points/_points_utils.py Outdated Show resolved Hide resolved
napari/layers/points/_points_utils.py Outdated Show resolved Hide resolved
napari/layers/points/_points_utils.py Outdated Show resolved Hide resolved
Co-authored-by: Grzegorz Bokota <bokota+github@gmail.com>
Copy link
Contributor

The Qt benchmark run requested by PR #6727 (e37f69c vs 2a22052) has finished with status 'failure'. See the CI logs and artifacts for further details.
The non-Qt benchmark run requested by PR #6727 (e37f69c vs 2a22052) has finished with status 'failure'. See the CI logs and artifacts for further details.

@Czaki Czaki mentioned this pull request Apr 30, 2024
@Czaki Czaki added run-benchmarks Add this label to trigger a full benchmark run in a PR and removed run-benchmarks Add this label to trigger a full benchmark run in a PR labels Apr 30, 2024
Copy link
Contributor

The Qt benchmark run requested by PR #6727 (f41f486 vs 2a22052) has finished with status 'failure'. See the CI logs and artifacts for further details.
The non-Qt benchmark run requested by PR #6727 (f41f486 vs 2a22052) has finished with status 'failure'. See the CI logs and artifacts for further details.

1 similar comment
Copy link
Contributor

The Qt benchmark run requested by PR #6727 (f41f486 vs 2a22052) has finished with status 'failure'. See the CI logs and artifacts for further details.
The non-Qt benchmark run requested by PR #6727 (f41f486 vs 2a22052) has finished with status 'failure'. See the CI logs and artifacts for further details.

@Czaki
Copy link
Collaborator

Czaki commented Apr 30, 2024

It looks like a global translation dict solve the performance problems:

| Change   | Before [2a220521]    | After [f41f4862]    |   Ratio | Benchmark (Parameter)                                          |
|----------|----------------------|---------------------|---------|----------------------------------------------------------------|
| -        | 16.2±0.2ms           | 9.19±0.1ms          |    0.57 | benchmark_points_layer.Points3DSuite.time_create_layer(4096)   |
| -        | 15.5±0.3ms           | 8.24±0.04ms         |    0.53 | benchmark_points_layer.Points2DSuite.time_create_layer(4096)   |
| -        | 40.8±0.9ms           | 12.5±0.06ms         |    0.31 | benchmark_points_layer.Points2DSuite.time_create_layer(16384)  |
| -        | 40.8±0.3ms           | 12.5±0.1ms          |    0.31 | benchmark_points_layer.Points3DSuite.time_create_layer(16384)  |
| -        | 141±2ms              | 30.0±0.6ms          |    0.21 | benchmark_points_layer.Points2DSuite.time_create_layer(65536)  |
| -        | 138±0.9ms            | 27.4±0.6ms          |    0.2  | benchmark_points_layer.Points3DSuite.time_create_layer(65536)  |
| -        | 1.66±0.01s           | 62.4±1ms            |    0.04 | benchmark_python_layer.CoerceSymbolsSuite.time_coerce_symbols1 |
| -        | 1.66±0.01s           | 63.3±0.5ms          |    0.04 | benchmark_python_layer.CoerceSymbolsSuite.time_coerce_symbols2 |

@jules-vanaret are you ok with these changes?

For typing, I prefer to wait on #6866 as it will bump mypy and current output is false negative.

jules-vanaret and others added 2 commits May 1, 2024 19:07
Co-authored-by: Grzegorz Bokota <bokota+github@gmail.com>
@jules-vanaret
Copy link
Contributor Author

jules-vanaret commented May 1, 2024

Sorry for the delay. Thank you for your work @Czaki, precomputing the dict makes much more sense ! Some tests are still failing though.

@Czaki
Copy link
Collaborator

Czaki commented May 1, 2024

Yes, but I prefer to merge first #6866, as it will impact failing test.

@Czaki Czaki added run-benchmarks Add this label to trigger a full benchmark run in a PR and removed run-benchmarks Add this label to trigger a full benchmark run in a PR labels May 7, 2024
Copy link
Contributor

github-actions bot commented May 7, 2024

The Qt benchmark run requested by PR #6727 (0b86e44 vs 1dcbade) has finished with status 'success'. See the CI logs and artifacts for further details.
The non-Qt benchmark run requested by PR #6727 (0b86e44 vs 1dcbade) has finished with status 'success'. See the CI logs and artifacts for further details.

@Czaki Czaki added ready to merge Last chance for comments! Will be merged in ~24h and removed run-benchmarks Add this label to trigger a full benchmark run in a PR ready to merge Last chance for comments! Will be merged in ~24h labels May 7, 2024
Signed-off-by: Grzegorz Bokota <bokota+github@gmail.com>
@Czaki
Copy link
Collaborator

Czaki commented May 7, 2024

@jules-vanaret Could you check current state? I have cleanup code a little and tweak documentation. For me, it is close to merging.

@psobolewskiPhD could you also check?

@psobolewskiPhD
Copy link
Member

Everything still looks good to me.
it's even faster:

Creating layer with 1000000 vertices: 0.1736 seconds
Creating layer with 1000000 vertices and as many symbols: 0.3118 seconds
Creating layer with 1000000 vertices and a Symbol instance: 0.0992 seconds

@Czaki Czaki added the ready to merge Last chance for comments! Will be merged in ~24h label May 13, 2024
@jules-vanaret
Copy link
Contributor Author

I just rechecked the code and some cases are indeed faster (cases that are not are marginally so).
From my point of view, everything seems ok!

@jni jni merged commit 0e6f0af into napari:main May 15, 2024
37 checks passed
@github-actions github-actions bot removed the ready to merge Last chance for comments! Will be merged in ~24h label May 15, 2024
@jules-vanaret
Copy link
Contributor Author

Thanks @jni for merging the PR! :)
And thank you @brisvag for the Vispy push, @psobolewskiPhD for the thorough benchmarking, and particularly @Czaki for all your work on optimization, clean up, and general coordination of the PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
highlight PR that should be mentioned in next release notes performance Relates to performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants