Much faster FormatStringEncoding #5315

jeylau · 2022-11-08T18:06:28Z

Description

Loading a large pose estimation data file with a simple "{id}–{label}" format (using the napari-deeplabcut plugin; resulting feature table with >6M rows and four columns) lasted 3+ min.
While I first thought it had something to do with rendering the keypoints, a bit of profiling (see pink lines below) indicated that 92% of the time it took to load the annotations (177 s!) was spent in the TextManager, and specifically in _get_feature_row.

I substituted df.iloc with df.itertuples, and loading now takes only roughly 5 s (~35x speedup).

Type of change

New feature (non-breaking change which adds functionality)

References

How has this been tested?

example: the test suite for my feature covers cases x, y, and z
example: all tests pass with my change
example: I check if my changes works with both PySide and PyQt backends
as there are small differences between the two Qt bindings.

Final checklist:

My PR is the minimum possible work for the desired functionality
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
I have added tests that prove my fix is effective or that my feature works
If I included new strings, I have used trans. to make them localizable.
For more information see our translations guide.

andy-sweet · 2022-11-08T18:13:20Z

Thanks for the contribution and detailed write up! And sorry about the performance problems! I should be able to take a look either today or tomorrow.

for more information, see https://pre-commit.ci

andy-sweet

Fantastic speedup here! Approving with just a minor comment about a variable name.

One thing to watch out for is that when Layer.features is present, it will not necessarily be a pandas DataFrame in the future. Currently we always coerce it to be exactly a pandas DataFrame. But in the future we may want to support other dataframe libraries (dask dataframe, cudf) using a Protocol.

I've currently done a half-baked job of indicating that with the docstring and typing. But ideally, we can define exactly what attributes and methods we expect to find in a dataframe Protocol.

Previously we needed the dataframe to define iloc and support len, but now we need itertuples. I checked dask's and cudf's dataframe APIs and they both provide itertuples, so there's no problem here.

napari/layers/utils/string_encoding.py

Co-authored-by: Andy Sweet <andrew.d.sweet@gmail.com>

andy-sweet · 2022-11-08T19:33:33Z

Also, if you have the profile data or even just a screenshot of snakeviz for the optimized version, that would be ideal. I'm curious what takes up the rest of that 5s.

jeylau · 2022-11-08T19:56:54Z

I was unclear above; the 5 s are spent in the TextManager. The ColorManager asks for 9 other seconds, but this may be something I have to improve at the level of the plugin; I'll give it a look.

andy-sweet · 2022-11-08T20:23:01Z

I was unclear above; the 5 s are spent in the TextManager. The ColorManager asks for 9 other seconds, but this may be something I have to improve at the level of the plugin; I'll give it a look.

Thanks for clarifying. The optimization here should just go in as is. There was an effort to replace ColorManager which is currently on hold, but if we pick it back up, we might be able to look for an optimization. You should also feel free to look for one too.

andy-sweet · 2022-11-08T20:29:58Z

Also, FYI, I ran the relevant ASV benchmark for the format string case and we also see a big speedup there. At least 10x on my machine, though we only go up to 65536 (2^16) elements and timings are not linear.

(napari-dev) ➜  napari git:(pr/jeylau/5315) ✗ asv run --python=same --bench "TextManagerSuite.time_create"
· Discovering benchmarks
· Running 1 total benchmarks (1 commits * 1 environments * 1 benchmarks)
[  0.00%] ·· Benchmarking existing-py_Users_asweet_software_miniconda3_envs_napari-dev_bin_python
[ 50.00%] ··· Running (benchmark_text_manager.TextManagerSuite.time_create--).
[100.00%] ··· benchmark_text_manager.TextManagerSuite.time_create                                                                                                                                                                                                                                                                                                       ok
[100.00%] ··· ======= =========================================
              --                        string                 
              ------- -----------------------------------------
                 n     {string_property}: {float_property:.2f} 
              ======= =========================================
                 16                    829±50μs                
                 64                    855±20μs                
                256                  1.14±0.06ms               
                1024                 2.23±0.05ms               
                4096                 6.15±0.06ms               
               16384                  21.5±0.2ms               
               65536                  84.9±0.9ms               
              ======= =========================================

(napari-dev) ➜  napari git:(pr/jeylau/5315) ✗ git switch main          
M	napari/benchmarks/benchmark_text_manager.py
Switched to branch 'main'
Your branch is ahead of 'origin/main' by 4 commits.
  (use "git push" to publish your local commits)
(napari-dev) ➜  napari git:(main) ✗ asv run --python=same --bench "TextManagerSuite.time_create"
· Discovering benchmarks
· Running 1 total benchmarks (1 commits * 1 environments * 1 benchmarks)
[  0.00%] ·· Benchmarking existing-py_Users_asweet_software_miniconda3_envs_napari-dev_bin_python
[ 50.00%] ··· Running (benchmark_text_manager.TextManagerSuite.time_create--).
[100.00%] ··· benchmark_text_manager.TextManagerSuite.time_create                                                                                                                                                                                                                                                                                                       ok
[100.00%] ··· ======= =========================================
              --                        string                 
              ------- -----------------------------------------
                 n     {string_property}: {float_property:.2f} 
              ======= =========================================
                 16                   1.04±0.1ms               
                 64                  1.87±0.08ms               
                256                   5.81±0.6ms               
                1024                   19.9±2ms                
                4096                   78.9±8ms                
               16384                   290±7ms                 
               65536                  1.15±0.01s               
              ======= =========================================

jeylau · 2022-11-08T20:32:02Z

Sweet!

andy-sweet · 2022-11-08T20:42:07Z

I'll merge this after 48 hours unless anyone objects.

brisvag

Awesome! Small suggestion, but otherwise approving.

brisvag · 2022-11-09T10:17:49Z

napari/layers/utils/string_encoding.py

+        feature_names = features.columns.to_list()
        values = [
-            self.format.format(**_get_feature_row(features, i))
-            for i in range(len(features))
+            self.format.format(**dict(zip(feature_names, row)))
+            for row in features.itertuples(index=False, name=None)
        ]


I think we should add a comment explaining the code here. Before it had at least the name of the function explaining something, but now it's rather cryptic.

JoOkuma · 2022-11-09T13:28:23Z

The functionality is very similar to pd.Dataframe.to_dict("records"), it does essentially the same thing with a few extra checks.
This PR is still 2x faster than to_dict("records").

This PR:

[100.00%] ··· ======= =========================================
              --
              ------- -----------------------------------------
                 n     {string_property}: {float_property:.2f}
              ======= =========================================
                 16                    676±40μs
                 64                    864±80μs
                256                  1.02±0.05ms
                1024                  2.00±0.1ms
                4096                  5.67±0.1ms
               16384                  19.6±0.2ms
               65536                   78.6±2ms
              ======= =========================================

with values = [self.format.format(**row) for row in features.to_dict("records")]

[100.00%] ··· ======= =========================================
             --
             ------- -----------------------------------------
                n     {string_property}: {float_property:.2f}
             ======= =========================================
                16                    759±30μs
                64                    803±30μs
               256                  1.17±0.05ms
               1024                 2.90±0.03ms
               4096                  9.54±0.3ms
              16384                  34.4±0.4ms
              65536                   144±4ms
             ======= =========================================

jeylau · 2022-11-09T13:39:55Z

@JoOkuma, and df.itertuples memory footprint will also be very low, as it returns an iterator.

beckernick · 2022-11-09T14:28:23Z

Hi! I came across this PR due to the cuDF mention. Performance gains look fantastic with this change!

I wanted to add some context about cuDF and itertuples:

Previously we needed the dataframe to define iloc and support len, but now we need itertuples. I checked dask's and cudf's dataframe APIs and they both provide itertuples, so there's no problem here.

cuDF doesn't support this kind of row based iteration via itertuples or itterrows. Iterating one row at a time from raw Python will cause a GPU->CPU transfer of one row at time, which is very inefficient due to the transfer overhead. If there's interest in cuDF support in the future, it may be fine to jump back and forth between cuDF and pandas here, as one bulk transfer of pdf = gdf.to_pandas(); ...; gdf = cudf.from_pandas(...) will not be as slow.

I also wanted to mention that cuDF is adding support for DataFrame.to_dict, which will essentially do the pandas conversion for you and then call pd.DataFrame.to_dict. to_dict provides a similar "dictionary of every row in a list" output.

import cudf

df = cudf.datasets.randomdata(nrows=3)
pdf = df.to_pandas()
print(pdf.to_dict(orient="records"))

# from the function in the PR
feature_names = pdf.columns.to_list()
print([dict(zip(feature_names, row)) for row in pdf.itertuples(index=False, name=None)])
[{'id': 1023, 'x': -0.3437607026238143, 'y': 0.4419788553645101}, {'id': 1012, 'x': 0.2760312523846038, 'y': 0.8273451449034162}, {'id': 1013, 'x': -0.35755297004454434, 'y': -0.13542889747873632}]
[{'id': 1023, 'x': -0.3437607026238143, 'y': 0.4419788553645101}, {'id': 1012, 'x': 0.2760312523846038, 'y': 0.8273451449034162}, {'id': 1013, 'x': -0.35755297004454434, 'y': -0.13542889747873632}]

Happy to chat further if helpful.

andy-sweet · 2022-11-09T15:57:37Z

cuDF doesn't support this kind of row based iteration via itertuples or itterrows. Iterating one row at a time from raw Python will cause a GPU->CPU transfer of one row at time, which is very inefficient due to the transfer overhead. If there's interest in cuDF support in the future, it may be fine to jump back and forth between cuDF and pandas here, as one bulk transfer of pdf = gdf.to_pandas(); ...; gdf = cudf.from_pandas(...) will not be as slow.

Thanks for the very useful information!

napari doesn't yet support different types of tables/dataframes, but we can at least imagine that it could in the not too distant future. Will definitely reference this information then and may pick your brains more. I imagine support would look something like defining a core functional protocol and maybe do some specialized implementation for specific types (e.g. here doing something special if it's a cuDF dataframe).

exactlyallan · 2022-11-09T18:54:00Z

FYI RAPIDS viz works closely with HoloViews and have done a similar cuDF implementations with success, for example.

andy-sweet · 2022-11-10T15:49:49Z

Merging after 24 but before 48 hours since we have multiple approvals here.

* Much faster iteration over a DataFrame's rows * Remove unused import * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update napari/layers/utils/string_encoding.py Co-authored-by: Andy Sweet <andrew.d.sweet@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Andy Sweet <andrew.d.sweet@gmail.com>

Much faster iteration over a DataFrame's rows

531df6f

github-actions bot assigned jeylau Nov 8, 2022

jeylau and others added 2 commits November 8, 2022 19:15

Remove unused import

05e7f14

[pre-commit.ci] auto fixes from pre-commit.com hooks

2b9078e

for more information, see https://pre-commit.ci

andy-sweet approved these changes Nov 8, 2022

View reviewed changes

napari/layers/utils/string_encoding.py Outdated Show resolved Hide resolved

Update napari/layers/utils/string_encoding.py

52e9845

Co-authored-by: Andy Sweet <andrew.d.sweet@gmail.com>

Merge branch 'napari:main' into fast_format_encoding

37c0cba

andy-sweet added the performance Relates to performance label Nov 8, 2022

Czaki approved these changes Nov 8, 2022

View reviewed changes

brisvag approved these changes Nov 9, 2022

View reviewed changes

andy-sweet merged commit cd5c314 into napari:main Nov 10, 2022

jeylau deleted the fast_format_encoding branch November 11, 2022 11:30

Czaki mentioned this pull request Jun 7, 2023

v0.4.18 - release process #5911

Closed

Czaki added this to the 0.4.18 milestone Jun 13, 2023

Czaki added the enhancement label Jun 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Much faster FormatStringEncoding #5315

Much faster FormatStringEncoding #5315

jeylau commented Nov 8, 2022

andy-sweet commented Nov 8, 2022

andy-sweet left a comment

andy-sweet commented Nov 8, 2022

jeylau commented Nov 8, 2022

andy-sweet commented Nov 8, 2022

andy-sweet commented Nov 8, 2022

jeylau commented Nov 8, 2022

andy-sweet commented Nov 8, 2022

brisvag left a comment

brisvag Nov 9, 2022

JoOkuma commented Nov 9, 2022

jeylau commented Nov 9, 2022

beckernick commented Nov 9, 2022

andy-sweet commented Nov 9, 2022

exactlyallan commented Nov 9, 2022

andy-sweet commented Nov 10, 2022 •

edited

Loading

Much faster FormatStringEncoding #5315

Much faster FormatStringEncoding #5315

Conversation

jeylau commented Nov 8, 2022

Description

Type of change

References

How has this been tested?

Final checklist:

andy-sweet commented Nov 8, 2022

andy-sweet left a comment

Choose a reason for hiding this comment

andy-sweet commented Nov 8, 2022

jeylau commented Nov 8, 2022

andy-sweet commented Nov 8, 2022

andy-sweet commented Nov 8, 2022

jeylau commented Nov 8, 2022

andy-sweet commented Nov 8, 2022

brisvag left a comment

Choose a reason for hiding this comment

brisvag Nov 9, 2022

Choose a reason for hiding this comment

JoOkuma commented Nov 9, 2022

jeylau commented Nov 9, 2022

beckernick commented Nov 9, 2022

andy-sweet commented Nov 9, 2022

exactlyallan commented Nov 9, 2022

andy-sweet commented Nov 10, 2022 • edited Loading

andy-sweet commented Nov 10, 2022 •

edited

Loading