Use tuple instead of string for (LOWER|UPPER)_TABLEs. #23795

ttsugriy · 2023-05-22T23:50:45Z

This avoids unnecessary join and speeds up translation.

You can see benchmarks and memory measurements in the colab.

The summary:

%timeit "".join(_all_chars[:97] + _ascii_upper + _all_chars[97+26:])
3.25 µs ± 120 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

vs

%timeit _all_chars[:97] + _ascii_upper + _all_chars[97+26:]
1.37 µs ± 234 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

which reduces import time and

%timeit english_upper('ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789_')
1.08 µs ± 227 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

vs

%timeit english_upper2('ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789_')
872 ns ± 6.37 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

which suggests more than 10% faster english_upper runtime perf. The cons of this approach are potentially breaking change if anyone clients rely on these constants to be strs (but I couldn't find any usages on github or by googling) and tuple uses more memory - 2088 vs 329.

This avoids unnecessary join and speeds up translation. You can see benchmarks and memory measurements in the [colab](https://colab.research.google.com/gist/ttsugriy/461ae12926d42a69f0f19aa7780b06ef/str-tuple-english_upper.ipynb). The summary: ``` %timeit "".join(_all_chars[:97] + _ascii_upper + _all_chars[97+26:]) 3.25 µs ± 120 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) ``` vs ``` %timeit _all_chars[:97] + _ascii_upper + _all_chars[97+26:] 1.37 µs ± 234 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) ``` which reduces import time and ``` %timeit english_upper('ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789_') 1.08 µs ± 227 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) ``` vs ``` %timeit english_upper2('ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789_') 872 ns ± 6.37 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) ``` which suggests more than 10% faster `english_upper` runtime perf. The cons of this approach are potentially breaking change if anyone clients rely on these constants to be `str`s (but I couldn't find any usages on github or by googling) and tuple uses more memory - 2088 vs 329.

MatteoRaso

LGTM. Thanks!

The cons of this approach are potentially breaking change if anyone clients rely on these constants to be strs

I don't think this will be a problem, since these constants are only meant for internal use.

charris · 2023-05-30T15:25:26Z

Thanks @ttsugriy .

MatteoRaso approved these changes May 24, 2023

View reviewed changes

charris merged commit 174dfae into numpy:main May 30, 2023
60 checks passed

ttsugriy deleted the patch-1 branch May 30, 2023 15:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use tuple instead of string for (LOWER|UPPER)_TABLEs. #23795

Use tuple instead of string for (LOWER|UPPER)_TABLEs. #23795

ttsugriy commented May 22, 2023

MatteoRaso left a comment

charris commented May 30, 2023

Use tuple instead of string for (LOWER|UPPER)_TABLEs. #23795

Use tuple instead of string for (LOWER|UPPER)_TABLEs. #23795

Conversation

ttsugriy commented May 22, 2023

MatteoRaso left a comment

Choose a reason for hiding this comment

charris commented May 30, 2023