Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use tuple instead of string for (LOWER|UPPER)_TABLEs. #23795

Merged
merged 1 commit into from May 30, 2023

Conversation

ttsugriy
Copy link
Contributor

This avoids unnecessary join and speeds up translation.

You can see benchmarks and memory measurements in the colab.

The summary:

%timeit "".join(_all_chars[:97] + _ascii_upper + _all_chars[97+26:])
3.25 µs ± 120 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

vs

%timeit _all_chars[:97] + _ascii_upper + _all_chars[97+26:]
1.37 µs ± 234 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

which reduces import time and

%timeit english_upper('ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789_')
1.08 µs ± 227 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

vs

%timeit english_upper2('ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789_')
872 ns ± 6.37 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

which suggests more than 10% faster english_upper runtime perf. The cons of this approach are potentially breaking change if anyone clients rely on these constants to be strs (but I couldn't find any usages on github or by googling) and tuple uses more memory - 2088 vs 329.

This avoids unnecessary join and speeds up translation.

You can see benchmarks and memory measurements in the [colab](https://colab.research.google.com/gist/ttsugriy/461ae12926d42a69f0f19aa7780b06ef/str-tuple-english_upper.ipynb).

The summary:
```
%timeit "".join(_all_chars[:97] + _ascii_upper + _all_chars[97+26:])
3.25 µs ± 120 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
```
vs
```
%timeit _all_chars[:97] + _ascii_upper + _all_chars[97+26:]
1.37 µs ± 234 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
```
which reduces import time and
```
%timeit english_upper('ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789_')
1.08 µs ± 227 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
```
vs
```
%timeit english_upper2('ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789_')
872 ns ± 6.37 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
```
which suggests more than 10% faster `english_upper` runtime perf.
The cons of this approach are potentially breaking change if anyone clients rely on these constants to be `str`s (but I couldn't find any usages on github or by googling) and tuple uses more memory - 2088 vs 329.
Copy link
Contributor

@MatteoRaso MatteoRaso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

The cons of this approach are potentially breaking change if anyone clients rely on these constants to be strs

I don't think this will be a problem, since these constants are only meant for internal use.

@charris charris merged commit 174dfae into numpy:main May 30, 2023
60 checks passed
@charris
Copy link
Member

charris commented May 30, 2023

Thanks @ttsugriy .

@ttsugriy ttsugriy deleted the patch-1 branch May 30, 2023 15:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants