Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use tuple instead of string for (LOWER|UPPER)_TABLEs. #23795

Merged
merged 1 commit into from
May 30, 2023

Commits on May 22, 2023

  1. Use tuple instead of string for (LOWER|UPPER)_TABLEs.

    This avoids unnecessary join and speeds up translation.
    
    You can see benchmarks and memory measurements in the [colab](https://colab.research.google.com/gist/ttsugriy/461ae12926d42a69f0f19aa7780b06ef/str-tuple-english_upper.ipynb).
    
    The summary:
    ```
    %timeit "".join(_all_chars[:97] + _ascii_upper + _all_chars[97+26:])
    3.25 µs ± 120 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
    ```
    vs
    ```
    %timeit _all_chars[:97] + _ascii_upper + _all_chars[97+26:]
    1.37 µs ± 234 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
    ```
    which reduces import time and
    ```
    %timeit english_upper('ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789_')
    1.08 µs ± 227 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
    ```
    vs
    ```
    %timeit english_upper2('ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789_')
    872 ns ± 6.37 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
    ```
    which suggests more than 10% faster `english_upper` runtime perf.
    The cons of this approach are potentially breaking change if anyone clients rely on these constants to be `str`s (but I couldn't find any usages on github or by googling) and tuple uses more memory - 2088 vs 329.
    ttsugriy committed May 22, 2023
    Configuration menu
    Copy the full SHA
    3abee05 View commit details
    Browse the repository at this point in the history