Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IntFlag makes re.compile slower #75852

Closed
methane opened this issue Oct 3, 2017 · 29 comments
Closed

IntFlag makes re.compile slower #75852

methane opened this issue Oct 3, 2017 · 29 comments
Labels
3.7 performance stdlib

Comments

@methane
Copy link
Member

@methane methane commented Oct 3, 2017

BPO 31671
Nosy @gvanrossum, @warsaw, @rhettinger, @vstinner, @methane, @ethanfurman, @serhiy-storchaka
PRs
  • #3862
  • #3867
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2017-10-05.08:20:18.804>
    created_at = <Date 2017-10-03.11:00:30.826>
    labels = ['3.7', 'library', 'performance']
    title = 'IntFlag makes re.compile slower'
    updated_at = <Date 2017-10-05.10:18:23.889>
    user = 'https://github.com/methane'

    bugs.python.org fields:

    activity = <Date 2017-10-05.10:18:23.889>
    actor = 'vstinner'
    assignee = 'none'
    closed = True
    closed_date = <Date 2017-10-05.08:20:18.804>
    closer = 'methane'
    components = ['Library (Lib)']
    creation = <Date 2017-10-03.11:00:30.826>
    creator = 'methane'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 31671
    keywords = ['patch']
    message_count = 29.0
    messages = ['303594', '303601', '303603', '303605', '303607', '303609', '303610', '303617', '303626', '303666', '303667', '303679', '303681', '303684', '303687', '303688', '303690', '303691', '303706', '303732', '303737', '303740', '303746', '303747', '303749', '303750', '303753', '303754', '303756']
    nosy_count = 7.0
    nosy_names = ['gvanrossum', 'barry', 'rhettinger', 'vstinner', 'methane', 'ethan.furman', 'serhiy.storchaka']
    pr_nums = ['3862', '3867']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'performance'
    url = 'https://bugs.python.org/issue31671'
    versions = ['Python 3.7']

    @methane
    Copy link
    Member Author

    @methane methane commented Oct 3, 2017

    flags exposed by re module are IntFlag from Python 3.4.

    Since it is passed to underlying sre_parse and sre_compile,
    tests in loop (e.g. flags & SRE_FLAG_IGNORECASE) calls
    IntFlag.and and creates new instance everytime.

    It makes re.compile() slower.

    @methane methane added 3.7 stdlib labels Oct 3, 2017
    @serhiy-storchaka
    Copy link
    Member

    @serhiy-storchaka serhiy-storchaka commented Oct 3, 2017

    See also bpo-28637. Using IntFlag in the re module impacted the Python startup time. This was "fixed" by getting rid of re in site.py.

    @vstinner
    Copy link
    Member

    @vstinner vstinner commented Oct 3, 2017

    Oh, Python already accepts floating point numbers:

    haypo@selma$ ./python
    Python 3.7.0a1+ (heads/master-dirty:efb560eee2, Oct  3 2017, 12:15:58) 
    [GCC 7.2.1 20170915 (Red Hat 7.2.1-2)] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import re
    >>> re.compile("[a-z]", flags=re.I).match("A")
    <_sre.SRE_Match object; span=(0, 1), match='A'>
    >>> re.I
    <RegexFlag.IGNORECASE: 2>
    >>> re.compile("[a-z]", flags=2).match("A")
    <_sre.SRE_Match object; span=(0, 1), match='A'>
    
    # and now a float
    >>> re.compile("[a-z]", flags=2.0).match("A")
    <_sre.SRE_Match object; span=(0, 1), match='A'>

    @serhiy-storchaka
    Copy link
    Member

    @serhiy-storchaka serhiy-storchaka commented Oct 3, 2017

    :) This is due to caching. 2.0 == 2.

    @vstinner
    Copy link
    Member

    @vstinner vstinner commented Oct 3, 2017

    Oh, right. Strange.

    >>> re.compile("e", flags=2.0)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/home/haypo/prog/python/master/Lib/re.py", line 240, in compile
        return _compile(pattern, flags)
      File "/home/haypo/prog/python/master/Lib/re.py", line 288, in _compile
        p = sre_compile.compile(pattern, flags)
      File "/home/haypo/prog/python/master/Lib/sre_compile.py", line 747, in compile
        p = sre_parse.parse(p, flags)
      File "/home/haypo/prog/python/master/Lib/sre_parse.py", line 890, in parse
        p = _parse_sub(source, pattern, flags & SRE_FLAG_VERBOSE, 0)
    TypeError: unsupported operand type(s) for &: 'float' and 'int'

    @vstinner
    Copy link
    Member

    @vstinner vstinner commented Oct 3, 2017

    :) This is due to caching. 2.0 == 2.

    I proposed PR 3867 to fix the re.compile() cache: check flags type.

    @serhiy-storchaka
    Copy link
    Member

    @serhiy-storchaka serhiy-storchaka commented Oct 3, 2017

    Victor, how large is performance regression of your patch?

    @vstinner
    Copy link
    Member

    @vstinner vstinner commented Oct 3, 2017

    Serhiy: "Victor, how large is performance regression of your patch?"

    I tested bm_regex_compile.py of the Python performance benchmark suite:
    https://pyperformance.readthedocs.io/benchmarks.html#regex-compile

    My patch has no impact on *uncached* re.compile() performances according to this benchmark.

    haypo@selma$ ./python -m perf compare_to ref.json patch.json
    Benchmark hidden because not significant (1): regex_compile

    haypo@selma$ ./python -m perf compare_to ref.json patch.json -v
    Mean +- std dev: [ref] 386 ms +- 6 ms -> [patch] 387 ms +- 6 ms: 1.00x slower (+0%)
    Not significant!

    --

    On a microbenchmark on the *cache* itself, the difference is visible:

    1018 ./python -m perf timeit -s 'import re; re_compile=re.compile' 're_compile("a", flags=2)' --duplicate=1024 -o ref.json --inherit=PYTHONPATH -v
    1019 git co re_compile_flags_type
    1020 make
    1021 ./python -m perf timeit -s 'import re; re_compile=re.compile' 're_compile("a", flags=2)' --duplicate=1024 -o patch.json --inherit=PYTHONPATH -v
    1022 ./python -m perf compare_to ref.json patch.json
    1023 git diff master..

    haypo@selma$ ./python -m perf compare_to ref.json patch.json
    Mean +- std dev: [ref] 364 ns +- 6 ns -> [patch] 459 ns +- 10 ns: 1.26x slower (+26%)

    If you replace type(flags) with flags.__class__, the change has no impact on performances :-) But obj.__class__ can be different than type(obj).

    @serhiy-storchaka
    Copy link
    Member

    @serhiy-storchaka serhiy-storchaka commented Oct 3, 2017

    I don't think there is a problem which is worth to be solved by PR 3867. It is very unlikely that anyone uses re functions with a pattern and float flags which accidentally matches already cached valid pattern and integer flag. If float value is passed as a flag it is likely is passed the first time. Fractional floats are rejected in any case. This is different from the problem with PR 3862.

    @rhettinger
    Copy link
    Contributor

    @rhettinger rhettinger commented Oct 4, 2017

    When proposing to undo recent decisions, please add the people to the nosy list who were involved in making that decision in the first place.

    @methane
    Copy link
    Member Author

    @methane methane commented Oct 4, 2017

    When proposing to undo recent decisions, please add the people to the nosy list who were involved in making that decision in the first place.

    I don't propose reverting IntFlag.
    I just propose convert IntFlag to int in re module, before passing it to sre_compile.
    So this change is not visible to re users.

    @vstinner
    Copy link
    Member

    @vstinner vstinner commented Oct 4, 2017

    Nice speedup!

    I ran a benchmark on PR 3862:

    1002 ./python ~/prog/python/performance/performance/benchmarks/bm_regex_compile.py --inherit=PYTHONPATH -v -o patch.json
    1003 git co master
    1004 make
    1005 ./python ~/prog/python/performance/performance/benchmarks/bm_regex_compile.py --inherit=PYTHONPATH -v -o ref.json
    1007 ./python -m perf compare_to ref.json patch.json

    Mean +- std dev: [ref] 396 ms +- 16 ms -> [patch] 347 ms +- 8 ms: 1.14x faster (-12%)

    It's the following benchmark on *uncached* regular expressions. So it really measures re.compile() performance, not the cache.
    https://pyperformance.readthedocs.io/benchmarks.html#regex-compile

    @methane
    Copy link
    Member Author

    @methane methane commented Oct 4, 2017

    Thank you for benchmarking. I've added news entry about it.

    @vstinner
    Copy link
    Member

    @vstinner vstinner commented Oct 4, 2017

    Naoki: "I've added news entry about it."

    Nice. You might also mention the "optimization" in What's New in Python 3.7 doc, in the Optimisations section. The issue here is that it's only faster than Python 3.6, but Python 3.6 was slower than 3.5 :-) The "optimization" just makes Python 3.7 as fast as Python 3.5 ... :-)

    @serhiy-storchaka
    Copy link
    Member

    @serhiy-storchaka serhiy-storchaka commented Oct 4, 2017

    Did you compare the benchmarking results against 3.7 or 3.6?

    @vstinner
    Copy link
    Member

    @vstinner vstinner commented Oct 4, 2017

    Did you compare the benchmarking results against 3.7 or 3.6?

    My lasted benchmark is on the master branch: original code ("ref") vs code patched with PR 3862 ("patch").

    @serhiy-storchaka
    Copy link
    Member

    @serhiy-storchaka serhiy-storchaka commented Oct 4, 2017

    It makes sense to report the performance gain only in comparison with the previous released version. I expect that re compiling is slower in 3.7 due to new features and optimizations. Generating more optimal code takes a time.

    @vstinner
    Copy link
    Member

    @vstinner vstinner commented Oct 4, 2017

    I don't really care if Python 3.7 is still slower than 3.6 with PR 3862. The speedup is significant, the change is short and safe, so the PR LGTM :-) We can implement further optimizations later ;-)

    By the way, speed.python.org can be used to track performances over time. Sadly, I didn't have much time to take care of the website: run manually to job to draw new plots :-)

    @ethanfurman
    Copy link
    Member

    @ethanfurman ethanfurman commented Oct 4, 2017

    IntFlag.__and__ does not create a new instance every time -- all new instances are cached in the IntFlag machinery (so RegexFlag(7) is only created once).

    If all the RegexFlag combinations are created before the regex compile benchmark do we still see a speed-up?

    @methane
    Copy link
    Member Author

    @methane methane commented Oct 4, 2017

    IntFlag.__and__ does not create a new instance every time -- all new instances are cached in the IntFlag machinery (so RegexFlag(7) is only created once).

    I'm sorry, I misunderstood.
    But while new instance is not created each time, 4 Python method calls
    (e,g. IntFlag.__and__() -> IntFlag.__new__() -> IntFlag._missing_() -> IntFlag._create_pseudo_member_()) are much slower than int & int.

    If all the RegexFlag combinations are created before the regex compile benchmark do we still see a speed-up?

    I believe that's what Victor benchmarked.

    @vstinner
    Copy link
    Member

    @vstinner vstinner commented Oct 5, 2017

    The perf module always starts with a "warmup" run to fill caches. If enum
    has a cache, it should be filled automatically.

    @ethanfurman
    Copy link
    Member

    @ethanfurman ethanfurman commented Oct 5, 2017

    INADA Naoki said:

    But while new instance is not created each time, 4 Python method
    calls (e,g. IntFlag.__and__() -> IntFlag.__new__()
    -> IntFlag._missing_() -> IntFlag._create_pseudo_member_())
    are much slower than int & int.

    Only the first two calls always happen -- the last two calls only happen for numbers that haven't been seen yet. And yes, two Python level calls are much slower than an int & int.

    @methane
    Copy link
    Member Author

    @methane methane commented Oct 5, 2017

    New changeset c1c47c1 by INADA Naoki in branch 'master':
    bpo-31671: re: Convert RegexFlag to int before compile (GH-3862)
    c1c47c1

    @methane methane closed this as completed Oct 5, 2017
    @serhiy-storchaka
    Copy link
    Member

    @serhiy-storchaka serhiy-storchaka commented Oct 5, 2017

    Thanks Naoki!

    @serhiy-storchaka serhiy-storchaka added the performance label Oct 5, 2017
    @vstinner
    Copy link
    Member

    @vstinner vstinner commented Oct 5, 2017

    My PR 3867 fixes a corner case when the re.compile() is misused ("on purpose"?). I'm going to reject (abandon) my own PR since it makes re.compile() slower:

    Mean +- std dev: [ref] 364 ns +- 6 ns -> [patch] 459 ns +- 10 ns: 1.26x slower (+26%)

    @serhiy, @naoki: Are you ok to reject this PR and keep the corner case bug?

    @methane
    Copy link
    Member Author

    @methane methane commented Oct 5, 2017

    Instead caching type(flags), how about this?

      if not isinstance(flags, int):
          raise TypeError(f"flags must be int or RegexFlag, got {flags!r}")
      flags = int(flags)

    @serhiy-storchaka
    Copy link
    Member

    @serhiy-storchaka serhiy-storchaka commented Oct 5, 2017

    PR 3867 looks unpythonic to me. We usually don't check the type of arguments. This complicate and slow down a code.

    Do you have a realistic example when the current behavior harms?

    @vstinner
    Copy link
    Member

    @vstinner vstinner commented Oct 5, 2017

    Hum, I ran again my microbenchmark on re.compile() cache:

    haypo@selma$ ./python -m perf timeit -s 'import re; re_compile=re.compile' 're_compile("a", flags=2)' --duplicate=1024 -o ref.json --inherit=PYTHONPATH -v

    Sadly, the commit c1c47c1 made the cache slower:

    Mean +- std dev: [ref] 364 ns +- 26 ns -> [patch] 545 ns +- 19 ns: 1.50x slower (+50%)

    Just to check, I reverted the change on Scanner, the benchmark is still "560 ns +- 27 ns".

    "if isinstance(flags, RegexFlag): flags = flags.value" added 181 nanoseconds? A quick "isinstance(flags, RegexFlag)" timeit microbenchmark says that this operation has a cost of 46.2 ns (+- 1.6 ns). Benchmarks are strange, as usual :-)

    @vstinner
    Copy link
    Member

    @vstinner vstinner commented Oct 5, 2017

    Serhiy: "PR 3867 looks unpythonic to me. We usually don't check the type of arguments. This complicate and slow down a code. Do you have a realistic example when the current behavior harms?"

    No, I don't have any realistic example. I just noticed the bug and I was surprised. In practice, flags=2.0 instead of flags=2 is not totally wrong, it's okish to accept 2.0. Let's just forget this corner case. I abandon my change.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 performance stdlib
    Projects
    None yet
    Development

    No branches or pull requests

    5 participants