Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Short repeated regex patterns can skip signal handling #109631

Closed
pan324 opened this issue Sep 20, 2023 · 4 comments · Fixed by #109886
Closed

Short repeated regex patterns can skip signal handling #109631

pan324 opened this issue Sep 20, 2023 · 4 comments · Fixed by #109886
Assignees
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) topic-regex type-bug An unexpected behavior, bug, or error

Comments

@pan324
Copy link
Contributor

pan324 commented Sep 20, 2023

Bug report

Bug description:

I mentioned regex but this probably applies to other modules as well. In the code below, if the # line right after the findall is commented out (as it is right now), the code raises the exception past the try-except block on the print("Safely reached EOF.") statement. The exception happens one statement later than expected. NB: I tested all Python versions on Windows but only 3.11 on Linux.

import _thread, re
from threading import Timer

s = "ab"*20000000
pattern = re.compile("ab+")
Timer(0.2, _thread.interrupt_main).start()
try:
    pattern.findall(s)
#    print() ###################################################################
except:
    print("Exception block")
print("Safely reached EOF.")

CPython versions tested on:

3.9, 3.10, 3.11, 3.12, CPython main branch

Operating systems tested on:

Linux, Windows

Linked PRs

@pan324 pan324 added the type-bug An unexpected behavior, bug, or error label Sep 20, 2023
@terryjreedy terryjreedy added topic-regex interpreter-core (Objects, Python, Grammar, and Parser dirs) labels Sep 21, 2023
@brandtbucher
Copy link
Member

brandtbucher commented Sep 25, 2023

I don't think we guarantee exactly when signals are handled, as long as they are handled eventually (meaning, reasonably promptly in pure-Python code).

For example, on recent Python versions, the following function will only ever return 3 after being interrupted (since we only check for signals at the bottom of the loop):

def f():
    try:
        while True:
            x = 0
            x = 1
            x = 2
            x = 3
    except:
        return x

Same idea in your code: even if the signal was delivered during the findall call, it may not be handled there.

So I'm not sure that this is a bug, but it does look like something about the way this pattern runs is bypassing the signal handling in the matching engine (if I had to guess, it's probably because we're repeating a single small operation instead of dispatching between multiple operations). That's probably worth improving, perhaps by adding MAYBE_CHECK_SIGNALS to the exit label in sre_lib.h (before jumping back into the previous "context").

@serhiy-storchaka, thoughts?

@brandtbucher brandtbucher changed the title SIGINT during regex falls through try-except block unless try block contains multiple statements Short repeated regex patterns can skip signal handling Sep 25, 2023
@brandtbucher
Copy link
Member

brandtbucher commented Sep 25, 2023

Ah, I think I misunderstood the cause. We're performing many short matches, and none of them actually increment the counter enough to trigger the signal handling branch in the matching engine?

@serhiy-storchaka serhiy-storchaka self-assigned this Sep 25, 2023
serhiy-storchaka added a commit to serhiy-storchaka/cpython that referenced this issue Sep 25, 2023
Counting for signal checking now continues in new match from the point where
it ended in the previous match instead of starting from 0.
@serhiy-storchaka
Copy link
Member

There is two things:

  1. Short repeated matches are not-interruptible. You can get KeyboardInterrupt only after returning from findall().
  2. Simple opcodes like POP_TOP perhaps is not interruptible too. And it is for good.
  3. print("Safely reached EOF.") immediately follows findall(), so it is the place where interruption can happen. except does not produce any bytecode. And it is for good too.

So the only issue is (1). #109867 fixes it.

Thank you for your report @pan324. It was an interesting issue.

serhiy-storchaka added a commit that referenced this issue Sep 26, 2023
)

Counting for signal checking now continues in new match from the point where
it ended in the previous match instead of starting from 0.
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Sep 26, 2023
…ythonGH-109867)

Counting for signal checking now continues in new match from the point where
it ended in the previous match instead of starting from 0.
(cherry picked from commit 8ac2085)

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Sep 26, 2023
…ythonGH-109867)

Counting for signal checking now continues in new match from the point where
it ended in the previous match instead of starting from 0.
(cherry picked from commit 8ac2085)

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
serhiy-storchaka added a commit that referenced this issue Sep 26, 2023
…H-109867) (GH-109885)

Counting for signal checking now continues in new match from the point where
it ended in the previous match instead of starting from 0.
(cherry picked from commit 8ac2085)

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
@pan324
Copy link
Contributor Author

pan324 commented Sep 26, 2023

Thanks! After posting I had gotten a bit anxious whether it actually qualified as a bug. Thank you both for the good info on signals and interruptibility!

csm10495 pushed a commit to csm10495/cpython that referenced this issue Sep 28, 2023
…ythonGH-109867)

Counting for signal checking now continues in new match from the point where
it ended in the previous match instead of starting from 0.
Yhg1s pushed a commit that referenced this issue Oct 2, 2023
…H-109867) (#109886)

gh-109631: Allow interruption of short repeated regex matches (GH-109867)

Counting for signal checking now continues in new match from the point where
it ended in the previous match instead of starting from 0.
(cherry picked from commit 8ac2085)

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) topic-regex type-bug An unexpected behavior, bug, or error
Projects
None yet
4 participants