-
-
Notifications
You must be signed in to change notification settings - Fork 30.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Check for signals during regular expression matches #39573
Comments
This patch adds a call to PyErr_CheckSignals to Rationale: Regular expressions can run away inside of the C code. When the signal was received, the signal function was I am unsure whether the PyErr_CheckSignals is |
Logged In: YES Can you give an example for a SRE matching that is so slow |
Logged In: YES Fredrik, what do you think about this patch? |
here is an example (from http://swtch.com/~rsc/regexp/regexp1.html) python -c 'import re; num=25; r=re.compile("a?"*num+"a"*num); At work I have seen a real world case of a regular expression which ran |
I'm attaching a working patch against 2.5.1 and a short test program. #! /usr/bin/env python import signal
import re
import time
def main():
num=28 # need more than 60s on a 2.4Ghz core 2
r=re.compile("a?"*num+"a"*num)
signal.signal(signal.SIGALRM, signal.default_int_handler)
signal.alarm(1)
stime = time.time()
try:
r.match("a"*num)
except KeyboardInterrupt:
assert time.time()-stime<3
else:
raise RuntimeError("no keyboard interrupt")
if __name__=='__main__':
main() |
hm. just noticed that calling PyErr_CheckSignals slows down regular I'm adding another patch, which only checks every 4096th iteration for |
Couldn't apply cleanly the patch, as it appears to be a diff in other Anyway, applied it by hand, and now I attach the correct svn diff. The test cases run ok with this change, and the problem is solved. Regarding the delay introduced, I tested it with: $ ./python timeit.py -s "import re;r=re.compile('a?a?a?a?a?aaaaa')"
"r.match('aaaaa')" Trunk: Patch applied: I don't like that. Anyway, I do NOT trust for timing the system where Suggestions? |
./python Lib/timeit.py -n 1000000 -s "import Trunk: Patched: which would be ok, I guess. (This is on a 64bit debian testing with gcc 4.2.3). Can you test with the following: if ((0 == (sigcount & 0xffffffff)) && PyErr_CheckSignals()) (i.e. the code will (nearly) not even call PyErr_CheckSignals). I guess this is some c compiler optimization issue (seems like mine does |
Mind if I assign this to Facundo? Facundo, if you wish to pass this on, |
Retried it in a platform where I trust timing, and it proved ok. So, problem solved, no performance impact, all tests pass ok. Commited Thank you all! |
I think this is worth backporting to 2.5.2. This and r60054 are the |
Backported to 2.5.2 as r60576. (The other deltas are not backported.) |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: