Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check for signals during regular expression matches #39573

Closed
joshhoyt mannequin opened this issue Nov 21, 2003 · 12 comments
Closed

Check for signals during regular expression matches #39573

joshhoyt mannequin opened this issue Nov 21, 2003 · 12 comments
Assignees
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs)

Comments

@joshhoyt
Copy link
Mannequin

joshhoyt mannequin commented Nov 21, 2003

BPO 846388
Nosy @gvanrossum, @loewis, @facundobatista
Files
  • _sre.c.patch: Allow signal handlers to run during _sre searches
  • patch
  • patch.speedup
  • sre_exception.diff
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/facundobatista'
    closed_at = <Date 2008-01-09.14:22:46.703>
    created_at = <Date 2003-11-21.06:29:21.000>
    labels = ['interpreter-core']
    title = 'Check for signals during regular expression matches'
    updated_at = <Date 2008-02-05.04:13:26.208>
    user = 'https://bugs.python.org/joshhoyt'

    bugs.python.org fields:

    activity = <Date 2008-02-05.04:13:26.208>
    actor = 'gvanrossum'
    assignee = 'facundobatista'
    closed = True
    closed_date = <Date 2008-01-09.14:22:46.703>
    closer = 'facundobatista'
    components = ['Interpreter Core']
    creation = <Date 2003-11-21.06:29:21.000>
    creator = 'joshhoyt'
    dependencies = []
    files = ['5695', '8679', '8680', '9064']
    hgrepos = []
    issue_num = 846388
    keywords = ['patch']
    message_count = 12.0
    messages = ['44910', '44911', '44912', '57056', '57067', '57068', '59246', '59247', '59267', '59600', '61925', '62062']
    nosy_count = 6.0
    nosy_names = ['gvanrossum', 'loewis', 'effbot', 'facundobatista', 'joshhoyt', 'schmir']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = None
    status = 'closed'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue846388'
    versions = ['Python 2.4']

    @joshhoyt
    Copy link
    Mannequin Author

    joshhoyt mannequin commented Nov 21, 2003

    This patch adds a call to PyErr_CheckSignals to
    SRE_MATCH so that signal handlers can be invoked during
    long regular expression matches. It also adds a new
    error return value indicating that an exception
    occurred in a signal handler during the match, allowing
    exceptions in the signal handler to propagate up to the
    main loop.

    Rationale:

    Regular expressions can run away inside of the C code.
    There is no way for Python code to stop the C code from
    running away, so we attempted to use setrlimit to
    interrupt the process when the CPU usage exceeded a limit.

    When the signal was received, the signal function was
    triggered. The sre code does not allow the main loop to
    run, so the triggered handlers were not called until
    the regular expression finished, if ever. This
    behaviour makes the interruption by the signal useless
    for the purposes of constraining the running time of
    regular expression matches.

    I am unsure whether the PyErr_CheckSignals is
    lightweight enough to be called inside of the for loop
    in SRE_MATCH, so I took the conservative approach and
    only checked on recursion or match invocation. I
    believe that the performance hit from this check would
    not be prohibitive inside of the loop, since
    PyErr_CheckSignals does very little work unless there
    is a signal to handle.

    @joshhoyt joshhoyt mannequin assigned effbot Nov 21, 2003
    @joshhoyt joshhoyt mannequin added the interpreter-core (Objects, Python, Grammar, and Parser dirs) label Nov 21, 2003
    @joshhoyt joshhoyt mannequin assigned effbot Nov 21, 2003
    @joshhoyt joshhoyt mannequin added the interpreter-core (Objects, Python, Grammar, and Parser dirs) label Nov 21, 2003
    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Nov 22, 2003

    Logged In: YES
    user_id=21627

    Can you give an example for a SRE matching that is so slow
    that the user may press Ctrl-C?

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Nov 13, 2006

    Logged In: YES
    user_id=21627

    Fredrik, what do you think about this patch?

    @schmir
    Copy link
    Mannequin

    schmir mannequin commented Nov 2, 2007

    here is an example (from http://swtch.com/~rsc/regexp/regexp1.html)

    python -c 'import re; num=25; r=re.compile("a?"*num+"a"*num);
    r.match("a"*num)'

    At work I have seen a real world case of a regular expression which ran
    for minutes rendering the application unresponsive. We would have been
    glad to be able to interrupt the regular expression engine and getting a
    traceback.

    @schmir
    Copy link
    Mannequin

    schmir mannequin commented Nov 2, 2007

    I'm attaching a working patch against 2.5.1 and a short test program.

    #! /usr/bin/env python

    import signal
    import re
    import time
    
    
    def main():
        num=28 # need more than 60s on a 2.4Ghz core 2
        r=re.compile("a?"*num+"a"*num)
    
        signal.signal(signal.SIGALRM, signal.default_int_handler)
        signal.alarm(1)
        stime = time.time()
        try:
            r.match("a"*num)
        except KeyboardInterrupt:
            assert time.time()-stime<3
        else:
            raise RuntimeError("no keyboard interrupt")
    
    if __name__=='__main__':
        main()

    @schmir
    Copy link
    Mannequin

    schmir mannequin commented Nov 2, 2007

    hm. just noticed that calling PyErr_CheckSignals slows down regular
    expression matching considerably (50%).

    I'm adding another patch, which only checks every 4096th iteration for
    signals.

    @tiran tiran added type-feature A feature request or enhancement labels Jan 4, 2008
    @facundobatista
    Copy link
    Member

    Couldn't apply cleanly the patch, as it appears to be a diff in other
    format.

    Anyway, applied it by hand, and now I attach the correct svn diff.

    The test cases run ok with this change, and the problem is solved.

    Regarding the delay introduced, I tested it with:

      $ ./python timeit.py -s "import re;r=re.compile('a?a?a?a?a?aaaaa')"
    "r.match('aaaaa')"

    Trunk:
    100000 loops, best of 3: 5.4 usec per loop
    100000 loops, best of 3: 5.32 usec per loop
    100000 loops, best of 3: 5.41 usec per loop

    Patch applied:
    100000 loops, best of 3: 7.28 usec per loop
    100000 loops, best of 3: 6.79 usec per loop
    100000 loops, best of 3: 7.00 usec per loop

    I don't like that. Anyway, I do NOT trust for timing the system where
    I'm making the timing, so you may get different results.

    Suggestions?

    @facundobatista facundobatista removed type-feature A feature request or enhancement labels Jan 4, 2008
    @schmir
    Copy link
    Mannequin

    schmir mannequin commented Jan 4, 2008

    ./python Lib/timeit.py -n 1000000 -s "import
    re;r=re.compile('a?a?a?a?a?aaaaa')" "r.match('aaaaa')" gives me for

    Trunk:
    1000000 loops, best of 3: 3.02 usec per loop
    1000000 loops, best of 3: 2.99 usec per loop
    1000000 loops, best of 3: 3.01 usec per loop

    Patched:
    1000000 loops, best of 3: 3.04 usec per loop
    1000000 loops, best of 3: 3.04 usec per loop
    1000000 loops, best of 3: 3.14 usec per loop

    which would be ok, I guess.

    (This is on a 64bit debian testing with gcc 4.2.3).

    Can you test with the following:

    if ((0 == (sigcount & 0xffffffff)) && PyErr_CheckSignals())

    (i.e. the code will (nearly) not even call PyErr_CheckSignals).

    I guess this is some c compiler optimization issue (seems like mine does
    a better job at optimizing :)

    @gvanrossum
    Copy link
    Member

    Mind if I assign this to Facundo? Facundo, if you wish to pass this on,
    just unassign it.

    @gvanrossum gvanrossum assigned facundobatista and unassigned effbot Jan 4, 2008
    @facundobatista
    Copy link
    Member

    Retried it in a platform where I trust timing, and it proved ok.

    So, problem solved, no performance impact, all tests pass ok. Commited
    in r59862.

    Thank you all!

    @gvanrossum
    Copy link
    Member

    I think this is worth backporting to 2.5.2. This and r60054 are the
    *only* differences between _sre.c in 2.5.2 and 2.6.

    @gvanrossum
    Copy link
    Member

    Backported to 2.5.2 as r60576. (The other deltas are not backported.)

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 9, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    interpreter-core (Objects, Python, Grammar, and Parser dirs)
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants