Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SRE engine should release the GIL when/if possible #42625

Open
ericnoyau mannequin opened this issue Nov 25, 2005 · 10 comments
Open

SRE engine should release the GIL when/if possible #42625

ericnoyau mannequin opened this issue Nov 25, 2005 · 10 comments
Assignees
Labels
3.8 only security fixes extension-modules C modules in the Modules dir topic-regex type-feature A feature request or enhancement

Comments

@ericnoyau
Copy link
Mannequin

ericnoyau mannequin commented Nov 25, 2005

BPO 1366311
Nosy @loewis, @warsaw, @birkenfeld, @serhiy-storchaka
Files
  • regex.diff: updated patch for _sre.c and sre.h
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/serhiy-storchaka'
    closed_at = None
    created_at = <Date 2005-11-25.13:57:15.000>
    labels = ['extension-modules', 'type-feature', '3.8']
    title = 'SRE engine should release the GIL when/if possible'
    updated_at = <Date 2018-05-25.18:58:39.484>
    user = 'https://bugs.python.org/ericnoyau'

    bugs.python.org fields:

    activity = <Date 2018-05-25.18:58:39.484>
    actor = 'barry'
    assignee = 'serhiy.storchaka'
    closed = False
    closed_date = None
    closer = None
    components = ['Extension Modules']
    creation = <Date 2005-11-25.13:57:15.000>
    creator = 'eric_noyau'
    dependencies = []
    files = ['6894']
    hgrepos = []
    issue_num = 1366311
    keywords = []
    message_count = 10.0
    messages = ['49122', '49123', '49124', '49125', '49126', '49127', '49128', '317692', '317700', '317701']
    nosy_count = 6.0
    nosy_names = ['loewis', 'effbot', 'barry', 'georg.brandl', 'eric_noyau', 'serhiy.storchaka']
    pr_nums = []
    priority = 'normal'
    resolution = None
    stage = None
    status = 'open'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue1366311'
    versions = ['Python 3.8']

    @ericnoyau
    Copy link
    Mannequin Author

    ericnoyau mannequin commented Nov 25, 2005

    In a multi-threaded program that does lots of regular
    expression searching, some of them on very long strings
    with complex regex we've noticed that everything stops
    when a regular expression is searching.

    One of the issue is that the re engine does not release
    the interpreter lock while it is running. All the
    other threads are therefore blocked for the entire time
    it takes to do the regular expression search.

    See the thread in python-dev about it:

    http://mail.python.org/pipermail/python-dev/2005-November/058316.html

    @ericnoyau ericnoyau mannequin closed this as completed Nov 25, 2005
    @ericnoyau ericnoyau mannequin assigned effbot Nov 25, 2005
    @ericnoyau ericnoyau mannequin closed this as completed Nov 25, 2005
    @ericnoyau ericnoyau mannequin assigned effbot Nov 25, 2005
    @ericnoyau
    Copy link
    Mannequin Author

    ericnoyau mannequin commented Nov 25, 2005

    Logged In: YES
    user_id=1388768

    I'm attaching a diff to this bug that remove this limitation
    if it sane to do so. If a search is done on a string or a
    unicode object (which by definition are immutable) the GIL
    is released and reacquired everytime a search is done.

    I've tested this patch in both a simple tests (start a
    thread with a greedy regex on a monstruous string and verify
    that the othe python threads are still active) and by
    running our internal application verifying that nothing is
    blocking anymore.

    @arigo
    Copy link
    Mannequin

    arigo mannequin commented Nov 25, 2005

    Logged In: YES
    user_id=4771

    The patch looks good, but I wonder if it is safe. The SRE_STATE structure that SRE_SEARCH_INNER uses is potentially visible to the application-level Python code, via the (undocumented) scanner objects:

    >>> r = re.compile(r"hello")
    >>> s = r.scanner("big string in which to search")
    >>> s.search()
    <_sre.SRE_Match object at 0x12345678>

    Each call to s.search() continues the previous search with the same SRE_STATE. The problem with releasing the GIL as you do is that several threads could call s.search() concurrently, which would most probably crash CPython.

    This probably means that you need to add a lock in SRE_STATE and acquire it while searching, to serialize its usage. Of course, we should then be careful about what overhead this gives to applications that use regexps on a lot of small strings...

    Another note: for consistency, match() should also release the GIL if search() does.

    @ericnoyau
    Copy link
    Mannequin Author

    ericnoyau mannequin commented Nov 28, 2005

    Logged In: YES
    user_id=1388768

    Thanks for your comments. I've updated the patch to fix your
    issues, but without introducing a per-state object lock.

    What I did instead is to mark a state as not supporting
    concurrency when a scanner object creates it. So the GIL
    will not be released for scanners objects at all.

    For consistency match also release the GIL now, if possible.

    @birkenfeld
    Copy link
    Member

    Logged In: YES
    user_id=1188172

    Fredrik, do you have time to review this?

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Nov 25, 2006

    I believe the patch is incorrect. While matching, sre may allocate memory through Python API, and it may raise exceptions through Python API. Neither is allowed when the GIL is released

    Tentatively rejecting the patch.

    Eric, if you think the patch is correct or can be corrected, please update it to the current subversion trunk.

    @ericnoyau
    Copy link
    Mannequin Author

    ericnoyau mannequin commented Nov 27, 2006

    Albeit I still think releasing the GIL during regex matching would be beneficial, I agree with Martin that the patch is not good enough for that purpose. I was not aware of the requirement to hold the GIL in order to do memory allocation.

    Anyway, since implementing this patch, we have reviewed our usage of regex and supressed the really greedy ones. As such this patch is no longer needed by us either. It would probably make our application a tiny fractional bit faster, but not the order of magnitude faster than we experienced before removing the big regexes.

    In conclusion I thank Martin for the review as I've learned something new, and instead of trying to do a more fine grained fix I'm closing this bug as the current behaviour is good enough if you avoid using stupid regexes...

    @warsaw
    Copy link
    Member

    warsaw commented May 25, 2018

    I'm reopening this bug even though it's been long closed and even though the attached patch is no longer relevant.

    We recently found a degenerative performance problem with entrypoints and threads. As the number of threads increases, it becomes increasingly horrible to extract entrypoints.

    https://github.com/takluyver/entrypoints

    The problem is that entrypoints uses glob.iglob() to find the .{dist,egg}-info directory. iglob() uses regexps on a fairly simple pattern. GIL contention can cause entrypoint discovery with 16 threads to take ~10 seconds for all threads to complete.

    We have some test cases that I'll try to attach later, but since my bugs archeology found this bug, I decided to reopen it rather than start with a new bug.

    @warsaw warsaw added 3.8 only security fixes extension-modules C modules in the Modules dir labels May 25, 2018
    @warsaw warsaw reopened this May 25, 2018
    @warsaw warsaw changed the title SRE engine do not release the GIL SRE engine should release the GIL when/if possible May 25, 2018
    @warsaw warsaw added 3.8 only security fixes extension-modules C modules in the Modules dir labels May 25, 2018
    @warsaw warsaw reopened this May 25, 2018
    @warsaw warsaw changed the title SRE engine do not release the GIL SRE engine should release the GIL when/if possible May 25, 2018
    @serhiy-storchaka
    Copy link
    Member

    It is possible to implement this feature, but it has some limitations.

    Did you try to use regex which has this feature? For short strings and simple patterns there may be no benefit.

    @serhiy-storchaka serhiy-storchaka self-assigned this May 25, 2018
    @serhiy-storchaka serhiy-storchaka added the type-feature A feature request or enhancement label May 25, 2018
    @serhiy-storchaka serhiy-storchaka self-assigned this May 25, 2018
    @serhiy-storchaka serhiy-storchaka added the type-feature A feature request or enhancement label May 25, 2018
    @warsaw
    Copy link
    Member

    warsaw commented May 25, 2018

    Did you try to use regex which has this feature? For short strings and simple patterns there may be no benefit.

    In my use case, it’s not possible, since the problematic API is glob.iglob() through multiple levels of calls.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.8 only security fixes extension-modules C modules in the Modules dir topic-regex type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants