Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix for bpo-36402 (threading._shutdown() race condition) causes reference leak #81969

Closed
akruis mannequin opened this issue Aug 7, 2019 · 23 comments
Closed

fix for bpo-36402 (threading._shutdown() race condition) causes reference leak #81969

akruis mannequin opened this issue Aug 7, 2019 · 23 comments
Labels
3.9 only security fixes 3.10 only security fixes 3.11 only security fixes performance Performance or resource usage stdlib Python modules in the Lib dir

Comments

@akruis
Copy link
Mannequin

akruis mannequin commented Aug 7, 2019

BPO 37788
Nosy @mdickinson, @jaraco, @pitrou, @vstinner, @vadmium, @koobs, @akruis, @pablogsal, @miss-islington, @shihai1991, @haoren3696, @ViktorPegy, @victorvorobev
PRs
  • bpo-37788: fix reference leak caused by threading._shutdown_locks #15175
  • bpo-37788: Fix a reference leak if a thread is not joined #15228
  • [3.7] bpo-37788: Fix a reference leak if a thread is not joined (GH-15228) #15336
  • [3.8] bpo-37788: Fix a reference leak if a thread is not joined (GH-15228) #15337
  • Revert "bpo-37788: Fix a reference leak if a thread is not joined" #15338
  • [3.7] bpo-37788: Fix continuous memory leak occurs when multiple subthreads which are not joined are started. #25148
  • bpo-37788: Fix a reference leak if a thread is not joined. #25226
  • bpo-37788: Fix reference leak when Thread is never joined #26103
  • [3.10] bpo-37788: Fix reference leak when Thread is never joined (GH-26103) #26138
  • [3.9] bpo-37788: Fix reference leak when Thread is never joined (GH-26103) #26142
  • bpo-37788: Fix reference leak when Thread is never joined #27473
  • [3.8] bpo-37788: Fix reference leak when Thread is never joined (GH-26103) #28239
  • Files
  • threading-leak-test-case.diff
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2021-05-15.09:53:03.268>
    created_at = <Date 2019-08-07.16:22:52.588>
    labels = ['3.11', 'library', '3.9', '3.10', 'performance']
    title = 'fix for bpo-36402 (threading._shutdown() race condition) causes reference leak'
    updated_at = <Date 2021-09-08.12:30:51.049>
    user = 'https://github.com/akruis'

    bugs.python.org fields:

    activity = <Date 2021-09-08.12:30:51.049>
    actor = 'victorvorobev'
    assignee = 'none'
    closed = True
    closed_date = <Date 2021-05-15.09:53:03.268>
    closer = 'pitrou'
    components = ['Library (Lib)']
    creation = <Date 2019-08-07.16:22:52.588>
    creator = 'anselm.kruis'
    dependencies = []
    files = ['48534']
    hgrepos = []
    issue_num = 37788
    keywords = ['patch', '3.7regression']
    message_count = 23.0
    messages = ['349173', '349225', '349973', '349977', '349980', '354460', '354463', '358641', '359477', '359479', '371866', '380391', '388606', '388609', '388611', '388618', '389671', '389696', '390118', '390817', '393709', '393710', '394307']
    nosy_count = 16.0
    nosy_names = ['mark.dickinson', 'jaraco', 'pitrou', 'vstinner', 'python-dev', 'martin.panter', 'koobs', 'anselm.kruis', 'pablogsal', 'miss-islington', 'shihai1991', 'Zhipeng Xie', 'krypticus', 'Arkady M', 'ViktorPegy', 'victorvorobev']
    pr_nums = ['15175', '15228', '15336', '15337', '15338', '25148', '25226', '26103', '26138', '26142', '27473', '28239']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'resource usage'
    url = 'https://bugs.python.org/issue37788'
    versions = ['Python 3.9', 'Python 3.10', 'Python 3.11']

    @akruis
    Copy link
    Mannequin Author

    akruis mannequin commented Aug 7, 2019

    Starting with commit 468e5fe (bpo-36402: Fix threading._shutdown() race condition (GH-13948)) the following trivial test case leaks one reference and one memory block.

    class MiscTestCase(unittest.TestCase):
        def test_without_join(self):
            # Test that a thread without join does not leak references.
            # Use a debug build and run "python -m test -R: test_threading"
            threading.Thread().start()

    Attached is a patch, that adds this test case to Lib/test/test_threading.py. After you apply this patch "python -m test -R: test_threading" leaks one (additional) reference. This leak is also present in Python 3.7.4 and 3.8.

    I'm not sure, if it correct not to join a thread, but it did work flawlessly and didn't leak in previous releases.
    I didn't analyse the root cause yet.

    @akruis akruis mannequin added 3.7 (EOL) end of life 3.8 (EOL) end of life 3.9 only security fixes stdlib Python modules in the Lib dir performance Performance or resource usage labels Aug 7, 2019
    @akruis
    Copy link
    Mannequin Author

    akruis mannequin commented Aug 8, 2019

    The root cause for the reference leak is the global set threading._shutdown_locks. It contains Thread._tstate_lock locks of non-daemon threads. If a non-daemon thread terminates and no other thread joins the terminated thread, the _tstate_lock remains in threading._shutdown_locks forever.

    I could imagine that a long running server could accumulate many locks in threading._shutdown_locks over time. Therefore the leak should be fixed.

    There are probably several ways to deal with this issue. A straight forward approach is to discard the lock from within tstate->on_delete hook, that is function "void release_sentinel(void *)" in _threadmodule.c. Pull request (GH-15175) implements this idea. Eventually I should add another C-Python specific test-case to the PR.

    @vstinner
    Copy link
    Member

    New changeset d3dcc92 by Victor Stinner in branch 'master':
    bpo-37788: Fix a reference leak if a thread is not joined (GH-15228)
    d3dcc92

    @vstinner
    Copy link
    Member

    d3dcc92

    This change introduced a regression :-(
    #15228 (comment)

    I created PR 15338 to revert it

    @vstinner
    Copy link
    Member

    New changeset d11c2c6 by Victor Stinner in branch 'master':
    Revert "bpo-37788: Fix a reference leak if a thread is not joined (GH-15228)" (GH-15338)
    d11c2c6

    @vstinner
    Copy link
    Member

    Copy of wmanley's comment:
    #13948 (comment)
    """
    This caused a regression for people overriding Thread.join to implement custom thread interruption. See https://stackoverflow.com/questions/50486083/ending-non-daemon-threads-when-shutting-down-an-interactive-python-session
    """

    @pablogsal
    Copy link
    Member

    Hummm....is this really a regression? The docs say:

    No other methods (except for the constructor) should be overridden in a subclass. In other words, only override the __init__() and run() methods of this class.

    So if someone is overriding join() they are out of contract

    @krypticus
    Copy link
    Mannequin

    krypticus mannequin commented Dec 18, 2019

    I ran into this bug as well, and opened an issue for it (before I saw this issue): https://bugs.python.org/issue39074

    Was there a conclusion on the best way to fix this? It seems like the previous __del__ implementation would correct the resource leakage by removing the _tstate_lock from _shutdown_locks.

    @vstinner
    Copy link
    Member

    vstinner commented Jan 6, 2020

    I marked bpo-39074 as a duplicate of this issue.

    @vstinner
    Copy link
    Member

    vstinner commented Jan 6, 2020

    I marked bpo-39201 as a duplicate of this issue.

    @ArkadyM
    Copy link
    Mannequin

    ArkadyM mannequin commented Jun 19, 2020

    I have reproduced a similar memory leak in the multiprocessing

    This is the trace:

    913 memory blocks: 33232 Bytes: File "/usr/lib/python3.7/threading.py", line 890; self._bootstrap_inner(); File "/usr/lib/python3.7/threading.py", line 914; self._set_tstate_lock(); File "/usr/lib/python3.7/threading.py", line 904; self._tstate_lock = _set_sentinel();

    @jaraco
    Copy link
    Member

    jaraco commented Nov 5, 2020

    I marked bpo-42263 as a duplicate of this issue.

    This issue is implicated in preventing the desired fix for bpo-37193, where a thread wishes to remove the handle to itself after performing its duty. By removing its own handle, it can never be joined, and thus obviates the most straightforward way to directly remove the handle for a thread within that thread.

    @jaraco jaraco added the 3.10 only security fixes label Nov 5, 2020
    @vstinner
    Copy link
    Member

    bpo-37193 has been fixed.

    @mdickinson
    Copy link
    Member

    @victor: How does the fix for bpo-37193 solve this issue? I think I'm missing something.

    @mdickinson
    Copy link
    Member

    Re-opening; I believe this issue is still valid.

    Here's output on a debug build on current master (commit 7591d94) on my machine:

    lovelace:cpython mdickinson$ ./python.exe
    Python 3.10.0a6+ (remotes/upstream/master:7591d9455e, Mar 13 2021, 12:04:31) [Clang 11.0.0 (clang-1100.0.33.17)] on darwin
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import sys, threading, time
    >>> for _ in range(10):
    ...     threading.Thread().start()
    ...     time.sleep(0.1)
    ...     print(sys.gettotalrefcount())
    ... 
    109901
    109905
    109907
    109909
    109911
    109913
    109915
    109917
    109919
    109921

    @mdickinson mdickinson reopened this Mar 13, 2021
    @vstinner
    Copy link
    Member

    Oh, I'm confused. This issue is about the threading module, whereas bpo-37193 is unrelated: it's about the socketserver module.

    @haoren3696
    Copy link
    Mannequin

    haoren3696 mannequin commented Mar 29, 2021

    ping

    I also encountered this problem. Is there a fix now?

    @jaraco
    Copy link
    Member

    jaraco commented Mar 29, 2021

    No. The issue remains open.

    @jaraco jaraco removed the 3.7 (EOL) end of life label Mar 29, 2021
    @vadmium
    Copy link
    Member

    vadmium commented Apr 3, 2021

    Anselm's pull request PR 15175 looked hopeful to me. I've been using those changes in our builds at work for a while.

    @shihai1991
    Copy link
    Member

    I created PR 25226. It's a another way to solve this problem.
    Compared to PR 15175, _set_sentinel() don't need to receive any params :)

    @pitrou pitrou added the 3.11 only security fixes label May 13, 2021
    @pitrou pitrou removed the 3.8 (EOL) end of life label May 15, 2021
    @miss-islington
    Copy link
    Contributor

    New changeset 71dca6e by Miss Islington (bot) in branch '3.10':
    [3.10] bpo-37788: Fix reference leak when Thread is never joined (GH-26103) (GH-26138)
    71dca6e

    @miss-islington
    Copy link
    Contributor

    New changeset b30b25b by Antoine Pitrou in branch '3.9':
    [3.9] bpo-37788: Fix reference leak when Thread is never joined (GH-26103) (GH-26142)
    b30b25b

    @pitrou pitrou closed this as completed May 15, 2021
    @vstinner
    Copy link
    Member

    Fix in the main branch:

    commit c10c2ec
    Author: Antoine Pitrou <antoine@python.org>
    Date: Fri May 14 21:37:20 2021 +0200

    bpo-37788: Fix reference leak when Thread is never joined (GH-26103)
    
    
    
    When a Thread is not joined after it has stopped, its lock may remain in the _shutdown_locks set until interpreter shutdown.  If many threads are created this way, the _shutdown_locks set could therefore grow endlessly.  To avoid such a situation, purge expired locks each time a new one is added or removed.
    

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.9 only security fixes 3.10 only security fixes 3.11 only security fixes performance Performance or resource usage stdlib Python modules in the Lib dir
    Projects
    None yet
    Development

    No branches or pull requests

    8 participants