Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make semaphore_tracker track other system resources #81048

Closed
pierreglaser mannequin opened this issue May 9, 2019 · 15 comments
Closed

Make semaphore_tracker track other system resources #81048

pierreglaser mannequin opened this issue May 9, 2019 · 15 comments
Labels
3.8 (EOL) end of life performance Performance or resource usage stdlib Python modules in the Lib dir

Comments

@pierreglaser
Copy link
Mannequin

pierreglaser mannequin commented May 9, 2019

BPO 36867
Nosy @pitrou, @vstinner, @ogrisel, @pablogsal, @tirkarthi, @pierreglaser
PRs
  • bpo-36867: Make semaphore_tracker track other system resources. #13222
  • bpo-36867: Create the resource_tracker before launching SharedMemoryManagers #13276
  • bpo-36894: Fix regression in test_multiprocessing_spawn (no tests run on Windows) #13290
  • bpo-36867: Avoid weak syncronization primitive in resource_tracker tests #13292
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2019-05-17.18:22:48.167>
    created_at = <Date 2019-05-09.17:36:01.009>
    labels = ['3.8', 'library', 'performance']
    title = 'Make semaphore_tracker track other system resources'
    updated_at = <Date 2019-06-12.11:47:54.660>
    user = 'https://github.com/pierreglaser'

    bugs.python.org fields:

    activity = <Date 2019-06-12.11:47:54.660>
    actor = 'vstinner'
    assignee = 'none'
    closed = True
    closed_date = <Date 2019-05-17.18:22:48.167>
    closer = 'vstinner'
    components = ['Library (Lib)']
    creation = <Date 2019-05-09.17:36:01.009>
    creator = 'pierreglaser'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 36867
    keywords = ['patch']
    message_count = 15.0
    messages = ['341987', '342133', '342201', '342246', '342307', '342308', '342309', '342319', '342321', '342326', '342365', '342374', '342749', '342752', '345322']
    nosy_count = 6.0
    nosy_names = ['pitrou', 'vstinner', 'Olivier.Grisel', 'pablogsal', 'xtreak', 'pierreglaser']
    pr_nums = ['13222', '13276', '13290', '13292']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'resource usage'
    url = 'https://bugs.python.org/issue36867'
    versions = ['Python 3.8']

    @pierreglaser
    Copy link
    Mannequin Author

    pierreglaser mannequin commented May 9, 2019

    Hi all,

    Olivier Grisel, Thomas Moreau and myself are currently working on increasing
    the range of action of the semaphore_tracker in Python.

    multiprocessing.semaphore_tracker is a little known module, that launches a
    server process used to track the life cycle of semaphores created in a python
    session, and potentially cleanup those semaphores after all python processes of
    the session terminated. Normally, python processes cleanup semaphores they
    create. This is however not not guaranteed if the processes get violently
    interrupted (using for example the bash command "killall python")

    A note on why the semaphore_tracker was introduced: Cleaning up semaphores
    after termination is important because the system only supports a limited
    number of named semaphores, and they will not be automatically removed till the
    next reboot.

    Now, Python 3.8 introduces shared memory segments creation. Shared memory is
    another sensitive global system resource. Currently, unexpected termination of
    processes that created memory segments will result in leaking those memory
    segments. This can be problematic for large compute clusters with many users
    and that are rebooted rarely.

    For this reason, we expanded the semaphore_tracker to also track shared memory
    segments, and renamed it resource_tracker. Shared memory segments get
    automatically tracked by the resource tracker when they are created. This is a
    first, self-contained fix. (1)

    Additionally, supporting shared memory tracking led to a more generic design
    for the resource_tracker. The resource_tracker can be now easily extended
    to track arbitrary resource types.
    A public API could potentially be exposed for users willing to track other
    types. One for example may want to add tracking for temporary folders creating
    during python sessions. Another use case is the one of joblib, which
    is a widely-used parallel-computing package, and also the backend of
    scikit-learn. Joblib relies heavily on memmapping. A public API could extend
    the resource_tracker to track memmap-ed objects with very little code.

    Therefore, this issue serves two purposes:

    • referencing the semaphore_tracker enhancement mentioned in (1)
    • discussing a potentially public resource_tracker API.

    @pierreglaser pierreglaser mannequin added 3.8 (EOL) end of life stdlib Python modules in the Lib dir performance Performance or resource usage labels May 9, 2019
    @pitrou
    Copy link
    Member

    pitrou commented May 10, 2019

    New changeset f22cc69 by Antoine Pitrou (Pierre Glaser) in branch 'master':
    bpo-36867: Make semaphore_tracker track other system resources (GH-13222)
    f22cc69

    @pierreglaser
    Copy link
    Mannequin Author

    pierreglaser mannequin commented May 11, 2019

    Shared memory segments are now tracked by the brand new resource_tracker!
    Thanks Antoine for the review.

    Does anyone have an opinion on introducing a public API for users to make the
    resource_tracker track resources of their choice?

    What We have in mind is:

    • making public the existing resource_tracker.register/unregister

    • adding a new function, resource_tracker.make_trackable(resource_type,
      cleanup_func), where:

      • resource_type is a string (an identifier that will be used each time a
        resource of the type resource_type needs tracking, via the call
        resource_tracker.register(resource_name, resource_type)
      • cleanup_func must be a callable taking a single string as argument (the
        name of the resource that needs tracking). This function will be called
        after the end of a process for reach resource of type resource_type the
        process did not clean properly

    Under the hood, make_trackable simply populates resource_tracker._CLEANUP_FUNCS
    with new items.

    Here is a simple example:
    import os
    import resource_tracker
    import shutil
    from multiprocessing import util

    	class ClassCreatingAFolder:
    	    """Class where each instance creates a temporary folder.
        Each temporary folder is supposed to exist for the duration of the instance
        that created it.
        """
        def \_\_init__(self, folder_name):
    	self.folder_name = folder_name
    	os.mkdir(folder_name)
    
    		# any instance normally garbage-collected should remove its folder, and
    		# notice the resource_tracker that its folder was correctly removed.
    		util.Finalize(self, ClassCreatingAFolder.cleanup, args=(folder_name,))
    
    		# If this session quits abruptly, the finalizer will not be called for
    		# the instances of ClassCreatingAFolder that were still alive
    		# before the shutdown. The resource_tracker comes into play, and removes
    		# the folders associated to each of these resources.
    		resource_tracker.register(
    		    folder_name, # argument to shutil.rmtree
    		    "ClassCreatingAFolder")
    
    	    @staticmethod
    	    def cleanup(folder_name):
    		resource_tracker.unregister(folder_name, "ClassCreatingAFolder")
    		shutil.rmtree(folder_name)
    
    	# Tell the resource_tracker how to cleanup resources created by
    	# ClassCreatingAFolder instances
    	resource_tracker.make_trackable("ClassCreatingAFolder", shutil.rmtree)

    Typical resources that can be made trackable include memmaped objects,
    temporary folders. Our use-case is joblib that has its own mmap type that we
    would like to track using the semaphore_tracker.

    Any thoughts?

    @tirkarthi
    Copy link
    Member

    Since the commit there is a warning in CI and locally while running tests.

    Travis CI :

    Before commit :

    ./python.exe -X tracemalloc -m test --fail-env-changed test_multiprocessing_forkserver
    Run tests sequentially
    0:00:00 load avg: 4.71 [1/1] test_multiprocessing_forkserver
    test_multiprocessing_forkserver passed in 2 min 21 sec

    == Tests result: SUCCESS ==

    1 test OK.

    Total duration: 2 min 21 sec
    Tests result: SUCCESS

    After commit f22cc69 :

    ./python.exe -X tracemalloc -m test --fail-env-changed test_multiprocessing_forkserver
    Run tests sequentially
    0:00:00 load avg: 4.08 [1/1] test_multiprocessing_forkserver
    /Users/karthikeyansingaravelan/stuff/python/cpython/Lib/multiprocessing/resource_tracker.py:198: UserWarning: resource_tracker: There appear to be 1 leaked shared_memory objects to clean up at shutdown
    warnings.warn('resource_tracker: There appear to be %d '
    /Users/karthikeyansingaravelan/stuff/python/cpython/Lib/unittest/case.py:683: ResourceWarning: unclosed file <_io.BufferedReader name=7>
    testMethod()
    Object allocated at (most recent call last):
    File "/Users/karthikeyansingaravelan/stuff/python/cpython/Lib/subprocess.py", lineno 819
    self.stdout = io.open(c2pread, 'rb', bufsize)
    test_multiprocessing_forkserver passed in 2 min 20 sec

    == Tests result: SUCCESS ==

    1 test OK.

    Total duration: 2 min 20 sec
    Tests result: SUCCESS

    == Tests result: SUCCESS ==

    1 test OK.

    Total duration: 2 min 26 sec
    Tests result: SUCCESS

    @pierreglaser
    Copy link
    Mannequin Author

    pierreglaser mannequin commented May 13, 2019

    Yes, one test I wrote in an unrelated commit does not unlink a memory segment. Now the ResourceTracker complains. Fixing it now.

    @pierreglaser
    Copy link
    Mannequin Author

    pierreglaser mannequin commented May 13, 2019

    Actually, I was properly unlinking the shared_memory segments. The warning messages are due to bad interactions between the ResourceTracker and the SharedMemoryManager object. In this particular case, it's easy to change a little bit the problematic test to avoid the warnings. I will focus on solving those bad interactions right after.

    @vstinner
    Copy link
    Member

    test_shared_memory_cleaned_after_process_termination() uses time as a weak synchronization primitive:

            # killing abruptly processes holding reference to a shared memory
            # segment should not leak the given memory segment.
            p.terminate()
            p.wait()
            time.sleep(1.0)  # wait for the OS to collect the segment
    
            with self.assertRaises(FileNotFoundError):
                smm = shared_memory.SharedMemory(name, create=False)

    Would it be possible to use a more reliable synchronization? Such test usually fail randomly.
    https://pythondev.readthedocs.io/unstable_tests.html

    @ogrisel
    Copy link
    Mannequin

    ogrisel mannequin commented May 13, 2019

    As Victor said, the time.sleep(1.0) might lead to Heisen failures. I am not sure how to write proper strong synchronization in this case but we could instead go for something intermediate such as the following pattern:

        ...
        p.terminate()
        p.wait()
        for i in range(60):
            try:
                shared_memory.SharedMemory(name, create=False)
            except FileNotFoundError:
                # the OS successfully collected the segment as expected
                break
    
                time.sleep(1.0)  # wait for the OS to collect the segment
         else:
             raise AssertionError(f"Failed to collect shared_memory segment {name}")
    

    What do you think?

    @pierreglaser
    Copy link
    Mannequin Author

    pierreglaser mannequin commented May 13, 2019

    We can do that, or maybe we can try to wait on the resource_tracker's pid?

    @vstinner
    Copy link
    Member

    I like Olivier's pattern. Maybe we can slowly increase the sleep to stop shortly if the resource goes away shortly.

    deadline = time.monotonic() + 60.0
    sleep = 0.010
    while ...:
       if ....: break
       if time.monotonic() > deadline: ... assert error ...
       sleep = min(sleep * 2, 5.0)
       time.sleep(1.0)

    It's kind of a common pattern. Maybe it should be an helper in test.support module.

    We can do that, or maybe we can try to wait on the resource_tracker's pid?

    I prefer to make sure that the resource goes away without inspecting multiprocessing internals.

    @pitrou
    Copy link
    Member

    pitrou commented May 13, 2019

    New changeset 50466c6 by Antoine Pitrou (Pierre Glaser) in branch 'master':
    bpo-36867: DOC update multiprocessing.rst (GH-13289)
    50466c6

    @pitrou
    Copy link
    Member

    pitrou commented May 13, 2019

    New changeset b1dfcad by Antoine Pitrou (Pierre Glaser) in branch 'master':
    bpo-36867: Create the resource_tracker before launching SharedMemoryManagers (GH-13276)
    b1dfcad

    @vstinner
    Copy link
    Member

    New changeset cbe72d8 by Victor Stinner (Pierre Glaser) in branch 'master':
    bpo-36867: _test_multiprocessing: avoid weak sync primitive (GH-13292)
    cbe72d8

    @vstinner
    Copy link
    Member

    It seems like all known bugs are fixed, I close again the issue. Thanks!

    @vstinner
    Copy link
    Member

    The new test is not reliable, see: bpo-37244 "test_multiprocessing_forkserver: test_resource_tracker() failed on x86 Gentoo Refleaks 3.8".

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.8 (EOL) end of life performance Performance or resource usage stdlib Python modules in the Lib dir
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants