Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop using imp.find_module() in multiprocessing #61516

Closed
brettcannon opened this issue Feb 27, 2013 · 9 comments
Closed

Stop using imp.find_module() in multiprocessing #61516

brettcannon opened this issue Feb 27, 2013 · 9 comments
Assignees
Labels
stdlib Python modules in the Lib dir

Comments

@brettcannon
Copy link
Member

BPO 17314
Nosy @brettcannon
Files
  • remove_imp.find_module.diff: remove use of imp.find_module() from multiprocessing/forking.py
  • mp-importlib.diff
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/brettcannon'
    closed_at = <Date 2013-06-07.15:46:09.012>
    created_at = <Date 2013-02-27.19:56:34.952>
    labels = ['library']
    title = 'Stop using imp.find_module() in multiprocessing'
    updated_at = <Date 2013-06-07.15:46:09.011>
    user = 'https://github.com/brettcannon'

    bugs.python.org fields:

    activity = <Date 2013-06-07.15:46:09.011>
    actor = 'brett.cannon'
    assignee = 'brett.cannon'
    closed = True
    closed_date = <Date 2013-06-07.15:46:09.012>
    closer = 'brett.cannon'
    components = ['Library (Lib)']
    creation = <Date 2013-02-27.19:56:34.952>
    creator = 'brett.cannon'
    dependencies = []
    files = ['29270', '29274']
    hgrepos = []
    issue_num = 17314
    keywords = ['patch']
    message_count = 9.0
    messages = ['183177', '183191', '183194', '189967', '189991', '190031', '190036', '190089', '190758']
    nosy_count = 3.0
    nosy_names = ['brett.cannon', 'python-dev', 'sbt']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue17314'
    versions = ['Python 3.4']

    @brettcannon
    Copy link
    Member Author

    I'm trying to remove all uses of imp.find_module()/load_module() and multiprocessing seems to have a single use of both purely for (re)loading a module. The attached patch moves over to importlib.find_loader() and subsequent load_module() call to match the semantics of imp.find_module()/load_module(). If a guaranteed reload is not necessary then importlib.import_module() is a single-line change.

    I ran the test suite, but there don't seem to be any explicit tests for this chunk of code (or am I missing something?).

    @brettcannon brettcannon added the stdlib Python modules in the Lib dir label Feb 27, 2013
    @sbt
    Copy link
    Mannequin

    sbt mannequin commented Feb 27, 2013

    I think this change will potentially make the main module get imported twice under different names when we transfer pickled data between processes. The current code (which is rather a mess) goes out of its way to avoid that.

    Basically the main process makes sys.modules['__mp_main__'] an alias for the main module, and other process import the parent's main module with __name__ == '__mp_main__' and make sys.modules['__main__'] an alias for that. This means that any functions/classes defined in the main module (from whatever process) will have

    obj.__module__ in {'__main__', '__mp_main__'}
    

    Unpickling such an object will succeed in any process without reimporting the main module.

    Attached is an alternative patch which is more like the original code and seems to work. (Maybe modifying loader.name is an abuse of the API.)

    @brettcannon
    Copy link
    Member Author

    It is an abuse since I didn't design that part of the API to function that way, but it's cool that it just happens to. =)

    I do see your use-case and it is legitimate, although extremely rare and narrow. Let me think about whether I want to add specific support either through your approach, Richard, or if I want to decouple the setting of module attributes so that it is more along the lines of::

      main_module = imp.new_module('__mp_main__')
      loader.set_attributes(main_module)  # BRAND-NEW; maybe private to the stdlib?
      main_module.__name__ = '__mp_main__'
      code_object = loader.get_code(main_name)
      sys.modules['__main__'] = sys.modules['__mp_main__'] = main_module  # OLD
      exec(code_object, main_module.__dict__)

    I'm currently leaning towards the latter option since it's an annoying bit to get right and it doesn't hurt anything to expose.

    @brettcannon brettcannon assigned brettcannon and unassigned sbt Feb 28, 2013
    @brettcannon
    Copy link
    Member Author

    So I think I have come up with a way to expose a new method that makes this use-case doable and in a sane manner. Richard, let me know what you think so that I know that this makes sense before I commit myself to the new method (init_module_attrs())::

    --- a/Lib/multiprocessing/forking.py	Fri May 24 13:51:21 2013 +0200
    +++ b/Lib/multiprocessing/forking.py	Fri May 24 08:06:17 2013 -0400
    @@ -449,7 +449,7 @@
             elif main_name != 'ipython':
                 # Main modules not actually called __main__.py may
                 # contain additional code that should still be executed
    -            import imp
    +            import importlib
     
                 if main_path is None:
                     dirs = None
    @@ -460,16 +460,17 @@
     
                 assert main_name not in sys.modules, main_name
                 sys.modules.pop('__mp_main__', None)
    -            file, path_name, etc = imp.find_module(main_name, dirs)
    +            # We should not try to load __main__
    +            # since that would execute 'if __name__ == "__main__"'
    +            # clauses, potentially causing a psuedo fork bomb.
    +            loader = importlib.find_loader(main_name, path=dirs)
    +            main_module = imp.new_module(main_name)
                 try:
    -                # We should not do 'imp.load_module("__main__", ...)'
    -                # since that would execute 'if __name__ == "__main__"'
    -                # clauses, potentially causing a psuedo fork bomb.
    -                main_module = imp.load_module(
    -                    '__mp_main__', file, path_name, etc
    -                    )
    -            finally:
    -                if file:
    -                    file.close()
    +                loader.init_module_attrs(main_module)
    +            except AttributeError:
    +                pass
    +            main_module.__name__ = '__mp_main__'
    +            code = loader.get_code(main_name)
    +            exec(code, main_module.__dict__)
     
                 sys.modules['__main__'] = sys.modules['__mp_main__'] = main_module

    @sbt
    Copy link
    Mannequin

    sbt mannequin commented May 25, 2013

    Looks good to me.

    (Any particular reason for ignoring AttributeError?)

    @brettcannon
    Copy link
    Member Author

    Catching the AttributeError is in case a loader is used that doesn't define init_module_attrs(). Since it will be new to Python 3.4 I can't guarantee that it will exist (in case someone doesn't subclass the proper ABC).

    @sbt
    Copy link
    Mannequin

    sbt mannequin commented May 25, 2013

    The unit tests pass with the patch already (if we don't delete the "import imp" line).

    What attributes will be set by init_module_attrs()?

    @brettcannon
    Copy link
    Member Author

    In the common case of SourceLoader it will set __loader__, __package__, __file__, and __cached__.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Jun 7, 2013

    New changeset 97adaa820353 by Brett Cannon in branch 'default':
    Issue bpo-17314: Stop using imp in multiprocessing.forking and move over
    http://hg.python.org/cpython/rev/97adaa820353

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir
    Projects
    None yet
    Development

    No branches or pull requests

    1 participant