Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a keyword-only spec argument to types.ModuleType #64582

Closed
brettcannon opened this issue Jan 24, 2014 · 28 comments
Closed

Add a keyword-only spec argument to types.ModuleType #64582

brettcannon opened this issue Jan 24, 2014 · 28 comments
Assignees
Labels
stdlib Python modules in the Lib dir

Comments

@brettcannon
Copy link
Member

BPO 20383
Nosy @brettcannon, @rhettinger, @ncoghlan, @mitsuhiko, @ericsnowcurrently
Files
  • module_from_spec.diff
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/brettcannon'
    closed_at = <Date 2014-05-30.18:57:20.024>
    created_at = <Date 2014-01-24.17:11:47.874>
    labels = ['library']
    title = 'Add a keyword-only spec argument to types.ModuleType'
    updated_at = <Date 2014-05-31.01:24:37.704>
    user = 'https://github.com/brettcannon'

    bugs.python.org fields:

    activity = <Date 2014-05-31.01:24:37.704>
    actor = 'ncoghlan'
    assignee = 'brett.cannon'
    closed = True
    closed_date = <Date 2014-05-30.18:57:20.024>
    closer = 'brett.cannon'
    components = ['Library (Lib)']
    creation = <Date 2014-01-24.17:11:47.874>
    creator = 'brett.cannon'
    dependencies = []
    files = ['35286']
    hgrepos = []
    issue_num = 20383
    keywords = ['patch']
    message_count = 28.0
    messages = ['209100', '215554', '215629', '215648', '215649', '215653', '215662', '217925', '218759', '218981', '219043', '219046', '219135', '219156', '219168', '219170', '219173', '219224', '219229', '219388', '219389', '219397', '219399', '219402', '219411', '219412', '219418', '219434']
    nosy_count = 7.0
    nosy_names = ['brett.cannon', 'rhettinger', 'ncoghlan', 'aronacher', 'Arfrever', 'python-dev', 'eric.snow']
    pr_nums = []
    priority = 'low'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue20383'
    versions = ['Python 3.5']

    @brettcannon
    Copy link
    Member Author

    Would allow for the name attribute to be optional since it can be grabbed from the spec. Since having module.__spec__ set is now expected we should help ensure that by supporting it in the constructor.

    @brettcannon brettcannon added the stdlib Python modules in the Lib dir label Jan 24, 2014
    @brettcannon
    Copy link
    Member Author

    I envision making this happen would also allow for importlib to be updated to rely on __spec__ when possible, with the idea that sometime in the future we can deprecate pulling attributes from a module directly and shift to always working from __spec__.

    @brettcannon brettcannon self-assigned this Apr 4, 2014
    @ericsnowcurrently
    Copy link
    Member

    Sounds good to me.

    @rhettinger
    Copy link
    Contributor

    Would allow for the name attribute to be optional
    ...
    with the idea that sometime in the future we can
    deprecate pulling attributes from a module directly

    Is there any chance that this would ever happen?
    The __name__ attribute has been around almost forever
    is widely relied on. Deprecating it would be a very
    disruptive event for nearly zero benefit.

    @ericsnowcurrently
    Copy link
    Member

    I made roughly the same point in the current import-sig thread that relates here:

    https://mail.python.org/pipermail/import-sig/2014-April/000805.html

    Basically, I agree we should be careful with both __name__ and __file__.

    @ncoghlan
    Copy link
    Contributor

    ncoghlan commented Apr 6, 2014

    No, the attribute level arguments won't go away - __name__ deliberately differs from __spec__.name in some cases (notably in __main__), __path__ may be manipulated after the module is loaded, and __name and __file__ are both used too heavily within module code for it to be worth the hassle of deprecating them in favour of something else.

    I think Brett's push to simplify things as much as possible is good though - that's the main brake on creeping API complexity in the overall import system as we try to make the internals easier to comprehend and manipulate.

    @brettcannon
    Copy link
    Member Author

    I can dream about getting rid of the attributes, but I doubt it would happen any time soon, if at all. But we do need to make it easier to set __spec__ on a new module than it currently is to help promote its use.

    @brettcannon
    Copy link
    Member Author

    My current thinking on this it to introduce in importlib.util:

      def module_from_spec(spec, module=None):
        """Create/initialize a module based on the provided spec.
    If a module is provided then spec.loader.create_module()
    will not be consulted.
    """
    

    This serves two purposes. One is that it abstracts the loader.create_module() dance out so that's no longer a worry. But more crucially it also means that if you have the function create the module for you then it will be returned with all of its attributes set without having to worry about forgetting that step. The module argument is just for convenience in those instances where you truly only want to override the module creation dance for some reason and really just want the attribute setting bit.

    @brettcannon
    Copy link
    Member Author

    Here is an implementation of importlib.util.module_from_spec(). With this it makes PEP-451 conceptually work like:

      spec = importlib.util.find_spec(name)
      module = importlib.util.module_from_spec(spec)
      spec.loader.exec_module(module)

    About the only other thing I can think of that people might still want is something like importlib.util.load(spec) so that they don't even need to care about whether load_module() or exec_module() is defined, but that can be a separate issue if it turns out people actually want something like that.

    @ericsnowcurrently
    Copy link
    Member

    @brett: Did those last two messages (and the patch) get on the wrong issue. The issue of module_from_spec() or the like seems somewhat orthogonal to a spec argument to ModuleType. Perhaps you meant issue bpo-21436 or bpo-21235?

    Otherwise, I have a few comments. :)

    @brettcannon
    Copy link
    Member Author

    Nope, I commented where I meant to. I wanted a way to promote people to **always** create modules with properly initialized attributes while also dealing with the module creation dance at the same time. Otherwise it will require expanding the API of types.ModuleType() to accommodate specs while also needing to expose a function to do the proper thing to get a module for a loader and **still** have a function to set up a module properly based on what was potentially returned from create_module(). Rolling it all into a single function that just gets you a module ready for use seems like the most practical solution from an API perspective.

    @ncoghlan
    Copy link
    Contributor

    I'd ask "Why not a class method?", but I already know the answer (types.ModuleType is implemented in C, so it would be unnecessarily painful to implement it that way).

    Given that, the utility function approach sounds good to me.

    @ericsnowcurrently
    Copy link
    Member

    Okay, I didn't read closely enough. :) It may be worth updating the title.

    FWIW, the name "module_from_spec" confused me at first because my brain interpreted that as "load_from_spec". Keeping the name and purpose more focused might be helpful. I have comments below to that effect.

    If your proposed change still makes sense, could we keep it simpler for now? Something like this:

    # in importlib.util
    def module_from_spec(spec, module=None):
        """..."""
        methods = _bootstrap._SpecMethods(spec)
        if module is None:
            return methods.create()
        else:
            methods.init_module_attrs(methods)
            return module

    Keeping the 2 methods on _SpecMethods is helpful for the PEP-406 (ImportEngine) superseder that I still want to get back to for 3.5. :)

    ----------------------------------

    This serves two purposes. One is that it abstracts the
    loader.create_module() dance out so that's no longer a worry.

    I'm not sure what dance you mean here and what the worry is.

    But more crucially it also means that if you have the function
    create the module for you then it will be returned with all of
    its attributes set without having to worry about forgetting that
    step.

    So we would discourage calling ModuleType directly and encourage the use of a function in importlib.util that is equivalent to _SpecMethods.create(). That sounds good to me. The use case for creating module objects directly is a pretty advanced one, but having a public API for that would still be good.

    From this point of view I'd expect it to just look like this:

    def new_module(spec):
        return _SpecMethods(spec).create()

    or given what I expect is the common use case currently:

    def new_module(name, loader):
        spec = spec_from_loader(name, loader)
        return _SpecMethods(spec).create()

    or together:

    def new_module(spec_or_name, /, loader=None):
        if isinstance(spec_or_name, str):
            name = spec_or_name
            if loader is None:
                raise TypeError('missing loader')
            spec = spec_from_loader(name, loader)
        else:
            if loader is not None:
                raise TypeError('got unexpected keyword argument "loader"')
            spec = spec_or_name
        return _SpecMethods(spec).create()

    To kill 2 birds with 1 stone, you could even make that the new signature of ModuleType(), which would just do the equivalent under the hood. That way people keep using the same API that they already are (no need to communicate a new one to them), but they still get the appropriate attributes set properly.

    The module argument is just for convenience in those
    instances where you truly only want to override the module
    creation dance for some reason and really just want the
    attribute setting bit.

    Perhaps it would be better to have a separate function for that (equivalent to just _SpecMethods.init_module_attrs()). However, isn't that an even more uncommon use case outside of the import system itself?

    About the only other thing I can think of that people might
    still want is something like importlib.util.load(spec)

    I agree it's somewhat orthogonal. I'll address comments to issue bpo-21235.

    @brettcannon
    Copy link
    Member Author

    First, about breaking up _SpecMethods: that was entirely on purpose. =) I honestly have found _SpecMethods a bit of a pain to work with because at every place where I have a spec object and I need to operate on it I end up having to wrap it and then call a method on it instead of simply calling a function (it also doesn't help that spec methods is structured as a collection of methods which can just operate as functions since they almost all have spec = self.spec at the beginning). I get you're hoping to make PEP-406 happen, but it actually needs to happen first. =) And as I said, the methods are essentially functions anyway so undoing any unraveling I may do won't be difficult to revert if PEP-406 actually comes about. IOW _SpecMethods feels like OOP just 'cause and not for any design reasons; it's been making my eye twitch when I look at that class.

    Second, the dance that an advanced user has to worry about is "does create_module exist, and if so did it not return None and if any of that is not true then return what types.ModuleType would give you" is not exactly one line of code ATM. There is no reason to not abstract that out to do the right thing in the face of a loader.

    Third, yes this would be to encourage people not to directly call types.ModuleType but call the utility function instead.

    Fourth, I purposefully didn't bifurcate the code of types.ModuleType based on the type of what the first argument was. At best you could change it to take an optional spec as a second argument and use that, but if you did that and the name doesn't match the spec then what? I'm not going to promote passing in a loader because spec_from_loader() cannot necessarily glean all of the same information a hand-crafted spec could if the loader doesn't have every possible introspection method defined (I'm calling "explicit is better than implicit" to back that up). I also don't think any type should depend on importlib having been previously loaded to properly function if it isn't strictly required, so the code would have to be written entirely in C which I just don't want to write. =) Since it's going beyond simply constructing a new module but also initializing it I think it's fine to keeping it in importlib.util which also makes it all more discoverable for people than having to realize that types.ModuleType is the place to go to create a module and get its attributes initialized.

    Fifth, fair enough on not needing the module argument. I will refactor the code for internal use of attribute initialization in testing and leave it at that.

    @ericsnowcurrently
    Copy link
    Member

    tl;dr I'm okay with pulling the functions out of _SpecMethods (and
    dropping the class) but I'd prefer the distinct functions continue to
    exist as they are. Also, I still think updating the ModuleType
    signature is the way to go (given current use cases). :)

    First, about breaking up _SpecMethods: that was entirely on purpose. =) ... IOW _SpecMethods feels like OOP just 'cause and not for any design reasons; it's been making my eye twitch when I look at that class.

    Keep in mind that _SpecMethods is basically a proxy for ModuleSpec
    that has the methods that ModuleSpec had originally. That was the
    simplest way to not expose the methods on every module.__spec__. If
    we didn't care about exposing more API on __spec__, we'd have it all
    on ModuleSpec. I still wonder if we could have used subclassing
    instead of proxying (obviating the pain points we've run into) and
    used a SimpleModuleSpec for __spec__ and ModuleSpec for all other APIs
    (or ModuleSpec/FullModuleSpec):

    SimpleModuleSpec # What currently is ModuleSpec.
    ModuleSpec(SimpleModuleSpec) # Basically this is _SpecMethods.

    Regardless, _SpecMethods made sense at the time and I still find it
    helpful to have the functions grouped into a distinct namespace. That
    said, I agree that it's a pain. :) I'm okay with ditching the class,
    but would rather we kept the distinct functions that exist now.

    Second, the dance that an advanced user has to worry about is "does create_module exist, and if so did it not return None and if any of that is not true then return what types.ModuleType would give you" is not exactly one line of code ATM. There is no reason to not abstract that out to do the right thing in the face of a loader.

    That's what _SpecMethods.create() already does for you.

    Third, yes this would be to encourage people not to directly call types.ModuleType but call the utility function instead.

    I'm totally on board. :)

    Fourth, I purposefully didn't bifurcate the code of types.ModuleType based on the type of what the first argument was. At best you could change it to take an optional spec as a second argument and use that, but if you did that and the name doesn't match the spec then what?

    I see your point. I just see the trade-offs a little differently. :)
    This is all about finding the best way of eliminating the problem of
    people not setting the module attributes properly (and making it
    easier to do so). From my perspective:

    • Currently every module must have a spec (and consequently a loader).
    • Updating the ModuleType signature will encourage setting the spec,
      etc. better than a separate factory would.

    If every module must have a spec, then I'd expect that to be part of
    the ModuleType signature. I can see the use case (though uncommon)
    for directly creating a module object. Is there a use case for
    creating a module without a spec? Either way, if it's important
    enough to ensure that people set the module attrs properly, then we
    should consider having direct instantiation of ModuleType() issue a
    warning when created without setting the spec, etc.

    Regarding the second point, with a separate factory function, people
    will have to learn about the function and then switch to using it
    before they get the benefit.

    Backward compatibility for an updated signature shouldn't be too hard:

    currently: ModuleType(name, doc=None)
    updated: ModuleType(name_or_spec, doc=None, *, loader=None)
    actual: ModuleType(name_or_spec=None, /, name=None, doc=None, *, loader=None)

    I'm not going to promote passing in a loader because spec_from_loader() cannot necessarily glean all of the same information a hand-crafted spec could if the loader doesn't have every possible introspection method defined (I'm calling "explicit is better than implicit" to back that up).

    Regardless of new signature or new factory, I still think the
    signature should allow for passing in a spec or passing in a
    name/loader pair. Passing name/loader instead of spec is a
    convenience for what I anticipate is the most common use case now:
    given a loader, create a new module. For comparison:

      mod = new_module(name, loader=loader)

    vs.

      spec = spec_from_loader(name, loader=loader)
      mod = new_module(spec)

    I'll argue that we can accommodate the common case and if that doesn't
    work, they can still create their own spec. If this were new
    functionality then I'd agree with you.

    I also don't think any type should depend on importlib having been previously loaded to properly function if it isn't strictly required, so the code would have to be written entirely in C which I just don't want to write. =)

    If it's in _bootstrap then it's already available. No C required. :)

    Since it's going beyond simply constructing a new module but also initializing it I think it's fine to keeping it in importlib.util which also makes it all more discoverable for people than having to realize that types.ModuleType is the place to go to create a module and get its attributes initialized.

    It seems to me that people haven't been thinking about initializing a
    module's attributes. This is popping up now because people are
    starting to take their advanced importing use cases to Python 3 (which
    I take as a very good sign!). So ModuleType seems like the right
    place for me. :)

    @brettcannon
    Copy link
    Member Author

    I think we view the fundamentals of built-in types differently as well. =) A module instance can exist without a spec no problem. As long as you don't pass that module instance through some chunk of code that expects __spec__ -- or any other attribute for that matter -- then the instance is fine and operates as expected. There is nothing fundamental to module objects which dictates any real attribute is set (not even __name__). If I wanted to construct a module from scratch and just wanted a blank module object which I assign attributes to manually then I can do that, and that's what types.ModuleType provides (sans its module name requirement in its constructor which really isn't necessary either).

    This also means that tying the functionality of the module type to importlib is the wrong direction to have the dependency. That means I refuse to add code to the module type which requires that importlib have been imported and set up. Just think of someone who embeds Python and has no use for imports; why should the module type then be broken because of that choice? That means anything done to the module type needs to be done entirely in C.

    But luckily for you bytes broke the glass ceiling of pivoting on a type's constructor based on its input type (it is the only one, though). So I'll let go on arguing that one. =)

    Anyway, I'll think about changing types.ModuleType, but having to implement init_module_attrs() in pure C and then exposing it in _imp just doesn't sound like much fun.

    And as for your preference that "the distinct functions continue to exist as they are", are you saying you want the code duplicated or that you just don't like me merging the create() and init_module_attrs() functions?

    @ericsnowcurrently
    Copy link
    Member

    I give. :) You've made good points about builtins and C implementations. Also, thinking about issue bpo-21235 has changed my perspective a bit.

    As to _SpecMethods, I mean just drop the class and turn the methods into functions:

    • name: <name> -> _spec_<name> (or whatever)
    • signature: self -> spec
    • body: self.spec -> spec
    • body: self.<method> -> method (there is interplay between methods)

    And then importlib.util.new_module:

    def new_module(spec):
        return _bootstrap._spec_create(spec)

    @brettcannon
    Copy link
    Member Author

    Why do you want a one-liner wrapper for the functions for the public API when they are exactly the same?

    @ericsnowcurrently
    Copy link
    Member

    You're right that it doesn't have to be a one-line wrapper or anything more than an import-from in importlib.util. :)

    @brettcannon
    Copy link
    Member Author

    Giving Eric is polymorphic first argument to types.ModuleType() is going to be tricky thanks to the fact that it is not hard-coded anywhere in the C code nor documented that the first argument must be a string (the only way it might come up is from PyModule_GetName() and even then that's not required to work as you would expect). And since we purposefully kept specs type-agnostic, you can't do a type check for a SimpleNamespace.

    I think the only way for it to work is by attribute check (e.g. is 'name' or 'loader' defined on the object, regardless of value).

    @brettcannon
    Copy link
    Member Author

    Another issue with the polymorphic argument is that the module type is one of those rare things written in C with keyword parameter support, so renaming the 'name' argument to 'name_or_spec' could potentially break code.

    @ericsnowcurrently
    Copy link
    Member

    Yeah, it just looks too complicated to take the ModuleType signature approach, as much as I prefer it. :) I appreciate you taking a look though.

    @brettcannon
    Copy link
    Member Author

    And another complication is the compatibility hack to set loader when submodule_search_locations is set but loader is not since _NamespaceLoader is not exposed in C without importlib which gets us back into the whole question of whether types should function entirely in isolation from other subsystems of Python.

    @ericsnowcurrently
    Copy link
    Member

    But that part is less of a concern since we don't need namespace packages before or during bootstrapping, and afterward we have access to interp->importlib. Or am I missing something?

    which gets us back into the whole question of whether types should
    function entirely in isolation from other subsystems of Python.

    That is a great question, regardless. In part I imagine it depends on the subsystem and how intrinsic it is to the interpreter, which seems like a relatively implementation-agnostic point from a high-level view.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented May 30, 2014

    New changeset b26d021081d2 by Brett Cannon in branch 'default':
    Issue bpo-20383: Introduce importlib.util.module_from_spec().
    http://hg.python.org/cpython/rev/b26d021081d2

    @brettcannon
    Copy link
    Member Author

    After all the various revelations today about how much of a hassle and murky it would be to get types.ModuleType to do what we were after, I went ahead and kept importlib.util.module_from_spec(), but dropped the module argument bit. I also deconstructed _SpecMethods along the way while keeping the abstractions as Eric requested.

    @ericsnowcurrently
    Copy link
    Member

    Thanks for doing that Brett and for accommodating me. :) Also, the various little cleanups are much appreciated.

    @ncoghlan
    Copy link
    Contributor

    Another thing to keep in the back of your minds: one of my goals for PEP-432 is to make it possible to have a fully functional embedded interpreter
    up and running *without* configuring the Python level import system. This
    is the state the interpreter would be in between the beginning and end of
    initialisation.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants