Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarification about how to implement namespace packages (as in PEP 420) via import hooks for PEP 660 use case #92054

Open
abravalheri opened this issue Apr 29, 2022 · 21 comments
Labels
docs Documentation in the Doc dir topic-importlib

Comments

@abravalheri
Copy link

This is a follow up of 2 previous discussions in the Python discourse and an issue in pypa/packaging-problems:

Summary

One of the popular ways of implementing PEP 660 is via import hooks (since other alternatives like symlinking package folders are not available in all platforms).

However the reference implementation editables currently does not support namespaces.
Specifically the dynamic path computation aspect of PEP 420 is a challenge.

After inspecting the implementation for importlib._bootstrap_external, I may have managed to find 2 solutions for this problem, described in the detail in this discourse post and demonstrated as a PoC in this gist:

  • Solution A: Create a custom "path class" that tries to emulate the _NamespacePath behaviour.
  • Solution B: Create a custom PathEntryFinder and add a “bogus” entry to sys.path just to trigger it (implicitly relying that importlib.machinery.PathFinder will take care of setting __path__ to _NamespacePath object under the hood).

However, the problem is that I don't know if these approaches are reliable in the long run and in accordance with the language specification.

Open Questions

Questions like the following are still not clear to me:

  1. Is there a chance that an internal check for namespace packages will fail if __path__ uses a custom class instead of _NamespacePath even if the behaviour described in PEP 420 is observed?
  2. Can we rely on the fact that the import machinery will automatically set the namespace loader for specs in the form of ModuleSpec("name", None, is_package=True)?
  3. Is pkgutil (and particularly pkgutil.extend_path) still considered first class citizen of the standard library or is it better to avoid it when writing new code?
  4. Can we rely on the behaviour of sys.path_hooks + importlib.machinery.PathFinder to create ModuleSpec objects for namespaces that will automatically perform dynamic path computation? Or is this an internal implementation detail that can change in the future?

If the discussed approaches are not recommended, how can we support namespace packages using import hooks and deliver a complete solution for PEP 660?

@abravalheri abravalheri added the docs Documentation in the Doc dir label Apr 29, 2022
@abravalheri abravalheri changed the title Clarification about how to implement namespace packages (as in PEP 420) via import hooks Clarification about how to implement namespace packages (as in PEP 420) via import hooks for PEP 660 use case Apr 29, 2022
@brettcannon
Copy link
Member

I'm going to preface all of this with @ericvsmith is the namespace package expert. 😅

  1. Is there a chance that an internal check for namespace packages will fail if __path__ uses a custom class instead of _NamespacePath even if the behaviour described in PEP 420 is observed?

I don't think so. Easiest way to check is to search for _NamespacePath and make sure there aren't any isinstance() checks. Otherwise experiment and let us know. 😉

2. Can we rely on the fact that the import machinery will automatically set the namespace loader for specs in the form of ModuleSpec("name", None, is_package=True)?

As in that exact call signature for ModuleSpec? I wouldn't count on that being like that forever as we don't make any promises around that.

3. Is pkgutil (and particularly pkgutil.extend_path) still considered first class citizen of the standard library or is it better to avoid it when writing new code?

pkgutil is on its way out as soon as people have the time to deprecate it.

4. Can we rely on the behaviour of sys.path_hooks + importlib.machinery.PathFinder to create ModuleSpec objects for namespaces that will automatically perform dynamic path computation?

The language spec just says it "works" without giving details in how.

I think there are two possible ways to solve this (and we probably want both long-term). One, you register a custom finder that effectively re-implements namespace packages the way you need it to work. Two, we develop whatever APIs, etc. you need to make this as pluggable as possible.

@abravalheri
Copy link
Author

Thank you very much for the clarifications @brettcannon.

I agree with your conclusions, if there were APIs officially supported for implementing the namespace packages using a custom finders, I would be more than happy to use them. The main problem that I am facing right now is the impression that things work "accidentally".

I think it is also worth a shot considering the importance of editable installs in the community.

@brettcannon
Copy link
Member

I think it is also worth a shot considering the importance of editable installs in the community.

Yep, this is just pushing up against the niche nature of namespace packages.

All of the hard-coded locations of using namespace-related classes seem to be:

else:
spec = _bootstrap.ModuleSpec(fullname, None)
spec.submodule_search_locations = namespace_path
return spec

if namespace_path:
# We found at least one namespace path. Return a spec which
# can create the namespace package.
spec.origin = None
spec.submodule_search_locations = _NamespacePath(fullname, namespace_path, cls._get_spec)
return spec

if loader is None:
# A backward compatibility hack.
if spec.submodule_search_locations is not None:
if _bootstrap_external is None:
raise NotImplementedError
NamespaceLoader = _bootstrap_external.NamespaceLoader
loader = NamespaceLoader.__new__(NamespaceLoader)

The first two are all in PathFinder, so that could grow a keyword argument or something in its initializer to allow for overriding the defaults.

But the latter is deep in the import system, and so that would require something on sys.

This is also assuming I didn't miss a spot. 😅

@abravalheri
Copy link
Author

Would this entry also be relevant?

# Also invalidate the caches of _NamespacePaths
# https://bugs.python.org/issue45703
_NamespacePath._epoch += 1

It also seems that in 3.11 NamespaceLoader is promoted to a "public API", but just for the sake of introspection (the example given in the docs shows an isinstance check). The documentation implies that direct instantiation is not allowed.

It would seem that the existing behaviour is highly dependent on _NamespacePath and NamespaceLoader, but there is no public API to construct the objects.

@brettcannon
Copy link
Member

Would this entry also be relevant?

Yep.

It would seem that the existing behaviour is highly dependent on _NamespacePath and NamespaceLoader, but there is no public API to construct the objects.

Correct. We purposefully didn't over-expose an API we would have to support for a decade or more when no one had really asked for access previously.

@abravalheri
Copy link
Author

Please let me know if there is anything I can do to help with this.

As I previously mentioned, approach B seems to work right now across the major active versions of CPython (tested in setuptools), and it does not involve using soon-to-be-deprecated functionality as pkgutil or re-implementing big parts of the import machinery. So maybe the easiest approach here would be to add tests to capture this expectation and prevent accidental regressions.


Brainstoming of alternatives/complementary approaches

  • By looking at the _bootstrap_external code (and considering the fact that the newest API already exposes NamespaceLoader for users to compare via isinstance), I imagine that exposing both NamespaceLoader and NamespacePath to allow users to directly instantiate them would do the trick.

  • Maybe adding a public constructor for namespace specs (e.g. NamespaceSpec or importlib.util.namespace_spec_from_portion) could also do the trick.

  • _NamespacePath would very likely require append operations to not be undone in the _recalculate method...

@brettcannon
Copy link
Member

Please let me know if there is anything I can do to help with this.

More free time or get @warsaw , @ericvsmith , or @ericsnowcurrently to weigh in.

@warsaw
Copy link
Member

warsaw commented Jul 6, 2022

Specifically the dynamic path computation aspect of PEP 420 is a challenge.

TBH, I'd totally forgotten about that language in the PEP. @ericvsmith, I don't remember why this was added. That said, while I can understand that there might be some usefulness to this in deployed code, I'm not sure how useful it is in the same environments where you'd want editable installs (during the development process, right?). I personally can't remember any time where I or a package I've used relied on this behavior.

Is there a chance that an internal check for namespace packages will fail if path uses a custom class instead of _NamespacePath even if the behaviour described in PEP 420 is observed?

The one place folks seem to have missed is in importlib.resources.reader. There's a shallow dependency on the str() of the argument to NamespaceReader, but that would be easily faked (i.e. by calling your class _EditableNamespacePath). @jaraco for thoughts on the implications or ways to make this more generic.

It also seems that in 3.11 NamespaceLoader is promoted to a "public API", but just for the sake of introspection (the example given in the docs shows an isinstance check). The documentation implies that direct instantiation is not allowed.

I've thought about exposing _NamespacePath as a public name for introspection purposes, but it didn't seem as urgent as NamespaceLoader.

@abravalheri
Copy link
Author

Thank you very much for the comments @warsaw.

I'm not sure how useful it is in the same environments where you'd want editable installs (during the development process, right?). I personally can't remember any time where I or a package I've used relied on this behavior.

As long as developers are allowed to create and publish namespace packages, editable installs will be relevant during the development process of those packages. I don't know how many namespace packages are published in PyPI, but I would not be surprised if their authors/maintainers use an editable install when implementing changes/bugfixes/features etc.

As an anecdote (that might not be representative of the general tendencies), when I asked about feedback from the community about setuptools candidate implementation of PEP 660, one of the few responses I got back was about a package using namespaces1.

Footnotes

  1. This project in particular is using pkgutil namespaces to be precise, not PEP 420. But I think that is fair to imagine that eventually all namespace packages will end up migrating to PEP 420. Another example are the popular sphinxcontrib-* packages which use pkg_resources style namespaces.

@warsaw
Copy link
Member

warsaw commented Jul 6, 2022

As long as developers are allowed to create and publish namespace packages, editable installs will be relevant during the development process of those packages. I don't know how many namespace packages are published in PyPI, but I would not be surprised if their authors/maintainers use an editable install when implementing changes/bugfixes/features etc.

I should have been more clear in my comment. In fact, I'm an author of several packages in the flufl namespace (albeit only flufl.i18n and flufl.lock are still actively maintained), and yes I still want editable installs to work with them. What I was specifically wondering the utility of was whether dynamic path calculation was actually useful while developing those namespace packages, i.e. when editable installs are useful. In deployed code, I can't think of why I'd want editable installs.

@pfmoore
Copy link
Member

pfmoore commented Jul 6, 2022

The dynamic path calculation is the only aspect of namespace packages that is problematic (as far as I can tell, it's been a while since I looked at the code) and to be perfectly honest I'm not completely sure what the use cases are for that feature. In general, we don't try to guarantee what will happen in a running Python process if new packages are installed "in the background", so most of the cases I've been able to come up with for the dynamic path feature feel like things I wouldn't recommend doing anyway...

Can anyone explain what the intended use of the dynamic path feature is? If there's no use cases that we actually want to support, would it be easier to simply deprecate the feature?

@warsaw
Copy link
Member

warsaw commented Jul 6, 2022

Can anyone explain what the intended use of the dynamic path feature is?

I can't. As I said, I didn't even remember that being part of the PEP! 😆

If there's no use cases that we actually want to support, would it be easier to simply deprecate the feature?

WFM. Or maybe another way to frame the question is, has anybody been asking for this feature?

@ericvsmith
Copy link
Member

IIRC, Nick and PJE wanted the feature.

/me checks …

https://peps.python.org/pep-0420/#id8 mentions it, and links to where it was discussed. PJE talks about it here: https://mail.python.org/pipermail/python-dev/2012-May/119603.html

I think this was included so we could interoperate with legacy namespace packages in setuptools. That use case is probably long gone.

@pfmoore
Copy link
Member

pfmoore commented Jul 7, 2022

Cool. So what would be involved in deprecating that behaviour? Would it need a PEP? For the purposes of editable installs we could just not support it, citing this discussion and any subsequent deprecation as the reason, but it would be good to formally deprecate and then remove the feature, to avoid confusion.

@abravalheri
Copy link
Author

For the purposes of editable installs we could just not support it, citing this discussion and any subsequent deprecation as the reason

Thank you very much, I will rework the exiting implementation taking this discussion into consideration.

Should I also be worried about using the constructor of ModuleSpec directly and accessing the submodule_search_locations attribute? Both items seem to be part of the documented public API on docs.python.org, and I think they would required to implement a MetaPathFinder.find_spec that is able to handle namespace packages (even with the dynamic path computation requirement dropped) ... (Or do you have a different suggestion?)

@ericvsmith
Copy link
Member

Cool. So what would be involved in deprecating that behaviour? Would it need a PEP? For the purposes of editable installs we could just not support it, citing this discussion and any subsequent deprecation as the reason, but it would be good to formally deprecate and then remove the feature, to avoid confusion.

The problem is that we know why the feature was originally added, but we don't know who is relying on it for other reasons. I think this would need a wide discussion, and probably a PEP.

@pfmoore
Copy link
Member

pfmoore commented Jul 7, 2022

On reflection, I'm not sure deprecation is the right thing to do here. Having re-read the PEP, the encodings example is convincing - people do modify sys.path, and that is supported, and it is reasonable to expect that if someone does import foo.xxx, then modifies sys.path to add a directory that already contains foo/yyy, then import foo.yyy would work. That's very different from modifying the filesystem while the application is running (which is the thing that we discourage because it has non-obvious effects).

So I think we probably should retain the current behaviour, and ideally make it easier for custom importers to hook into the namespace path mechanism.

@ericsnowcurrently
Copy link
Member

So I think we probably should retain the current behaviour,

+1

and ideally make it easier for custom importers to hook into the namespace path mechanism.

+1

This is where my mind went pretty quickly. 😄 Clearly there's a meaningful case for hooking into/duplicating the namespace path logic. It seems like helpers in importlib.util and/or importlib.machinery (exposing or derived from the existing _bootstrap*.py code) would meet the need here.

FWIW, every time I look in _bootstrap*.py and bump into the special-casing for namespace packages, I wonder if there might be a less special-case-y way to get there. At the least, having no public API to replicate the functionality has always seemed off to me. (That said, there is value, of course, in not providing public API until actually needed. 😄)

@warsaw
Copy link
Member

warsaw commented Jul 8, 2022

That's very different from modifying the filesystem while the application is running (which is the thing that we discourage because it has non-obvious effects).

Agreed.

I'm still wondering, short of deprecating this feature, are there users who are asking for support for this with editable installs?

@pfmoore
Copy link
Member

pfmoore commented Jul 8, 2022

I'm not aware of any. My library editables doesn't support it when using a custom importer. I don't know how many people are using the library in that mode, but I've not had any complaints. As setuptols has a far larger user base, @abravalheri may have more useful feedback, though.

@abravalheri
Copy link
Author

As setuptols has a far larger user base, @abravalheri may have more useful feedback, though.

I just recently invited the community to start experimenting with the new editables support, and the current implementation does try its best to be compatible with the dynamic path computation (I am using the approach B discussed above). So I also don't have any input about that.

I could try to modify the implementation to not support dynamic path computation and see what would be the reaction of the user base, if that is the core team recommendation. (I imagine if someone were to complain, we are just going to find out after the feature is included in a stable release for setuptools).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs Documentation in the Doc dir topic-importlib
Projects
None yet
Development

No branches or pull requests

7 participants