-
-
Notifications
You must be signed in to change notification settings - Fork 30.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expand the 'Extending' docs with an example. #113187
Changes from all commits
2dd1db7
2180bea
72a18cc
5e6b903
49ad6b2
88db416
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -406,6 +406,84 @@ metadata in locations other than the file system, subclass | |
a custom finder, return instances of this derived ``Distribution`` in the | ||
``find_distributions()`` method. | ||
|
||
Example | ||
------- | ||
|
||
Consider for example a custom finder that loads Python | ||
modules from a database:: | ||
|
||
class DatabaseImporter(importlib.abc.MetaPathFinder): | ||
def __init__(self, db): | ||
self.db = db | ||
|
||
def find_spec(self, fullname, target=None) -> ModuleSpec: | ||
return self.db.spec_from_name(fullname) | ||
|
||
sys.meta_path.append(DatabaseImporter(connect_db(...))) | ||
|
||
That importer now presumably provides importable modules from a | ||
database, but it provides no metadata or entry points. For this | ||
custom importer to provide metadata, it would also need to implement | ||
``DistributionFinder``:: | ||
|
||
from importlib.metadata import DistributionFinder | ||
|
||
class DatabaseImporter(DistributionFinder): | ||
... | ||
|
||
def find_distributions(self, context=DistributionFinder.Context()): | ||
query = dict(name=context.name) if context.name else {} | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's not clear why it's OK to only respect the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In a3af10d, I've added a paragraph describing why path was ignored and when it should be considered. |
||
for dist_record in self.db.query_distributions(query): | ||
yield DatabaseDistribution(dist_record) | ||
|
||
In this way, ``query_distributions`` would return records for | ||
each distribution served by the database matching the query. For | ||
example, if ``requests-1.0`` is in the database, ``find_distributions`` | ||
would yield a ``DatabaseDistribution`` for ``Context(name='requests')`` | ||
or ``Context(name=None)``. | ||
|
||
For the sake of simplicity, this example ignores ``context.path``\. The | ||
``path`` attribute defaults to ``sys.path`` and is the set of import paths to | ||
be considered in the search. A ``DatabaseImporter`` could potentially function | ||
without any concern for a search path. Assuming the importer does no | ||
partitioning, the "path" would be irrelevant. In order to illustrate the | ||
purpose of ``path``, the example would need to illustrate a more complex | ||
``DatabaseImporter`` whose behavior varied depending on | ||
``sys.path``/``PYTHONPATH``. In that case, the ``find_distributions`` should | ||
honor the ``context.path`` and only yield ``Distribution``\ s pertinent to that | ||
path. | ||
|
||
``DatabaseDistribution``, then, would look something like:: | ||
|
||
class DatabaseDistribution(importlib.metadata.Distributon): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Where is it documented that you only need to implement Why is it OK to omit There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
The parent class (
It's not documented, and much of it is in the implementation details. The Ideally, the packaging ecosystem would have a more rigorous definition of what constitutes metadata. Ideally, the metadata wouldn't be structured based on incidental file formats, but on a proper API and data structures. Unfortunately, that's not what we have, so providers have to simulate the interface that the
I'm not sure RECORD is rigorously defined anywhere. This doc indicates that when installing a wheel that the RECORD should be updated, so if an installer is installing a distribution into the database, it presumably could create that "file". Alternately, the Distribution could dynamically generate the RECORD based on internal structures. It's not obvious to me that the RECORD has any meaning in a provider that doesn't present any files on the file system. Much of the packaging ecosystem is designed around assumptions of installation to a file system or zip file. Basically it's undocumented because it's ill-defined. I welcome the PyPA or others to design a system that addresses these concerns. My instinct is it's not worth documenting these incidental details.
It depends on your callers. If they're expecting to be able to resolve My advice would be to read through the code, figure out what works and what doesn't for a particular use-case, and capture bugs or documentation that would help. In 9599fd8, I've added a small paragraph suggesting that other files might be provided and directed to the source. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
OK, but that's not an argument for not documenting this - if Sphinx isn't rendering the contract expected from users correctly, then you need to add prose text explaining the details.
Yes, that's essentially my point 🙁
That's the definitive description (along with the core metadata spec) but that just explains what the file is for. What I'm asking is how I know what methods of the
Not true. It's defined here. But again, that's not the point - the question is how I know what
https://packaging.python.org/en/latest/specifications/recording-installed-packages/#the-record-file
I'm not sure I agree with that statement. If nothing else,
If I'm writing an import hook, I have no control over what my callers will do. That's the whole point of having a standardised API and interface - callers can do whatever they want with no consideration of whether there are any custom importers in the chain.
That's precisely the sort of statement I expect to see in the documentation.
To be blunt, that's way more work than I'm willing to do. I don't think "read the code" is a reasonable thing to suggest to a user who's simply trying to use the API in a way that's (documented as) supported. But as the module maintainer, it's up to you whether you agree with me on that. But I guess that means that I'm simply going to go back to my user and say that if they want There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I see now that Sphinx does in fact render an abstract prefix for those methods. I'm unsure if it did before and I missed it or something improved, but that's the way one would interpret which methods need to be overridden in a subclass.
That's fair. I'm somewhat reluctant to provide an intermediate documentation between the guide and the documented source API, because that adds yet another dimension that needs to be maintained and kept in sync, especially when there are already synchronicity challenges across the stdlib and the standalone package. What I'd like to try to do instead of creating a new, intermediate layer between the user guide and the API docs is to close the gap between those through the following:
How does that sound as a plan?
That's helpful. I sometimes struggle to understand what gaps the users see because I'm so entrenched in the implementation that I carry an intuitive understanding of the design. In python/importlib_metadata@8635240, I've updated the docstrings to include more of this guidance.
The importlib metadata design takes a pragmatic approach, attempting to satisfy the users' current needs around packaging metadata based on implicit designs from pkg_resources and pip implementations. Where standards exist, it will honor those. The design has another goal, to support the extensibility afforded by the Python import system. That is, in the same way that importers and finders and specs allow arbitrary customization, importlib metadata wishes to support that customization.
The main problem is that many systems assume a file-system implementation or less commonly a zip-file based implementation. They assume packages are installed and uninstalled. But the Python ecosystem doesn't have these constraints. It's conceivable that Python packages could be supplied through a database or loaded in memory from the web or even generated by an AI without any location on disk. In these cases, the concept of a "RECORD" of installed "files" makes much less sense, and the python-packaging docs seem oblivious to this possibility. And maybe that's fine. Maybe PyPA wishes to focus solely on the filesystem-based packages and leave concerns of custom finders as an exercise for the reader. Unfortunately, this area is not one where I have time or energy to invest, so I'm relying on others to explore the space and where necessary to ask questions or (preferably) suggest changes to support the stated goals better.
But the point of an extensible interface is that it may expose a richer surface than what's available to the general caller. Imagine, for example, a company creates a custom importer that presents a suite of packages that are partitioned by a This mechanism gives the custom importer a means to solicit additional details from the caller beyond "name" and "path" when searching distributions. I'll elaborate on this aspect in the docs (python/importlib_metadata@a6038c3f2b). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I'm not particularly happy with the idea of the stdlib documentation (which is basically the "canonical" documentation in many people's minds, including mine) being essentially examples. But I understand that you're not willing to go down the route taken by the other stdlib docs of including handwritten API documentation, so I think that leaves us at a bit of an impasse. Your suggestion is probably an improvement, so I don't want to discourage you, but conversely I don't want to give you the impression that it addresses my concerns when it doesn't.
I'm not sure that addresses my point. Yes,
I can't speak for the PyPA as a whole, but I think the focus there is on packaging rather than on import mechanisms. As such, filesystem-based approaches are really the only ones that fit the PyPA's scope. It's similar to the way that import finders and hooks aren't under the PyPA's remit, but are part of the core (or stdlib). Maybe that's something that could change, but someone would need to propose the idea (on the Packaging category on Discourse) in order to get the community involved in the idea. Is that something you want to do? I don't personally have either the bandwidth or the interest in tackling it right now.
I can completely understand and accept this. But if no-one is willing to commit time to non-filesystem based import scenarios, then maybe we shouldn't be blocking progress in areas where there is benefit just in order to maintain a (fairly obscure? is that fair?) capability in the area of esoteric1 import mechanisms? If you're talking more generally, then ideally, I'd offer to help, but as
I think I disagree. If the caller has to know what import hooks are installed, and the import hooks have to expose data in a way that the caller can use, everybody's tightly coupled together. Which is the opposite of what I consider to be an "interoperability standard". Maybe I'm wrong to use that terminology and mindset for In any case, if this is a genuine intended use case for the library, I think it should be documented. There's no indication anywhere that I can see that something like this is anything other than an accident of implementation, and I certainly wouldn't have anticipated that sort of usage. I'm not sure I see the value of this much flexibility. I can understand the example you give, but are there any real-world use cases of this flexibility being used "in the wild" - or is it simply something that came out of a "don't prohibit anything that doesn't need to be" design approach? I assume it's simply a matter of me not having encountered this type of use case, but again, having a discussion in the docs would make it less of a stretch to understand why the API is designed in this way. Anyway, I don't know that this discussion is particularly productive - if you're getting benefit from what I'm saying, please say so and I'm happy to continue, but otherwise it just feels like two people with very different design philosophies trying to explain why they disagree. Also, I think that if this discussion is worthwhile, it should probably be occurring somewhere more public (for example, Discourse) - because as it stands, I have no sense of whether my views or yours are more in line with the community view. So I'm happy to continue, either here or in a more public forum, but I don't want to mislead you into thinking that we're converging on any sort of agreement here. (Edit: Saying a comment is uncharitable doesn't make it any less uncharitable. I apologise if you saw the original version - it was uncalled for. I've reworded the offending sentence.) Footnotes
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm getting caught up on this again. It's been a rough new year with some seriously distracting life events.
I see how PyPA deals with things like core metadata specs, which are about metadata for packages and independent of the import mechanisms, and that's similar to how
Not at the moment, as I've got more important concerns to address, but possibly in the future. I agree the starting point would be a thread in discourse.
There are definitely users relying on the non-filesystem based interfaces, including the mempip project and the rinohtype project. Moreover, the zip-based import scenarios originally required separate handling before I didn't realize that progress was blocked in any dimension. Yes, I agree, a prominent need should not be blocked for an esoteric concern. Maybe I just missed an opportunity in python/importlib_metadata#427 where all you were seeking was a simple way to expose an additional PathDistribution for an editable package... and the concern of non-filesystem based interfaces was only confounding the problem.
Definitely not. I try to be welcoming to contributions and if I have concerns, raise those in a way that the contributor would ideally agree. I try to avoid ruling with authority and consider these projects to be community-owned. The one area where I'd push back is if a contribution is likely to create more maintenance burden for which I would be expected to bear.
It came about when I realized that the signature to I think you're right that ultimately this interface has stayed pretty static, with only the
I'm definitely getting value from the conversation, but I also want to work toward something useful. I understand the frustration - the design is complicated, possibly unnecessarily so, but I'm not sure what the alternative is. The prognosis can't be "go back and try again". What I'd really like to see is concrete examples of what's broken/blocked with the current design so we can discuss and implement improvements to address those defects. I want to get these changes merged so at least they're not lingering. Then let's figure out what the next steps are toward convergence. That is, what changes are you proposing? If the problem is "the current design is incomprehensible", let's work on a new design that's more comprehensible and a transition plan to that approach. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for the comments. I don't really have enough time right now to comment fully, but I agree that this should be merged, as it's definitely an improvement even if there are things that might still need to be done.
It's hardly the end of the world, but pfmoore/editables#23 is stalled, mostly because I was struggling to understand how to implement an My problem here is that I don't want to rely on unsupported details that might change - Let's not leave this discussion here - I'll try to get back to it when I get some time. Is it OK to continue discussions on this PR even after the change itself gets merged? If not, maybe we should open a new issue. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks! I'm not opposed to continuing the discussion here, but I'd probably prefer to reset to a new issue or continue in python/importlib_metadata#427. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Given that python/importlib_metadata#427 is closed, let's pick it up in a new issue. I'll open something when I have the time. |
||
def __init__(self, record): | ||
self.record = record | ||
|
||
def read_text(self, filename): | ||
""" | ||
Read a file like "METADATA" for the current distribution. | ||
""" | ||
if filename == "METADATA": | ||
return f"""Name: {self.record.name} | ||
Version: {self.record.version} | ||
""" | ||
if filename == "entry_points.txt": | ||
return "\n".join( | ||
f"""[{ep.group}]\n{ep.name}={ep.value}""" | ||
for ep in self.record.entry_points) | ||
|
||
def locate_file(self, path): | ||
raise RuntimeError("This distribution has no file system") | ||
|
||
This basic implementation should provide metadata and entry points for | ||
packages served by the ``DatabaseImporter``, assuming that the | ||
``record`` supplies suitable ``.name``, ``.version``, and | ||
``.entry_points`` attributes. | ||
|
||
The ``DatabaseDistribution`` may also provide other metadata files, like | ||
``RECORD`` (required for ``Distribution.files``) or override the | ||
implementation of ``Distribution.files``. See the source for more inspiration. | ||
|
||
|
||
.. _`entry point API`: https://setuptools.readthedocs.io/en/latest/pkg_resources.html#entry-points | ||
.. _`metadata API`: https://setuptools.readthedocs.io/en/latest/pkg_resources.html#metadata-api |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This says that to provide metadata, one must also implement
DistributionFinder
but the parent class,importlib.abc.MetaPathFinder
, is replaced byDistributionFinder
in the snippet below. The reader must infer thatDistributionFinder
is a subclass ofimportlib.abc.MetaPathFinder
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see a couple of ways to highlight this detail. One could change the narrative to something like:
Alternatively, the
DistributionFinder
could be changed from a raw identifier to a reference to:class:DistributionFinder
, which captures that detail. That, of course, would require that the API docs be implemented, which is something I'm reluctant to do because it's something a machine can do (just not in the context of CPython AFAIK). I guess the CPython docs could just link back to the importlib_metadata docs, i.e.:WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally, I would much rather the full documentation gets consolidated into the CPython documentation, rather than calling out to the
importlib_metadata
docs. While I appreciate this is extra work, I think it is worthwhile nevertheless, because it brings theimportlib.metadata
in line with the rest of the CPython documentation (in terms of style, maintenance workflow, and ease of access).It's not me that would be doing the work, though, so I'm just offering this as my viewpoint. If you feel as maintainer that the cost isn't justified by the benefits, then that's your call.