New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Import machinery documentation #59500
Comments
I believe Barry said he was going to handle the documentation for PEP-420. |
One request I would like to make is that while the docs are being written, to please look at importlib.find_loader() and let me know if the name no longer applies (it's new in Python 3.3 so it can easily be renamed). |
Ping. Barry? (It's not strictly necessary to have the docs for b2, but could you give me a rough estimate when you'll do this?) |
On Jul 21, 2012, at 02:23 PM, Georg Brandl wrote:
Unfortunately, I lost a bunch of work with a disk crash, but I might have |
From the import-sig discussions, this wasn't just about documenting PEP-420, it was about finally bringing the full import system specification into the language reference. (Now that it doesn't need to be loaded with caveats about the old default import mechanisms) |
First draft is complete, along with updates to the importlib abcs for the new protocols. You'll see the language reference has a new importmachinery.rst file which describes finding and loading modules. You'll see that the import statement docs have been simplified to point to this for step (1), and now only describe the name binding operations, i.e. step (2). Various other documentation updates are made, including new glossary terms. Everything lives in features/pep-420 in the importdocs branch. I don't know if it's possible to just attach that branch to this tracker issue. I'd rather not post a patch right now since that's much less convenient for the inevitable deluge of comments I'm sure I'll get. Off to email python-dev now. |
Awesome addition, Barry! Bless you for slogging through this. Here are some thoughts (prepare one grain of salt for each):
Whew. All in all, Barry, nice work on a difficult and tedious project! This is such an improvement and long overdue. Other notes: [1] A package doesn't necessarily have to correspond to a directory, does it? Meta path importers should be able to generate packages just as well as path importers. bpo-1644818 hints at this. [2] Doesn't "path importer" refer to the callables on sys.path_hooks that decide the path-based finder to use for a module during filesystem-based imports. In light of "sys path finder", maybe "sys path importer" is more appropriate. So "sys path finder" refers to the specialized finder used during FS-based imports. [3] A module may replace itself in sys.modules. This came up during the importlib integration when several people pointed out that they relied on this previously unspecified side-effect of the old import machinery (and importlib didn't cooperate). Django was involved, if I recall correctly. You've alluded to the situation in the footnote on import_machinery.rst. I'm pretty sure this still isn't specified, nor that it should be. And yet... Anyway, this would somewhat imply that all module attributes set by the loader should likewise be set before the module's code is executed (which _is_ already at least vaguely specified). |
I would title the new section "Import system" rather than "Import machinery" as it is meant to be a specification documentation rather than an implementation description. Import statement: The statement that "from X import A" only performs a single import lookup is incorrect. The trick is that if A, B or C refers to a submodule of X then it will be imported. I'll use a couple of examples from the logging package to make this clear: # Attribute access will fail for submodules that haven't been imported yet
>>> import logging
>>> logging.DEBUG
10
>>> logging.handlers
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'module' object has no attribute 'handlers'
# Direct imports will fail for attributes that are not submodules
>>> import logging.DEBUG
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: No module named 'logging.DEBUG'
>>> import logging.handlers
# From imports check for an existing attribute first, but check for a submodule if the attribute is missing
>>> del sys.modules["logging"]
>>> del sys.modules["logging.handlers"]
>>> from logging import DEBUG
>>> from logging import handlers Aside from this flaw, the new content in the import statement looks good. More on the import system section in a subsequent comment. |
General comment: runpy, pkgutil, et al should all get "See Also" links at the top pointing to the new import system section. Import system intro: As noted above, I suggest changing the name :) Opening section should refer to importlib.import_module(). Any mentions of __import__ should point out that its API is designed for the convenience of the interpreter, and thus it's a pain to use directly. However, we should also document explicitly that, unlike the import statement and calling __import__ directly, importlib.import_module will ignore any module level replacements of __import__. Replacing builtins.__import__ won't reliably override the entire import system (due to module level __import__ definitions, most notably importlib.__import__) and other tools that work with the process global import state directly (e.g. pkgutil, runpy). 5.1 Packages: Don't tease readers, just tell them: the defining characteristic of a package is that it is a module object with a __path__ attribute. Since we have the privilege of defining *the* standard terminology for old-style packages, I suggest we use the term "initialised" packages (since having an __init__.py is what makes them special). We should also note explicitly that an initialised package can also behave as a namespace package, by setting __path__ appropriately in __init__.py Also, I suggest adding a 5.1.3 Package Example subheading - currently you define an initialised package under the namespace package heading Finally, I think this section needs to explicitly define the terms *import path* and *path entry*. The meta path docs later refer to find_module() accepting a module name and path, and the reader could be forgiven for thinking that meant a filesystem path, when it is actually an import path (which is a sequence of path entries, which may themselves by filesystem paths). 5.2.2 Finders and loaders: The term "sys path finder" is incorrect as registered path hooks are invoked for both sys.path entries *and* package __path__ entries. I suggest "path entry finder". (I agree a longer name is needed to better distinguish them from metapath finders) 5.2.3 Import hooks: While it does get cleared up in 5.2.4, this section could be clearer that the hooks *cannot* override the initial check of the module cache. 5.3.4 Metapath: See above comment about clarifying that an import path is passed to find_module() rather than a filesystem path. The description of the path importer is incorrect. It only knows how to scan an import path and interrogate the path hooks. It's the individual path entry finders that know how to do things like load modules from the filesystem or zip files. 5.2.5 Meta path loaders I don't like the title here. There's no such thing as a meta path loader. there are only module loaders. Once they're created, it doesn't matter how you found them. Clarify that the loader only has to remove the modules it inserted itself. Other modules that were loaded *successfully* as a side effect of the code execution will remain in the cache. 5.3 The Path Importer As noted above, the path importer is *NOT* restricted to filesystem imports. All it cares about is arbitrary text sequences and path hooks. With the right path hook, you could use URLs or database connection strings as path entries. 5.5 References I'd also point to PEP-328 (absolute imports by default and explicit relative import syntax) and PEP-338 (using the import system to find __main__) |
Great start here Barry, I'll switch my checkout over to read/write access and start contributing fixes. |
Pushed the import machinery -> import system change (which hopefully won't break Barry's world) Also merged in a more recent version of trunk. This probably screwed up the default branch in this clone, but the clone should be done after these docs updates. |
Updated the statement docs to accurately describe the from X import Y case. I also noted that unlike the statement form, importlib.import_module ignores module level __import__ overrides. |
On Jul 29, 2012 2:09 AM, "Nick Coghlan" <report@bugs.python.org> wrote:
Just so this doesn't get lost and in case it is important enough to block -brett
|
Ah, the perils of email readers with quote folding and issue trackers without it. The important part of Brett's email is that PEP-420 has started splitting the meta path finder and path entry finder APIs, but importlib still uses a single ABC for both of them. That's probably a mistake, and something we want to address prior to the release of 3.3. I'll create a separate issue for that. I just pushed a docs update to the PEP-420 repo that should address all of my comments. I went ahead with the "regular package" -> "initialized package" and "sys path finder" -> "path entry finder" name changes - they just make more sense given the way the components are used. I wanted to avoid "regular package" as I expect namespace packages to eventually become the norm and initialized packages the more unusual case. "sys path finder" was simply misleading, as those finders are used for *all* path entries, including those in package __path__ attributes. I haven't reviewed Eric's comments in detail, so I don't know if I also picked up all of those. |
bpo-15502 records Brett concern about the merged ABC |
Thanks for the review Eric. I'm slogging through these and many other I'll respond just to a few of your comments. Whatever I omit you can consider On Jul 28, 2012, at 06:54 AM, Eric Snow wrote:
I certainly struggled with this term. I almost picked PathFinder (or "path You ask in [2] whether "path importer" refers specifically to the callables on If we agree that "path importer" is the name of the things on sys.path_hooks,
TBH, I'm not crazy about the term "sys path finder" either but I couldn't Keep the suggestions coming for both of these terms,. I'll ruminate on it too
There's importlib.find_loader() and importlib.util.resolve_name(), but OTOH, In a subsequent comment, Nick suggests this whole chapter be called the I'm still thinking about this.
An ImportError gets raised? Were you suggesting that some additional
I rewrote the introductory paragraphs, and added a mention of
I've added an XXX for this. I think the right thing to do is to update the
I don't understand what you're suggesting here.
I don't think so. I lifted it from somewhere (hard to remember exactly where
Yeah.
Indeed! AFAICT, it was only required by the module object's default repr, but
Great question. I see no official recommendation in anything I've consulted,
Currently, it doesn't afaict. The timing of the related PEPs was such that I
Let's cheer him on! I have added a footnote about this based on the
C'mon Brett, let's see 'em!
I couldn't figure out what to say differently here.
I think __all__ should be considered a public, official API. Certainly it's a
I've rewritten some of this, so I think the distinction is clearer. More
Crazy! But I've added a footnote about this. |
A small note in passing: “protocol” is used for things like the sequence protocol, the iterator protocol, or closer to home the finder and loader protocols, so it would sound weird or potentially confusing to me. Import system is how I’ve always thought about it (probably took that term from the docs). |
To answer a couple of Barry's comments in reply to Eric... __package__ should be set to the empty string if I'm reading PEP-362 correctly (and importlib isn't broken): "When the import system encounters an explicit relative import in a module without __package__ set (or with it set to None), it will calculate and store the correct value (name.rpartition('.')[0] for normal modules and __name__ for package initialisation modules". If someone sets __package__ to None, then importlib fills it in as necessary. As for the diagram(s), I have attached the overall PDF that I still have from my original Omnifgraffle file (which I don't have a license to anymore) that I built my PyCon 2008 presentation with. It's probably outdated at this point. I will have to redo them for my PyCon Argentina/Brasil (maybe US?) import talks anyway. |
Considering that the goal is for importlib to be the common import machinery for the various Python implementations, this might not be inappropriate.
Unfortunately not. There aren't many people that use import hook terminology and I already have a terrible memory. :) Regardless, I find "path importer" a little too ambiguous.
What don't you like about the "sys path thingee" names? I find them to be nice and explicit. I'll mull this over some more.
I agree with Nick.
And it does a good job of it.
I guess I was just noting a possible hole in the Import System (sounds nice, doesn't it <wink>) specification. Since importlib is a complete reference implementation, it's not critical to have every detail spelled out (at least, that's seems to be the status quo).
Perfect.
Yeah, that was poorly worded. I'd meant to suggest that you could document the alternative to find_module() and load_module() being regular methods of the same object. For instance: class MyMetaHook:
@classmethod
def find_module(cls, name, path=None):
return cls()
def load_module(self, name):
raise ImportError("You lose!") Thus, the "finder" is the class, and the "loader" is the instance.
Nick made the point more clearly. :)
No, you've got it covered. |
As far as the path importer goes, it's important to keep in mind there are *four* different pieces in play:
This is a meta path finder installed on sys.meta_path, which implements the find_module API. It scans the supplied search path (or sys.path) for path entries, using sys.path_importer_cache and sys.path_hooks to find the locate path entry finders. "Path importer" is an eminently appropriate name as it is responsible for *all* of the standard semantics of sys.path and package __path__ attribute processing. It could be potentially be qualified with "standard path importer" or "default path importer" to distinguish it from other cases.
These are installed in sys.path_hooks, and are simply callables that accept a path entry and return an appropriate path entry handler or else raise ImportError. The specification is designed to make it easy to use the classes for path entry handlers directly as path hooks (since __init__ can throw ImportError, but it can't return None). For these, "path hook" is just fine as a name.
These are the objects returned by the path hooks. Historically, they implemented find_module() (without the second "search path" parameter), and now they can implement the "find_loader()" API instead. The reason I don't like "sys path finder" for these is that it misses their essential role in handling package __path__ attributes. I have previously suggested "path entry finder", but that's a little ambiguous (since it suggests they're tools for *finding* path entries, rather than tools for finding module loaders *given* a path entry). Thus, my new suggestion here: "path entry handler". They're objects that handle particular path entries on behalf of the path importer, so the name is perfectly appropriate, and better distinguishes them from the meta path finder objects.
As with any import, the module loaders implement the load_module() API to create, cache, initialise and return a loaded module object. |
s/locate path entry finders/appropriate path entry handlers/ |
Sounds good to me. As I understood them:
A "path entry handler" would stand in contrast to a "meta path finder". These two would also map well to ABCs for bpo-15502. |
More on import-related terms. Given Nick's recommendation, here's a broader view, as related to the import state: sys.meta_path: One unfortunate name is "sys.path_importer_cache", which implies either a cache of "path importers" or a cache belonging to "path importer", both of which are still rather ambiguous. In light of all the above, I've attached an updated patch just for the glossary. The import system reference then goes further into the protocols that the different objects implement and so forth. |
While saying "default path importer" vs. "meta path finder" somewhat muddles the term "importer", it definitely gets the point across that PathFinder does a lot more than any other default meta path finder. While _we_ might know that import does nothing more than call a method on sys.meta_path and has no concept of sys.path and friends, most people will consider the default path importer as part of import's semantics and thus not make the distinction. IOW I like Nick's suggestion. |
On Jul 29, 2012, at 05:10 AM, Nick Coghlan wrote:
"Import system" it is.
I think I see where you and Eric are coming from on this. Actually, I don't |
Part of the problem with the import nomenclature is that PEP-302 doesn't really nail it down and mixes the terms up a bit. This is understandable considering it broken ground in some regard. However, at this point we have a more comfortable relationship with the import system. Would it be feasible to lightly update PEP-302 to have a more concrete and consistent use of import terminology? |
On Jul 29, 2012, at 06:09 AM, Nick Coghlan wrote:
I've put an XXX in the import.rst file for this, but I probably won't get to
While I've added a mention of import_module() in several places, I don't think I would much rather add a section that goes into more detail about coarsely
I don't like the term "initialized package" (even with the Americanized What about "concrete package"? In a sense, namespace packages are virtual, so
This is getting somewhere. I like using the term "path importer" for the What we have are several default finders, one that knows how to locate frozen This seems to make for much better reading, and while I've worded it |
On Jul 30, 2012, at 09:41 PM, Brett Cannon wrote:
Thanks. This isn't quite the level I was looking for, but we can add a (I think I've improved the discussion on __package__ based on your feedback |
On Jul 31, 2012, at 12:28 AM, Eric Snow wrote:
Dang. I've grown to really like "path importer" for the thing on Thinking about Nick's suggestion then, the callables on sys.path_hooks would I think this terminology holds together well, and I think I'm going to land it
Nick put his finger on it. "sys path" implies that only sys.path is involved,
While true, it's not required in the specification, so I'd like to leave this |
Well, I'm more -0 than -1 on "path importer", though I do like "default path importer" better. As to the rest, sounds good to me. |
I think I was unclear in my previous follow up. Here are the objects import path
A list of locations (or :term:`path entries <path entry>`) that are
searched by the :term:`path importer` for modules to import. During
import, this list of locations usually comes from :data:`sys.path`, but
for subpackages it may also come from the parent package's ``__path__``
attribute. meta path finder path entry path entry finder path entry hook path importer |
Shouldn't it be committed already? I don't see the point of refining documentation in a separate repo rather than in the main repo. |
On Jul 31, 2012, at 03:21 AM, Eric Snow wrote:
+1, although currently I am refraining from using "default" when describing
I have called these "path entry hooks"
I still call these "path entry finders". I understand the ambiguity, and
I've pulled "Loaders" out into a separate higher level section because as you |
New changeset c933ec7cafcf by Barry Warsaw in branch 'default': New changeset d5317b8f455a by Barry Warsaw in branch 'default':
|
The import path definition is a little misleading as sys.path is only inferred when 'path' has None passed in. Otherwise 'path' is what __path__ in a package is set to, so technically sys.path never even comes into play except by choice from PathFinder as it just chooses to treat None to mean sys.path. |
On Jul 31, 2012, at 08:30 PM, Brett Cannon wrote:
Do you think the glossary entry needs to be so precise? It may be difficult |
I guess just saying it can be None depending on context would be enough. |
On Jul 31, 2012, at 02:56 PM, Eric Snow wrote:
Maybe not an update to PEP-302, but probably a big red warning that the This also points out an interesting, more general problem, with PEPs that get |
Hope I'm not too late to the bikeshed painting party; just wanted to chip in with the suggestion of "self-contained package" for non-namespace packages. (i.e., a self-contained package is one that cannot be split across different sys.path entries due to its use of an __init__ module). Also, technically, namespace portions do not only contribute subpackages; they can contribute modules as well. Another point of possible confusion: meta finders and path finders are both described as "hooks", but this terminology seems in conflict with the use of "hook" as describing a callable on path_hooks. Perhaps we could drop the term "hook" from this section, and retitle it "Import System Extensions" and say you can extend the system by writing meta finders and path entry finders. This would let the term "hook" be the exclusive property of path_hooks, which is how you extend the import system to use your custom finders. The statement about __path__ being a list is also out-of-date; as of PEP-420, it can be an immutable, iterable object. Specification-wise, __path__ need only be a re-iterable object, and people reading its value must NOT assume that it's a list or even indexable. The term "sys path finder" should also be replaced by "path entry finder". The former term is both incorrect and misleading, as it both implies that such a finder actually searches sys.path, and that it is exclusive to sys.path. Path entry finders are used to look for modules within a location specified by a "path entry" - a string found in sys.path or in a __path__ attribute. The term "path importer" is also horribly confusing in context... after some reflection, I suggest the following additional terminology changes:
Now we can say that you extend the import system by adding import handlers to sys.meta_path, and that one of the default handlers is the sys.path import handler, which processes imports using sys.path and module __path__ attributes. The sys.path import handler can of course in turn be extended by adding path hooks to sys.path_hooks, which are used to create module finder objects for the path entry strings found in sys.path and module __path__ attributes. A path hook must return a finder object, which implements similar methods to those of an import handler, but with some important differences. Whew. It's a bit of a mouthful, but I think that this set of terms would keep all the roles and functions clear, along with their relationships to one another. In addition, I think it provides greater clarity as to which pieces you need to extend when, why, and how. What do you think? |
We changed quite a bit already as we tried to make everything consistent, |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: