-
-
Notifications
You must be signed in to change notification settings - Fork 30.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Re-implement parts of imp in pure Python #58167
Comments
A bunch of code in Python/import.c exists purely for the imp module, but a large chunk of it does not need to be implemented in C but instead can be implemented in Python code. Once importlib is bootstrapped in then an attempt to scale back the C code should be done by re-implementing parts of imp in pure Python (either in some _imp module or directly in _importlib itself). |
Attached is a start for re-implementing imp. Still need code for cache_from_source, source_from_cache, find_module, load_module, reload, load_compiled, load_source, and load_package. |
Here's an incomplete (though essentially test-passing) patch that flips more of imp into importlib. It builds on Brett's patch. The gist is that imp.py is a minimal wrapper around importlib, which uses _imp directly. Ultimately the things implemented in Lib/importlib/_bootstrap.py should be yanked out of Python/import.c (and Python/dynload_*.c). I expect that this will wait for the bootstrap merge (bpo-2377) to land. |
Here's a listing of the places where import-ish modules are used in CPython. Once things all the import stuff currently on the table is wrapped up, I'll probably work on switching things over to using importlib directly. |
Of note for my patch is the addition of SUFFIXES and MODE to the 3 main "file loader" classes. I did this to reverse the dependence on imp.get_suffixes(). As well, a couple of extra functions are added to Python/importlib/_bootstrap.py for a similar reason, but should also be meaningful externally in their own right. load_module_from_file(), is one of these, though the current incarnation relies on a mythical new method on loaders that may not pass muster. (FWIW, I accidently included the use_of_imp.txt file in the patch and don't intend it to get included) |
Please leave distutils unchanged; we don’t clean it up or improve it in any way, it’s bugfixes only. Changing packaging is fine; I’ll just keep using the imp module in the distutils2-python3 backport. |
Sounds good. It will be a while before we get there, but at that point I was already planning on getting in touch with the maintainers of relevant modules before messing with them. |
(Éric, I'd reply in the review if reitveld recognized my current username -- see http://psf.upfronthosting.co.za/roundup/meta/issue402. FYI, I also did not get an email for your review, which is likely related to that issue.) Regarding __all__, I didn't realize that about being a tuple. I'll fix it. As to the "dm" in the SOABI value, that is definitely something we'll have to factor in. There are actually a few such places in the current patch that rely on problematic data. It's either hidden in C or not available during bootstrapping. That's something we'll have to iron out before the solution for this issue gets committed. |
I think this should be a blocker for 3.3. |
Just because I was thinking about it, I wonder if necessarily all the frozen stuff really needs to stay in import.c. I mean a frozen module is really just an entry in an array of structs that has a name of an char*[]. I don't see why one couldn't simply have a get_frozen_bytes() method to convert that char*[] into a bytes object and use that to construct a module in pure Python code. |
New changeset d777f854a66e by Brett Cannon in branch 'default': |
OK, so I have started to check this stuff in, but I think it's best to do it piecemeal. Going forward I would like to commit in units of functions being replaced, and prioritize stuff that cuts out C code (e.g. the load_*() methods, find_module(), etc.). That way it's clear that progress is being made. Obviously the best way to tell if code is hanging on just because of imp is to comment out the interface code and see what your compiler complains about in terms of dead code (or at least clang is good at this). |
It looks like in order to get a clear sign of what it will take to remove various parts of import.c, imp.load_module() needs to go along with imp.load_package() (since they call each other in the C code). You also have to take care of imp.reload(), but I am simplifying the C code greatly and will take care of referencing other code in the module. |
New changeset 4dce3afc392c by Brett Cannon in branch 'default': |
I am seeing how this is going to go down. the load_dynamic, load_source, etc. family of functions are simply dispatched to by load_module(). So to keep some semblance of backwards-compatibility, each of those modules need to be implemented and then have load_module() simply dispatch to them based on the "type" of module it is. |
New changeset 2df37938b8e1 by Brett Cannon in branch 'default': |
New changeset 4256df44023b by Brett Cannon in branch 'default': |
From Eric Smith on python-dev:
Should this be: to match:
|
New changeset 3b5b4b4bb43c by Brett Cannon in branch 'default': |
This is a mostly-working sketch of find_module() in imp.py. I've run out of time tonight, but wanted to get it up in case it's useful to anyone else (a.k.a Brett). If nobody's touched it before then, I'll finish it up tomorrow. |
I doubt I will beat you to it, Eric, but I did want to say that your overall design was what I had in my head when I was thinking about how to re-implement the function, so keep at it! |
New changeset 66bd85bcf916 by Brett Cannon in branch 'default': |
2 "problems" with that last patch:
|
rather, tokenize.detect_encoding() |
You could change Lib/imp.py to have As for Lib/test/badsyntax_pep3120.py, it *does* have a source encoding of UTF-8 since it does not explicitly specify an encoding. Based on the name I'm assuming the file itself has bad UTF-8 characters, but that doesn't mean that the file is not supposed to be interpreted as UTF-8. |
On the _frozen_importlib point, I'm fine with that. It was just counter-intuitive that the importlib._bootstrap functions were returning loaders whose classes weren't also defined there. I'll have another look at that test tonight and run the patch through the full test suite, but otherwise I think find_module() is basically done. |
Looking it over, I'm confident that tokenizer.detect_encoding() does not raise a SyntaxError where PyTokenizer_FindEncodingFilename() does. I've run out of time tonight, but I'll look at it more tomorrow. Once find_module() is done, I'd like to move on to reload(), which I expect will be pretty straightforward at this point. Then the feasibility of bpo-14618 should be clear. |
New changeset c820aa9c0c00 by Brett Cannon in branch 'default': |
OK, I'm waiting on issue bpo-14657 (avoiding the _frozen_importlib/importlib dichotomy) is resolved before I document the new imp.extension_suffixes() as it would be good to know where I should pull in the source and bytecode suffixes. I think this only leaves get_magic() and get_tag() as the only things to move + trying to figure out how to handle PyImport_ExecCodeModuleObject() and its grip on various bits of C code. |
New changeset 22b0689346f9 by Brett Cannon in branch 'default': |
New changeset b81ddaf0db47 by Brett Cannon in branch 'default': |
Question on this one: <snip>
@@ -126,7 +131,7 @@ def load_compiled(name, pathname, file=N
# XXX deprecate
def load_package(name, path):
if os.path.isdir(path):
- extensions = _bootstrap._SOURCE_SUFFIXES + [_bootstrap._BYTECODE_SUFFIX]
+ extensions = machinery.SOURCE_SUFFIXES[:] + [machinery.BYTECODE_SUFFIXES]
for extension in extensions:
path = os.path.join(path, '__init__'+extension)
if os.path.exists(path):
</snip> Should that be the following? extensions = machinery.SOURCE_SUFFIXES[:] + machinery.BYTECODE_SUFFIXES[:] Also, why the "[:]"? Finally, in a couple spots you use the first element of the list (like the old case of "machinery.SOURCE_SUFFIXES[0]" in source_from_cache() and the new one in find_module()). Will this be a problem where the source file has one of the other suffixes? I'm not sure it's a big enough deal for the moment to worry about, but thought I'd ask. :) |
Yes. And the [:] copies the list so I don't accidentally mutate in-place (did that once already, so now I'm just being paranoid; I doubt I need it hardly anywhere I don't do +=). As for using SOURCE_SUFFIXES[0], I'm done following the ported code. I really don't care if .pyw isn't supported in that case. But in general people shouldn't assume just .py (or .py at all). |
New changeset 626d5c6fbd95 by Brett Cannon in branch 'default': |
New changeset 7bf8ac742d2f by Brett Cannon in branch 'default': |
I have importlib.find_loader() coded up, but getting rid of find_module() is going to be a pain thanks to pkgutil, multiprocessing.forking, modulefinder, and idlelib.EditorWindow. |
New changeset 59870239813c by Brett Cannon in branch 'default': |
Need to update the docstrings for imp.find_module() and load_module() to mention the functions are deprecated since there is no specific raised deprecation, only documented deprecation. |
This may be a known problem, but imp.load_module fails when trying to load a C extension. This is with the 3.3a4 release on Windows, though I suspect the problem is cross-platform. |
Does it work in Python 3.2, John? If it does then it's just an oversight thanks to the lack of tests in test_imp and it shouldn't be too difficult to support. But do realize I have deprecated the function. =) |
On 6/5/12 1:08 PM, Brett Cannon wrote:
Attached is a short test file to demonstrate this. It assumes the I've already rewritten my code to use importlib when running in Python 3.3. |
You're right about the solution, John. I (or someone else) just needs to code it up. |
New changeset 034c814eb187 by Brett Cannon in branch 'default': |
Which parts are still missing here? |
Here are the things I'm aware of:
From my discussions with Brett, there was a laundry list he was aiming to get in for 3.3 (for import stuff in general). I know much of it got done, but not all of it. Of the remainder, I'm not sure how much of that he still wants to see get in. Some of it pertains to this issue, though not all. |
OK, sounds like none of it would block beta1. |
Sorry, been on vacation the past week. Georg is right, nothing left is a b1 blocker, just stuff I want to get in before 3.3 ships. |
Moving back to blocker for beta2. |
New changeset efb5e6ab10f4 by Brett Cannon in branch 'default': |
Since the final issue is just a nicety, I am considering this issue closed. |
hurray! |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: