Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-1635741: Port _collections extension module to multiphase initialization(PEP 489) #18066

Closed
wants to merge 2 commits into from

Conversation

shihai1991
Copy link
Member

@shihai1991 shihai1991 commented Jan 19, 2020

@rhettinger
Copy link
Contributor

rhettinger commented Jan 26, 2020

Perhaps Nick can review this one. I'm unclear on whether this PR actually makes anything better (something you can do afterwards that you couldn't do before). I was not under the impression that PEP 489 required that every single module change its initialization code.

If there is an improvement and Nick approved, I have no objection to this going forward.

@ncoghlan
Copy link
Contributor

The key benefits that multi-phase initialisation brings to a relatively straightforward extension module like collections is that it gains support for true reloading, and ceases to be a backchannel for potential communication between subinterpreters. The latter aspect is one of the pre-requisites for the long term goal of potentially moving away from sharing the GIL between subinterpreters (no shared GIL -> no shared refcounts -> no shared Python objects).

If we look at one of the types that collections defines, we see that we normally can't persuade the interpreter to re-create it, even if we're accessing the module from a subinterpreter:

>>> import collections
>>> print(hex(id(collections.deque)))
0x751a00
>>> del sys.modules["collections"]
>>> import collections
>>> print(hex(id(collections.deque)))
0x751a00
>>> from importlib import reload
>>> reload(collections)
<module 'collections' from '/home/ncoghlan/devel/cpython/Lib/collections/__init__.py'>
>>> print(hex(id(collections.deque)))
0x751a00
>>> import _xxsubinterpreters as interp
>>> si = interp.create()
>>> interp.run_string(si, "import collections; print(hex(id(collections.deque)))")
0x751a00

Even trashing the module to the point where reload doesn't work any more, still doesn't clear the hidden module state cache (the object ID only changed from above because this is a different REPL session):

>>> import collections
>>> from importlib import reload
>>> print(hex(id(collections.deque)))
0x7fc37355d080
>>> collections.__dict__.clear()
>>> vars(collections)
{}
>>> reload(collections)
Traceback (most recent call last):
 ...
AttributeError: module has no attribute '__name__'
>>> import sys
>>> del sys.modules["collections"]
>>> import collections
>>> print(hex(id(collections.deque)))
0x7fc37355d080

By contrast, once it has migrated to multi-phase initialisation, then reloading it (or loading it in a subinterpreter) will actually recreate the contents, not just the containing dictionary and module metadata.

(cc @ericsnowcurrently @encukou in case I missed something above).

ncoghlan
ncoghlan previously approved these changes Jan 26, 2020
Copy link
Contributor

@ncoghlan ncoghlan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ncoghlan
Copy link
Contributor

Regarding the codecov complaints: I believe most of the issue is the fact we don't explicitly test the error returns from module initialisation functions. That's not going to change any time soon (and likely not ever).

However, I did find a bug in our code coverage configuration, where news file snippets are being counted as uncovered: #18194

@shihai1991
Copy link
Member Author

Nick, thank you for your explanation.
The latter aspect is one of the pre-requisites for the long term goal of potentially moving away from sharing the GIL between subinterpreters (no shared GIL -> no shared refcounts -> no shared Python objects). This sentence is very helpful to me.

petr have leave a comment in #18032.

And i find the historical email have discussed this problem too:
https://mail.python.org/pipermail/python-3000/2006-April/000727.html

@ncoghlan
Copy link
Contributor

Reviewing the comments on #18032 reminded me that there is an issue impacting the collections module that would make it advisable to hold off on merging this patch: our own _Py_IDENTIFIER macro uses C static variables rather than interpreter managed storage, so while it is fine when it comes to extension module reloading, and interpreter shutdown takes care of cleaning up the static references, it's a problem for full subinterpreter support (since that requires that the string references be stored separately for each interpreter instance).

That means migrating modules that use that macro is currently a bit messy, as we don't have a subinterpreter-friendly replacement that is anywhere near as tidy. (See 5a7d2e1 for an example of converting a module away from using it).

Given that the collections module regularly uses _Py_IDENTIFIER in method implementations, that also means it can't be ported until after PEP 573 [1] has been accepted and implemented.

In the meantime, continuing to use single-phase initialisation better reflects the actual status of the module.

[1] https://www.python.org/dev/peps/pep-0573/

@ncoghlan ncoghlan dismissed their stale review January 27, 2020 13:15

I missed the _Py_IDENTIFIER issue in my initial review

@ncoghlan
Copy link
Contributor

Given the _Py_IDENTIFIER issue, and the PEP 573 dependency, I'm going to close this PR for now.

This isn't a "we'll never port the collections module to multi-phase initialisation", but rather a matter of there being enough blockers to doing it right now that it makes sense to leave it alone until more of those blocking issues have been resolved.

@ncoghlan ncoghlan closed this Jan 27, 2020
@ncoghlan
Copy link
Contributor

@shihai1991 Thanks for your work on this PR. You may find https://bugs.python.org/issue39465 interesting, as its an issue I just filed regarding the fact that _Py_IDENTIFIER has now been identified as a potential issue more than once when it comes to extension modules properly supporting subinterpreters (reloading is less of a concern, since it isn't really any different from any other form of string interning in that regard).

@shihai1991
Copy link
Member Author

@shihai1991 Thanks for your work on this PR. You may find https://bugs.python.org/issue39465 interesting, as its an issue I just filed regarding the fact that _Py_IDENTIFIER has now been identified as a potential issue more than once when it comes to extension modules properly supporting subinterpreters (reloading is less of a concern, since it isn't really any different from any other form of string interning in that regard).

@ncoghlan Of course, I'm interested in this bpo ;)
Can we continue to port some simple extension(not involving _Py_IDENTIFIERglobal static c varsPyState_FindModule(pep 573)) modules to multiphase initalization or all block issues should be resolved before all porting operations(I am not sure there have potential risk or not)?

@ncoghlan
Copy link
Contributor

If we have any modules that don't use _Py_IDENTIFIER, or hit any of the other challenges of writing extension module code that could cope with subinterpreters that don't share the GIL, porting would still make sense.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants