Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-phase extension module (PEP 489): don't call m_traverse, m_clear nor m_free before the module state is allocated #84005

Closed
vstinner opened this issue Mar 2, 2020 · 20 comments
Labels
3.9 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs)

Comments

@vstinner
Copy link
Member

vstinner commented Mar 2, 2020

BPO 39824
Nosy @ncoghlan, @scoder, @vstinner, @encukou, @Dormouse759, @pablogsal, @miss-islington, @shihai1991
PRs
  • bpo-39824: module_traverse() don't call m_traverse if md_state=NULL #18738
  • bpo-39824: Convert PyModule_GetState() to get_module_state() #19076
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2020-03-17.17:10:56.689>
    created_at = <Date 2020-03-02.09:52:22.175>
    labels = ['interpreter-core', '3.9']
    title = "Multi-phase extension module (PEP 489): don't call m_traverse, m_clear nor m_free before the module state is allocated"
    updated_at = <Date 2020-03-21.16:40:23.881>
    user = 'https://github.com/vstinner'

    bugs.python.org fields:

    activity = <Date 2020-03-21.16:40:23.881>
    actor = 'vstinner'
    assignee = 'none'
    closed = True
    closed_date = <Date 2020-03-17.17:10:56.689>
    closer = 'vstinner'
    components = ['Interpreter Core']
    creation = <Date 2020-03-02.09:52:22.175>
    creator = 'vstinner'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 39824
    keywords = ['patch']
    message_count = 20.0
    messages = ['363145', '363146', '363151', '363152', '363153', '363155', '363591', '363599', '363708', '363912', '363938', '363939', '364280', '364318', '364348', '364454', '364455', '364616', '364738', '364758']
    nosy_count = 8.0
    nosy_names = ['ncoghlan', 'scoder', 'vstinner', 'petr.viktorin', 'Dormouse759', 'pablogsal', 'miss-islington', 'shihai1991']
    pr_nums = ['18738', '19076']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue39824'
    versions = ['Python 3.9']

    @vstinner
    Copy link
    Member Author

    vstinner commented Mar 2, 2020

    Currently, when a module implements m_traverse(), m_clear() or m_free(), these methods can be called with md_state=NULL even if the module implements the "Multi-phase extension module initialization" API (PEP-489).

    I'm talking about these module methods:

    • tp_traverse: module_traverse() calls md_def->m_traverse() if m_traverse is not NULL
    • tp_clear: module_clear() calls md_def->m_clear() if m_clear is not NULL
    • tp_dealloc: module_dealloc() calls md_def->m_free() is m_free is not NULL

    Because of that, the implementation of these methods must check manually if md_state is NULL or not.

    I propose to change module_traverse(), module_clear() and module_dealloc() to not call m_traverse(), m_clear() and m_free() if md_state is NULL and m_size > 0.

    "m_size > 0" is an heuristic to check if the module implements the "Multi-phase extension module initialization" API (PEP-489). For example, the _pickle module doesn't fully implements the PEP-489: m_size > 0, but PyInit__pickle() uses PyModule_Create().

    See bpo-32374 which documented that "m_traverse may be called with m_state=NULL" (GH-5140).

    @vstinner vstinner added 3.9 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) labels Mar 2, 2020
    @vstinner
    Copy link
    Member Author

    vstinner commented Mar 2, 2020

    The question came up while I reviewed PR 18608 which ports the audioop extension module to multiphase initialization (PEP-489):
    https://github.com/python/cpython/pull/18608/files#r386061746

    Petr Viktorin referred to m_clear() documentation which says that md_state *can* be NULL when m_clear() is called:
    https://docs.python.org/3/c-api/module.html#c.PyModuleDef.m_clear

    @vstinner
    Copy link
    Member Author

    vstinner commented Mar 2, 2020

    There are different kinds of extension modules:

    (1) no module state (m_size <= 0): **not affected** by this change. Example: _asyncio which implements m_free() to clear global variables and free lists.

    (2) Have a module state but PyInit_xxx() calls PyModule_Create(): PyModule_Create() always allocates md_state. I understand that such extension module is **not affected** by this change.

    (3) Multi-phase extension: PyInit_xxx() function calls PyModuleDef_Init(). Such extension module **is affected** if m_traverse, m_clear or m_free() is not NULL. Example: atexit module implements m_traverse, m_clear and m_free.

    PyModuleObject structure contains Python objects like md_dict (dict), md_name (str) or md_weaklist (list):

    • module_traverse() always traverses md_dict: m_traverse() is no needed for that.
    • module_clear() always clears md_dict: m_clear() is no needed for that.
    • module_dealloc() always deallocates md_dict, md_name and md_weaklist: m_free() is no needed for that.
    • md_name is a string, it cannot be involved in a reference cycle.

    I don't think that it's possible to extend PyModuleObject structure (as done by PyListObject for PyObject) to add other Python objects: md_state is designed for that. PyModule_Create() allocates exactly sizeof(PyModuleObject) bytes.

    If an extension module has a module state, stores Python objects *outside* this state and uses m_traverse, m_clear and m_free to handle these objects: the GC will no longer be able to handle these objects before the module is executed with this change. If such extension module exists, I consider that it's ok to only handle objects stored outside the module state once the module is executed. The window between <the module is created> and <the module is executed> is very short.

    @vstinner
    Copy link
    Member Author

    vstinner commented Mar 2, 2020

    I wrote PR 18738 to implement this change.

    @vstinner vstinner changed the title Multi-phase extension module (PEP 489): don't call m_traverse, m_clear nor m_free if md_state is NULL Multi-phase extension module (PEP 489): don't call m_traverse, m_clear nor m_free before the module state is allocated Mar 2, 2020
    @vstinner vstinner changed the title Multi-phase extension module (PEP 489): don't call m_traverse, m_clear nor m_free if md_state is NULL Multi-phase extension module (PEP 489): don't call m_traverse, m_clear nor m_free before the module state is allocated Mar 2, 2020
    @vstinner
    Copy link
    Member Author

    vstinner commented Mar 2, 2020

    I propose to change module_traverse(), module_clear() and module_dealloc() to not call m_traverse(), m_clear() and m_free() if md_state is NULL and m_size > 0.

    Note: This change also means that m_traverse, m_clear and m_free are no longer called if md_state is set to NULL. But it never occurs in practice.

    module_dealloc() calls PyMem_FREE(m->md_state) but it doesn't set md_state to NULL. It's not needed, since the module memory is deallocated anyway.

    @vstinner
    Copy link
    Member Author

    vstinner commented Mar 2, 2020

    Proposed PR checking if the module state is NULL:

    @ncoghlan
    Copy link
    Contributor

    ncoghlan commented Mar 7, 2020

    One of the intended use cases for Py_mod_create is to return instances of ModuleType subclasses rather than straight ModuleType instances. And those are definitely legal to define:

    >>> import __main__
    >>> class MyModule(type(__main__)): pass
    ... 
    >>> m = MyModule('example')
    >>> m
    <module 'example'>

    So it isn't valid to skip calling the cleanup functions just because md_state is NULL - we have no idea what Py_mod_create might have done that needs to be cleaned up.

    It would *probably* be legitimate to skip calling the cleanup functions when there's no Py_mod_create slot defined, but then the rules for "Do I need to account for md_state potentially being NULL or not?" are getting complicated enough that the safest option for a module author is to always assume that md_state might be NULL and handle that case appropriately.

    @shihai1991
    Copy link
    Member

    we have no idea what Py_mod_create might have done that needs to be cleaned up.

    Looks like no extension module author use Py_mod_create slots now.

    @vstinner
    Copy link
    Member Author

    vstinner commented Mar 9, 2020

    So it isn't valid to skip calling the cleanup functions just because md_state is NULL - we have no idea what Py_mod_create might have done that needs to be cleaned up.

    In your example, I don't see what m_clear/m_free would be supposed to clear/free.

    I don't see how a module can store data without md_state which would require m_clear/m_free to clear/free such data. module_clear() continue to clear Python objects of the PyModuleObject structure with PR 18738.

    Would you mind to elaborate?

    The intent of PR 18738 is to simplify the implementation of C extension modules which implements the PEP-489 and uses a module state (md_state > 0).

    @encukou
    Copy link
    Member

    encukou commented Mar 11, 2020

    If you use a module subclass that needs some additional C-level infrastructure, it would be more appropriate to override tp_clear/tp_free directly.

    IMO limiting m_clear/m_free to work just with the module state won't hurt. But it is an API change.

    @vstinner
    Copy link
    Member Author

    Proposed PR checking if the module state is NULL:

    Petr suggested to not hold these PRs with this controversial issue. I agree, so I merged these 3 PRs.

    @vstinner
    Copy link
    Member Author

    Stefan Behnel: as the 3rd author of the PEP-489, what's your call on this issue?

    @ncoghlan
    Copy link
    Contributor

    Petr's point that any subclass state should be managed in the subclass cleanup functions is a good one, so I withdraw my concern:

    • custom module subclasses should clean up like any other class instance
    • the module slots are then only needed if the module state actually gets populated, so we can safely skip them if it never even gets allocated

    @vstinner
    Copy link
    Member Author

    I updated PR 18738 to document the incompatible change in What's New In Python 3.9. Sadly, I expect that almost no third-party extension module implement the PEP-489 yet. So I expect that little or no third-party code is impacted in pratice.

    the module slots are then only needed if the module state actually gets populated, so we can safely skip them if it never even gets allocated

    That's also my understanding.

    custom module subclasses should clean up like any other class instance

    That sounds like a reasonable compromise to me.

    @pablogsal
    Copy link
    Member

    I think Cython makes use of PEP-489 so unless I am missing something all generated extensions use PEP-489 structures.

    @vstinner
    Copy link
    Member Author

    New changeset 5b1ef20 by Victor Stinner in branch 'master':
    bpo-39824: module_traverse() don't call m_traverse if md_state=NULL (GH-18738)
    5b1ef20

    @vstinner
    Copy link
    Member Author

    Thanks Petr and Nick for the review ;-)

    Pablo Galindo Salgado:

    I think Cython makes use of PEP-489 so unless I am missing something all generated extensions use PEP-489 structures.

    Alright. I still consider that my change is correct and will no harm anyone ;-)

    @miss-islington
    Copy link
    Contributor

    New changeset 13397ee by Hai Shi in branch 'master':
    bpo-39824: Convert PyModule_GetState() to get_module_state() (GH-19076)
    13397ee

    @scoder
    Copy link
    Contributor

    scoder commented Mar 21, 2020

    I think Cython makes use of PEP-489 so unless I am missing something all generated extensions use PEP-489 structures.

    Cython doesn't make complete use of PEP-489 yet, specifically not of the module state feature (nor module subclasses). This change looks good from my side. Good idea, Victor.

    @vstinner
    Copy link
    Member Author

    Cython doesn't make complete use of PEP-489 yet, specifically not of the module state feature (nor module subclasses). This change looks good from my side. Good idea, Victor.

    Thanks for the confirmation Stefan ;-)

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    This issue was closed.
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.9 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs)
    Projects
    None yet
    Development

    No branches or pull requests

    7 participants