Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C API] Avoid accessing PyObject and PyVarObject members directly: add Py_SET_TYPE() and Py_IS_TYPE(), disallow Py_TYPE(obj)=type #83754

Closed
vstinner opened this issue Feb 6, 2020 · 96 comments

Comments

@vstinner
Copy link
Member

@vstinner vstinner commented Feb 6, 2020

BPO 39573
Nosy @vstinner
PRs
  • #18388
  • #18389
  • #18390
  • #18391
  • #18392
  • #18393
  • #18394
  • #18398
  • #18400
  • #18402
  • #18411
  • #18419
  • #18488
  • #18496
  • #18507
  • #18508
  • #18521
  • #18601
  • #18789
  • #18798
  • #18799
  • #18804
  • #18809
  • #19882
  • #19975
  • #20290
  • #20391
  • #20429
  • #20610
  • #21262
  • #21433
  • #23366
  • #23375
  • #26493
  • #26550
  • #26596
  • #28128
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2021-09-08.16:32:15.871>
    created_at = <Date 2020-02-06.23:07:12.131>
    labels = ['expert-C-API', '3.11']
    title = '[C API] Avoid accessing PyObject and PyVarObject members directly: add Py_SET_TYPE() and Py_IS_TYPE(), disallow Py_TYPE(obj)=type'
    updated_at = <Date 2022-01-20.00:24:00.180>
    user = 'https://github.com/vstinner'

    bugs.python.org fields:

    activity = <Date 2022-01-20.00:24:00.180>
    actor = 'vstinner'
    assignee = 'none'
    closed = True
    closed_date = <Date 2021-09-08.16:32:15.871>
    closer = 'vstinner'
    components = ['C API']
    creation = <Date 2020-02-06.23:07:12.131>
    creator = 'vstinner'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 39573
    keywords = ['patch']
    message_count = 96.0
    messages = ['361513', '361514', '361515', '361516', '361517', '361518', '361519', '361522', '361523', '361526', '361527', '361529', '361531', '361540', '361549', '361555', '361557', '361590', '361593', '361607', '361611', '361626', '361631', '361639', '361904', '361960', '361961', '361963', '361964', '361965', '361971', '361977', '361987', '361988', '362033', '362034', '362133', '362134', '362166', '362212', '362216', '362445', '363345', '363494', '363564', '365690', '366473', '366493', '368047', '369896', '369898', '370074', '370303', '370638', '370663', '370665', '370666', '370671', '370729', '370902', '370932', '372308', '373460', '379675', '379679', '379680', '379757', '379759', '381337', '381345', '381365', '381374', '381403', '381404', '382260', '382534', '382539', '382780', '382781', '382783', '394954', '394971', '395018', '395205', '395206', '395287', '395323', '395536', '401365', '401370', '401378', '401395', '401396', '401399', '403252', '410995']
    nosy_count = 1.0
    nosy_names = ['vstinner']
    pr_nums = ['18388', '18389', '18390', '18391', '18392', '18393', '18394', '18398', '18400', '18402', '18411', '18419', '18488', '18496', '18507', '18508', '18521', '18601', '18789', '18798', '18799', '18804', '18809', '19882', '19975', '20290', '20391', '20429', '20610', '21262', '21433', '23366', '23375', '26493', '26550', '26596', '28128']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue39573'
    versions = ['Python 3.11']

    @vstinner
    Copy link
    Member Author

    @vstinner vstinner commented Feb 6, 2020

    Today, CPython is leaking too many implementation through its public C API. We cannot easily change the "default" C API, but we can enhance the "limited" C API (when Py_LIMITED_API macro is defined). Example of leaking implementation details: memory allocator, garbage collector, structure layouts, etc.

    Making PyObject an opaque structure would allow in the long term of modify structures to implement more efficient types (ex: list specialized for small integers), and it can prepare CPython to experiment tagged pointers.

    Longer rationale:

    I propose to incremental evolve the existing limited C API towards opaque PyObject, by trying to reduce the risk of breakage.

    We may test changes on PyQt which uses the limited C API.

    Another idea would be to convert some C extensions of the standard library to the limited C API. It would ensure that the limited C API contains enough functions to be useful, but would also notify us directly if the API is broken.

    @vstinner
    Copy link
    Member Author

    @vstinner vstinner commented Feb 6, 2020

    Another idea would be to convert some C extensions of the standard library to the limited C API. It would ensure that the limited C API contains enough functions to be useful, but would also notify us directly if the API is broken.

    First issues that I met when I tried that:

    * C code generated by Argument Clinic is incompatible the limited C API: METH_FASTCALL, _PyArg_CheckPositional(), static _PyArg_Parser, etc. are excluded from the limited C API.
    * PyTypeObject is opaque and so it's not possible to implement a deallocator function (tp_dealloc) which calls tp_free like:
      Py_TYPE(self)->tp_free((PyObject*)self);
    * _Py_IDENTIFIER() is not part of the limited C API

    @vstinner
    Copy link
    Member Author

    @vstinner vstinner commented Feb 6, 2020

    New changeset a93c51e by Victor Stinner in branch 'master':
    bpo-39573: Use Py_REFCNT() macro (GH-18388)
    a93c51e

    @vstinner
    Copy link
    Member Author

    @vstinner vstinner commented Feb 7, 2020

    @vstinner
    Copy link
    Member Author

    @vstinner vstinner commented Feb 7, 2020

    In the limited C API, Py_REFCNT() should be converted to:

    static inline Py_ssize_t _Py_REFCNT(const PyObject *ob)
    { return ob->ob_refcnt; }
    #define Py_REFCNT(ob) _Py_REFCNT(_PyObject_CAST(ob))

    It would enforce the usage of newly added Py_SET_REFCNT() (PR 18389) and advertise that the object is not modified (const).

    That would only be the first step towards a really opaque Py_REFCNT() function.

    @vstinner
    Copy link
    Member Author

    @vstinner vstinner commented Feb 7, 2020

    TODO: Add Py_IS_TYPE() macro:

    #define Py_IS_TYPE(ob, tp) (Py_TYPE(ob) == (tp)) 

    For example, replace:

        #define PyBool_Check(x) (Py_TYPE(x) == &PyBool_Type) 

    with:

        #define PyBool_Check(x) Py_IS_TYPE(x, &PyBool_Type)

    IMHO it makes the code more readable.

    nascheme@c156300

    @vstinner
    Copy link
    Member Author

    @vstinner vstinner commented Feb 7, 2020

    New changeset c86a112 by Victor Stinner in branch 'master':
    bpo-39573: Add Py_SET_REFCNT() function (GH-18389)
    c86a112

    @vstinner
    Copy link
    Member Author

    @vstinner vstinner commented Feb 7, 2020

    New changeset 0d76d2b by Victor Stinner in branch 'master':
    bpo-39573: Use Py_TYPE() in abstract.c (GH-18390)
    0d76d2b

    @vstinner
    Copy link
    Member Author

    @vstinner vstinner commented Feb 7, 2020

    Py_TYPE() is commonly used to render the type name in an error message. Example:

    PyErr_Format(PyExc_TypeError,
                 "cannot convert '%.200s' object to bytearray",
                 Py_TYPE(arg)->tp_name);

    This code has multiple issues:

    • It truncates type name to 200 characters: there is no Python exception, not even a marker to indicate that the string has been truncated
    • It's only the short name: the qualified name (tp_qualname) would be more helpful. The best would be to generate the fully qualified name: module + qualname.
    • Py_TYPE() returns a borrowed reference which is causing multiple issues: https://pythoncapi.readthedocs.io/bad_api.html#borrowed-references

    In September 2018, I created bpo-34595: "PyUnicode_FromFormat(): add %T format for an object type name". But there was disagreement, so I rejected my change.

    I started "bpo-34595: How to format a type name?" thread on python-dev:

    I didn't continue this work (until now), since it wasn't my priority.

    @vstinner
    Copy link
    Member Author

    @vstinner vstinner commented Feb 7, 2020

    New changeset a102ed7 by Victor Stinner in branch 'master':
    bpo-39573: Use Py_TYPE() macro in Python and Include directories (GH-18391)
    a102ed7

    @vstinner
    Copy link
    Member Author

    @vstinner vstinner commented Feb 7, 2020

    Make PyObject an opaque structure is also a first step towards the more ambitious project "HPy" project which is fully opaque:
    https://github.com/pyhandle/hpy

    This API is written from scratch and currently implemented on top on the existing C API.

    The following article is a nice introduction to the overall idea:
    https://morepypy.blogspot.com/2019/12/hpy-kick-off-sprint-report.html

    From my point of view, the long term goal would be to get better performance on PyPy and having a single API for C extension which would be efficient on all Python implementations (not only CPython).

    Currently, the C API is not only a performance issue to run C extensions on PyPy. It's also an issue in CPython. Because the C API leaks too many implementation details, we cannot experiment optimizations.

    See also: https://pythoncapi.readthedocs.io/rationale.html

    @vstinner
    Copy link
    Member Author

    @vstinner vstinner commented Feb 7, 2020

    New changeset 58ac700 by Victor Stinner in branch 'master':
    bpo-39573: Use Py_TYPE() macro in Objects directory (GH-18392)
    58ac700

    @vstinner
    Copy link
    Member Author

    @vstinner vstinner commented Feb 7, 2020

    New changeset daa9756 by Victor Stinner in branch 'master':
    bpo-39573: Use Py_TYPE() macro in Modules directory (GH-18393)
    daa9756

    @vstinner
    Copy link
    Member Author

    @vstinner vstinner commented Feb 7, 2020

    New changeset d2ec81a by Victor Stinner in branch 'master':
    bpo-39573: Add Py_SET_TYPE() function (GH-18394)
    d2ec81a

    @vstinner
    Copy link
    Member Author

    @vstinner vstinner commented Feb 7, 2020

    To make PyObject opaque, we would have to convert Py_INCREF() and Py_DECREF() to opaque function calls. Example:

    #define Py_XINCREF(op) Py_IncRef(op)
    #define Py_XDECREF(op) Py_DecRef(op)

    Benchmarks should be run to measure to overhead and balance the advantages and drawbacks.

    @vstinner
    Copy link
    Member Author

    @vstinner vstinner commented Feb 7, 2020

    Would a Py_TYPE_IS() macro help code readability?

    For example:
        #define Future_CheckExact(obj) (Py_TYPE(obj) == &FutureType)
    would become:
        #define Future_CheckExact(obj) (Py_TYPE_IS(obj, &FutureType))

    Py_TYPE_IS() would be more efficient for tagged pointers.

    I'm not sure about the macro name. Neil used Py_IS_TYPE(obj, type).

    Note: Py_TYPE_EQ(obj, type) name sounds confusing since the first parameter is an object, whereas the second one is a type.

    @vstinner
    Copy link
    Member Author

    @vstinner vstinner commented Feb 7, 2020

    New changeset c65b320 by Victor Stinner in branch 'master':
    bpo-39573: Use Py_TYPE() macro in object.c (GH-18398)
    c65b320

    @vstinner
    Copy link
    Member Author

    @vstinner vstinner commented Feb 7, 2020

    New changeset b10dc3e by Victor Stinner in branch 'master':
    bpo-39573: Add Py_SET_SIZE() function (GH-18400)
    b10dc3e

    @serhiy-storchaka
    Copy link
    Member

    @serhiy-storchaka serhiy-storchaka commented Feb 7, 2020

    You have merged so much PRs today. What they do?

    PyObject cannot just be made an opaque structure. The user code reads and writes its fields directly and via macros. This change would break working code.

    We can encourage the user code to prepare to making PyObject an opaque structure. We need to provide a stable C API for access of PyObject fields for this. Note that there is a performance penalty of using functions instead of direct access, so you should have very good reasons to do this.

    @vstinner
    Copy link
    Member Author

    @vstinner vstinner commented Feb 7, 2020

    You have merged so much PRs today. What they do?

    I merged changes which prepares CPython code base to make PyObject opaque. I only merged changes which should have no impact on performance, but prepare the API to make the structure opaque.

    Right now, Py_SET_REFNCT() stills access directly to PyObject.ob_refcnt. But it becomes possible to make Py_SET_REFNCT() an opaque function call.

    Do you see any issue with the changes that I already merged? Using PGO+LTO, static inline functions should be as efficient as the previous code using Py_REFCNT() & cie macros.

    PyObject cannot just be made an opaque structure. The user code reads and writes its fields directly and via macros. This change would break working code.

    I'm trying to modifying the limited C API to make it possible: all access to PyObject fields should go through macros or function calls. The question is now how which fields are accessed and how.

    We can encourage the user code to prepare to making PyObject an opaque structure. We need to provide a stable C API for access of PyObject fields for this.

    For the short term, I don't plan to make PyObject opaque, so I don't plan to enforce usage of Py_TYPE(), Py_SET_REFCNT(), etc.

    Note that there is a performance penalty of using functions instead of direct access, so you should have very good reasons to do this.

    Yeah, replacing Py_REFCNT() macro with an opaque function call is likely to have an impact on performance. It should be properly measure, I'm well aware of that, I already wrote it in a previous comment ;-) I don't plan to push change such right now. And I will wait for the review of my peers (like you) for such change ;-)

    @vstinner
    Copy link
    Member Author

    @vstinner vstinner commented Feb 7, 2020

    New changeset 60ac6ed by Victor Stinner in branch 'master':
    bpo-39573: Use Py_SET_SIZE() function (GH-18402)
    60ac6ed

    @zooba
    Copy link
    Member

    @zooba zooba commented Feb 8, 2020

    "static inline" functions are not opaque - as they get inlined into 3rd-party compiled code, we can't change anything they reference, and so the structure layout is still fixed and has to be visible to the user's compiler.

    I'm not totally against the changes, but it's worth pointing out that you aren't achieving what the issue title claims, so it's really just code cleanliness (and/or introducing macro-users to static inline functions ;) ).

    @vstinner
    Copy link
    Member Author

    @vstinner vstinner commented Feb 8, 2020

    "static inline" functions are not opaque

    I'm well aware of that :-) But once the CPython code base will stop accessing directly PyObject fields directly, it would become possible to experiment changing PyObject layout, at least testing it in CPython.

    First changes are just to prepare the code base to experiment the real change. But as Serhiy pointed out, the second part will have an impact on performance and so should be carefully benchmarked to balance advantages and drawbacks, even if it's only done in the limited C API.

    @vstinner
    Copy link
    Member Author

    @vstinner vstinner commented Feb 8, 2020

    New changeset 7f6f7ee by Dong-hee Na in branch 'master':
    bpo-39573: Use Py_TYPE() macro in ctypes.h (GH-18411)
    7f6f7ee

    @corona10
    Copy link
    Member

    @corona10 corona10 commented Feb 12, 2020

    FYI, I am working on to add Py_IS_TYPE macro. :)

    @shihai1991
    Copy link
    Member

    @shihai1991 shihai1991 commented Feb 13, 2020

    Hi, guys. Is there value in adding PyNone_Check macro?(_PyNone_Type is not esposed to CAPI directly, so I am not sure about it)
    If the answer is 'yes', i can add it ;)

    @vstinner
    Copy link
    Member Author

    @vstinner vstinner commented Feb 13, 2020

    Hi, guys. Is there value in adding PyNone_Check macro?

    "obj == Py_None" is a very common pattern.

    You have check how it is done in HPy: https://github.com/pyhandle/hpy

    See also bpo-39511: "[subinterpreters] Per-interpreter singletons (None, True, False, etc.)".

    @shihai1991
    Copy link
    Member

    @shihai1991 shihai1991 commented Feb 13, 2020

    "obj == Py_None" is a very common pattern.
    You have check how it is done in HPy: https://github.com/pyhandle/hpy
    See also bpo-39511: "[subinterpreters] Per-interpreter singletons (None, >True, False, etc.)".

    Thanks, I will check it.

    @vstinner
    Copy link
    Member Author

    @vstinner vstinner commented Nov 19, 2020

    And breezy:
    https://bugzilla.redhat.com/show_bug.cgi?id=1890880 (not yet reported upstream)

    Oh, I didn't notice that this project is broken by the Py_REFCNT() change. I expected it to be broken by the Py_TYPE() change as others.

    Should we revert the Py_REFCNT() change as well? So far, breezy is the only impacted project, and the fix should be simple, no?

    breezy uses "Py_REFCNT(self) -= 1;" instead of "Py_DECREF(self);" in its StaticTuple_Intern() function:

    static StaticTuple *
    StaticTuple_Intern(StaticTuple *self)
    {
        PyObject *canonical_tuple = NULL;
    
        if (_interned_tuples == NULL || _StaticTuple_is_interned(self)) {
            Py_INCREF(self);
            return self;
        }
        /* SimpleSet_Add returns whatever object is present at self
         * or the new object if it needs to add it.
         */
        canonical_tuple = SimpleSet_Add(_interned_tuples, (PyObject *)self);
        if (!canonical_tuple) {
            // Some sort of exception, propogate it.
            return NULL;
        }
        if (canonical_tuple != (PyObject *)self) {
            // There was already a tuple with that value
            return (StaticTuple *)canonical_tuple;
        }
        self->flags |= STATIC_TUPLE_INTERNED_FLAG;
        // The two references in the dict do not count, so that the StaticTuple
        // object does not become immortal just because it was interned.
        Py_REFCNT(self) -= 1;
        return self;
    }

    But it also uses "Py_REFCNT(self) = 2;" to "revive dead object temporarily for Discard".

    static void
    StaticTuple_dealloc(StaticTuple *self)
    {
        int i, len;
    
        if (_StaticTuple_is_interned(self)) {
            /* revive dead object temporarily for Discard */
            Py_REFCNT(self) = 2;
            if (SimpleSet_Discard(_interned_tuples, (PyObject*)self) != 1)
                Py_FatalError("deletion of interned StaticTuple failed");
            self->flags &= ~STATIC_TUPLE_INTERNED_FLAG;
        }
        len = self->size;
        for (i = 0; i < len; ++i) {
            Py_XDECREF(self->items[i]);
        }
        Py_TYPE(self)->tp_free((PyObject *)self);
    }

    It sounds like an optimization using a set of "interned" tuples. Maybe to reduce the memory footprint.

    @vstinner
    Copy link
    Member Author

    @vstinner vstinner commented Nov 19, 2020

    And breezy:
    https://bugzilla.redhat.com/show_bug.cgi?id=1890880 (not yet reported upstream)

    I reported the issue to breezy upstream:
    https://bugs.launchpad.net/brz/+bug/1904868

    @vstinner
    Copy link
    Member Author

    @vstinner vstinner commented Dec 1, 2020

    I created the https://github.com/pythoncapi/upgrade_pythoncapi project which updates automatically a C extension to newer C API. For example, it uses Py_TYPE(), Py_SIZE(), Py_REFCNT(), Py_SET_SIZE(), Py_SET_SIZE() and Py_SET_REFCNT() functions.

    @vstinner
    Copy link
    Member Author

    @vstinner vstinner commented Dec 4, 2020

    Status:

    • Py_SET_REFCNT(), Py_SET_TYPE() and Py_SET_SIZE() functions added to Python 3.9.
    • Python and Cython have been modified to use Py_TYPE(), Py_SET_REFCNT(), Py_IS_TYPE(), etc.
    • pythoncapi_compat.h header file has been created to provide new functions to Python 3.6: https://github.com/pythoncapi/pythoncapi_compat
    • Script has been created to upgrade C extensions to add support for Python 3.10 without losing support for old Python versions: https://github.com/pythoncapi/pythoncapi_compat
    • PEP-620 "Hide implementation details from the C API" written
    • Py_TYPE() and Py_SIZE() were converted to a static inline function to deny "Py_TYPE(obj) = type;" syntax, but this change has been reverted during Python 3.10 development cycle since it broke too many C extension modules. (msg381337)

    TODO:

    • Maybe add a new formatter for type names (msg361523)
    • Avoid sizeof(PyObject) in PyType_FromSpec() (msg366473)

    The purpose of this issue is only to fix the API part. Replacing static inline functions with opaque function calls (stable ABI) is not in the scope of this issue.

    @vstinner
    Copy link
    Member Author

    @vstinner vstinner commented Dec 4, 2020

    @vstinner
    Copy link
    Member Author

    @vstinner vstinner commented Dec 9, 2020

    @vstinner
    Copy link
    Member Author

    @vstinner vstinner commented Dec 9, 2020

    @vstinner
    Copy link
    Member Author

    @vstinner vstinner commented Dec 9, 2020

    @vstinner
    Copy link
    Member Author

    @vstinner vstinner commented Jun 2, 2021

    Most projects broken by Py_TYPE and Py_SIZE changes have been fixed since last year. I succeeded to use my new pythoncapi_compat project on multiple C extensions. So I propose again to convert Py_TYPE and Py_SIZE macros to static inline functions: #26493

    I also proposed to promote the pythoncapi_compat project in the "C API: Porting to Python 3.10" section of "What's New In Python 3.10?" on python-dev:
    https://mail.python.org/archives/list/python-dev@python.org/thread/KHDZGCNOYEDUTSPAATUDP55ZSSQM5RRC/

    I already announced the pythoncapi_compat project on the capi-sig last December:
    https://mail.python.org/archives/list/capi-sig@python.org/thread/LFLXFMKMZ77UCDUFD5EQCONSAFFWJWOZ/

    @corona10
    Copy link
    Member

    @corona10 corona10 commented Jun 3, 2021

    So I propose again to convert Py_TYPE and Py_SIZE macros to static inline functions

    +1

    @vstinner
    Copy link
    Member Author

    @vstinner vstinner commented Jun 3, 2021

    New changeset f3fa63e by Victor Stinner in branch 'main':
    bpo-39573: Py_TYPE becomes a static inline function (GH-26493)
    f3fa63e

    @Fidget-Spinner
    Copy link
    Member

    @Fidget-Spinner Fidget-Spinner commented Jun 6, 2021

    @victor, git bisect tells me the change f3fa63e caused test_exceptions.ExceptionTests.test_recursion_in_except_handler to stack overflow only on windows debug builds. 3 windows buildbots using python debug mode is affected. Python compiled with release mode is *not* affected and passes the test. Here's an example error on one of the buildbots:

    https://buildbot.python.org/all/#/builders/596/builds/354/steps/4/logs/stdio

    I can also reproduce this locally. I tracked this issue down after a recursion in AST also caused a stack overflow, see my message here:
    https://bugs.python.org/msg395172

    TLDR: Windows builds seems to set stack size to 2MB, on *nix it's probably higher (usually 8MB). I suspect the static inline functions are not being inlined in windows debug builds, so every function call adds to the stack. In that message I proposed to increase the stack size on windows but there are some concerns (see msg395177). What do you think?

    @WildCard65
    Copy link
    Mannequin

    @WildCard65 WildCard65 mannequin commented Jun 6, 2021

    MSVC by default disables method inlining (/Ob0) when '/Od' is specified on the command line while the optimization options specify '/Ob2'.

    @vstinner
    Copy link
    Member Author

    @vstinner vstinner commented Jun 7, 2021

    Ken Jin: Please open a separated issue for test_exceptions.test_recursion_in_except_handler(). It's not directly related to marking PyObject opaque, as William Pickard explained.

    See my notes on the stack size and stack overflow on a recursion error on Windows:
    https://pythondev.readthedocs.io/unstable_tests.html#unlimited-recursion

    @pablogsal
    Copy link
    Member

    @pablogsal pablogsal commented Jun 8, 2021

    New changeset 6d518bb by Pablo Galindo in branch 'main':
    bpo-44348: Revert "bpo-39573: Py_TYPE becomes a static inline function (GH-26493)" (GH-26596)
    6d518bb

    @vstinner
    Copy link
    Member Author

    @vstinner vstinner commented Jun 10, 2021

    See also bpo-44378: "Py_IS_TYPE(): cast discards ‘const’ qualifier from pointer target type".

    If Py_TYPE() is converted again to a static inline function which takes a "const PyObject*" type, Py_IS_TYPE() can be modified again at the same time to use Py_TYPE().

    @vstinner
    Copy link
    Member Author

    @vstinner vstinner commented Sep 8, 2021

    New changeset cb15afc by Victor Stinner in branch 'main':
    bpo-39573: Py_TYPE becomes a static inline function (GH-28128)
    cb15afc

    @vstinner
    Copy link
    Member Author

    @vstinner vstinner commented Sep 8, 2021

    @vstinner
    Copy link
    Member Author

    @vstinner vstinner commented Sep 8, 2021

    At commit cb15afc, I am able to rename PyObject members (to make sure that the structure is not accessed directly), I only had to modify header files:

    • Py_REFCNT(), Py_SET_REFCNT()
    • Py_INCREF(), Py_DECREF()
    • Py_TYPE(), Py_SET_TYPE()
    • Py_IS_TYPE()

    And just two more C files which corner cases:

    --

    I did the same with PyVarObject, rename the ob_size member. I had to modify header files:

    • Py_SIZE(), Py_SET_SIZE()

    But I had to modify the following function of the array module:

    static int
    array_buffer_getbuf(arrayobject *self, Py_buffer *view, int flags)
    {
        ...
        if ((flags & PyBUF_ND)==PyBUF_ND) {
            view->shape = &((PyVarObject*)self)->ob_size;
        }
        ...
        return 0;
    }

    I'm not sure how to patch this function.

    --

    This experience doesn't check usage of sizeof(PyObject) and sizeof(PyVarObject) which would break if these structures become opaque. sizeof() issues are listed in previous comments.

    @vstinner
    Copy link
    Member Author

    @vstinner vstinner commented Sep 8, 2021

    Oh and obviously, it's not possible possible to define structures which *include* PyObject or PyVarObject if PyObject and PyVarObject become opaque. Example:

    typedef struct {
        PyObject ob_base;
        Py_ssize_t ob_size; /* Number of items in variable part */
    } PyVarObject;

    This C code requires the PyObject structure to be fully defined (not being opaque).

    A new C API and ABI where structures *don't* include PyObject or PyVarObject should be designed to allocate their members "before" the PyObject* pointer value. Something like the current PyGC_Head structure which is excluded from PyObject and stored *before* the "PyObject*" pointer.

    Simplified code which allocates memory for an object implementin the GC protocol:

    static PyObject *
    _PyObject_GC_Malloc(size_t basicsize)
    {
        ...
        size_t size = sizeof(PyGC_Head) + basicsize;
        ...
        PyGC_Head *g = (PyGC_Head *)PyObject_Malloc(size);
        ...
        PyObject *op = (PyObject *)(g + 1);
        return op;
    }

    @vstinner
    Copy link
    Member Author

    @vstinner vstinner commented Sep 8, 2021

    I changed the issue title to restrict its scope: "[C API] Avoid accessing PyObject and PyVarObject members directly: add Py_SET_TYPE() and Py_IS_TYPE(), disallow Py_TYPE(obj)=type".

    Making PyObject and PyVarObject structures opaque is a broader topic which should be splited into sub-issues.

    "Py_TYPE(obj)=type;" is now disallowed. I consider that the work of this issue is now completed and I close the issue.

    Thanks everyone who help to fix these tedious issues!

    You can continue to use this issue if you need my help to adapt your C extensions to Py_SET_TYPE()/Py_SET_SIZE().

    See also the upgrade_pythoncapi.py script of the pythoncapi_compat project which helps to port your C extensions without losing support for old Python versions:
    https://github.com/pythoncapi/pythoncapi_compat

    See also the Py_TYPE() change announcement on the capi-sig list:
    https://mail.python.org/archives/list/capi-sig@python.org/thread/WGRLTHTHC32DQTACPPX36TPR2GLJAFRB/

    @vstinner vstinner added 3.11 and removed 3.9 labels Sep 8, 2021
    @vstinner vstinner closed this as completed Sep 8, 2021
    @vstinner vstinner changed the title [C API] Make PyObject an opaque structure in the limited C API [C API] Avoid accessing PyObject and PyVarObject members directly: add Py_SET_TYPE() and Py_IS_TYPE(), disallow Py_TYPE(obj)=type Sep 8, 2021
    @vstinner
    Copy link
    Member Author

    @vstinner vstinner commented Oct 5, 2021

    I wrote an article about these changes:
    https://vstinner.github.io/c-api-abstract-pyobject.html

    It elaborates the rationale for making these changes.

    @vstinner
    Copy link
    Member Author

    @vstinner vstinner commented Jan 20, 2022

    @victor, git bisect tells me the change f3fa63e caused test_exceptions.ExceptionTests.test_recursion_in_except_handler to stack overflow only on windows debug builds.

    FYI this regression was handled last year in bpo-44348 "test_exceptions.ExceptionTests.test_recursion_in_except_handler stack overflow on Windows debug builds" and fixed at 2021-09-07 by using the trashcan mecanism in the BaseException deallocator function:

    New changeset fb30509 by Victor Stinner in branch 'main':
    bpo-44348: BaseException deallocator uses trashcan (GH-28190)
    fb30509

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Projects
    None yet
    Development

    No branches or pull requests

    9 participants