Skip to content

C API: Add PyObject_AsObjectArray() function: get tuple/list items as PyObject** array #106593

@vstinner

Description

@vstinner

The Python C API has an efficient API to access tuple/list items:

  • seq = PySequence_Fast(obj)
  • size = PySequence_Fast_GET_SIZE(seq);
  • item = PySequence_Fast_GET_ITEM(seq, index);
  • items = PySequence_Fast_ITEMS(seq); -- then you can use items[0], items[1], ...
  • Py_DECREF(seq); -- release the "view" on the tuple/list

Problem: If obj is not a tuple or a list, the function is inefficient: it creates a temporary list. It's not possible to implement an "object array view" protocol in *3rd party C extension types.

The other problem is that the &PyTuple_GET_ITEM(tuple, 0) and &PyList_GET_ITEM(tuple, 0) code to get a direct access to an object array doesn't give a clear control on when the array remains valid. The returning pointer becomes a dangling pointer if the tuple/list is removed in the meanwhile.

I propose designing a new more generic API:

PyAPI_FUNC(int) PySequence_AsObjectArray(
    PyObject *,
    PyResource *res,
    PyObject ***parray,
    Py_ssize_t *psize);

The API gives a PyObject** array and it's Py_ssize_t size and rely on a new PyResource API to "release the resource" (view on this sequence).

The PyResource API is proposed separately: see issue #106592.

Example of usage:

void func(PyObject *seq)
{
    PyResource res;
    PyObject **items;
    Py_ssize_t nitem;
    if (PySequence_AsObjectArray(seq, &res, &items, &nitem) < 0) {
        if (PyErr_ExceptionMatches(PyExc_TypeError)) {
            PyErr_SetString(PyExc_TypeError, "items() returned non-iterable");
        }
        goto error;
    }
    if (nitem != 2) {
        PyErr_SetString(PyExc_TypeError,
                        "items() returned item which size is not 2");
        PyResource_Release(&res);
        goto error;
    }

    // items or it may be cleared while accessing __abstractmethod__
    // So we need to keep strong reference for key
    PyObject *key = Py_NewRef(items[0]);
    PyObject *value = Py_NewRef(items[1]);
    PyResource_Release(&res);

    // ... use key and value ...
    
    Py_DECREF(key);
    Py_DECREF(value);
}

This design is more generic: later, we can add a protocol to let a custom type to implement its own "object array view" and implement the "release function" with arbitrary code: it doesn't have to rely on PyObject reference counting. For example, a view can be a memory block allocated on the heap, the release function just would release the memory.

Providing such protocol is out of the scope of this issue. Maybe we can reuse the Py_buffer protocol for that.

Linked PRs

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions