Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

C API: Add PyResource API: generic API to "close a resource" #106592

Closed
vstinner opened this issue Jul 10, 2023 · 13 comments
Closed

C API: Add PyResource API: generic API to "close a resource" #106592

vstinner opened this issue Jul 10, 2023 · 13 comments
Labels
topic-C-API type-feature A feature request or enhancement

Comments

@vstinner
Copy link
Member

vstinner commented Jul 10, 2023

Currently, the Python C API is designed to fit exactly the current CPython implementation. It makes CPython changes really complicated, and it forces other Python implementation to mimick exactly the exact CPython implementation, likely missing nice optimization opportunities.

Example: PyUnicode_AsUTF8() relies on the fact that a Unicode object caches the UTF-8 encoded string, and the returned pointer becomes a dangling pointer once the Python str object is deleted. This API is unsafe.

Example: PySequence_Fast() is designed for tuple and list types, but 3rd party C extensions cannot implement their own "object arrary view" protocol, the API returns a Python object and expects that Py_DECREF(result); will immediately delete the "fast sequence". The function is inefficient for 3rd party types which could directly provide an "object array view", since currently a temporary Python list is created.

I would like to decouple the C API from its implementation: have a more generic API to "close a function".

typedef struct {
    void (*close_func) (void *data);
    void *data;
} PyResource;

PyAPI_FUNC(void) PyResource_Close(PyResource *res);

#ifdef Py_BUILD_CORE
PyAPI_FUNC(void) _PyResource_DECREF(void *data);
#endif

Example of an hypothetical PyUnicode_AsUTF8Resource() API which gives the caller control on when the returned string is valid.

void _PyResource_DECREF(void *data)
{
    PyObject *obj = _PyObject_CAST(data);
    Py_DECREF(obj);
}

const char *
PyUnicode_AsUTF8Resource(PyObject *unicode, PyResource *res)
{
    res->close_func = _PyResource_DECREF;
    res->data = Py_NewRef(unicode);
    return PyUnicode_AsUTF8AndSize(unicode, NULL);
}

Usage of PyUnicode_AsUTF8Resource():

PyObject *unicode = PyUnicode_FromString("temporary string");
PyResource res;
const char* utf8 = PyUnicode_AsUTF8Resource(unicode, &res);
Py_DECREF(unicode);  // delete the object!!!

// ... code using utf8 ...
PyResource_Close(&res);

This code is valid even if the Python str object is deleted by Py_DECREF(unicode);.

This example uses reference counting, but PyUnicode_AsUTF8Resource() can create allocate a copy on the heap, and the close function would just free the memory.

UPDATE: I renamed PyResource_Release() to PyResource_Close(): "closing" a resource expression sounds commonly used.

Linked PRs

@vstinner
Copy link
Member Author

See also "Function must not return a pointer to content without an explicit resource management" issue: capi-workgroup/problems#57

@vstinner
Copy link
Member Author

I created "C API: Add PyObject_AsObjectArray() function: get tuple/list items as PyObject** array" which uses this proposed PyResource API: issue #106593.

@vstinner
Copy link
Member Author

Proof-of-Concept PR to show how this API can be implemented and used: PR #106596.

@encukou
Copy link
Member

encukou commented Jul 10, 2023

This needs a non-opaque struct, which is a pretty heavy addition. Does it need to be opaque? Should it be versioned?

@encukou
Copy link
Member

encukou commented Jul 10, 2023

IMO, this should be a PEP, as it affects many various kinds of objects, and it should get feedback from alternate implementations.

@vstinner
Copy link
Member Author

PyResource structure members can be public. Maybe it's better to make it private, but I don't think that it's a requirement.

If we want to make it private, we can declare "fake members" to make the structure big enough for our needs, and use a different structure in the implementation. I would like to avoid having to allocate PyResource structure on the heap memory, so just write PyResource res; to allocate it on the stack memory (fast, reliable).

@vstinner
Copy link
Member Author

For comparison with other "opaque handle APIs", you can see PyResource_Release() as:

  • Windows HANDLE: CloseHandle(handle);
  • Unix file descriptor: close(fd);
  • HPy handle: HPy_Close(handle);

The difference here is that PyResource usage is associated with a pointer which is not opaque :-)

IMO, this should be a PEP, as it affects many various kinds of objects, and it should get feedback from alternate implementations.

I propose an API to discuss it and see if it would help fixing a class of bugs: prevent the risk of dangling pointer.

The Python C API has multiple functions returning a pointer without explicit lifetime management (no function to release the resource):

  • PyBytes_AsString(), PyBytes_AS_STRING()
  • PyByteArray_AsString(), PyByteArray_AS_STRING()
  • PyEval_GetFuncName()
  • PyUnicode_AsUTF8(), PyUnicode_AsUTF8AndSize()
  • PyCapsule_GetName()

There are different questions here:

  • Is if such API is generic enough to fix enough cases?
  • Is it annoying that the API is "too generic"? Or do users prefer specialized APIs?

For example, Python 3.12 has private _PySequence_BytesToCharpArray() API:

Flatten a sequence of bytes() objects into a C array of NULL terminated string pointers with a NULL char* terminating the array.

This API creates a buffer which must be released with _Py_FreeCharPArray().

PyResource API would avoid the need to have a specialized _Py_FreeCharPArray() function: it would reduce the size of the C API and make the code "more regular. Pseudo-code:

PyResource res;
argv = _PySequence_BytesToCharpArray(converted_args, &res);
// ... use argv ...
PyResource_Release(&res);

Internally, maybe PyResource_Release() can call _Py_FreeCharPArray(). The difference is that _Py_FreeCharPArray() doesn't have to be exposed as a public function in the C API.


I don't think that PyResource_Release() should be used too broadly. For example, if a function allocates a string with PyMem_Malloc(), IMO it's fine to request the caller to call PyMem_Free() on it when it's done with it.

The advantage of generalizing PyResource usage is that somehow it's a good hint that the function result is a new resource which may need to be released.

@encukou
Copy link
Member

encukou commented Jul 10, 2023

Another question for the PEP: Why not use Py_buffer?

@vstinner
Copy link
Member Author

Another question for the PEP: Why not use Py_buffer?

If you take PyUnicode_AsUTF8() as an example, how would the API look like to encode a Python str object to a char*?

For me, Py_buffer is kind of overcomplicated for such function: it supports memory layout with shapes, strides, suboffset, and it relies on a bf_buffer(obj, view, flags) and bf_releasebuffer(obj, view) protocol. I don't see how to bend it to fit to the discussed needs.

@encukou
Copy link
Member

encukou commented Jul 11, 2023

If you take PyUnicode_AsUTF8() as an example, how would the API look like to encode a Python str object to a char*?

Py_buffer buffer;
int result = PyUnicode_AsUTF8Buffer(string_obj, &buffer);
if(result < 0) goto error;
char *some_destination = malloc(buffer.len);
memcpy(some_destination, buffer.buf, buffer.len);
PyBuffer_Release(buffer);

I don't see how to bend it to fit to the discussed needs.

Set readonly=itemsize=ndim=1, format=shape=strides=suboffsets=NULL. Guarantee that so the caller doesn't need to check.
bf_buffer is not necessary for C API that fills a buffer.
bf_releasebuffer of the exporting object can be NULL if the data is stored/cached.

@vstinner vstinner changed the title C API: Add PyResource API: generic API to "release a resource" C API: Add PyResource API: generic API to "close a resource" Jul 13, 2023
@vstinner
Copy link
Member Author

If such API is added, it would be interesting to convert _PyType_Name() to a public function: this function is quite useful.

vstinner added a commit to vstinner/cpython that referenced this issue Jul 24, 2023
* Add PyResource structure and PyResource_Close() function
* Add Include/pyresource.h
* Add PyResource_Close() to the stable ABI
* Add PyBytes_AsStringRes() function.
vstinner added a commit to vstinner/cpython that referenced this issue Jul 24, 2023
* Add PyResource structure and PyResource_Close() function.
  Add them to the stable ABI.
* Add Include/pyresource.h.
* Add PyBytes_AsStringRes() function.
vstinner added a commit to vstinner/cpython that referenced this issue Jul 24, 2023
* Add PyResource structure and PyResource_Close() function.
  Add them to the stable ABI.
* Add Include/pyresource.h.
* Add PyBytes_AsStringRes() function.
@vstinner
Copy link
Member Author

@encukou:

IMO, this should be a PEP, as it affects many various kinds of objects, and it should get feedback from alternate implementations.

I wrote a draft PEP: https://discuss.python.org/t/pep-draft-add-pyresource-callback-c-api-to-close-resources/30331

@vstinner
Copy link
Member Author

I wrote a draft PEP: https://discuss.python.org/t/pep-draft-add-pyresource-callback-c-api-to-close-resources/30331

Sadly, I'm not confident that this API is solving a "real world" problem. It seems like there is little demand (me) to change the API. We may revisit the topic once of the concerned API would cause more troubles. For now, I prefer to give up on the issue.

I closed my draft PEP PR and I close this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic-C-API type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests

4 participants