Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-85283: Add PyMem_RawMalloc() to the limited C API #108570

Merged
merged 1 commit into from Oct 17, 2023

Conversation

vstinner
Copy link
Member

@vstinner vstinner commented Aug 28, 2023

Add PyMem_RawMalloc(), PyMem_RawCalloc(), PyMem_RawRealloc() and PyMem_RawFree() to the limited C API.

These functions were added by Python 3.4 and are needed to port stdlib extensions to the limited C API, like grp and pwd.


📚 Documentation preview 📚: https://cpython-previews--108570.org.readthedocs.build/

@vstinner
Copy link
Member Author

See also What To Do About Custom Allocators and the GIL? discussion. I don't think that PyMem_RawMalloc() is causing any trouble here.

cc @ericsnowcurrently

@vstinner
Copy link
Member Author

These functions (...) are needed to port stdlib extensions to the limited C API, like grp and pwd.

The alternative is to no add these functions to limited C API, and convert grp and pwd to libc malloc() and free(). The drawback is that tracemalloc will no longer be able to trace their memory allocation.

@encukou
Copy link
Member

encukou commented Aug 31, 2023

If nogil/mimalloc is merged, will there be any reason to use these instead of PyMem_Malloc/PyMem_Free?

@vstinner
Copy link
Member Author

vstinner commented Sep 1, 2023

If nogil/mimalloc is merged, will there be any reason to use these instead of PyMem_Malloc/PyMem_Free?

I understood that with the nogil implementation:

  • PyMem_RawMalloc() will still call malloc() as before, there is no change.
  • PyMem_Malloc() will use minalloc, instead of pymalloc.
  • PyMem_Malloc() no longer requires the caller to hold the GIL, since there is no GIL anymore in the nogil build.

For me, PyMem_RawMalloc() will still be different than PyMem_RawMalloc().

On Python 3.12 and older, PyMem_RawMalloc() must be used if the GIL is not held. Such case was not compatible with the limited C API, since PyMem_RawMalloc() is not part of Python 3.12 limited C API. Having PyMem_RawMalloc() in the limited C API may help compatibility with older Python versions (which use the regular API, not the limited API).

cc @colesbury

@vstinner
Copy link
Member Author

vstinner commented Sep 1, 2023

Code search using regex PyMem_Raw on PyPI top 5,000 projects (at 2023-07-04).

The following 34 projects use PyMem_Raw*() functions:

  • Cython (0.29.36)
  • Pillow (10.0.0)
  • biopython (1.81)
  • bitstruct (8.17.0)
  • catboost (1.2)
  • cx_Freeze (6.15.2)
  • ddtrace (1.15.1)
  • dlib (19.24.2)
  • fastobo (0.12.2)
  • gnureadline (8.1.2)
  • mariadb (1.1.6)
  • memray (1.8.1)
  • mpi4py (3.1.4)
  • mujoco (2.3.6)
  • numba (0.57.1)
  • numpy (1.25.0)
  • onnx (1.14.0)
  • onnxoptimizer (0.3.13)
  • onnxsim (0.4.33)
  • orjson (3.9.1)
  • osmium (3.6.0)
  • osqp (0.6.3)
  • praat-parselmouth (0.4.3)
  • pyarmor (8.2.9)
  • pybind11 (2.10.4)
  • pygame (2.5.0)
  • pyinstaller (5.13.0)
  • pyjson5 (1.6.3)
  • pylibmc (1.6.3)
  • pyreadline (2.1)
  • pyreadline3 (3.4.1)
  • scs (3.2.3)
  • typed_ast (1.5.5)
  • uvloop (0.17.0)

@colesbury
Copy link
Contributor

What @vstinner wrote about nogil is accurate, except that PyMem_Malloc still requires holding the GIL (or equivalent in --disable-gil builds) due to the way GC heap traversal is implemented. It's possible that could be relaxed in the future.

@vstinner
Copy link
Member Author

vstinner commented Sep 1, 2023

What @vstinner wrote about nogil is accurate, except that PyMem_Malloc still requires holding the GIL (or equivalent in --disable-gil builds) due to the way GC heap traversal is implemented. It's possible that could be relaxed in the future.

The important information is that even with nogil, PyMem_RawMalloc() still fit an use case which is not fitted by PyMem_Malloc(). So it's still relevant to use PyMem_RawMalloc() in "nogil Python".

@ericsnowcurrently
Copy link
Member

This shouldn't affect per-interpreter GIL since the only specific concern there is thread-safety.

Note that the docs now (3.12+) say that all 3 domains must be thread-safe. (This was discussed in a DPO thread and changed for gh-105766.) So does the rationale here apply to the other two memory domains (PyMem_Malloc(), `PyObject_Malloc(), etc.)?

As to my opinion about this change, I think we should be a bit more cautious about expanding the limited API until we have more clarity on where we're going with the C-API (hopefully at the core sprint). I'm particularly reluctant to add PyMem_Raw*() if it's just for the sake of converting a stdlib extension module to the limited API.

@ericsnowcurrently
Copy link
Member

Note that the docs now (3.12+) say that all 3 domains must be thread-safe. (This was discussed in a DPO thread and changed for gh-105766.)

Just to clarify, the GIL must still be held when using the "mem" and "object" domains (PyMem_Malloc(), PyObject_Malloc(), etc.), since each GIL still protects that interpreter's pymalloc state. (Obviously that would change under no-gil.)

@vstinner
Copy link
Member Author

vstinner commented Sep 6, 2023

As to my opinion about this change, I think we should be a bit more cautious about expanding the limited API until we have more clarity on where we're going with the C-API (hopefully at the core sprint). I'm particularly reluctant to add PyMem_Raw*() if it's just for the sake of converting a stdlib extension module to the limited API.

My long term goal doesn't concern the Python stdlib extensions, but just converting all C extensions towards the limited C API.

Cython uses PyMem_RawMalloc() and has a build mode which targets the limited C API: define the CYTHON_LIMITED_API macro in your Cython code.

A quick search on PyPI top 5,000 projects tells me that 113 projects use PyMem_RawMalloc(). On these 113 projects, 79 use PyMem_Malloc() in code generated by Cython. I used the command:

$ ./search_pypi_top.py PYPI-2023-07-04/ '(PyMem_RawMalloc|PyMem_RawCalloc|PyMem_RawRealloc|PyMem_RawFree)' --cython -o raw_malloc -q

Affected projects (113):

  • ConfigSpace (0.7.1)
  • Cython (0.29.36)
  • DoubleMetaphone (1.1)
  • Pillow (10.0.0)
  • PyWavelets (1.4.1)
  • Theano-PyMC (1.1.2)
  • aesara (2.9.1)
  • affinegap (1.12)
  • aiohttp (3.8.4)
  • aiokafka (0.8.1)
  • arch (6.1.0)
  • asyncpg (0.27.0)
  • av (10.0.0)
  • bezier (2021.2.12)
  • biopython (1.81)
  • bitstruct (8.17.0)
  • catboost (1.2)
  • causalml (0.13.0)
  • cchardet (2.1.7)
  • cityhash (0.4.7)
  • clickhouse-driver (0.2.6)
  • cx_Freeze (6.15.2)
  • cyksuid (2.0.2)
  • cytoolz (0.12.1)
  • ddtrace (1.15.1)
  • debugpy (1.6.7)
  • dependency-injector (4.41.0)
  • diffq (0.2.4)
  • dlib (19.24.2)
  • dtaidistance (2.3.10)
  • econml (0.14.1)
  • editdistance (0.6.2)
  • fastavro (1.7.4)
  • fastdtw (0.3.4)
  • fastobo (0.12.2)
  • fastparquet (2023.7.0)
  • frozenlist (1.3.3)
  • fuzzysearch (0.7.3)
  • gensim (4.3.1)
  • gluonnlp (0.10.0)
  • gnureadline (8.1.2)
  • grapheme (0.6.0)
  • grpcio (1.56.0)
  • grpcio-tools (1.56.0)
  • hidapi (0.14.0)
  • httptools (0.5.0)
  • imagecodecs (2023.3.16)
  • insightface (0.7.3)
  • jq (1.4.1)
  • lightfm (1.17)
  • lupa (2.0)
  • lxml (4.9.2)
  • mariadb (1.1.6)
  • marisa-trie (0.8.0)
  • memray (1.8.1)
  • mojimoji (0.0.12)
  • mpi4py (3.1.4)
  • msgpack (1.0.5)
  • mujoco (2.3.6)
  • numba (0.57.1)
  • numcodecs (0.11.0)
  • numpy (1.25.0)
  • onnx (1.14.0)
  • onnxoptimizer (0.3.13)
  • onnxsim (0.4.33)
  • orjson (3.9.1)
  • osmium (3.6.0)
  • osqp (0.6.3)
  • peewee (3.16.2)
  • plyvel (1.5.0)
  • praat-parselmouth (0.4.3)
  • ptvsd (4.3.2)
  • pyarmor (8.2.9)
  • pybedtools (0.9.0)
  • pybind11 (2.10.4)
  • pycapnp (1.3.0)
  • pycld3 (0.22)
  • pyclipper-1.3.0.post4
  • pydevd (2.9.6)
  • pygame (2.5.0)
  • pyinstaller (5.13.0)
  • pyjson5 (1.6.3)
  • pylibmc (1.6.3)
  • pymoo (0.6.0.1)
  • pypcap (1.3.0)
  • pyreadline (2.1)
  • pyreadline3 (3.4.1)
  • pyreadstat (1.2.2)
  • pysam (0.21.0)
  • pysimdjson (5.0.2)
  • python-crfsuite (0.9.9)
  • pyworld (0.3.3)
  • pyzmq (25.1.0)
  • quadprog (0.1.11)
  • recordclass (0.19)
  • ruamel.yaml.clib (0.2.7)
  • ruptures (1.1.8)
  • sasl (0.3.1)
  • scikit-surprise (1.1.3)
  • scs (3.2.3)
  • spacy (3.5.4)
  • srsly (2.4.6)
  • ssh-python (1.0.0)
  • ssh2-python (1.0.0)
  • statsmodels (0.14.0)
  • thriftpy2 (0.4.16)
  • tslearn (0.6.0)
  • typed_ast (1.5.5)
  • uamqp (1.6.4)
  • uvloop (0.17.0)
  • versioneer (0.28)
  • wordcloud (1.9.2)
  • yarl (1.9.2)

@vstinner
Copy link
Member Author

@encukou @colesbury @ericsnowcurrently: I'm sorry, I didn't get your opinion about adding these 4 functions to the limited C API. Do you approve this change? Or are you against it?

@ericsnowcurrently
Copy link
Member

For just PyMem_RawMalloc(), I don't see a problem, but I'm not particularly invested in the decision. 😄

(In general, however, I think we should slow down on moving things to the limited API until after the discussions at the core sprint next month. I'm not saying that should block this PR though.)

@colesbury
Copy link
Contributor

Seems fine to me too. (To be clear, I don't think this affects PEP 703 one way or the other.)

@vstinner
Copy link
Member Author

I rebased my PR on the main branch.

Add PyMem_RawMalloc(), PyMem_RawCalloc(), PyMem_RawRealloc() and
PyMem_RawFree() to the limited C API.

These functions were added by Python 3.4 and are needed to port
stdlib extensions to the limited C API, like grp and pwd.
@vstinner vstinner merged commit cc71cc9 into python:main Oct 17, 2023
27 checks passed
@vstinner vstinner deleted the limited_pymem_raw branch October 17, 2023 00:41
@vstinner
Copy link
Member Author

I merged my PR, thanks for reviews. It should help many projects to use the limited C API.

aisk pushed a commit to aisk/cpython that referenced this pull request Feb 11, 2024
…8570)

Add PyMem_RawMalloc(), PyMem_RawCalloc(), PyMem_RawRealloc() and
PyMem_RawFree() to the limited C API.

These functions were added by Python 3.4 and are needed to port
stdlib extensions to the limited C API, like grp and pwd.

Co-authored-by: Erlend E. Aasland <erlend@python.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants