Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.
Sign upUse PyMem_RawMalloc on Python 3.4 and newer #7404
Conversation
This comment has been minimized.
This comment has been minimized.
Extract of the gdb traceback where PyMem_Malloc() is currently called with the GIL released:
|
This comment has been minimized.
This comment has been minimized.
PyMem_Malloc() is called in 3 other places:
fortranobject.c: It looks like fortran_doc() is called with the GIL held. (PyMem_Free is also called with the GIL held in this file.) numpy/random/mtrand/mtrand.pyx: It looks like PyMem_Malloc() is also called with the GIL held, but I don't know enough Cython to double check :-/ (Same for PyMem_Free(), it also looks to be correctly used.) |
This comment has been minimized.
This comment has been minimized.
It appears this does not fix the issue for Python <= 3.3 (in particular 2.7)? |
This comment has been minimized.
This comment has been minimized.
In practice, PyMem_Malloc() can be called without the GIL being held until Python 3.6. But you may have issues if you play with the tracemalloc debugger or other debug tools. It may change in Python 3.6 with my patch: With my patch, calling PyMem_Malloc() without the GIL being held can fail badly. |
This comment has been minimized.
This comment has been minimized.
Do we do this normally? ufunc.at is not in the best state unfortunately, I would be surprised if we really call PyMem_Malloc without the GIL in many places, though wouldn't rule it out. If it is always safe with this patch, I guess we do not have to worry about it though. If it can fail badly with the new behaviour, isn't there some reason why we should try to prefer using the new behaviour as well? Holding the GIL a little longer in ufunc.at is probably not an issue as well. Just curious. |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
I don't see anything wrong with the patch, just asking what the best thing for us to do here is. |
This comment has been minimized.
This comment has been minimized.
"There are some places where memory is acquired without the GIL, but I think we use malloc in those places. @Haypo What is the advantage of using PyMem_RawMalloc()?" PyMem_RawMalloc() calls malloc() in practice. The advantage of calling PyMem_RawMalloc() instead of calling directly malloc() is that you can use tracemalloc to track memory leaks: The overhead of calling of PyMem_RawMalloc() instead of malloc() is negligible: By the way, I'm working on an extension of the tracemalloc module for numpy: |
This comment has been minimized.
This comment has been minimized.
@Haypo That sounds good. I guess the question come down to whether it is better to fix the places where |
This comment has been minimized.
This comment has been minimized.
How do you plan to "fix" the code? It's not trivial to lock/unlock the GIL. If you know that in some cases, the GIL can be released, always use PyMem_RawMalloc(). You might also replace direct calls to malloc() with PyMem_RawMalloc() (on Python 3.4+) to be able to track all memory allocated by numpy using tracemalloc. |
This comment has been minimized.
This comment has been minimized.
You can use the PyGILState_* functions to reacquire GIL. Of course, this has some performance penalty (and I don't remember whether it bombs subinterpreters, but we don't support those in any case). But it may be better to just use malloc or PyMem_RawMalloc instead. |
This comment has been minimized.
This comment has been minimized.
Yeah, it's likely to kill performances :-)
In this case, I suggest to use PyMem_RawMalloc on Python 3.4+ or malloc() on older Python version. I see that numpy has a PY_USE_PYMEM define (you can see it in my change). |
This comment has been minimized.
This comment has been minimized.
Let me rephrase the question: is there an advantage to using PyMem_Malloc() when the GIL is not a problem? |
This comment has been minimized.
This comment has been minimized.
For debuggers like tracemalloc, the tracemalloc doesn't have to acquire the GIL, so it's faster. If my change https://bugs.python.org/issue26249 is accepted, PyMem_Malloc() will be faster than PyMem_RawMalloc() (faster than malloc()?) for allocations smaller or equal than 512 bytes. |
This comment has been minimized.
This comment has been minimized.
@Haypo Thanks for the clarification. I ask because, IIRC, we started using more PyMem_Malloc to take advantage of the memory pool for small objects. OTOH, 512 bytes = 8 doubles, which is not a very large array. @juliantaylor Thoughts? |
This comment has been minimized.
This comment has been minimized.
Do you mean the pymalloc memory allocator optimized for small objects? PyMem_Malloc() doesn't use pymalloc, only PyObject_Malloc(). My change https://bugs.python.org/issue26249 proposes to modify PyMem_Malloc() to use pymalloc too. |
This comment has been minimized.
This comment has been minimized.
PyArray_malloc should not be used anywhere where it is really relevant, we use PyObject_malloc for objects and our own small data allocator for small arrays. We also have our own variant of tracemalloc so thats not so relevant either. It has been discussed in the past to match that with pythons but pythons api is not sufficiently flexible for that. |
This comment has been minimized.
This comment has been minimized.
Did you see my recent issue #26530 that I wrote for numpy? I wasn't aware that numpy has its own tool to track memory allocations. Interesting. |
This comment has been minimized.
This comment has been minimized.
Well, it has its own interface for hooking the memory allocation to track it. AFAIK no-one actually uses these hooks though, so tracemalloc is interesting because it lets us free-ride on the larger python ecosystem :-) |
This comment has been minimized.
This comment has been minimized.
Just so. There is a definite advantage to using standard tools that people will be more familiar with and probably expect. |
This comment has been minimized.
This comment has been minimized.
Could use a mention in the |
This comment has been minimized.
This comment has been minimized.
arigo
commented
Mar 15, 2016
@charris: 512 bytes is 64 doubles, not 8. |
This comment has been minimized.
This comment has been minimized.
My pull request fixes support for Python 3.6. Can it be merged? I suggest to defer changes to hold the GIL and to call PyMem_Malloc() rather than PyMem_RawMalloc(). PyMem_Malloc() and PyMem_RawMalloc() use exactly the same memory allocator in Python 3.4 and 3.5. My change to optimize the PyMem_Malloc() is not merged in CPython 3.6 yet, so it still uses the same allocator. |
This comment has been minimized.
This comment has been minimized.
@Haypo: sure, it sounds like maybe we should think about revisiting how If PyMem_Malloc and PyMem_RawMalloc are the same in 3.4 and earlier, then why does your patch have an |
This comment has been minimized.
This comment has been minimized.
My patch uses PyMem_RawMalloc on Python >= 3.4. Extract of the patch:
It cannot be used on older Python versions since the function was introduced in Python 3.4 ;-) |
This comment has been minimized.
This comment has been minimized.
ah, right, silly me :-) Can you expand the comment a bit, to note that (a) numpy sometimes calls |
Change PyArray_malloc() macro to use PyMem_RawMalloc() on Python 3.4 and newer. This macro can be called indirectly from ufunc_at() which releases the GIL, whereas PyMem_Malloc() requires the GIL to be held: https://docs.python.org/dev/c-api/memory.html#memory-interface PyMem_RawMalloc() can be called without the GIL.
This comment has been minimized.
This comment has been minimized.
Done. Python 3.6 has a new PYTHONMALLOC=debug env variable which will now fail with a fatal error in PyMem_Malloc() if the function is called without holding the GIL. And I'm now more confident to push my change modifying PyMem_Malloc() to reuse PyObject_Malloc() allocator, and so get the fast allocator for memory blocks <= 512 bytes. |
This comment has been minimized.
This comment has been minimized.
@njsmith: By the way, what's the status of tracemalloc support in numpy? It's not the first time that someone joined #python-fr help channel and had a question about the memory consumption of an application using numpy. |
This comment has been minimized.
This comment has been minimized.
@njsmith: Oh, I now recall that I was waiting for your feedback on https://bugs.python.org/issue26530 to be able to track more memory allocations in numpy! |
This comment has been minimized.
This comment has been minimized.
LGTM, thanks -- and sorry about the slowness |
This comment has been minimized.
This comment has been minimized.
Thank you. You didn't reply to my tracemalloc questions ;-) |
vstinner commentedMar 10, 2016
Change PyArray_malloc() macro to use PyMem_RawMalloc() on Python 3.4
and newer. This macro can be called indirectly from ufunc_at() which
releases the GIL, whereas PyMem_Malloc() requires the GIL to be held:
https://docs.python.org/dev/c-api/memory.html#memory-interface
PyMem_RawMalloc() can be called without the GIL.