-
-
Notifications
You must be signed in to change notification settings - Fork 29.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add *Calloc functions to CPython memory allocation API #65432
Comments
Numpy would like to switch to using the CPython allocator interface in order to take advantage of the new tracemalloc infrastructure in 3.4. But, numpy relies on the availability of calloc(), and the CPython allocator API does not expose calloc(). So, we should add *Calloc variants. This met general approval on python-dev. Thread here: This would involve adding a new .calloc field to the PyMemAllocator struct, exposed through new API functions PyMem_RawCalloc, PyMem_Calloc, PyObject_Calloc. [It's not clear that all 3 would really be used, but since we have only one PyMemAllocator struct that they all share, it'd be hard to add support to only one or two of these domains and not the rest. And the higher-level calloc variants might well be used. Numpy array buffers are often small (e.g., holding only a single value), and these small buffers benefit from small-alloc optimizations; meanwhile, large buffers benefit from calloc optimizations. So it might be optimal to use a single allocator that has both.] We might also have to rename the PyMemAllocator struct to ensure that compiling old code with the new headers doesn't silently leave garbage in the .calloc field and lead to crashes. |
Here is a first patch adding the following functions: void* PyMem_RawCalloc(size_t n);
void* PyMem_Calloc(size_t n);
void* PyObject_Calloc(size_t n);
PyObject* _PyObject_GC_Calloc(size_t); It adds the following field after malloc field to PyMemAllocator structure: void* (*calloc) (void *ctx, size_t size); It changes the tracemalloc module to trace "calloc" allocations, add new tests and document new functions. The patch also contains an important change: PyType_GenericAlloc() uses calloc instead of malloc+memset(0). It may be faster, I didn't check. |
So what is the point of _PyObject_GC_Calloc ? |
General comment on patch: For the flag value that toggles zero-ing, perhaps use a different name, e.g. setzero, clearmem, initzero or somesuch instead of calloc? calloc already gets used to refer to both the C standard function and the function pointer structure member; it's mildly confusing to have it *also* refer to a boolean flag as well. |
Additional comment on clarity: Might it make sense to make the calloc structure member take both the num and size arguments that the underlying calloc takes? That is, instead of: void* (*calloc) (void *ctx, size_t size); Declare it as: void* (*calloc) (void *ctx, size_t num, size_t size); Beyond potentially allowing more detailed tracing info at some later point (and much like the original calloc, potentially allowing us to verify that the components do not overflow on multiply, instead of assuming every caller must multiply and check for themselves), it also seems like it's a bit more friendly to have the prototype for the structure calloc to follow the same pattern as the other members for consistency (Principle of Least Surprise): A context pointer, plus the arguments expected by the equivalent C function. |
Sorry for breaking it up, but the same comment on consistent prototypes mirroring the C standard lib calloc would apply to all the API functions as well, e.g. PyMem_RawCalloc, PyMem_Calloc, PyObject_Calloc and _PyObject_GC_Calloc, not just the structure function pointer. |
It calls calloc(size) instead of malloc(size), calloc() which can be faster than malloc()+memset(), see: _PyObject_GC_Calloc() is used by PyType_GenericAlloc(). If I understand directly, it is the default allocator to allocate Python objects. |
In numpy, I found the two following functions: /*NUMPY_API
* Allocates memory for array data.
*/
void* PyDataMem_NEW(size_t size);
/*NUMPY_API
* Allocates zeroed memory for array data.
*/
void* PyDataMem_NEW_ZEROED(size_t size, size_t elsize); So it looks like it needs two size_t parameters. Prototype of the C function calloc(): void *calloc(size_t nmemb, size_t size); I agree that it's better to provide the same prototype than calloc(). |
New patch:
|
Le 16/04/2014 04:40, STINNER Victor a écrit :
No, the question is why you didn't simply change _PyObject_GC_Malloc |
It will only make a difference if the allocated region is large enough |
Oh ok, I didn't understand. I don't like changing the behaviour of |
2014-04-16 3:18 GMT-04:00 Charles-François Natali <report@bugs.python.org>:
Even if there are only 10% of cases where it may be faster, I think I didn't check which objects use (indirectly) _PyObject_GC_Calloc(). |
I left a Rietveld comment, which probably did not get mailed. |
On mer., 2014-04-16 at 08:06 +0000, STINNER Victor wrote:
I've checked: lists, tuples, dicts and sets at least seem to use it. |
Patch version 3: remove _PyObject_GC_Calloc(), modify _PyObject_GC_Malloc() instead of use calloc() instead of malloc()+memset(0). |
Do you have benchmarks? (I'm not looking for an improvement, just no regression.) |
won't replacing _PyObject_GC_Malloc with a calloc cause Var objects (PyObject_NewVar) to be completely zeroed which I think they didn't before? |
Julian: No. See the diff: http://bugs.python.org/review/21233/diff/11644/Objects/typeobject.c The original GC_Malloc was explicitly memset-ing after confirming that it received a non-NULL pointer from the underlying malloc call; that memset is removed in favor of using the calloc call. |
Well, to be more specific, PyType_GenericAlloc was originally calling one of two methods that didn't zero the memory (one of which was GC_Malloc), then memset-ing. Just realized you're talking about something else; not sure if you're correct about this now, but I have to get to work, will check later if no one else does. |
I just tested it, PyObject_NewVar seems to use RawMalloc not the GC malloc so its probably fine. |
I read again some remarks about alignement, it was suggested to provide allocators providing an address aligned to a requested alignement. This topic was already discussed in bpo-18835. If Python doesn't provide such memory allocators, it was suggested to provide a "trace" function which can be called on the result of a successful allocator to "trace" an allocation (and a similar function for free). But this is very different from the design of the PEP-445 (new malloc API). Basically, it requires to rewrite the PEP-445. |
The alignement issue is really orthogonal to the calloc one, so IMO |
2014-04-27 10:30 GMT+02:00 Charles-François Natali <report@bugs.python.org>:
This issue was opened to be able to use tracemalloc on numpy. I would |
Then please at least rename the issue. Also, I don't see why Regarding the *Calloc functions: how about we provide a sane API I mean, why not expose:
PyAPI_FUNC(void *) PyMem_Calloc(size_t size);
insteaf of
PyAPI_FUNC(void *) PyMem_Calloc(size_t nelem, size_t elsize); AFAICT, the two arguments are purely historical (it was used when And http://www.eglibc.org/cgi-bin/viewvc.cgi/trunk/libc/malloc/malloc.c?view=markup I'm also concerned about the change to _PyObject_GC_Malloc(): it now |
Note to numpy devs: it would be great if some of you followed the |
I wrote a short microbenchmark allocating objects using my benchmark.py script. It looks like the operation "(None,) * N" is slower with calloc-3.patch, but it's unclear how much times slower it is. I don't understand why only this operation has different speed. Do you have ideas for other benchmarks? Using the timeit module: $ ./python.orig -m timeit '(None,) * 10**5'
1000 loops, best of 3: 357 usec per loop
$ ./python.calloc -m timeit '(None,) * 10**5'
1000 loops, best of 3: 698 usec per loop But with different parameters, the difference is lower: $ ./python.orig -m timeit -r 20 -n '1000' '(None,) * 10**5'
1000 loops, best of 20: 362 usec per loop
$ ./python.calloc -m timeit -r 20 -n '1000' '(None,) * 10**5'
1000 loops, best of 20: 392 usec per loop Results of bench_alloc.py: Common platform: Platform of campaign orig: Platform of campaign calloc: -----------------------------------+--------------+--------------- |
Isn't the point of reproducing the C API to allow quickly switching from calloc() to PyObject_Calloc()? |
The order of the nelem/elsize matters for readability. Otherwise it is Why would you assert that 'nelem' is one? |
What is it you want to see? NumPy already uses calloc; we benchmarked it when we added it and it made a huge difference to various realistic workloads :-). What NumPy gets out of this isn't calloc, it's access to tracemalloc. |
Patch version 6:
Stefan & Charles-François: I hope that the patch looks better to you. |
LGTM! |
@Stefan: Can you please review calloc-6.patch? Charles-François wrote that the patch looks good, but for such critical operation (memory allocation), I would prefer a second review ;) |
Victor, sure, maybe not right away. If you prefer to commit very soon, |
There is no need to hurry. |
New changeset 5b0fda8f5718 by Victor Stinner in branch 'default': |
I changed my mind :-p It should be easier for numpy to test the development version of Python. Let's wait for buildbots. |
Antoine Pitrou wrote:
I created the issue bpo-21419 for this idea. |
New changeset 62438d1b11c7 by Victor Stinner in branch 'default': |
I did a post-commit review. A couple of things:
calloc(size_t nmemb, size_t size) If a memory region of bytes is allocated, IMO 'nbytes' should be in the calloc(nbytes, 1) In the commit the parameters are reversed in many places, which confuses calloc(1, nbytes)
>>> x = bytearray(0)
>>> m = memoryview(x)
>>> x.__init__(10)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
BufferError: Existing exports of data: object cannot be re-sized
>>> x = bytearray(0)
>>> m = memoryview(x)
>>> x.__init__(10)
>>> x[0]
0
>>> m[0]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: index out of bounds
|
I forgot one thing:
|
Another thing:
Python 3.5.0a0 (default:62438d1b11c7+, May 3 2014, 23:35:03)
[GCC 4.8.2] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import failmalloc
>>> failmalloc.enable()
>>> bytes(1)
Segmentation fault (core dumped) |
A simple solution would be to change the name of the struct, so that non-updated libraries will get a compile error instead of a runtime crash. |
My final commit includes an addition to What's New in Python 3.5 doc, Even if the API is public, the PyMemAllocator thing is low level. It's not |
Did you see my second commit? It's nlt already fixed? |
I don't think so, I have revision 5d076506b3f5 here. |
PyObject_Malloc(100) asks to allocate one object of 100 bytes. For PyMem_Malloc() and PyMem_RawMalloc(), it's more difficult to guess, but I consider that char data[100] is a object of 100 bytes, but you call it I don't think that using nelem or elsize matters in practice. |
STINNER Victor <report@bugs.python.org> wrote:
Okay, then let's please call it: _PyObject_Calloc(void *ctx, size_t nobjs, size_t objsize)
_PyObject_Alloc(int use_calloc, void *ctx, size_t nobjs, size_t objsize) |
STINNER Victor <report@bugs.python.org> wrote:
I'm not sure: The usual case with ABI changes is that extensions may segfault Here the extension *is* recompiled and still segfaults.
Perhaps it's worth asking on python-dev. Nathaniel's suggestion isn't bad [1] I was told on python-dev that many people in fact do not recompile. |
New changeset 358a12f4d4bc by Victor Stinner in branch 'default': |
New changeset 6374c2d957a9 by Victor Stinner in branch 'default': |
Ok, I renamed the structure PyMemAllocator to PyMemAllocatorEx, so the compilation fails because PyMemAllocator name is not defined. Modules compiled for Python 3.4 will crash on Python 3.5 if they are not recompiled, but I hope that you recompile your modules when you don't use the stable ABI. Using PyMemAllocator is now more complex because it depends on the Python version. See for example the patch for pyfailmalloc: Using the C preprocessor, it's possible to limit the changes. |
"Okay, then let's please call it: "void * PyMem_RawCalloc(size_t nelem, size_t elsize);" prototype comes from the POSIX standad: I'm don't want to change the prototype in Python. Extract of Python documentation: .. c:function:: void* PyMem_RawCalloc(size_t nelem, size_t elsize) Allocates *nelem* elements each whose size in bytes is *elsize* (...) |
New changeset dff6b4b61cac by Victor Stinner in branch 'default': |
"2) I'm not happy with the refactoring in bytearray_init(). (...)
Ok, I reverted the change on bytearray(int) and opened the issue bpo-21644 to discuss these two optimizations. |
I reread the issue. I hope that I now addressed all issues. The remaining issue, bytearray(int) is now tracked by the new issue bpo-21644. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: