Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add *Calloc functions to CPython memory allocation API #65432

Closed
njsmith opened this issue Apr 15, 2014 · 95 comments
Closed

Add *Calloc functions to CPython memory allocation API #65432

njsmith opened this issue Apr 15, 2014 · 95 comments
Labels
interpreter-core Interpreter core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement

Comments

@njsmith
Copy link
Contributor

njsmith commented Apr 15, 2014

BPO 21233
Nosy @pitrou, @vstinner, @njsmith, @skrah, @MojoVampire
Files
  • calloc.patch
  • calloc-2.patch
  • calloc-3.patch
  • bench_alloc.py
  • test.c
  • calloc-4.patch
  • use_calloc.patch
  • bench_alloc2.py
  • calloc-5.patch
  • calloc-6.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2014-06-02.20:29:45.268>
    created_at = <Date 2014-04-15.08:56:01.630>
    labels = ['interpreter-core', 'type-feature']
    title = 'Add *Calloc functions to CPython memory allocation API'
    updated_at = <Date 2014-06-02.20:29:45.267>
    user = 'https://github.com/njsmith'

    bugs.python.org fields:

    activity = <Date 2014-06-02.20:29:45.267>
    actor = 'vstinner'
    assignee = 'none'
    closed = True
    closed_date = <Date 2014-06-02.20:29:45.268>
    closer = 'vstinner'
    components = ['Interpreter Core']
    creation = <Date 2014-04-15.08:56:01.630>
    creator = 'njs'
    dependencies = []
    files = ['34897', '34903', '34924', '35052', '35059', '35063', '35064', '35065', '35072', '35097']
    hgrepos = []
    issue_num = 21233
    keywords = ['patch']
    message_count = 95.0
    messages = ['216281', '216390', '216394', '216399', '216403', '216404', '216422', '216425', '216431', '216433', '216444', '216451', '216452', '216455', '216515', '216567', '216668', '216671', '216681', '216682', '216686', '217228', '217242', '217246', '217251', '217252', '217253', '217254', '217255', '217256', '217257', '217262', '217274', '217276', '217282', '217283', '217284', '217291', '217294', '217295', '217297', '217298', '217302', '217303', '217304', '217305', '217306', '217307', '217308', '217309', '217310', '217323', '217324', '217325', '217326', '217330', '217331', '217333', '217346', '217348', '217349', '217351', '217357', '217375', '217380', '217382', '217423', '217436', '217445', '217549', '217553', '217594', '217617', '217619', '217785', '217786', '217794', '217797', '217826', '217829', '217831', '217832', '217838', '217839', '217840', '217841', '217844', '217866', '217972', '219627', '219628', '219630', '219631', '219634', '219635']
    nosy_count = 8.0
    nosy_names = ['pitrou', 'vstinner', 'njs', 'skrah', 'neologix', 'python-dev', 'jtaylor', 'josh.r']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = None
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue21233'
    versions = ['Python 3.5']

    @njsmith
    Copy link
    Contributor Author

    njsmith commented Apr 15, 2014

    Numpy would like to switch to using the CPython allocator interface in order to take advantage of the new tracemalloc infrastructure in 3.4. But, numpy relies on the availability of calloc(), and the CPython allocator API does not expose calloc().
    https://docs.python.org/3.5/c-api/memory.html#c.PyMemAllocator

    So, we should add *Calloc variants. This met general approval on python-dev. Thread here:
    https://mail.python.org/pipermail/python-dev/2014-April/133985.html

    This would involve adding a new .calloc field to the PyMemAllocator struct, exposed through new API functions PyMem_RawCalloc, PyMem_Calloc, PyObject_Calloc. [It's not clear that all 3 would really be used, but since we have only one PyMemAllocator struct that they all share, it'd be hard to add support to only one or two of these domains and not the rest. And the higher-level calloc variants might well be used. Numpy array buffers are often small (e.g., holding only a single value), and these small buffers benefit from small-alloc optimizations; meanwhile, large buffers benefit from calloc optimizations. So it might be optimal to use a single allocator that has both.]

    We might also have to rename the PyMemAllocator struct to ensure that compiling old code with the new headers doesn't silently leave garbage in the .calloc field and lead to crashes.

    @njsmith njsmith added interpreter-core Interpreter core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement labels Apr 15, 2014
    @vstinner
    Copy link
    Member

    vstinner commented Apr 15, 2014

    Here is a first patch adding the following functions:

      void* PyMem_RawCalloc(size_t n);
      void* PyMem_Calloc(size_t n);
      void* PyObject_Calloc(size_t n);
      PyObject* _PyObject_GC_Calloc(size_t);

    It adds the following field after malloc field to PyMemAllocator structure:

      void* (*calloc) (void *ctx, size_t size);

    It changes the tracemalloc module to trace "calloc" allocations, add new tests and document new functions.

    The patch also contains an important change: PyType_GenericAlloc() uses calloc instead of malloc+memset(0). It may be faster, I didn't check.

    @pitrou
    Copy link
    Member

    pitrou commented Apr 15, 2014

    So what is the point of _PyObject_GC_Calloc ?

    @MojoVampire
    Copy link
    Mannequin

    MojoVampire mannequin commented Apr 15, 2014

    General comment on patch: For the flag value that toggles zero-ing, perhaps use a different name, e.g. setzero, clearmem, initzero or somesuch instead of calloc? calloc already gets used to refer to both the C standard function and the function pointer structure member; it's mildly confusing to have it *also* refer to a boolean flag as well.

    @MojoVampire
    Copy link
    Mannequin

    MojoVampire mannequin commented Apr 15, 2014

    Additional comment on clarity: Might it make sense to make the calloc structure member take both the num and size arguments that the underlying calloc takes? That is, instead of:

    void* (*calloc) (void *ctx, size_t size);

    Declare it as:

    void* (*calloc) (void *ctx, size_t num, size_t size);

    Beyond potentially allowing more detailed tracing info at some later point (and much like the original calloc, potentially allowing us to verify that the components do not overflow on multiply, instead of assuming every caller must multiply and check for themselves), it also seems like it's a bit more friendly to have the prototype for the structure calloc to follow the same pattern as the other members for consistency (Principle of Least Surprise): A context pointer, plus the arguments expected by the equivalent C function.

    @MojoVampire
    Copy link
    Mannequin

    MojoVampire mannequin commented Apr 15, 2014

    Sorry for breaking it up, but the same comment on consistent prototypes mirroring the C standard lib calloc would apply to all the API functions as well, e.g. PyMem_RawCalloc, PyMem_Calloc, PyObject_Calloc and _PyObject_GC_Calloc, not just the structure function pointer.

    @vstinner
    Copy link
    Member

    vstinner commented Apr 16, 2014

    So what is the point of _PyObject_GC_Calloc ?

    It calls calloc(size) instead of malloc(size), calloc() which can be faster than malloc()+memset(), see:
    https://mail.python.org/pipermail/python-dev/2014-April/133985.html

    _PyObject_GC_Calloc() is used by PyType_GenericAlloc(). If I understand directly, it is the default allocator to allocate Python objects.

    @vstinner
    Copy link
    Member

    vstinner commented Apr 16, 2014

    In numpy, I found the two following functions:

    /*NUMPY_API
     * Allocates memory for array data.
     */
    void* PyDataMem_NEW(size_t size);
    
    /*NUMPY_API
     * Allocates zeroed memory for array data.
     */
    void* PyDataMem_NEW_ZEROED(size_t size, size_t elsize);

    So it looks like it needs two size_t parameters. Prototype of the C function calloc():

    void *calloc(size_t nmemb, size_t size);

    I agree that it's better to provide the same prototype than calloc().

    @vstinner
    Copy link
    Member

    vstinner commented Apr 16, 2014

    New patch:

    • replace "size_t size" with "size_t nelem, size_t elsize" in the prototype of calloc functions (the parameter names come from the POSIX standard)
    • replace "int calloc" with "int zero" in helper functions

    @pitrou
    Copy link
    Member

    pitrou commented Apr 16, 2014

    Le 16/04/2014 04:40, STINNER Victor a écrit :

    STINNER Victor added the comment:

    > So what is the point of _PyObject_GC_Calloc ?

    It calls calloc(size) instead of malloc(size)

    No, the question is why you didn't simply change _PyObject_GC_Malloc
    (which is a private function).

    @neologix
    Copy link
    Mannequin

    neologix mannequin commented Apr 16, 2014

    > So what is the point of _PyObject_GC_Calloc ?

    It calls calloc(size) instead of malloc(size), calloc() which can be faster than malloc()+memset(), see:
    https://mail.python.org/pipermail/python-dev/2014-April/133985.html

    It will only make a difference if the allocated region is large enough
    to be allocated by mmap (so not for 90% of objects).

    @vstinner
    Copy link
    Member

    vstinner commented Apr 16, 2014

    >> So what is the point of _PyObject_GC_Calloc ?
    >
    > It calls calloc(size) instead of malloc(size)

    No, the question is why you didn't simply change _PyObject_GC_Malloc
    (which is a private function).

    Oh ok, I didn't understand. I don't like changing the behaviour of
    functions, but it's maybe fine if the function is private.

    @vstinner
    Copy link
    Member

    vstinner commented Apr 16, 2014

    2014-04-16 3:18 GMT-04:00 Charles-François Natali <report@bugs.python.org>:

    > It calls calloc(size) instead of malloc(size), calloc() which can be faster than malloc()+memset(), see:
    > https://mail.python.org/pipermail/python-dev/2014-April/133985.html

    It will only make a difference if the allocated region is large enough
    to be allocated by mmap (so not for 90% of objects).

    Even if there are only 10% of cases where it may be faster, I think
    that it's interesting to use calloc() to allocate Python objects. You
    may create large Python objects ;-)

    I didn't check which objects use (indirectly) _PyObject_GC_Calloc().

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Apr 16, 2014

    I left a Rietveld comment, which probably did not get mailed.

    @pitrou
    Copy link
    Member

    pitrou commented Apr 16, 2014

    On mer., 2014-04-16 at 08:06 +0000, STINNER Victor wrote:

    I didn't check which objects use (indirectly) _PyObject_GC_Calloc().

    I've checked: lists, tuples, dicts and sets at least seem to use it.
    Obviously, objects which are not tracked by the GC (such as str and
    bytes) won't use it.

    @vstinner
    Copy link
    Member

    vstinner commented Apr 16, 2014

    Patch version 3: remove _PyObject_GC_Calloc(), modify _PyObject_GC_Malloc() instead of use calloc() instead of malloc()+memset(0).

    @neologix
    Copy link
    Mannequin

    neologix mannequin commented Apr 17, 2014

    Do you have benchmarks?

    (I'm not looking for an improvement, just no regression.)

    @jtaylor
    Copy link
    Mannequin

    jtaylor mannequin commented Apr 17, 2014

    won't replacing _PyObject_GC_Malloc with a calloc cause Var objects (PyObject_NewVar) to be completely zeroed which I think they didn't before?
    Some numeric programs stuff a lot of data into var objects and could care about python suddenly setting them to zero when they don't need it.
    An example would be tinyarray.

    @MojoVampire
    Copy link
    Mannequin

    MojoVampire mannequin commented Apr 17, 2014

    Julian: No. See the diff: http://bugs.python.org/review/21233/diff/11644/Objects/typeobject.c

    The original GC_Malloc was explicitly memset-ing after confirming that it received a non-NULL pointer from the underlying malloc call; that memset is removed in favor of using the calloc call.

    @MojoVampire
    Copy link
    Mannequin

    MojoVampire mannequin commented Apr 17, 2014

    Well, to be more specific, PyType_GenericAlloc was originally calling one of two methods that didn't zero the memory (one of which was GC_Malloc), then memset-ing. Just realized you're talking about something else; not sure if you're correct about this now, but I have to get to work, will check later if no one else does.

    @jtaylor
    Copy link
    Mannequin

    jtaylor mannequin commented Apr 17, 2014

    I just tested it, PyObject_NewVar seems to use RawMalloc not the GC malloc so its probably fine.

    @vstinner
    Copy link
    Member

    vstinner commented Apr 27, 2014

    I read again some remarks about alignement, it was suggested to provide allocators providing an address aligned to a requested alignement. This topic was already discussed in bpo-18835.

    If Python doesn't provide such memory allocators, it was suggested to provide a "trace" function which can be called on the result of a successful allocator to "trace" an allocation (and a similar function for free). But this is very different from the design of the PEP-445 (new malloc API). Basically, it requires to rewrite the PEP-445.

    @neologix
    Copy link
    Mannequin

    neologix mannequin commented Apr 27, 2014

    I read again some remarks about alignement, it was suggested to provide allocators providing an address aligned to a requested alignement. This topic was already discussed in bpo-18835.

    The alignement issue is really orthogonal to the calloc one, so IMO
    this shouldn't be discussed here (and FWIW I don't think we should
    expose those: alignement only matters either for concurrency or SIMD
    instructions, and I don't think we should try to standardize this kind
    of API, it's way to special-purpose (then we'd have to think about
    huge pages, etc...). Whereas calloc is a simple and immediately useful
    addition, not only for Numpy but also CPython).

    @vstinner
    Copy link
    Member

    vstinner commented Apr 27, 2014

    2014-04-27 10:30 GMT+02:00 Charles-François Natali <report@bugs.python.org>:

    > I read again some remarks about alignement, it was suggested to provide allocators providing an address aligned to a requested alignement. This topic was already discussed in bpo-18835.

    The alignement issue is really orthogonal to the calloc one, so IMO
    this shouldn't be discussed here (and FWIW I don't think we should
    expose those: alignement only matters either for concurrency or SIMD
    instructions, and I don't think we should try to standardize this kind
    of API, it's way to special-purpose (then we'd have to think about
    huge pages, etc...). Whereas calloc is a simple and immediately useful
    addition, not only for Numpy but also CPython).

    This issue was opened to be able to use tracemalloc on numpy. I would
    like to make sure that calloc is enough for numpy. I would prefer to
    change the malloc API only once.

    @neologix
    Copy link
    Mannequin

    neologix mannequin commented Apr 27, 2014

    This issue was opened to be able to use tracemalloc on numpy. I would
    like to make sure that calloc is enough for numpy. I would prefer to
    change the malloc API only once.

    Then please at least rename the issue. Also, I don't see why
    everything should be done at once: calloc support is a self-contained
    change, which is useful outside of numpy. Enhanced tracemalloc support
    for numpy certainly belongs to another issue.

    Regarding the *Calloc functions: how about we provide a sane API
    instead of reproducing the cumbersome C API?

    I mean, why not expose:
    PyAPI_FUNC(void *) PyMem_Calloc(size_t size);
    insteaf of
    PyAPI_FUNC(void *) PyMem_Calloc(size_t nelem, size_t elsize);

    AFAICT, the two arguments are purely historical (it was used when
    malloc() didn't guarantee suitable alignment, and has the advantage of
    performing overflow check when doing the multiplication, but in our
    code we always check for it anyway).
    See
    https://groups.google.com/forum/#!topic/comp.lang.c/jZbiyuYqjB4
    http://stackoverflow.com/questions/4083916/two-arguments-to-calloc

    And http://www.eglibc.org/cgi-bin/viewvc.cgi/trunk/libc/malloc/malloc.c?view=markup
    to check that calloc(nelem, elsize) is implemented as calloc(nelem *
    elsize)

    I'm also concerned about the change to _PyObject_GC_Malloc(): it now
    calls calloc() instead of malloc(): it can definitely be slower.

    @neologix
    Copy link
    Mannequin

    neologix mannequin commented Apr 27, 2014

    Note to numpy devs: it would be great if some of you followed the
    python-dev mailing list (I know it can be quite volume intensive, but
    maybe simple filters could help keep the noise down): you guys have
    definitely both expertise and real-life applications that could be
    very valuable in helping us design the best possible public/private
    APIs. It's always great to have downstream experts/end-users!

    @vstinner
    Copy link
    Member

    vstinner commented Apr 27, 2014

    I wrote a short microbenchmark allocating objects using my benchmark.py script.

    It looks like the operation "(None,) * N" is slower with calloc-3.patch, but it's unclear how much times slower it is. I don't understand why only this operation has different speed.

    Do you have ideas for other benchmarks?

    Using the timeit module:

    $ ./python.orig -m timeit '(None,) * 10**5'
    1000 loops, best of 3: 357 usec per loop
    $ ./python.calloc -m timeit '(None,) * 10**5'
    1000 loops, best of 3: 698 usec per loop

    But with different parameters, the difference is lower:

    $ ./python.orig -m timeit -r 20 -n '1000' '(None,) * 10**5'
    1000 loops, best of 20: 362 usec per loop
    $ ./python.calloc -m timeit -r 20 -n '1000' '(None,) * 10**5'
    1000 loops, best of 20: 392 usec per loop

    Results of bench_alloc.py:

    Common platform:
    CFLAGS: -Wno-unused-result -Werror=declaration-after-statement -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes
    CPU model: Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz
    Python unicode implementation: PEP-393
    Timer info: namespace(adjustable=False, implementation='clock_gettime(CLOCK_MONOTONIC)', monotonic=True, resolution=1e-09)
    Timer: time.perf_counter
    SCM: hg revision=462470859e57+ branch=default date="2014-04-26 19:01 -0400"
    Platform: Linux-3.13.8-200.fc20.x86_64-x86_64-with-fedora-20-Heisenbug
    Bits: int=32, long=64, long long=64, size_t=64, void*=64

    Platform of campaign orig:
    Timer precision: 42 ns
    Date: 2014-04-27 12:27:26
    Python version: 3.5.0a0 (default:462470859e57, Apr 27 2014, 11:52:55) [GCC 4.8.2 20131212 (Red Hat 4.8.2-7)]

    Platform of campaign calloc:
    Timer precision: 45 ns
    Date: 2014-04-27 12:29:10
    Python version: 3.5.0a0 (default:462470859e57+, Apr 27 2014, 12:04:57) [GCC 4.8.2 20131212 (Red Hat 4.8.2-7)]

    -----------------------------------+--------------+---------------
    Tests                              |         orig |         calloc
    -----------------------------------+--------------+---------------
    object()                           |    61 ns () |          62 ns
    b'A' * 10                          |    55 ns (
    ) |    51 ns (-7%)
    b'A' * 10**3                       |    99 ns () |          94 ns
    b'A' * 10**6                       |  37.5 us (
    ) |        36.6 us
    'A' * 10                           |    62 ns () |    58 ns (-7%)
    'A' * 10**3                        |   107 ns (
    ) |         104 ns
    'A' * 10**6                        |    37 us () |        36.6 us
    'A' * 10**8                        |  16.2 ms (
    ) |        16.4 ms
    decode 10 null bytes from ASCII    |   253 ns () |         248 ns
    decode 10**3 null bytes from ASCII |   359 ns (
    ) |         357 ns
    decode 10**6 null bytes from ASCII |  78.8 us () |        78.7 us
    decode 10**8 null bytes from ASCII |  26.2 ms (
    ) |        25.9 ms
    (None,) * 10**0                    |    30 ns () |          30 ns
    (None,) * 10**1                    |    78 ns (
    ) |          77 ns
    (None,) * 10**2                    |   427 ns () |   460 ns (+8%)
    (None,) * 10**3                    |   3.5 us (
    ) |   3.7 us (+6%)
    (None,) * 10**4                    |  34.7 us () |  37.2 us (+7%)
    (None,) * 10**5                    |   357 us (
    ) |   390 us (+9%)
    (None,) * 10**6                    |  3.86 ms () | 4.43 ms (+15%)
    (None,) * 10**7                    |  50.4 ms (
    ) |        50.3 ms
    (None,) * 10**8                    |   505 ms () |         504 ms
    ([None] * 10)[1:-1]                |   121 ns (
    ) |         120 ns
    ([None] * 10**3)[1:-1]             |  3.57 us () |        3.57 us
    ([None] * 10**6)[1:-1]             |  4.61 ms (
    ) |        4.59 ms
    ([None] * 10**8)[1:-1]             |   585 ms () |         582 ms
    -----------------------------------+--------------+---------------
    Total                              | 1.19 sec (
    ) |       1.19 sec
    -----------------------------------+--------------+---------------

    @pitrou
    Copy link
    Member

    pitrou commented Apr 27, 2014

    Regarding the *Calloc functions: how about we provide a sane API
    instead of reproducing the cumbersome C API?

    Isn't the point of reproducing the C API to allow quickly switching from calloc() to PyObject_Calloc()?
    (besides, it seems the OpenBSD guys like the two-argument form :-))

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Apr 28, 2014

    The order of the nelem/elsize matters for readability. Otherwise it is
    not intuitive what happens after the jump to redirect in _PyObject_Alloc().

    Why would you assert that 'nelem' is one?

    @njsmith
    Copy link
    Contributor Author

    njsmith commented Apr 28, 2014

    It would be interesting to see some NumPy benchmarks (Nathaniel?).

    What is it you want to see? NumPy already uses calloc; we benchmarked it when we added it and it made a huge difference to various realistic workloads :-). What NumPy gets out of this isn't calloc, it's access to tracemalloc.

    @vstinner
    Copy link
    Member

    vstinner commented Apr 29, 2014

    Patch version 6:

    • I renamed "int zero" parameter to "int use_calloc" and move the new parameter at the first position to avoid confusion with nelem. For example, _PyObject_Alloc(ctx, 1, nbytes, 0) becomes _PyObject_Alloc(0, ctx, 1, nbytes). It also more logical to put it in the first position. In bytesobject.c, I leaved it at the parameter at the end since its meaning is different (fill bytes with zero or not) IMO.

    • I removed my hack (premature optimization) "assert(nelem == 1); ... malloc(elsize);" and replaced it with a less surprising "... malloc(nelem * elsize);"

    Stefan & Charles-François: I hope that the patch looks better to you.

    @neologix
    Copy link
    Mannequin

    neologix mannequin commented Apr 29, 2014

    LGTM!

    @vstinner
    Copy link
    Member

    vstinner commented Apr 30, 2014

    @Stefan: Can you please review calloc-6.patch? Charles-François wrote that the patch looks good, but for such critical operation (memory allocation), I would prefer a second review ;)

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Apr 30, 2014

    Victor, sure, maybe not right away. If you prefer to commit very soon,
    I promise to do a post commit review.

    @vstinner
    Copy link
    Member

    vstinner commented Apr 30, 2014

    If you prefer to commit very soon,
    I promise to do a post commit review.

    There is no need to hurry.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented May 2, 2014

    New changeset 5b0fda8f5718 by Victor Stinner in branch 'default':
    Issue bpo-21233: Add new C functions: PyMem_RawCalloc(), PyMem_Calloc(),
    http://hg.python.org/cpython/rev/5b0fda8f5718

    @vstinner
    Copy link
    Member

    vstinner commented May 2, 2014

    There is no need to hurry.

    I changed my mind :-p It should be easier for numpy to test the development version of Python.

    Let's wait for buildbots.

    @vstinner
    Copy link
    Member

    vstinner commented May 2, 2014

    Antoine Pitrou wrote:

    The real use case I envision is with huge powers of two. If I write:
    x = 2 ** 1000000

    I created the issue bpo-21419 for this idea.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented May 2, 2014

    New changeset 62438d1b11c7 by Victor Stinner in branch 'default':
    Issue bpo-21233: Oops, Fix _PyObject_Alloc(): initialize nbytes before going to
    http://hg.python.org/cpython/rev/62438d1b11c7

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented May 3, 2014

    I did a post-commit review. A couple of things:

    1. I think Victor and I have a different view of the calloc() parameters.
          calloc(size_t nmemb, size_t size)

    If a memory region of bytes is allocated, IMO 'nbytes' should be in the
    place of 'nmemb' and '1' should be in the place of 'size'. That is,
    "allocate nbytes elements of size 1":

          calloc(nbytes, 1)

    In the commit the parameters are reversed in many places, which confuses
    me quite a bit, since it means "allocate one element of size nbytes".

          calloc(1, nbytes)
    1. I'm not happy with the refactoring in bytearray_init(). I think it would
      be safer to make focused minimal changes in PyByteArray_Resize() instead.
      In fact, there is a behavior change which isn't correct:

      Before:
      =======

            >>> x = bytearray(0)
            >>> m = memoryview(x)
            >>> x.__init__(10)
            Traceback (most recent call last):
              File "<stdin>", line 1, in <module>
            BufferError: Existing exports of data: object cannot be re-sized
     Now:
     \====
    
            >>> x = bytearray(0)
            >>> m = memoryview(x)
            >>> x.__init__(10)
            >>> x[0]
            0
            >>> m[0]
            Traceback (most recent call last):
              File "<stdin>", line 1, in <module>
            IndexError: index out of bounds
    1. Somewhat similarly, I wonder if it was necessary to refactor
      PyBytes_FromStringAndSize(). I find the new version more difficult
      to understand.

    2. _PyObject_Alloc(): assert(nelem <= PY_SSIZE_T_MAX / elsize) can be called
      with elsize = 0.

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented May 3, 2014

    I forgot one thing:

    1. If WITH_VALGRIND is defined, nbytes is uninitialized in _PyObject_Alloc().

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented May 3, 2014

    Another thing:

    1. We need some kind of prominent documentation that existing
      programs need to be changed:
    Python 3.5.0a0 (default:62438d1b11c7+, May  3 2014, 23:35:03) 
    [GCC 4.8.2] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import failmalloc
    >>> failmalloc.enable()
    >>> bytes(1)
    Segmentation fault (core dumped)

    @njsmith
    Copy link
    Contributor Author

    njsmith commented May 3, 2014

    A simple solution would be to change the name of the struct, so that non-updated libraries will get a compile error instead of a runtime crash.

    @vstinner
    Copy link
    Member

    vstinner commented May 3, 2014

    1. We need some kind of prominent documentation that existing
      programs need to be changed:

    My final commit includes an addition to What's New in Python 3.5 doc,
    including a notice in the porting section. It is not enough?

    Even if the API is public, the PyMemAllocator thing is low level. It's not
    part of the stable ABI. Except failmalloc, I don't know any user. I don't
    expect a lot of complain and it's easy to port the code.

    @vstinner
    Copy link
    Member

    vstinner commented May 3, 2014

    1. If WITH_VALGRIND is defined, nbytes is uninitialized in
      _PyObject_Alloc().

    Did you see my second commit? It's nlt already fixed?

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented May 3, 2014

    > 5) If WITH_VALGRIND is defined, nbytes is uninitialized in
    _PyObject_Alloc().

    Did you see my second commit? It's nlt already fixed?

    I don't think so, I have revision 5d076506b3f5 here.

    @vstinner
    Copy link
    Member

    vstinner commented May 3, 2014

    "allocate nbytes elements of size 1"

    PyObject_Malloc(100) asks to allocate one object of 100 bytes.

    For PyMem_Malloc() and PyMem_RawMalloc(), it's more difficult to guess, but
    IMO it's sane to bet that a single memory block of size bytes is requested.

    I consider that char data[100] is a object of 100 bytes, but you call it
    100 object of 1 byte.

    I don't think that using nelem or elsize matters in practice.

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented May 3, 2014

    STINNER Victor <report@bugs.python.org> wrote:

    PyObject_Malloc(100) asks to allocate one object of 100 bytes.

    Okay, then let's please call it:

    _PyObject_Calloc(void *ctx, size_t nobjs, size_t objsize)
    
    _PyObject_Alloc(int use_calloc, void *ctx, size_t nobjs, size_t objsize)

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented May 4, 2014

    STINNER Victor <report@bugs.python.org> wrote:

    My final commit includes an addition to What's New in Python 3.5 doc,
    including a notice in the porting section. It is not enough?

    I'm not sure: The usual case with ABI changes is that extensions may segfault
    if they are *not* recompiled [1]. In that case documenting it in What's New is
    standard procedure.

    Here the extension *is* recompiled and still segfaults.

    Even if the API is public, the PyMemAllocator thing is low level. It's not
    part of the stable ABI. Except failmalloc, I don't know any user. I don't
    expect a lot of complain and it's easy to port the code.

    Perhaps it's worth asking on python-dev. Nathaniel's suggestion isn't bad
    either (e.g. name it PyMemAllocatorEx).

    [1] I was told on python-dev that many people in fact do not recompile.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented May 6, 2014

    New changeset 358a12f4d4bc by Victor Stinner in branch 'default':
    Issue bpo-21233: Fix _PyObject_Alloc() when compiled with WITH_VALGRIND defined
    http://hg.python.org/cpython/rev/358a12f4d4bc

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Jun 2, 2014

    New changeset 6374c2d957a9 by Victor Stinner in branch 'default':
    Issue bpo-21233: Rename the C structure "PyMemAllocator" to "PyMemAllocatorEx" to
    http://hg.python.org/cpython/rev/6374c2d957a9

    @vstinner
    Copy link
    Member

    vstinner commented Jun 2, 2014

    I'm not sure: The usual case with ABI changes is that extensions may segfault if they are *not* recompiled [1].

    Ok, I renamed the structure PyMemAllocator to PyMemAllocatorEx, so the compilation fails because PyMemAllocator name is not defined. Modules compiled for Python 3.4 will crash on Python 3.5 if they are not recompiled, but I hope that you recompile your modules when you don't use the stable ABI.

    Using PyMemAllocator is now more complex because it depends on the Python version. See for example the patch for pyfailmalloc:
    https://bitbucket.org/haypo/pyfailmalloc/commits/9db92f423ac5f060d6ff499ee4bb74ebc0cf4761

    Using the C preprocessor, it's possible to limit the changes.

    @vstinner
    Copy link
    Member

    vstinner commented Jun 2, 2014

    "Okay, then let's please call it:
    _PyObject_Calloc(void *ctx, size_t nobjs, size_t objsize)
    _PyObject_Alloc(int use_calloc, void *ctx, size_t nobjs, size_t objsize)"

    "void * PyMem_RawCalloc(size_t nelem, size_t elsize);" prototype comes from the POSIX standad:
    http://pubs.opengroup.org/onlinepubs/009695399/functions/calloc.html

    I'm don't want to change the prototype in Python. Extract of Python documentation:

    .. c:function:: void* PyMem_RawCalloc(size_t nelem, size_t elsize)

    Allocates *nelem* elements each whose size in bytes is *elsize* (...)

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Jun 2, 2014

    New changeset dff6b4b61cac by Victor Stinner in branch 'default':
    Issue bpo-21233: Revert bytearray(int) optimization using calloc()
    http://hg.python.org/cpython/rev/dff6b4b61cac

    @vstinner
    Copy link
    Member

    vstinner commented Jun 2, 2014

    "2) I'm not happy with the refactoring in bytearray_init(). (...)

    1. Somewhat similarly, I wonder if it was necessary to refactor
      PyBytes_FromStringAndSize(). (...)"

    Ok, I reverted the change on bytearray(int) and opened the issue bpo-21644 to discuss these two optimizations.

    @vstinner
    Copy link
    Member

    vstinner commented Jun 2, 2014

    I reread the issue. I hope that I now addressed all issues. The remaining issue, bytearray(int) is now tracked by the new issue bpo-21644.

    @vstinner vstinner closed this as completed Jun 2, 2014
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    interpreter-core Interpreter core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants