Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API for setting the memory allocator used by Python #47579

Closed
jlaurila mannequin opened this issue Jul 9, 2008 · 48 comments
Closed

API for setting the memory allocator used by Python #47579

jlaurila mannequin opened this issue Jul 9, 2008 · 48 comments
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement

Comments

@jlaurila
Copy link
Mannequin

jlaurila mannequin commented Jul 9, 2008

BPO 3329
Nosy @warsaw, @rhettinger, @gpshead, @jcea, @amauryfa, @ncoghlan, @pitrou, @kristjanvalur, @vstinner, @jszakmeister, @tpn, @miss-islington
PRs
  • bpo-3329: Fix typo in PyObjectArenaAllocator doc #24795
  • [3.9] bpo-3329: Fix typo in PyObjectArenaAllocator doc (GH-24795) #24799
  • [3.8] bpo-3329: Fix typo in PyObjectArenaAllocator doc (GH-24795) #24800
  • Files
  • pymem.h: locally patched version of pymem.h
  • ccpmem.h
  • Capture.JPG: Profiling email
  • py_setallocators-filename.patch
  • pybench.txt
  • benchmarks.txt
  • py_setallocators-9.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2013-07-07.15:25:49.113>
    created_at = <Date 2008-07-09.19:48:51.979>
    labels = ['interpreter-core', 'type-feature']
    title = 'API for setting the memory allocator used by Python'
    updated_at = <Date 2021-03-09.11:39:48.738>
    user = 'https://bugs.python.org/jlaurila'

    bugs.python.org fields:

    activity = <Date 2021-03-09.11:39:48.738>
    actor = 'miss-islington'
    assignee = 'none'
    closed = True
    closed_date = <Date 2013-07-07.15:25:49.113>
    closer = 'vstinner'
    components = ['Interpreter Core']
    creation = <Date 2008-07-09.19:48:51.979>
    creator = 'jlaurila'
    dependencies = []
    files = ['30451', '30452', '30496', '30537', '30559', '30560', '30753']
    hgrepos = []
    issue_num = 3329
    keywords = ['patch']
    message_count = 48.0
    messages = ['69482', '69483', '69484', '69494', '69497', '69499', '69511', '78995', '79309', '91957', '142981', '183587', '183590', '183591', '183947', '183950', '183951', '190429', '190528', '190529', '190534', '190539', '190741', '190937', '190940', '190951', '190962', '191029', '191030', '191049', '191050', '191074', '191077', '191165', '191184', '191379', '191436', '191508', '192220', '192504', '192506', '192508', '192509', '192570', '192576', '388351', '388352', '388356']
    nosy_count = 18.0
    nosy_names = ['barry', 'rhettinger', 'gregory.p.smith', 'jcea', 'amaury.forgeotdarc', 'ncoghlan', 'Rhamphoryncus', 'pitrou', 'kristjan.jonsson', 'vstinner', 'jszakmeister', 'tlesher', 'jlaurila', 'trent', 'neilo', 'pjmcnerney', 'python-dev', 'miss-islington']
    pr_nums = ['24795', '24799', '24800']
    priority = 'normal'
    resolution = 'fixed'
    stage = None
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue3329'
    versions = ['Python 3.4']

    @jlaurila
    Copy link
    Mannequin Author

    jlaurila mannequin commented Jul 9, 2008

    Currently Python always uses the C library malloc/realloc/free as the
    underlying mechanism for requesting memory from the OS, but especially
    on memory-limited platforms it is often desirable to be able to override
    the allocator and to redirect all Python's allocations to use a special
    heap. This will make it possible to free memory back to the operating
    system without restarting the process, and to reduce fragmentation by
    separating Python's allocations from the rest of the program.

    The proposal is to make it possible to set the allocator used by the
    Python interpreter by calling the following function before Py_Initialize():

    void Py_SetAllocator(void* (*alloc)(size_t), void* (*realloc)(void*,
    size_t), void (*free)(void*))

    Direct function calls to malloc/realloc/free in obmalloc.c must be
    replaced with calls through the function pointers set through this
    function. By default these would of course point to the C stdlib
    malloc/realloc/free.

    @jlaurila jlaurila mannequin added interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement labels Jul 9, 2008
    @brettcannon
    Copy link
    Member

    Is registering pointers to functions really necessary, or would defining
    macros work as well? From a performance perspective I would like to
    avoid having a pointer indirection step every time malloc/realloc/free
    is called.

    I guess my question becomes, Jukka, is this more for alternative
    implementations of Python where changes to source are already expected,
    or for apps that embed Python where a change of malloc/realloc/free
    varies from app to app that dynamically loads Python?

    @Rhamphoryncus
    Copy link
    Mannequin

    Rhamphoryncus mannequin commented Jul 9, 2008

    How would this allow you to free all memory? The interpreter will still
    reference it, so you'd have to have called Py_Finalize already, and
    promise not to call Py_Initialize afterwords. This further supposes the
    process will live a long time after killing off the interpreter, but in
    that case I recommend putting python in a child process instead.

    @jlaurila
    Copy link
    Mannequin Author

    jlaurila mannequin commented Jul 10, 2008

    Brett, the ability to define the allocator dynamically at runtime could
    be a compile time option, turned on by default only on small memory
    platforms. On most platforms you can live with plain old malloc and may
    want to avoid the indirection. If no other platform is interested in
    this, we can just make it a Symbian-specific extension but I wanted to
    see if there's general interest in this.

    The application would control the lifecycle of the Python heap, and this
    seemed like the most natural way for the application to tell the
    interpreter which heap instance to use.

    Adam, the cleanup would work by freeing the entire heap used by Python
    after calling Py_Finalize. In the old PyS60 code we made Python 2.2.2
    clean itself completely by freeing the Python-specific heap and making
    sure all pointers to heap-allocated items are properly reinitialized.

    Yes, there are various static pointers that are initially set to NULL,
    initialized to point at things on the heap and not reset to NULL at
    Py_Finalize, and these are currently an obstacle to calling
    Py_Initialize again. I'm considering submitting a separate ticket about
    that since it seems like the ability to free the heap combined with the
    ability to reinitialize the static pointers could together make full
    cleanup possible.

    @ncoghlan
    Copy link
    Contributor

    Given where we are in the release cycle, I've bumped the target releases
    to 2.7/3.1. So Symbian are probably going to have to do something
    port-specific anyway in order to get 2.6/3.0 up and running.

    And in terms of hooking into this kind of thing, some simple macros that
    can be overriden in pyport.h (as Brett suggested) may be a better idea
    than baking any specific approach into the core interpreter.

    @rhettinger
    Copy link
    Contributor

    I think it is reasonable to get a macro definition change into 2.6.
    The OP's request is essential for his application (running Python
    on Nokia phones) and it would be a loss to wait two years for this.
    Also, his request for a macro will enable another important piece
    of functionality -- allowing a build to intercept and instrument all
    calls to the memory allocator.

    Barry, can you rule on whether to keep this open for consideration in
    2.6. It seems daft to postpone this discussion indefinitely. If we
    can agree to a simple, non-invasive solution while there is still yet
    another beta, then it makes sense to proceed.

    @Rhamphoryncus
    Copy link
    Mannequin

    Rhamphoryncus mannequin commented Jul 10, 2008

    Basically you just want to kick the malloc implementation into doing
    some housekeeping, freeing its caches? I'm kinda surprised you don't
    add the hook directly to your libc's malloc.

    IMO, there's no use-case for this until Py_Finalize can completely tear
    down the interpreter, which requires a lot of special work (killing(!)
    daemon threads, unloading C modules, etc), and nobody intends to do that
    at this point.

    The practical alternative, as I said, is to run python in a subprocess.
    Let the OS clean up after us.

    @neilo
    Copy link
    Mannequin

    neilo mannequin commented Jan 3, 2009

    I'll be in agreement here. I integrated Python into a game engine not
    too long ago, and had to a do a fair chunk of work to isolate Python
    into it's own heap - given that fragmentation on low memory systems can
    be a bit of a killer. Would also make future upgrades a heck of a lot
    easier too, as there'd be no need to do a search for all references and
    carefully replace them all.

    @jlaurila
    Copy link
    Mannequin Author

    jlaurila mannequin commented Jan 7, 2009

    Brett is right. Macroing the memory allocator is a better choice than
    forcing indirection on all platforms. We did this on Python for S60,
    using the macros PyCore_{MALLOC,REALLOC,FREE}_FUNC for interpreter's
    allocations, and then redirected those to a mechanism that allows to set
    the allocator at runtime.

    Sorry we don't have a clean patch at present for this change only, but
    in case anyone's interested the full source is at
    https://garage.maemo.org/frs/?group_id=854

    @pjmcnerney
    Copy link
    Mannequin

    pjmcnerney mannequin commented Aug 25, 2009

    Has the ability to set the memory allocator been added to Python 2.7/3.1?

    Thanks,
    PJ

    @warsaw warsaw removed their assignment Feb 18, 2010
    @pitrou
    Copy link
    Member

    pitrou commented Aug 25, 2011

    All this needs is a patch.
    Note that there are some places where we call malloc()/free() without going through our abstraction API. This is not in allocation-heavy paths, though.

    @vstinner
    Copy link
    Member

    vstinner commented Mar 6, 2013

    I attached a patch that I wrote for Wyplay: py_setallocators.patch. The patch adds two functions:

    PyAPI_FUNC(int) Py_GetAllocators(
        char api,
        void* (**malloc_p) (size_t),
        void* (**realloc_p) (void*, size_t),
        void (**free_p) (void*)
        );
    
    PyAPI_FUNC(int) Py_SetAllocators(
        char api,
        void* (*malloc) (size_t),
        void* (*realloc) (void*, size_t),
        void (*free) (void*)
        );

    Where api is one of these values:

    • PY_ALLOC_SYSTEM_API: the system API (malloc, realloc, free)
    • PY_ALLOC_MEM_API: the PyMem_Malloc() API
    • PY_ALLOC_OBJECT_API: the PyObject_Malloc() API

    These functions are used by the pytracemalloc project to hook PyMem_Malloc() and PyObject_Malloc() API. pytracemalloc traces all Python memory allocations to compute statistics per Python file.
    https://pypi.python.org/pypi/pytracemalloc

    Wyplay is also using Py_SetAllocators() internally to replace completly system allocators *before* Python is started. We have another private patch on Python adding a function. This function sets its own memory allocators, it is called before the start of Python thanks to an "__attribute__((constructor))" attribute.

    --

    If you use Py_SetAllocators() to replace completly a memory allocator (any memory allocation API), you have to do it before the first Python memory allocation (before Py_Main()) *or* your memory allocator must be able to recognize if a pointer was not allocated by him and pass the operation (realloc or free) to the previous memory allocator.

    For example, PyObject_Free() is able to recognize that a pointer is part of its memory pool, or fallback to the system allocator (extract of the original code):

        if (Py_ADDRESS_IN_RANGE(p, pool)) {
            ...
            return;
        }
        free(p);

    --

    If you use Py_SetAllocators() to hook memory allocators (do something before and/or after calling the previous function, *without* touching the pointer nor the size), you can do it anytime.

    --

    I didn't run a benchmark yet to measure the overhead of the patch on Python performances.

    New functions are not documented nor tested yet. If we want to test these new functions, we can write a simple hook tracing calls to the memory allocators and call the memory allocator.

    @vstinner
    Copy link
    Member

    vstinner commented Mar 6, 2013

    To be exhaustive, another patch should be developed to replace all calls for malloc/realloc/free by PyMem_Malloc/PyMem_Realloc/PyMem_Free. PyObject_Malloc() is still using mmap() or malloc() internally for example.

    Other examples of functions calling malloc/realloc/free directly: _PySequence_BytesToCharpArray(), block_new() (of pyarena.c), find_key() (of thread.c), PyInterpreterState_New(), win32_wchdir(), posix_getcwd(), Py_Main(), etc.

    @amauryfa
    Copy link
    Member

    amauryfa commented Mar 6, 2013

    Some customizable memory allocators I know have an extra parameter "void *opaque" that is passed to all functions:

    OTOH, expat, libxml, libmpdec don't have this extra parameter.

    @kristjanvalur
    Copy link
    Mannequin

    kristjanvalur mannequin commented Mar 11, 2013

    At ccp we have something similar. We are embedding python in the UnrealEngine on the PS3 and need to get everything through their allocators. For the purpose of flexibility, we added an api similar to the OPs, but more flexible:

    /* Support for custom allocators */
    typedef void *(*PyCCP_Malloc_t)(size_t size, void *arg, const char *file, int line, const char *msg);
    typedef void *(*PyCCP_Realloc_t)(void *ptr, size_t size, void *arg, const char *file, int line, const char *msg);
    typedef void (*PyCCP_Free_t)(void *ptr, void *arg, const char *file, int line, const char *msg);
    typedef size_t (*PyCCP_Msize_t)(void *ptr, void *arg);
    typedef struct PyCCP_CustomAllocator_t
    {
        PyCCP_Malloc_t  pMalloc;
        PyCCP_Realloc_t pRealloc;
        PyCCP_Free_t    pFree;
        PyCCP_Msize_t   pMsize;    /* can be NULL, or return -1 if no size info is avail. */
        void            *arg;      /* opaque argument for the functions */
    } PyCCP_CustomAllocator_t;
    
    /* To set an allocator!  use 0 for the regular allocator, 1 for the block allocator.
     * pass a null pointer to reset to internal default
     */
    PyAPI_FUNC(void) PyCCP_SetAllocator(int which, const PyCCP_CustomAllocator_t *);

    For a module to install itself as a "hook" at runtime, this approach can be extended by querying the current allocator, so that such a hook can the delegate the previous calls.

    The "block" allocator here, is intended as the underlying allocator to be used by obmalloc.c. Depending on platforms, this can then allocate aligned virtual memory directly, which is more efficient than layering that on-top of a malloc-like allocator.

    There are areas in cPython that use malloc() directly. Those are actually not needed in all cases, but to cope with them we change them all to new RAW api calls (using preprocessor macros).
    Essentially, malloc() maps to PyCCP_RawMalloc() or PyMem_MALLOC_INNER() (both local additions) based on whether the particular site using malloc() requires truly gil free malloc or not.

    For this reason, the custom allocators mentioned canot be assumed to be called with the GIL. However, it is easily possible to extend the system above so that there is a GIL and non-GIL version for the 'regular' allocator.

    I'll put details of the stuff we have done for EVE Online / Dust 514 on my blog. It is this, but much much more too.

    Hopefully we can arrive at a way to abstract memory allocation away from Python in a flexible and extendible manner :)

    @ncoghlan
    Copy link
    Contributor

    Note that I'm definitely open to including extra settings to set up custom allocators as part of Py_CoreConfig in PEP-432 (http://www.python.org/dev/peps/pep-0432/#pre-initialization-phase).

    I don't really want to continue the tradition of additional PySet_* APIs with weird conditions on when they have to be called, though (trying to prevent more of that kind of organic growth in complexity is why I wrote PEP-432 in the first place)

    @kristjanvalur
    Copy link
    Mannequin

    kristjanvalur mannequin commented Mar 11, 2013

    Absolutely. Although there is a very useful scenario where this could be consided a run-time setting:

      # turboprofiler.py
      # Load up the memory hooker which will supply us with all the info
      import _turboprofiler
      _turboprofiler.hookup()

    Perhaps people interested in memory optimizations and profiling could hook up at pycon? It is the most common regular query I get from people in my organization: How can I find out how python is using/leaking/wasting memory?

    @vstinner
    Copy link
    Member

    typedef void *(*PyCCP_Malloc_t)(size_t size, void *arg, const char *file, int line, const char *msg);

    I don't understand the purpose of the filename and line number. Python does not have such information. Is it just to have the API expected by Unreal engine?

    What is the message? How is it filled?

    --

    I'm proposing a simpler prototype:

    void* (*malloc) (size_t);

    Just because Python does not use or have less or more. I'm not against adding an arbitrary void* argument, it should not hurt, and may be required by some other applications or libraries.

    @kristjan.jonsson: Can you adapt your tool to fit the following API?

    PyAPI_FUNC(int) Py_SetAllocators(
        char api,
        void* (*malloc) (size_t size, void *data),
        void* (*realloc) (void* ptr, size_t size, void *data),
        void (*free) (void* ptr, void *data)
        );

    --

    My pytracemalloc project hooks allocation functions and then use C Python functions to get the current filename and line number. No need to modify the C code to pass __FILE__ and __LINE__.

    It can produce such summary:

    2013-02-28 23:40:18: Top 5 allocations per file
    #1: .../Lib/test/regrtest.py: 3998 KB
    #2: .../Lib/unittest/case.py: 2343 KB
    #3: .../ctypes/test/init.py: 513 KB
    #4: .../Lib/encodings/init.py: 525 KB
    #5: .../Lib/compiler/transformer.py: 438 KB
    other: 32119 KB
    Total allocated size: 39939 KB

    You can also configure it to display also the line number.

    https://pypi.python.org/pypi/pytracemalloc

    @kristjanvalur
    Copy link
    Mannequin

    kristjanvalur mannequin commented Jun 3, 2013

    Hi.
    the file and line arguments are for expanding from macros such as PyMem_MALLOC. I had them added because they provide the features of a comprehensive debugging API.

    Of course, I'm not showing you the entire set of modifications that we have made to the memory allocation scheme. They including more extensive versions of the memory allocation tools, in order to more easily monitor memory allocations from within C.

    For your information, I'm uploading pymemory.h from our 2.7 patch. The extent of our modifications can be gleaned from there.

    Basically, we have layered the macros into outer and inner versions, in order to better support internal diagnostics.

    I'm happy with the api you provide, with a small addition:
    PyAPI_FUNC(int) Py_SetAllocators(
        char api,
        void* (*malloc) (size_t size, void *data),
        void* (*realloc) (void* ptr, size_t size, void *data),
        void (*free) (void* ptr, void *data),
        void *data
        );

    The 'data' pointer is pointless unless you can provide it as part of the api. This sort of extra indirection is necessary for C callbacks to provide instance specific context to statically compiled and linked callback functions.

    @kristjanvalur
    Copy link
    Mannequin

    kristjanvalur mannequin commented Jun 3, 2013

    Also, our ccpmem.h, the interface to the ccpmem.cpp, internal flexible memory allocator framework.
    Again, just FYI. There are no trade secrets here, so please ask me for more details, if interested. One particular trick we have been using, which might be of interest, is to be able to tag each allocation with a "context" id. This is then set according to a global sys.memcontext variable, which the program will modify according to what it is doing. This can then be used to track memory usage by different parts of the program.

    @vstinner
    Copy link
    Member

    vstinner commented Jun 3, 2013

    """
    I'm happy with the api you provide, with a small addition:
    PyAPI_FUNC(int) Py_SetAllocators(
        char api,
        void* (*malloc) (size_t size, void *data),
        void* (*realloc) (void* ptr, size_t size, void *data),
        void (*free) (void* ptr, void *data),
        void *data
        );
    """

    Oops, I forgot "void *data". Yeah, each group of allocator functions (malloc, free and realloc) will get its own "data" pointer.

    @vstinner
    Copy link
    Member

    vstinner commented Jun 3, 2013

    New patch (version 2), more complete:

    • add "void *data" argument to all allocator functions

    • add "block" API used for pymalloc allocator to allocate arenas. Use mmap or malloc, but may use VirtualAlloc in a near future (see bpo-13483). Callbacks prototype:

      • void block_malloc (size_t, void*);
      • void block_free (void*, size_t, void*);
    • remove PY_ALLOC_SYSTEM_API

    Main API:
    ---

    #define PY_ALLOC_MEM_API 'm'      /* PyMem_Malloc() API */
    #define PY_ALLOC_OBJECT_API 'o'   /* PyObject_Malloc() API */
    
    PyAPI_FUNC(int) Py_GetAllocators(
        char api,
        void* (**malloc_p) (size_t size, void *user_data),
        void* (**realloc_p) (void *ptr, size_t size, void *user_data),
        void (**free_p) (void *ptr, void *user_data),
        void **user_data_p
        );
    
    PyAPI_FUNC(int) Py_SetAllocators(
        char api,
        void* (*malloc) (size_t size, void *user_data),
        void* (*realloc) (void *ptr, size_t size, void *user_data),
        void (*free) (void *ptr, void *user_data),
        void *user_data
        );
    
    PyAPI_FUNC(void) Py_GetBlockAllocators(
        void* (**malloc_p) (size_t size, void *user_data),
        void (**free_p) (void *ptr, size_t size, void *user_data),
        void **user_data_p
        );
    
    PyAPI_FUNC(int) Py_SetBlockAllocators(
        void* (*malloc) (size_t size, void *user_data),
        void (*free) (void *ptr, size_t size, void *user_data),
        void *user_data
        );

    I see the following use cases using allocators:

    • Don't use malloc nor mmap but your own allocator: replace PyMem and PyObject allocators
    • Track memory leaks (my pytracemalloc project, or Antoine's simple _Py_AllocatedBlocks counter): hook PyMem and PyObject allocators
    • Fill newly allocated memory with a pattern and check for buffer underflow and overflow: hook PyMem and PyObject allocators

    "Hook" means adding extra code before and/or after calling the original function.

    The final API should allow to hook the APIS multiple times and replacing allocators. So it should be possible to track memory leaks, detect buffer overflow and our your own allocators. It is not yet possible with the patch 2, because _PyMem_DebugMalloc() calls directly malloc().

    _PyMem_DebugMalloc is no more used by PyObject_Malloc. This code should be rewritten to use the hook approach instead of replacing memory allocators.

    Example tracing PyMem calls using the hook approach:
    -----------------------------------

    typedef struct {
        void* (*malloc) (size_t, void*);
        void* (*realloc) (void*, size_t, void*);
        void (*free) (void*, void*);
        void *data;
    } allocators_t;

    allocators_t pymem, pyobject;

    void* trace_malloc (size_t size, void* data)
    {
        allocators_t *alloc = (allocators_t *)data;
        printf("malloc(%z)\n", size);
        return alloc.malloc(size, alloc.data);
    }
    
    void* trace_realloc (void* ptr, size_t size, void* data)
    {
        allocators_t *alloc = (allocators_t *)data;
        printf("realloc(%p, %z)\n", ptr, size);
        return alloc.realloc(ptr, size, alloc.data);
    }
    
    void trace_free (void* ptr, void* data)
    {
        allocators_t *alloc = (allocators_t *)data;
        printf("free(%p)\n", ptr);
        alloc.free(ptr, alloc.data);
    }
    
    void hook_pymem(void)
    {
       Py_GetAllocators(PY_ALLOC_MEM_API, &pymem.malloc, &pymem.realloc, &pymem.free, &pymem.data);
       Py_SetAllocators(PY_ALLOC_MEM_API, trace_malloc, trace_realloc, trace_free, &pymem);
    
       Py_GetAllocators(PY_ALLOC_OBJECT_API, &pyobject.malloc, &pyobject.realloc, &pyobject.free, &pyobject.data);
       Py_SetAllocators(PY_ALLOC_OBJECT_API, trace_malloc, trace_realloc, trace_free, &pyobject);
    }

    I didn't try the example :-p It is just to give you an idea of the API and how to use it.

    @kristjanvalur
    Copy link
    Mannequin

    kristjanvalur mannequin commented Jun 7, 2013

    I'd like to add some argument to providing a "file" and "line number" to the allocation api. I know that currently this is not provided e.g. by the PyMem_Allocate() functions, but I think it would be wise to provide a "debug" version of these functions that pass in the call sites. An allocator api that then also allows for these values to be provided to the malloc/realloc/free routines is then future-proof in that respect.

    Case in point: We have a memory profiler running which uses a allocator hook system similar to what Victor is proposing. But in addition, it provides a "file " and "line" argument to every function.

    Now, the profiler is currently not using this code. Here how the "malloc" function looks:

    static void *
    PyMalloc(size_t size, void *arg, const char *file, int line, const char *msg)
    {
        void *r = DustMalloc(size);
        if (r) {
            tmAllocEx(g_telemetryContext, file, line, r, size, "Python alloc: %s", msg);
    		ReportAllocInfo(AllocEvent, 0, r, size);
        }
        return r;
    }

    tmAllocEx is calling the Telemetry memory profiles and passing in the allocation site. (http://www.radgametools.com/telemetry.htm, also my blog about using it: http://cosmicpercolator.com/2012/05/25/optimizing-python-condition-variables-with-telemetry/

    But our profiler, called with ReportAllocInfo, isn't using this. It relies solely on extracting the python callstack.

    Today, I got this email (see attached file Capture.jpg)

    Basically, the profiler sees a lot of allocated memory with no python call stack. Now it would be useful if we had the C call site information, to know where it came from.

    So: My suggestion is that the allocator api be

    1. a struct, which allows for a cleaner api function
    2. Include C filename and line number.

    Even though the current python memory API (e.g. PyMem_Malloc(), PyObject_Malloc()) do not currently support it, this would allow us to internally have _extended_ versions of these apis that do support it and macros that pass in that information. This can be added at a later stage. Having it in the allcoator api function would make it more future proof.

    See also my "pymem.h" and "ccpmem.h" files attached to this defect for examples on how we have tweaked python's internal memory apis to support this information.

    @vstinner
    Copy link
    Member

    py_setallocators-filename.patch: Here is a try to define an API providing the filename and line number of the C code. The Py_SetAllocators() API is unchanged:

    PyAPI_FUNC(int) Py_SetAllocators(
        char api,
        void* (*malloc) (size_t size, void *user_data),
        void* (*realloc) (void *ptr, size_t size, void *user_data),
        void (*free) (void *ptr, void *user_data),
        void *user_data
        );

    If Python is compiled with -DPYMEM_TRACE_MALLOC, user_data is not the last parameter passed to Py_SetAllocators() but a pointer to a _PyMem_Trace structure:

    typedef struct {
        void *data;
        /* NULL and -1 when unknown */
        const char *filename;
        int lineno;
    } _PyMem_Trace;

    The problem is that the module using Py_SetAllocators() must be compiled differently depending on PYMEM_TRACE_MALLOC. Example from pytracemalloc, modified for this patch:
    ---

        _PyMem_Trace *ctrace;
        trace_api_t *api;
        void *call_data;
        void *ptr;
    #ifdef PYMEM_TRACE_MALLOC
        ctrace = (_PyMem_Trace *)data;
        api = (trace_api_t *)ctrace->data;
        ctrace->data = api->data;
        call_data = data;
    #else
        ctrace = NULL;
        api = (trace_api_t *)data;
        call_data = api->data;
    #endif
        ptr = api->malloc(size, call_data);
        ...

    I didn't like the "ctrace->data = api->data;" instruction: pytracemalloc modifies the input _PyMem_Trace structure.

    pytracemalloc code is a little bit more complex, but "it works". pytracemalloc can reuse the filename and line number of the C module, or of the Python module. It can be configured at runtime. Example of output for the C module:
    ---
    2013-06-11 00:36:30: Top 15 allocations per file and line (compared to 2013-06-11 00:36:25)
    #1: Objects/dictobject.c:352: size=6 MiB (+4324 KiB), count=9818 (+7773), average=663 B
    #2: Objects/unicodeobject.c:1085: size=6 MiB (+2987 KiB), count=61788 (+26197), average=111 B
    #3: Objects/tupleobject.c:104: size=4054 KiB (+2176 KiB), count=44569 (+24316), average=93 B
    #4: Objects/typeobject.c:770: size=2440 KiB (+1626 KiB), count=13906 (+10360), average=179 B
    #5: Objects/bytesobject.c:107: size=2395 KiB (+1114 KiB), count=24846 (+11462), average=98 B
    #6: Objects/funcobject.c:12: size=1709 KiB (+1103 KiB), count=11516 (+7431), average=152 B
    #7: Objects/codeobject.c:117: size=1760 KiB (+871 KiB), count=11267 (+5578), average=160 B
    #8: Objects/dictobject.c:399: size=784 KiB (+627 KiB), count=10040 (+8028), average=80 B
    #9: Objects/listobject.c:159: size=420 KiB (+382 KiB), count=5386 (+4891), average=80 B
    #10: Objects/frameobject.c:649: size=1705 KiB (+257 KiB), count=3374 (+505), average=517 B
    #11: ???:?: size=388 KiB (+161 KiB), count=588 (+240), average=676 B
    #12: Objects/weakrefobject.c:36: size=241 KiB (+138 KiB), count=2579 (+1482), average=96 B
    #13: Objects/dictobject.c:420: size=135 KiB (+112 KiB), count=2031 (+1736), average=68 B
    #14: Objects/classobject.c:59: size=109 KiB (+105 KiB), count=1400 (+1345), average=80 B
    #15: Objects/unicodeobject.c:727: size=188 KiB (+86 KiB), count=1237 (+687), average=156 B
    37 more: size=828 KiB (+315 KiB), count=8421 (+5281), average=100 B
    Total Python memory: size=29 MiB (+16 MiB), count=212766 (+117312), average=145 B
    Total process memory: size=68 MiB (+22 MiB) (ignore tracemalloc: 0 B)
    ---

    I also had to modify the following GC functions to get more accurate information:

    • _PyObject_GC_Malloc(size)
    • _PyObject_GC_New(tp)
    • _PyObject_GC_NewVar(tp, nitems)
    • PyObject_GC_Del(op)

    For example, PyTuple_New() calls PyObject_GC_NewVar() to allocate its memory. With my patch, you get "Objects/tupleobject.c:104" instead of a generic "Modules/gcmodule.c:1717".

    @vstinner
    Copy link
    Member

    New version of the patch, py_setallocators-3.patch:

    • _PyMem_DebugMalloc(), _PyMem_DebugFree() and _PyMem_DebugRealloc() are now setup as hooks to the system allocator and are hook on PyMem API *and* on PyObject API
    • move "if (size > PY_SSIZE_T_MAX)" check into PyObject_Malloc() and PyObject_Realloc()

    This patch does not propose a simple API to reuse internal debug hooks when replacing system (PyMem) allocators.

    @amauryfa
    Copy link
    Member

    I prefer the new version without PYMEM_TRACE_MALLOC :-)

    Can we rename "API" and "api_id" to something more specific? maybe DOMAIN and domain_id?

    @vstinner
    Copy link
    Member

    Amaury Forgeot d'Arc added the comment:

    I prefer the new version without PYMEM_TRACE_MALLOC :-)

    Well, py_setallocators-filename.patch is more a proof-of-concept
    showing how to use my Py_SetAllocators() API to pass the C trace
    (filename/line number), than a real proposition. The patch is very
    intrusive and huge, I also prefer py_setallocators-3.patch :-)

    Can we rename "API" and "api_id" to something more specific? maybe DOMAIN and domain_id?

    Something like:
    {PY_ALLOC_MEM_DOMAIN, PY_ALLOC_OBJECT_DOMAIN}.
    or
    {PYMEM_DOMAIN, PYOBJECT_DOMAIN}
    ?

    There are only two values, another option is to duplicate functions:

    • PyMem_GetAllocators(), PyMem_SetAllocators(), PyMem_Malloc(), ..
    • PyObject_GetAllocators(), PyObject_SetAllocators(), PyObject_Malloc(), ..

    I prefer PyMem_SetAllocators() over PYOBJECT_DOMAIN.

    @vstinner
    Copy link
    Member

    Benchmark of py_setallocators-3.patch:

    • benchmarks suite (-b 2n3): some tests are 1.04x faster, some tests are 1.04 slower, significant is between 115 and -191. I don't understand these output, but I guess that the overhead cannot be seen with such test.
    • pybench: "+0.1%" (diff between -4.9% and +5.6%)

    If I understood correctly, the overhead is really really low (near zero).

    @vstinner
    Copy link
    Member

    If I understood correctly, the overhead is really really low (near zero).

    See attached output pybench.txt and benchmarks.txt.

    @vstinner
    Copy link
    Member

    New version (4) of the patch:

    • move the opaque pointer (now called "void *ctx", "context") as the first parameter instead of the last parameter, as done in zlib, lzma and Oracle's OCI APIs; ctx is also the first parameter of Py*_GetFunctions() and Py*_SetFunctions() instead of the last

    • rename public functions:

      • Py_GetAllocators() -> PyMem_GetAllocators(), PyObject_GetAllocators()
      • Py_SetAllocators() -> PyMem_SetAllocators(), PyObject_SetAllocators()
      • Py_GetBlockAllocators() -> PyObject_GetArenaAllocators()
      • Py_SetBlockAllocators() -> PyObject_SetArenaAllocators()
    • move declaration of PyObject_*() functions from pymem.h to objimpl.h

    • split _PyMem big structure into smaller structures: _PyMem, _PyObject, _PyObject_Arena

    • move "if (size == 0) size = 1;" from PyMem_Malloc() to _PyMem_Malloc(), so the custom allocator can decide how to implement PyMem_Malloc(0) (maybe something more efficient)

    Does the new API look better? py_setallocators-4.patch is ready for a final review. If nobody complains, I'm going to commit it.

    @vstinner
    Copy link
    Member

    py_setallocators-4.patch:

    • Oh, I forgot another change: Py*_Get/SetAllocators() cannot fail anymore (because of an unknown API identifier), so the return type is now void

    I just saw that I forgot ".. versionadded:: 3.4" in the doc.

    @vstinner
    Copy link
    Member

    This patch does not propose a simple API to reuse internal
    debug hooks when replacing system (PyMem) allocators.

    Ok, this is now fixed with new patch (version 5). Nick does not want a new environment variable, so I added instead a new function PyMem_SetupDebugHooks() which reinstalls hooks to detect bugs if allocator functions were replaced with PyMem_SetAllocators() or PyObject_SetAllocators(). The function does nothing is Python is not compiled in debug more or if hooks are already installed (so the function can be called twice).

    I also added unit tests for PyMem_SetAllocators() and PyObject_SetAllocators()! And I added "versionadded:: 3.4" to the C API documentation.

    @vstinner
    Copy link
    Member

    To be exhaustive, another patch should be developed to replace
    all calls for malloc/realloc/free by
    PyMem_Malloc/PyMem_Realloc/PyMem_Free.

    I created issue bpo-18203 for this point.

    PyObject_Malloc() is still using mmap() or malloc() internally
    for example.

    Arena allocator can be replaced or hooked with PyObject_SetArenaAllocators() of my lastest patch.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Jun 14, 2013

    New changeset 6661a8154eb3 by Victor Stinner in branch 'default':
    Issue bpo-3329: Add new APIs to customize memory allocators
    http://hg.python.org/cpython/rev/6661a8154eb3

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Jun 15, 2013

    New changeset b1455dd08000 by Victor Stinner in branch 'default':
    Revert changeset 6661a8154eb3: Issue bpo-3329: Add new APIs to customize memory allocators
    http://hg.python.org/cpython/rev/b1455dd08000

    @vstinner
    Copy link
    Member

    Convert changeset 6661a8154eb3 into a patch: py_setallocators-6.patch.

    @vstinner
    Copy link
    Member

    Update the patch to follow the API described in the PEP-445 (2013-06-18 22:33:41 +0200).

    @vstinner
    Copy link
    Member

    Update patch according to the last version of the PEP.

    @vstinner
    Copy link
    Member

    vstinner commented Jul 2, 2013

    Updated patch (version 9):

    • update API to the last version of the PEP
    • PYMEM_DOMAIN_RAW now also have a well defined behaviour when requesting an allocation of zero bytes: PyMem_RawMalloc(0) now calls malloc(1)
    • enhance the documentation (ex: mention default allocators)
    • _testcapi checks also that PyMem_RawMalloc(0) is non-NULL

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Jul 7, 2013

    New changeset ca78c974e938 by Victor Stinner in branch 'default':
    Issue bpo-3329: Implement the PEP-445
    http://hg.python.org/cpython/rev/ca78c974e938

    @vstinner
    Copy link
    Member

    vstinner commented Jul 7, 2013

    Ok, let see if buildbots like the PEP-445 (keep this issue open until we have the result of all 3.4 buildbots).

    I created the issue bpo-18392 to document PyObject_Malloc().

    @vstinner
    Copy link
    Member

    vstinner commented Jul 7, 2013

    It looks like the changeset ca78c974e938 broke the "x86 XP-4 3.x" buildbot:
    buildbot.python.org/all/builders/x86 XP-4 3.x/builds/8795/

    Traceback (most recent call last):
      File "../lib/test/regrtest.py", line 1305, in runtest_inner
        test_runner()
      File "D:\cygwin\home\db3l\buildarea\3.x.bolen-windows\build\lib\test\test_tools.py", line 459, in test_main
        support.run_unittest(*[obj for obj in globals().values()
      File "D:\cygwin\home\db3l\buildarea\3.x.bolen-windows\build\lib\test\support.py", line 1600, in run_unittest
        _run_suite(suite)
      File "D:\cygwin\home\db3l\buildarea\3.x.bolen-windows\build\lib\test\support.py", line 1566, in _run_suite
        result = runner.run(suite)
      File "D:\cygwin\home\db3l\buildarea\3.x.bolen-windows\build\lib\unittest\runner.py", line 175, in run
        result.printErrors()
      File "D:\cygwin\home\db3l\buildarea\3.x.bolen-windows\build\lib\unittest\runner.py", line 109, in printErrors
        self.printErrorList('ERROR', self.errors)
      File "D:\cygwin\home\db3l\buildarea\3.x.bolen-windows\build\lib\unittest\runner.py", line 117, in printErrorList
        self.stream.writeln("%s" % err)
      File "D:\cygwin\home\db3l\buildarea\3.x.bolen-windows\build\lib\unittest\runner.py", line 25, in writeln
        self.write(arg)
    MemoryError
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "../lib/test/regrtest.py", line 1615, in <module>
        main_in_temp_cwd()
      File "../lib/test/regrtest.py", line 1590, in main_in_temp_cwd
        main()
      File "../lib/test/regrtest.py", line 796, in main
        match_tests=match_tests)
      File "../lib/test/regrtest.py", line 998, in runtest
        debug, display_failure=False)
      File "../lib/test/regrtest.py", line 1330, in runtest_inner
        msg = traceback.format_exc()
      File "D:\cygwin\home\db3l\buildarea\3.x.bolen-windows\build\lib\traceback.py", line 254, in format_exc
        return "".join(format_exception(*sys.exc_info(), limit=limit, chain=chain))
      File "D:\cygwin\home\db3l\buildarea\3.x.bolen-windows\build\lib\traceback.py", line 180, in format_exception
        return list(_format_exception_iter(etype, value, tb, limit, chain))
      File "D:\cygwin\home\db3l\buildarea\3.x.bolen-windows\build\lib\traceback.py", line 152, in _format_exception_iter
        yield from _format_list_iter(_extract_tb_iter(tb, limit=limit))
      File "D:\cygwin\home\db3l\buildarea\3.x.bolen-windows\build\lib\traceback.py", line 17, in _format_list_iter
        for filename, lineno, name, line in extracted_list:
      File "D:\cygwin\home\db3l\buildarea\3.x.bolen-windows\build\lib\traceback.py", line 64, in _extract_tb_or_stack_iter
        line = linecache.getline(filename, lineno, f.f_globals)
      File "D:\cygwin\home\db3l\buildarea\3.x.bolen-windows\build\lib\linecache.py", line 15, in getline
        lines = getlines(filename, module_globals)
      File "D:\cygwin\home\db3l\buildarea\3.x.bolen-windows\build\lib\linecache.py", line 41, in getlines
        return updatecache(filename, module_globals)
      File "D:\cygwin\home\db3l\buildarea\3.x.bolen-windows\build\lib\linecache.py", line 127, in updatecache
        lines = fp.readlines()
      File "D:\cygwin\home\db3l\buildarea\3.x.bolen-windows\build\lib\codecs.py", line 301, in decode
        (result, consumed) = self._buffer_decode(data, self.errors, final)
    MemoryError

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Jul 7, 2013

    New changeset 51ed51d10e60 by Victor Stinner in branch 'default':
    Issue bpo-3329: Fix _PyObject_ArenaVirtualFree()
    http://hg.python.org/cpython/rev/51ed51d10e60

    @vstinner
    Copy link
    Member

    vstinner commented Jul 7, 2013

    Buildbots are happy, changeset 51ed51d10e60 fixed the memory leak on Windows XP. Let's close this issue, 5 years after its creation!

    @vstinner vstinner closed this as completed Jul 7, 2013
    @kristjanvalur
    Copy link
    Mannequin

    kristjanvalur mannequin commented Jul 7, 2013

    Well done.

    @vstinner
    Copy link
    Member

    vstinner commented Mar 9, 2021

    New changeset 0d6bd1c by Victor Stinner in branch 'master':
    bpo-3329: Fix typo in PyObjectArenaAllocator doc (GH-24795)
    0d6bd1c

    @miss-islington
    Copy link
    Contributor

    New changeset 5ca02c4 by Miss Islington (bot) in branch '3.8':
    bpo-3329: Fix typo in PyObjectArenaAllocator doc (GH-24795)
    5ca02c4

    @miss-islington
    Copy link
    Contributor

    New changeset ea46c7b by Miss Islington (bot) in branch '3.9':
    bpo-3329: Fix typo in PyObjectArenaAllocator doc (GH-24795)
    ea46c7b

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    8 participants