Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Debug memory allocators: remove useless "serialno" field to reduce memory footprint #80792

Closed
vstinner opened this issue Apr 12, 2019 · 10 comments
Labels
3.8 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs)

Comments

@vstinner
Copy link
Member

BPO 36611
Nosy @tim-one, @vstinner, @methane, @miss-islington
PRs
  • bpo-36611: Remove serialno from debug memory allocators #12795
  • bpo-36611: Disable serialno field of debug memory allocators #12796
  • bpo-36611: Fix test_sys.test_getallocatedblocks() #12797
  • [3.7] bpo-36611: Fix test_sys.test_getallocatedblocks() (GH-12797) #12801
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2019-04-12.20:01:03.894>
    created_at = <Date 2019-04-12.10:16:34.859>
    labels = ['interpreter-core', '3.8']
    title = 'Debug memory allocators: remove useless "serialno" field to reduce memory footprint'
    updated_at = <Date 2019-04-12.20:01:03.893>
    user = 'https://github.com/vstinner'

    bugs.python.org fields:

    activity = <Date 2019-04-12.20:01:03.893>
    actor = 'vstinner'
    assignee = 'none'
    closed = True
    closed_date = <Date 2019-04-12.20:01:03.894>
    closer = 'vstinner'
    components = ['Interpreter Core']
    creation = <Date 2019-04-12.10:16:34.859>
    creator = 'vstinner'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 36611
    keywords = ['patch']
    message_count = 10.0
    messages = ['340019', '340022', '340023', '340025', '340026', '340027', '340037', '340041', '340103', '340105']
    nosy_count = 4.0
    nosy_names = ['tim.peters', 'vstinner', 'methane', 'miss-islington']
    pr_nums = ['12795', '12796', '12797', '12801']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue36611'
    versions = ['Python 3.8']

    @vstinner
    Copy link
    Member Author

    When PYTHONMALLOC=debug environment variable or -X dev command line option is used, Python installs debug hooks on memory allocators which add 2 size_t before and 2 size_t after each memory block: it adds 32 bytes to every memory allocation.

    I'm debugging crashes and memory leaks in CPython for 10 years, and I simply never had to use "serialno". So I simply propose attached pull request to remove it to reduce the memory footprint: I measured a reduction around -5% (ex: 1.2 MiB on 33.0 MiB when running test_asyncio). A smaller memory footprint allows to use this feature on devices with small memory, like embedded devices.

    The change also fix race condition in debug memory allocators: bpo-31473, "Debug hooks on memory allocators are not thread safe (serialno variable)".

    Using tracemalloc, it is already possible (since Python 3.6) to find where a memory block has been allocated, and so decide where to put a breakpoint when debugging.

    If someone cares about the "serialno" field, maybe we can keep code using a compilation flag, like a C #define.

    "serialno" is documented as: "an excellent way to set a breakpoint on the next run, to capture the instant at which this block was passed out." But again, I never used it...

    --

    Some examples of the *peak* memory usage without => with the change:

    • -c pass: 2321.8 kB => 2437.1 kB (-115.3 kiB, -5%)
    • -m test test_os test_sys: 14252.3 kB => 13598.6 kB (-653.7 kiB, -5%)
    • -m test test_asyncio: 34194.2 kB => 32963.1 kB (-1231.1 kiB, -4%)

    Command used to measure the memory consumption:

    $ ./python -i -X tracemalloc -c pass
    >>> import tracemalloc; print("%.1f kB" % (tracemalloc.get_traced_memory()[1] / 1024.))

    With the patch:

    diff --git a/Modules/_tracemalloc.c b/Modules/_tracemalloc.c
    index c5d5671032..e010c2ef84 100644
    --- a/Modules/_tracemalloc.c
    +++ b/Modules/_tracemalloc.c
    @@ -582,6 +582,8 @@ tracemalloc_add_trace(unsigned int domain, uintptr_t ptr,
         _Py_hashtable_entry_t* entry;
         int res;
     
    +    size += 4 * sizeof(size_t);
    +
         assert(_Py_tracemalloc_config.tracing);
     
         traceback = traceback_new();

    Replace 4 with 3 to measure memory used with the change.

    --

    Since Python 3.6, when the debug memory allocator detects a bug (ex: buffer overflow), it now also displays the Python traceback where the memory block has been allocated if tracemalloc is tracing Python memory allocations.

    Example with buffer_overflow.py:
    ---

    import _testcapi
    
    def func():
        _testcapi.pymem_buffer_overflow()
    
    def main():
        func()
    
    if __name__ == "__main__":
        main()

    Output:
    ---

    $ ./python -X tracemalloc=10 -X dev bug.py

    Debug memory block at address p=0x7f45e85c3270: API 'm'
    16 bytes originally requested
    The 7 pad bytes at p-7 are FORBIDDENBYTE, as expected.
    The 8 pad bytes at tail=0x7f45e85c3280 are not all FORBIDDENBYTE (0xfd):
    at tail+0: 0x78 *** OUCH
    at tail+1: 0xfd
    at tail+2: 0xfd
    at tail+3: 0xfd
    at tail+4: 0xfd
    at tail+5: 0xfd
    at tail+6: 0xfd
    at tail+7: 0xfd
    Data at p: cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd

    Memory block allocated at (most recent call first):
    File "bug.py", line 4
    File "bug.py", line 7
    File "bug.py", line 10

    Fatal Python error: bad trailing pad byte

    Current thread 0x00007f45f5660740 (most recent call first):
    File "bug.py", line 4 in func
    File "bug.py", line 7 in main
    File "bug.py", line 10 in <module>
    Aborted (core dumped)
    ---

    The interesting part is "Memory block allocated at (most recent call first):". Traceback reconstructed manually:
    ---
    Memory block allocated at (most recent call first):
    File "bug.py", line 4
    _testcapi.pymem_buffer_overflow()
    File "bug.py", line 7
    func()
    File "bug.py", line 10
    main()
    ---

    You can see exactly where the memory block has been allocated.

    Note: Internally, the _PyTraceMalloc_GetTraceback() function is used to get the traceback where a memory block has been allocated.

    --

    Extract of _PyMem_DebugRawAlloc() in Objects/obmalloc.c:

    /* Let S = sizeof(size_t). The debug malloc asks for 4*S extra bytes and
    fills them with useful stuff, here calling the underlying malloc's result p:

    p[0: S]
    Number of bytes originally asked for. This is a size_t, big-endian (easier
    to read in a memory dump).
    p[S]
    API ID. See PEP-445. This is a character, but seems undocumented.
    p[S+1: 2*S]
    Copies of FORBIDDENBYTE. Used to catch under- writes and reads.
    p[2*S: 2*S+n]
    The requested memory, filled with copies of CLEANBYTE.
    Used to catch reference to uninitialized memory.
    &p[2*S] is returned. Note that this is 8-byte aligned if pymalloc
    handled the request itself.
    p[2*S+n: 2*S+n+S]
    Copies of FORBIDDENBYTE. Used to catch over- writes and reads.
    p[2*S+n+S: 2*S+n+2*S]
    A serial number, incremented by 1 on each call to _PyMem_DebugMalloc
    and _PyMem_DebugRealloc.
    This is a big-endian size_t.
    If "bad memory" is detected later, the serial number gives an
    excellent way to set a breakpoint on the next run, to capture the
    instant at which this block was passed out.
    */

    /* Layout: [SSSS IFFF CCCC...CCCC FFFF NNNN]

    •      ^--- p    ^--- data   ^--- tail
      
      S: nbytes stored as size_t
      I: API identifier (1 byte)
      F: Forbidden bytes (size_t - 1 bytes before, size_t bytes after)
      C: Clean bytes used later to store actual data
      N: Serial number stored as size_t */

    The last size_t written at the end of each memory block is "serialno". It is documented as: "an excellent way to set a breakpoint on the next run, to capture the instant at which this block was passed out."

    @vstinner vstinner added 3.8 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) labels Apr 12, 2019
    @vstinner
    Copy link
    Member Author

    I wrote 2 pull requests for the two options:

    • PR 12795 removes serialno field
    • PR 12796 adds PYMEM_DEBUG_SERIALNO define which is unset by default

    @vstinner
    Copy link
    Member Author

    While testing my changes, I found a bug in test_sys:

    ./python -X tracemalloc -m test test_sys -v -m test_getallocatedblocks

    ======================================================================
    ERROR: test_getallocatedblocks (test.test_sys.SysModuleTest)
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "/home/vstinner/prog/python/master/Lib/test/test_sys.py", line 770, in test_getallocatedblocks
        alloc_name = _testcapi.pymem_getallocatorsname()
    RuntimeError: cannot get allocators name

    Attached PR 12797 fix it.

    @methane
    Copy link
    Member

    methane commented Apr 12, 2019

    I never used the serialno too.

    @vstinner
    Copy link
    Member Author

    This issue is related to the following thread on python-dev which discuss disabling Py_TRACE_REFS by default, bpo-36465, to reduce the memory footprint in debug mode:
    https://mail.python.org/pipermail/python-dev/2019-April/157015.html

    @vstinner
    Copy link
    Member Author

    The serialno has been added at the same time than the whole debug hooks on Python memory allocated by Tim Peters in 2002, 17 years ago:

    commit ddea208
    Author: Tim Peters <tim.peters@gmail.com>
    Date: Sat Mar 23 10:03:50 2002 +0000

    Give Python a debug-mode pymalloc, much as sketched on Python-Dev.
    
    When WITH_PYMALLOC is defined, define PYMALLOC_DEBUG to enable the debug
    allocator.  This can be done independent of build type (release or debug).
    A debug build automatically defines PYMALLOC_DEBUG when pymalloc is
    enabled.  It's a detected error to define PYMALLOC_DEBUG when pymalloc
    isn't enabled.
    
    Two debugging entry points defined only under PYMALLOC_DEBUG:
    
    + _PyMalloc_DebugCheckAddress(const void *p) can be used (e.g., from gdb)
      to sanity-check a memory block obtained from pymalloc.  It sprays
      info to stderr (see next) and dies via Py_FatalError if the block is
      detectably damaged.
    
    + _PyMalloc_DebugDumpAddress(const void *p) can be used to spray info
      about a debug memory block to stderr.
    
    A tiny start at implementing "API family" checks isn't good for
    anything yet.
    
    _PyMalloc_DebugRealloc() has been optimized to do little when the new
    size is <= old size.  However, if the new size is larger, it really
    can't call the underlying realloc() routine without either violating its
    contract, or knowing something non-trivial about how the underlying
    realloc() works.  A memcpy is always done in this case.
    
    This was a disaster for (and only) one of the std tests:  test_bufio
    creates single text file lines up to a million characters long.  On
    Windows, fileobject.c's get_line() uses the horridly funky
    getline_via_fgets(), which keeps growing and growing a string object
    hoping to find a newline.  It grew the string object 1000 bytes each
    time, so for a million-character string it took approximately forever
    (I gave up after a few minutes).
    
    So, also:
    
    fileobject.c, getline_via_fgets():  When a single line is outrageously
    long, grow the string object at a mildly exponential rate, instead of
    just 1000 bytes at a time.
    
    That's enough so that a debug-build test_bufio finishes in about 5 seconds
    on my Win98SE box.  I'm curious to try this on Win2K, because it has very
    different memory behavior than Win9X, and test_bufio always took a factor
    of 10 longer to complete on Win2K.  It *could* be that the endless
    reallocs were simply killing it on Win2K even in the release build.
    

    @vstinner
    Copy link
    Member Author

    New changeset 9b8314c by Victor Stinner in branch 'master':
    bpo-36611: Fix test_sys.test_getallocatedblocks() (GH-12797)
    9b8314c

    @miss-islington
    Copy link
    Contributor

    New changeset 7182e65 by Miss Islington (bot) in branch '3.7':
    bpo-36611: Fix test_sys.test_getallocatedblocks() (GH-12797)
    7182e65

    @vstinner
    Copy link
    Member Author

    New changeset e8f9acf by Victor Stinner in branch 'master':
    bpo-36611: Disable serialno field of debug memory allocators (bpo-12796)
    e8f9acf

    @vstinner
    Copy link
    Member Author

    We decided to only disable the code by default, but the code stays until we are sure that nobody uses it.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.8 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs)
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants