Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose malloc statistics #1275

Open
pitrou opened this issue May 8, 2018 · 28 comments
Open

Expose malloc statistics #1275

pitrou opened this issue May 8, 2018 · 28 comments

Comments

@pitrou
Copy link
Contributor

pitrou commented May 8, 2018

I don't know if it's exactly in the scope for psutil, but just in case: it could be useful to expose per-platform malloc() statistics, for example using mallinfo() on GNU/Linux:
http://man7.org/linux/man-pages/man3/mallinfo.3.html

@giampaolo
Copy link
Owner

Hey Antoine! I don't know either but it looks kinda too low-levelish. Do you have a use case?

@pitrou
Copy link
Contributor Author

pitrou commented May 8, 2018

My use case was debugging memory use here: https://bugs.python.org/issue33444
I'm not sure there's a production use :-) Though I'm sure authors of sophisticated parallel computing frameworks (such as Dask -- @mrocklin, @jakirkham or @ogrisel) would like to be able to diagnose whether a memory consumption problem is a memory leak caused by Python objects.

@jakirkham
Copy link

Not having used mallinfo much, can't say much about it. That said, I can grok why it would be useful.

Just to understand the use case a bit more. Are you hoping to call mallinfo using ctypes (?) within a process that you are trying to debug? Namely by inserting mallinfo calls wherever it seems to matter to get some intuition about how memory usage is changing over time?

@pitrou
Copy link
Contributor Author

pitrou commented May 8, 2018

What I mean is that a framework (like Dask) which already exposes memory usage statistics (such as RSS) could expose additional useful information thanks to mallinfo. The main info of interest IMHO is how much memory is kept in the allocator despite being nominally released (because of fragmentation).

@mrocklin
Copy link

mrocklin commented May 8, 2018

I've been trying to track down memory leaks when using Pandas in parallel in situations that look similar to the bug report pointed to by @pitrou . I agree that increased visibility would be of value.

@jakirkham
Copy link

To rephrase my question in the Dask context, does this make sense to call from the nanny process or does it only make sense in the worker process?

@pitrou
Copy link
Contributor Author

pitrou commented May 8, 2018

Only sense in the worker process, IMO.

@giampaolo
Copy link
Owner

If I'm understanding this right, mallinfo() returns info about the current (calling) process. It's not system-wide info nor it can be fetched on a per-process basis which makes it incompatible with psutil.Process class.

@giampaolo
Copy link
Owner

giampaolo commented May 9, 2018

As for the usefulness of this, I'm skeptical. We are already able to determine memory leaks by using Process.memory_info().rss or Process.memory_full_info().uss before and after a function call. Knowing detailed stats about malloc() specifically looks like something which is more useful to debug a C program while developing it.
Also, it seems mallinfo() is basically deprecated: https://stackoverflow.com/questions/40878169/64-bit-capable-alternative-to-mallinfo

@pitrou
Copy link
Contributor Author

pitrou commented May 9, 2018

We are already able to determine memory leaks by using Process.memory_info().rss or Process.memory_full_info().uss before and after a function call.

See https://bugs.python.org/issue33444. A higher rss tells you that there may be a memory leak, not that there is one. Python uses malloc() for all allocations > 512 bytes. When Python releases memory to the glibc (by calling free()), the glibc isn't always able to return memory to the system, so rss appears to remain high.

Also, it seems mallinfo() is basically deprecated

I see, perhaps malloc_info is more worth exposing then? Though it's return values aren't documented...

@giampaolo
Copy link
Owner

A higher rss tells you that there may be a memory leak, not that there is one.

You are right, RSS is an approximation. In fact I bumped into false positives for a long time before introducing USS which apparently either solved or mitigated the issue (not sure):

@staticmethod
def _get_mem():
# By using USS memory it seems it's less likely to bump
# into false positives.
if LINUX or WINDOWS or OSX:
return thisproc.memory_full_info().uss
else:
return thisproc.memory_info().rss

I say "not sure" because psutil's memory leak test script allows some tolerance:

diff1 = mem2 - mem1
if diff1 > tolerance:

Would mallinfo() / malloc_info() metrics be more precise than USS at helping identify missing free()s? According to this blog post https://scaryreasoner.wordpress.com/2007/10/17/finding-memory-leaks-with-mallinfo/ uordblks is the value one would typically want to use.

With that said, I'm not against exposing mallinfo() / malloc_info() or whatever per se. My concern is that they are pretty low level, it would be a Linux only API and I'm not sure what values other than uordblks are really useful and worth exposing.

@pitrou
Copy link
Contributor Author

pitrou commented May 9, 2018

In Python's case fordblks is the better choice (it gives you the number of bytes available in the allocator but not returned to the system). uordblks can be misleading because Python uses its own allocator for small-sized blocks (< 256 or 512 bytes).

USS is not better than RSS for finding out memory fragmentation.

@giampaolo
Copy link
Owner

If we figure out what malloc_info() values stand for and we are able to provide a concrete code sample in the doc which shows how to detect memory leaks for a function call then I suppose we could have a psutil.malloc_memory() or something. Basically I want there to be a concrete use case for it other than merely "reporting low level memory metrics" which very few people would use. If we can extend the same concept to other platforms (basically Windows and/or OSX) that's even better, especially because relying on RSS/USS is apparently not fully reliable.

@giampaolo
Copy link
Owner

giampaolo commented May 10, 2018

To push this even further: test_memory_leaks.py tries hard to detect a function memory leaks by:
1 - picking up the right memory stats to use (RSS/USS)
2 - warming up first
3 - call function a certain number of times
4 - run garbage collector
5 - allow some failure tolerance

Since that's not straightforward maybe we can have a utility function like this:

>>> # signature
>>> test_leak(callable, times=1000, warmup_times=10, tolerance=4096)
>>>
>>> # success (returns None)
>>> test_leak(fun)
>>>
>>> # failure
>>> test_leak(fun)
AssertionError("46523 extra process memory after 1000 calls")

Depending on how reliable such a function turns out to be it can live either in psutil namespace, psutil.test namespace, psutil doc or a blog post.

@giampaolo
Copy link
Owner

The Windows counterpart appears to be called _heapwalk:
https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/heapwalk?redirectedfrom=MSDN&view=vs-2019

@giampaolo
Copy link
Owner

@giampaolo
Copy link
Owner

See comment about mallinfo() in #1757. It still would not grant 100% stability in detecting memory leaks, despite being more precise.

giampaolo added a commit that referenced this issue May 12, 2020
Preamble
=======

We have a [memory leak test suite](https://github.com/giampaolo/psutil/blob/e1ea2bccf8aea404dca0f79398f36f37217c45f6/psutil/tests/__init__.py#L897), which calls a function many times and fails if the process memory increased. We do this in order to detect missing `free()` or `Py_DECREF` calls in the C modules. When we do, then we have a memory leak.

The problem
==========

A problem we've been having for probably over 10 years, is the false positives. That's because the memory fluctuates. Sometimes it may increase (or even decrease!) due to how the OS handles memory, the Python's garbage collector, the fact that RSS is an approximation and who knows what else. So thus far we tried to compensate that by using the following logic:
- warmup (call fun 10 times)
- call the function many times (1000)
- if memory increased before/after calling function 1000 times, then keep calling it for another 3 secs
- if it still increased at all (> 0) then fail

This logic didn't really solve the problem, as we still had occasional false positives, especially lately on FreeBSD. 

The solution
=========

This PR changes the internal algorithm so that in case of failure (mem > 0 after calling fun() N times) we retry the test for up to 5 times, increasing N (repetitions) each time, so we consider it a failure only if the memory **keeps increasing** between runs. So for instance, here's a legitimate failure:

```
psutil.tests.test_memory_leaks.TestModuleFunctionsLeaks.test_disk_partitions ... 
Run #1: extra-mem=696.0K, per-call=3.5K, calls=200
Run #2: extra-mem=1.4M, per-call=3.5K, calls=400
Run #3: extra-mem=2.1M, per-call=3.5K, calls=600
Run #4: extra-mem=2.7M, per-call=3.5K, calls=800
Run #5: extra-mem=3.4M, per-call=3.5K, calls=1000
FAIL
```

If, on the other hand, the memory increased on one run (say 200 calls) but decreased on the next run (say 400 calls), then it clearly means it's a false positive, because memory consumption may be > 0 on second run, but if it's lower than the previous run with less repetitions, then it cannot possibly represent a leak (just a fluctuation):

```
psutil.tests.test_memory_leaks.TestModuleFunctionsLeaks.test_net_connections ... 
Run #1: extra-mem=568.0K, per-call=2.8K, calls=200
Run #2: extra-mem=24.0K, per-call=61.4B, calls=400
OK
```

Note about mallinfo()
================

Aka #1275. `mallinfo()` on Linux is supposed to provide memory metrics about how many bytes gets allocated on the heap by `malloc()`, so it's supposed to be way more precise than RSS and also [USS](http://grodola.blogspot.com/2016/02/psutil-4-real-process-memory-and-environ.html). In another branch were I exposed it, I verified that fluctuations still occur even when using `mallinfo()` though, despite less often. So that means even `mallinfo()` would not grant 100% stability.
@crusaderky
Copy link
Contributor

I have a real life use case on dask.distributed. The package really struggles right now to tell apart genuine memory leaks and free'd memory that hasn't been returned to the OS yet. This meausure is used in heuristics for memory rebalancing and OOM safety net systems.

Using the demo workbook attached to dask/distributed#4774 I can reliably produce such a "leak" where I allocate a bunch of large-ish numpy arrays (160 kib each) and then free them after a few seconds. After that operation, on my GUI I read:

RSS: 1244 MiB
managed memory (Python objects tracked by dask and measured with sizeof): 749 MiB
unmanaged memory (RSS - managed: memory leaks, modules, global variables, and unreleased memory): 495 MiB

full_memory_info says nothing interesting; note how uss and rss are almost the same:

pfullmem(rss=1304870912, vms=2051538944, shared=31059968, text=2240512, lib=0, data=1326317568, dirty=0, uss=1274175488, pss=1275800576, swap=0)

however, if I run this on the process:

import ctypes
libc = ctypes.CDLL("libc.so.6")
libc.malloc_stats()

I read:

Total (incl. mmap):
system bytes     = 1238429696
in use bytes     =  797209696

which is exactly the information I need.
Thanks to the malloc_stats output I can get these numbers out:

RSS: 1244 MiB
managed: 749 MiB
not in use: (1181 - 760) = 421 MiB
unmanaged: (1244 - 749 - 421) = 74 MiB
managed+unmanaged: (749+74) = 823 MiB

When I run my rebalancing and anti-OMM algorithms, if I had this information I could consider 823 MiB instead of 1244 MiB, knowing that the rest will be reused at the next malloc.

MacOSX Big Sur has exactly the same problem. I don't know where to get the same information though.
I could not reproduce the issue on Windows.

CC @fjetter @gjoseph92

@giampaolo
Copy link
Owner

if I run this on the process:
import ctypes
libc = ctypes.CDLL("libc.so.6")
libc.malloc_stats()
I read:

Total (incl. mmap):
system bytes = 1238429696
in use bytes = 797209696

On my Linux system I get this:

Arena 0:
system bytes     =     790528
in use bytes     =     722832
Total (incl. mmap):
system bytes     =     942080
in use bytes     =     874384
max mmap regions =          1
max mmap bytes   =     151552
0

Which one of these values should we expose, in your opinion? Which one is useful to detect a memory leak?

@giampaolo
Copy link
Owner

For the record, there's an old experimental branch where I exposed mallinfo on Linux and _heapwalk on Windows:
https://github.com/giampaolo/psutil/compare/malloc-info?expand=1

@giampaolo
Copy link
Owner

giampaolo commented May 27, 2021

Looking at malloc_stats: it's only able to stream the output to stderr, which is a major pita. It seems the evolution of malloc_stats is called malloc_info() (https://man7.org/linux/man-pages/man3/malloc_info.3.html). Quote:

malloc_stats function exports an XML string that describes the current state of the memory-allocation implementation in the caller. [...] The open_memstream(3) function can be used to send the output of malloc_info() directly into a buffer in memory, rather than to a file. The malloc_info() function is designed to address deficiencies in malloc_stats(3) and mallinfo(3).

The data returned, though, is undocumented, completely different than malloc_stats, and I'm not sure what to make of it, see:

In summary we have:

  • mallinfo: it apparently returns the most useful info in a nice struct, but it's deprecated and completely unsuitable for 64-bit machines
  • malloc_stats: it returns info (sort of) similar to mallinfo but it's streamed over stderr, making it unsuitable for a library, as we would have to mess with stderr without user consent
  • malloc_info: undocumented

These must be the worst designed APIs in Linux.

@crusaderky
Copy link
Contributor

crusaderky commented May 28, 2021

Which one of these values should we expose, in your opinion? Which one is useful to detect a memory leak?

The difference between these two is the amount of memory that will get released if you invoke malloc_trim(0), or it will likely be reused if you invoke malloc:

Total (incl. mmap):
system bytes     =     942080
in use bytes     =     874384

I'm unsure what "system bytes" means. Note that it's slightly lower than USS.

@crusaderky
Copy link
Contributor

@giampaolo glibc 2.33 (released 2021-02-01) adds mallinfo2, which replaces mallinfo and supports >2GiB
https://man7.org/linux/man-pages/man3/mallinfo.3.html
Note that ubuntu 20.04 currently ships glibc 2.31.

@crusaderky
Copy link
Contributor

crusaderky commented May 28, 2021

Notes:

  1. this comment in the man page of mallinfo (linked above) seems incorrect to me. If I read the source code, it's clearly doing the sum of all arenas e.g. it reports the same as the "Total (incl. mmap)" paragraph of malloc_stats.
       Information is returned for only the main memory allocation area.
       Allocations in other arenas are excluded.  See malloc_stats(3)
       and malloc_info(3) for alternatives that include information
       about other arenas.
  1. also by reading the git tip of the glibc source code, I spotted that the total paragraph of malloc_stats will experience an integer overflow when you get beyond 4 GiB.

I'm opening bug reports for both issues.

[EDIT]
actually, all measures from malloc_stats will break when you go beyond 4 GiB, not just the total.
Bug reports:

  1. https://sourceware.org/bugzilla/show_bug.cgi?id=27928
  2. https://sourceware.org/bugzilla/show_bug.cgi?id=21556

@giampaolo
Copy link
Owner

@giampaolo glibc 2.33 (released 2021-02-01) adds mallinfo2, which replaces mallinfo and supports >2GiB
https://man7.org/linux/man-pages/man3/mallinfo.3.html

Sweet! I missed that. It's so recent we can't assume we can rely on it though. When mallinfo2 is not available we should use malloc_info in order to mimic the same results. I'm not sure if that's possible but malloc_info source code is here:
https://github.com/bminor/glibc/blob/master/malloc/malloc.c
Also, another interesting link:
https://bitbucket.org/einsteintoolkit/tickets/issues/2352/support-64-bit-numbers-for

@crusaderky
Copy link
Contributor

crusaderky commented May 28, 2021

Does anybody have a clue about where to get the same information on Mac? The problem there seems to be even more pronounced e.g. rss hardly ever deflates.

@giampaolo
Copy link
Owner

[EDIT]
actually, all measures from malloc_stats will break when you go beyond 4 GiB, not just the total.
Bug reports:
https://sourceware.org/bugzilla/show_bug.cgi?id=27928
https://sourceware.org/bugzilla/show_bug.cgi?id=21556

Ouch! I just saw your edit. It seems all APIs are bugged one way or another. :-\

@crusaderky
Copy link
Contributor

To my knowledge malloc_info isn't buggy...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants