-
Notifications
You must be signed in to change notification settings - Fork 116
memory usage significantly increased with memkind #708
Comments
Memkind uses jemalloc 5.2.1 as a heap allocator. It is highly configurable - see http://jemalloc.net/jemalloc.3.html#tuning. Memkind uses the jemk_mallctl call to tune some jemalloc options. To get information about allocations you can use memkind_stats_print() function. As for your problem - what kind of memory are you using? What is your allocation pattern? In a recent commit we added an optimization to the FS_DAX type that helps in scenarios where you have many small allocations - see c3fb4e4 Also, to debug your problem, you can try using jemalloc directly instead of memkind. Just compile your application without memkind and use LD_PRELOAD to use jemalloc instead of the standard mallocs: LD_PRELOAD=memkind/jemalloc/lib/libjemalloc.so /path/to/app |
@adrianjhpc Hi Adrian, did you manage to find anything? |
Hi @bratpiorka, thanks for the replies, turning with jemalloc and investigating my code let me identify some usage issues I had. |
Ok, I've now got a standalone benchmark that highlights the issue I'm seeing here, so I'm re-opening this, hope that's ok. |
The benchmark is this:
|
The benchmark is run like this:
or like this:
I have a small library that collects Running the smallest array possible I get these results:
The above output shows the I've also tried using a standalone version of
But as you see above it doesn't change the underlying memory requirements. If I scale up the array size, I can see memkind clearly is working, i.e.:
But is there any way to get the base memory usage down with memkind? It makes it hard to track the actual memory consumption over time for benchmarking when using memkind to offload data from DRAM to NVRAM. |
btw, I'm just trying building against master to see if that changes the memory usage. But it's failing on build like this:
Let me know if you want me to open a separate issue for this, or if I'm just being stupid. |
Give me some time to look at this. |
I see that there's indeed base memory cost of ~0.5GB per process when using any kinds other than DRAM, but it doesn't grow when adding huge allocations. Thus, a process that allocs a few KB wastes that 0.5GB, while a process with hundreds of GB also needs just that extra 0.5GB. What's your use case? Do you have plenty of small tasks, or a single big one? If the latter, fixing this issue might be not so urgent. |
I'm doing a range of benchmarking at the moment, mainly on parallel programs, so as many processes as there are cores on a node. However, if I know the overhead, I can factor it out/in of the memory calculations so I can work around the issue. In reality it's only a big issue for things that don't use much memory, but as you can imagine on a 48 core system, 24GB is a reasonable base overhead. |
When using memkind but targetting DRAM and running my C++ application the amount of memory used by the application more than doubles (when looking at usaging information in /proc/$PID/status). i.e. I'm using ~2GB per process without memkind but over 4gb per process when memkind is linked. Is this to do with the allocator being used? And is this tunable? It's significantly interferring with my ability to differentiate between memory and NVRAM usage when testing NVRAM allocaiton.
The text was updated successfully, but these errors were encountered: