Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.Sign up
tcmalloc can have pretty large memory fragmentation (was: TCMalloc could hold most of the memory useless.) #756
I'm working on an application which memory limits about 500MB(FLAGS_tcmalloc_heap_limit_mb). It just malloc in two size, and free randomly.
Consider the following scenario:
This is more obvious in the multi thread environment, any idea for it? Thanks.
You're right. In worth case and even in some real-world cases tcmalloc can fragment memory a lot. This is also true (sometimes to lesser extend) about any malloc implementation. jemalloc could be even worse in some cases and better in some other cases.
glibc malloc which is still somewhat based on Doug Lea's malloc is actually decent w.r.t. dealing with corner cases, but it may still fragment memory a lot sometimes.
If your app triggers some hard corner cases for malloc implementations, then only advice I can give you is to adapt your app somehow.
Thanks for your reply.
I noticed that the Span>objects is a simple freelist in single linked list, fetched by a range/release one by one.
According to the idea of skip list, I have tried to organize it into a skip list node for range fetching.
on average, skiplist has a better performance when the span->objects is long enough(in my test, pagesize == 64kb and the size of object < 256).
It balanced the lookup and prepend operation, and when the number of fetching objects >= list length, It can be removed directly.
You can disable thread cache if that's a problem for you. Just build with -DTCMALLOC_SMALL_BUT_SLOW. But it will not in general affect worst cases of fragmentation as far as I understand. That worst case fragmentation occurs from spans that have most but not all objects free and we need objects of other size class. And that doesn't look like something that can be fixed with improved freelists representation.
Regarding this idea, I was thinking about something along this lines but very different. But didn't have time in last months to actually implement this stuff. All I have is some prototyping code that I used to see how quick and compact the code could be. And so far it looks promising. It can be seen here: https://gist.github.com/alk/09a387957fc78aa25b29
The idea is inspired by "binary representations" chapter from Okasaki's "Purely Functional Data Structures" book.
My thinking so far is that this representation has chance to be fast for "push/pop one object" case and for "get N objects" and for "add N objects". With all cases doing just few memory accesses. Only needing two words per object and some manageable overhead for per-thread freelists, transfer caches (which in this case could be just single free-list per size class and per-cpu I think) and free lists in spans.
But all that needs more work and I could be wrong. In new few months I'm unlikely to have time to work on this idea.
In any case, feel free to pursue idea of skip lists if you think it'll work fine.