New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tcmalloc can have pretty large memory fragmentation (was: TCMalloc could hold most of the memory useless.) #756
Comments
You're right. In worth case and even in some real-world cases tcmalloc can fragment memory a lot. This is also true (sometimes to lesser extend) about any malloc implementation. jemalloc could be even worse in some cases and better in some other cases. glibc malloc which is still somewhat based on Doug Lea's malloc is actually decent w.r.t. dealing with corner cases, but it may still fragment memory a lot sometimes. If your app triggers some hard corner cases for malloc implementations, then only advice I can give you is to adapt your app somehow. |
You can disable thread cache if that's a problem for you. Just build with -DTCMALLOC_SMALL_BUT_SLOW. But it will not in general affect worst cases of fragmentation as far as I understand. That worst case fragmentation occurs from spans that have most but not all objects free and we need objects of other size class. And that doesn't look like something that can be fixed with improved freelists representation. Regarding this idea, I was thinking about something along this lines but very different. But didn't have time in last months to actually implement this stuff. All I have is some prototyping code that I used to see how quick and compact the code could be. And so far it looks promising. It can be seen here: https://gist.github.com/alk/09a387957fc78aa25b29 The idea is inspired by "binary representations" chapter from Okasaki's "Purely Functional Data Structures" book. My thinking so far is that this representation has chance to be fast for "push/pop one object" case and for "get N objects" and for "add N objects". With all cases doing just few memory accesses. Only needing two words per object and some manageable overhead for per-thread freelists, transfer caches (which in this case could be just single free-list per size class and per-cpu I think) and free lists in spans. But all that needs more work and I could be wrong. In new few months I'm unlikely to have time to work on this idea. In any case, feel free to pursue idea of skip lists if you think it'll work fine. |
I'm working on an application which memory limits about 500MB(FLAGS_tcmalloc_heap_limit_mb). It just malloc in two size, and free randomly.
Consider the following scenario:
That would be a big waste?
This is more obvious in the multi thread environment, any idea for it? Thanks.
The text was updated successfully, but these errors were encountered: