Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance issue with linked lists #6

Open
vladimir-cheverdyuk-altium opened this issue Jun 4, 2020 · 5 comments
Open

Performance issue with linked lists #6

vladimir-cheverdyuk-altium opened this issue Jun 4, 2020 · 5 comments

Comments

@vladimir-cheverdyuk-altium

Hi

We are evaluating using FastMM5 for our application and in general we see improvements over our current memory manager but in one scenario we see about 2 times slower performance (6 minutes with FastMM5 and less than 3 with current memory manager).

Code allocates linked list items and later does multiple scans over these lists. There are few threads that do the same operation. Each thread always uses own data and never uses data from another thread. Our current memory manager allocates memory pretty much sequentially and as result memory nicely cached by CPU and scan operations are really fast.

But in FastMM5 allocations scattered all around memory and as result CPU caching does not work.

I wonder if is it possible to tune FastMM5 for that scenario when each thread has "own" memory manager/memory pool?

Thank you.

@pleriche
Copy link
Owner

pleriche commented Jun 4, 2020

Hi Vladimir,

Thank you for the feedback.

With regards to the scenario where FastMM5 is slower, I am considering adding support for having an arena affinity per thread, so that blocks of the same size allocated by the same thread will be adjacent in the address space.

In the meantime I would like to investigate this a bit further. What memory manager are you currently using, and have you perhaps been able to reduce it to a small test case that I can run in a profiler to see where the bottleneck lies? Perhaps it is something that is easily fixable.

Best regards,
Pierre

@vladimir-cheverdyuk-altium
Copy link
Author

We are using TbbMalloc right now. We did check log addresses of all allocations and with Tbbmalloc there pretty much sequential. With FastMM5 they are all mixed.

@pleriche
Copy link
Owner

pleriche commented Jun 5, 2020

I suspect it might be due to cache thrashing. Could you please try forcing 64-byte alignment by calling FastMM_EnterMinimumAddressAlignment(maa64Bytes)?

If that improves it then an arena affinity per thread will help, otherwise not.

@vladimir-cheverdyuk-altium
Copy link
Author

I did try everything. 32 and 64 alignment try to change different configuration variables. But our application has a lot of other threads that also allocating and I believe leads to scatter allocation all around memory.

@Nashev
Copy link

Nashev commented May 13, 2024

Very sad, no small test case app was provided to reproduce this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants