Performance issue with linked lists #6

vladimir-cheverdyuk-altium · 2020-06-04T17:17:44Z

Hi

We are evaluating using FastMM5 for our application and in general we see improvements over our current memory manager but in one scenario we see about 2 times slower performance (6 minutes with FastMM5 and less than 3 with current memory manager).

Code allocates linked list items and later does multiple scans over these lists. There are few threads that do the same operation. Each thread always uses own data and never uses data from another thread. Our current memory manager allocates memory pretty much sequentially and as result memory nicely cached by CPU and scan operations are really fast.

But in FastMM5 allocations scattered all around memory and as result CPU caching does not work.

I wonder if is it possible to tune FastMM5 for that scenario when each thread has "own" memory manager/memory pool?

Thank you.

pleriche · 2020-06-04T17:50:46Z

Hi Vladimir,

Thank you for the feedback.

With regards to the scenario where FastMM5 is slower, I am considering adding support for having an arena affinity per thread, so that blocks of the same size allocated by the same thread will be adjacent in the address space.

In the meantime I would like to investigate this a bit further. What memory manager are you currently using, and have you perhaps been able to reduce it to a small test case that I can run in a profiler to see where the bottleneck lies? Perhaps it is something that is easily fixable.

Best regards,
Pierre

vladimir-cheverdyuk-altium · 2020-06-04T18:04:23Z

We are using TbbMalloc right now. We did check log addresses of all allocations and with Tbbmalloc there pretty much sequential. With FastMM5 they are all mixed.

pleriche · 2020-06-05T14:40:05Z

I suspect it might be due to cache thrashing. Could you please try forcing 64-byte alignment by calling FastMM_EnterMinimumAddressAlignment(maa64Bytes)?

If that improves it then an arena affinity per thread will help, otherwise not.

vladimir-cheverdyuk-altium · 2020-06-05T17:59:06Z

I did try everything. 32 and 64 alignment try to change different configuration variables. But our application has a lot of other threads that also allocating and I believe leads to scatter allocation all around memory.

Nashev · 2024-05-13T10:00:28Z

Very sad, no small test case app was provided to reproduce this issue.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance issue with linked lists #6

Performance issue with linked lists #6

vladimir-cheverdyuk-altium commented Jun 4, 2020

pleriche commented Jun 4, 2020

vladimir-cheverdyuk-altium commented Jun 4, 2020

pleriche commented Jun 5, 2020

vladimir-cheverdyuk-altium commented Jun 5, 2020

Nashev commented May 13, 2024

Performance issue with linked lists #6

Performance issue with linked lists #6

Comments

vladimir-cheverdyuk-altium commented Jun 4, 2020

pleriche commented Jun 4, 2020

vladimir-cheverdyuk-altium commented Jun 4, 2020

pleriche commented Jun 5, 2020

vladimir-cheverdyuk-altium commented Jun 5, 2020

Nashev commented May 13, 2024