Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance loss between 2020.3 and 2021.1.1 #355

Closed
mikemccarty-vertex opened this issue Mar 3, 2021 · 15 comments
Closed

Performance loss between 2020.3 and 2021.1.1 #355

mikemccarty-vertex opened this issue Mar 3, 2021 · 15 comments

Comments

@mikemccarty-vertex
Copy link

I have an embree-3.12.2-based raytracing application (built on ubuntu 18.04) where if I update tbb from 2020.3 to 2021.1.1, the throughput performance drops significantly (I would estimate up to an order of magnitude slower) when under load. It is very consistent and obvious.

Both tbb and embree are rebuilt locally from scratch. The only application source code changed during the migration is the removal of the usage of tbb::task_scheduler_init during the initialization phase, which I have understood was effectively non-functional anyway.

Any ideas/recommendations?

@alexey-katranov
Copy link
Contributor

Does the application use TBB only inside Embree or some other parts of the application also can use TBB?

@mikemccarty-vertex
Copy link
Author

I would say no -- there is a few appearances of tbb::task_groups in the source code but I'm fairly sure they are not executed in the tests I am running.

@mikemccarty-vertex
Copy link
Author

Thinking about it, the 2021.1.1 build configuration was

cmake ../oneTBB-${TBB_VERSION}/ -DCMAKE_CXX_STANDARD=14

I will try:

cmake ../oneTBB-${TBB_VERSION}/ -DCMAKE_CXX_STANDARD=14 -DTBB_TEST=OFF -DCMAKE_BUILD_TYPE=Release

... just to see if that makes a difference.

@alexey-katranov
Copy link
Contributor

The only application source code changed during the migration is the removal of the usage of tbb::task_scheduler_init during the initialization phase

Did it have any arguments or was it created by default?

@mikemccarty-vertex
Copy link
Author

No arguments

@mikemccarty-vertex
Copy link
Author

At this point, I think I should explain our runtime configuration because this is looking more like a runtime issue rather than build time.

The embree application runs in its own docker container on a server machine and there may be multiple instances of the container running concurrently (6 concurrent containers in the case that I'm testing). In the test scenario, each of the processes is under roughly equal (fairly significant) load. They are each fed work tasks from another process via RPC.

Here's a dump of embree's runtime configuration:

Embree Ray Tracing Kernels 3.12.2 ()
  Compiler  : GCC 7.5.0
  Build     : Release
  Platform  : Linux (64bit)
  CPU       : Xeon Sky Lake (GenuineIntel)
   Threads  : 36
   ISA      : XMM YMM ZMM SSE SSE2 SSE3 SSSE3 SSE4.1 SSE4.2 POPCNT AVX F16C RDRAND AVX2 FMA3 LZCNT BMI1 BMI2 AVX512F AVX512DQ AVX512CD AVX512BW AVX512VL
   Targets  : SSE SSE2 SSE3 SSSE3 SSE4.1 SSE4.2 AVX AVXI AVX2 AVX512
   MXCSR    : FTZ=1, DAZ=1
  Config
    Threads : default
    ISA     : XMM YMM ZMM SSE SSE2 SSE3 SSSE3 SSE4.1 SSE4.2 POPCNT AVX F16C RDRAND AVX2 FMA3 LZCNT BMI1 BMI2 AVX512F AVX512DQ AVX512CD AVX512BW AVX512VL
    Targets : SSE SSE2 SSE3 SSSE3 SSE4.1 SSE4.2 AVX AVXI AVX2 AVX512  (supported)
              SSE2 SSE4.2 AVX AVX2  (compile time enabled)
    Features: intersection_filter
    Tasking : TBB2021.1 TBB_header_interface_12010 TBB_lib_interface_12010

We don't set up any CPU affinity, etc. through the docker-compose.yml, we just let TBB in each of the container processes work out the CPU load-balancing amongst themselves. Historically, this has worked pretty well. With 2021.1.1, it's as if it is stubbornly under-utilizing the hardware it has at its disposal (that's just a guess - I don't have much direct evidence to support that theory).

Has TBB's oversubscription logic changed significantly with 2021.1.1? Again, switching back to TBB 2020.3 completely changes the performance profile.

@alexey-katranov
Copy link
Contributor

Interestingly, my ray-tracing unit tests crash now having switched to the Release tbb build configuration. These tests have been used for a couple of years now -- I've not seen them crash like this before. The plot thickens...

There are some known issues with Release configuration #342, try ReleaseWithDebugInfo.

Has TBB's oversubscription logic changed significantly with 2021.1.1? Again, switching back to TBB 2020.3 completely changes the performance profile.

There is no special changes related to oversubscription but oneTBB was improved to send external threads to sleep when there is work disbalance and nothing to do. In theory, oversubscription might increase disbalance and this logic might bring some issues but it is just supposition. To get the previous TBB behavior in that aspect, try to replace the body of external_waiter::pause function with the following code

my_backoff.pause();
return;

@mikemccarty-vertex
Copy link
Author

Thanks for the info. I will look into it.

@mikemccarty-vertex
Copy link
Author

@alexey-katranov your suggested change to external_waiter::pause() did not have any effect. Do you have any other suggestions?

@alexey-katranov
Copy link
Contributor

Do you have any other suggestions?

Unfortunately, no. Do you have reproducer or steps how to reproduce the issue?

@mikemccarty-vertex
Copy link
Author

No, I don't... and it's going to be tedious to trim to a form that I can release to you. I will see what I can do.

@anton-potapov
Copy link
Contributor

@mikemccarty-vertex , any updates ?

BTW there were a number of fixes since that. Can v2021.4 be tried ?

@mikemccarty-vertex
Copy link
Author

@anton-potapov thanks for checking in. We've been holding at 2020.3 since this ticket was logged. I will give 2021.4 a try when I get some time.

@anton-potapov
Copy link
Contributor

@mikemccarty-vertex , friendly reminder :)

@mikemccarty-vertex
Copy link
Author

@anton-potapov I think I'm going to close this issue for now. When I get some time to retest, I will reopen if necessary. Thanks for the reminder.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants