Set thread QoS to USER_INITIATED on Apple Silicon#3278
Set thread QoS to USER_INITIATED on Apple Silicon#3278ssp3nc3r wants to merge 1 commit intostan-dev:developfrom
Conversation
On Apple Silicon Macs, TBB worker threads are created with the default QoS class, which macOS may schedule to efficiency cores even when performance cores are available. This significantly degrades parallel performance. This adds a pthread_set_qos_class_self_np() call in on_scheduler_entry() to set USER_INITIATED QoS, signaling to macOS that these are compute threads the user is waiting for. This causes macOS to prefer performance cores when available. Fixes stan-dev#3277
|
Changes look good and relatively self contained to me, but I don't have a machine to test @bob-carpenter is our resident Mac Silicon fan and has used some of this functionality recently, so maybe he can take a peek before we'd merge |
Jenkins Console Log Machine informationNo LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 20.04.3 LTS Release: 20.04 Codename: focalCPU: G++: Clang: |
Summary
On Apple Silicon Macs, TBB worker threads are created with the default QoS class, which macOS may schedule to efficiency cores even when performance cores are available. This significantly degrades parallel performance.
This adds a
pthread_set_qos_class_self_np()call inon_scheduler_entry()to setUSER_INITIATEDQoS, signaling to macOS that these are compute threads the user is waiting for. This causes macOS to prefer performance cores when available.Details
#include <pthread.h>and#include <sys/qos.h>on Applepthread_set_qos_class_self_np(QOS_CLASS_USER_INITIATED, 0)when TBB worker threads enter the scheduler__arm64__or__aarch64__)Testing
Tested on macOS 26.2 (Tahoe) with Apple M3 Ultra (24 P-cores, 8 E-cores).
Before: CPU usage drops from ~800% to ~100-300% per chain after ~4 minutes (threads demoted to E-cores)
After: CPU usage remains stable on P-cores
Fixes #3277