Describe your environment. macOS
Steps to reproduce.
Customer hit it randomly, I have their spindump.
What is the expected behavior?
Not 100% CPU :)
What is the actual behavior?
100% CPU usage due to spin yield-loop in CancelAllRequests
Additional context.
Two problems:
-
No lock on m_httpCallbacks — it reads m_httpCallbacks.empty() without holding m_httpCallbacksMtx, but every other access to m_httpCallbacks (in handleSendRequest and onHttpResponse) is guarded by that mutex. This is a data race (undefined behavior). On arm64 especially, the thread may never see the list become empty due to store/load ordering.
-
The yield-loop can spin forever — CancelAllRequests() tells the HTTP client to cancel, but the callbacks are removed from m_httpCallbacks in onHttpResponse(), which is scheduled asynchronously via scheduleOnHttpResponse() → scheduleTask(). If the task dispatcher is blocked or shut down (e.g., PauseTransmission has stopped dispatching), the response handler never runs, m_httpCallbacks never empties, and this loop spins indefinitely — which is exactly what the spindump shows.
The call chain in the spindump confirms this: PauseTransmission() calls into cancelAllRequests(), but pausing transmission likely also stops or stalls the task dispatcher that would normally run onHttpResponse to drain the callback list. The yield-loop has no timeout or fallback, so it burns CPU forever.
A fix would need to either:
Use a condition variable signaled from onHttpResponse instead of polling
Add a timeout to the loop and forcibly clear callbacks after it expires
Ensure onHttpResponse can still execute even when transmission is paused (or drain synchronously)
Spindump:
CPU Time: 9.756s (42.6G cycles, 143.3G instructions, 0.30c/i, 4.37GHz avg)
Num threads: 7
Note: 2 idle work queue threads omitted
Thread 0x3b1aee 1001 samples (1-1001) priority 0-20 (base 20) cpu time 9.755s (42.6G cycles, 143.3G instructions, 0.30c/i, 4.37GHz avg)
1001 thread_start + 8 (libsystem_pthread.dylib + 7196) [0x185ac6c1c]
1001 _pthread_start + 136 (libsystem_pthread.dylib + 27736) [0x185acbc58]
1001 void* std::__1::__thread_proxy[abi:ne190102]
<redacted>
1001 sense::Telemetry::OneDsTelemetryClient::PauseTransmission() (in telemetryd_v2.dwarf) (/Users/cloudtest/vss/_work/1/s/src/edr/libtelemetry/1ds/OneDsTelemetryClient.cpp:348) [0x104c99f64]
1001 Microsoft::Applications::Events::LogManagerImpl::PauseTransmission() (in telemetryd_v2.dwarf) (/Users/cloudtest/vss/_work/1/s/src/edr/MAT/MAT/lib/api/LogManagerImpl.cpp:461) [0x104be8344]
1001 std::__1::__function::__func<Microsoft::Applications::Events::TelemetrySystem::TelemetrySystem(Microsoft::Applications::Events::ILogManager&, Microsoft::Applications::Events::IRuntimeConfig&, Microsoft::Applications::Events::IOfflineStorage&, Microsoft::Applications::Events::IHttpClient&, Microsoft::Applications::Events::ITaskDispatcher&, Microsoft::Applications::Events::IBandwidthController*, Microsoft::Applications::Events::LogSessionDataProvider&)::$_2, std::__1::allocator<Microsoft::Applications::Events::TelemetrySystem::TelemetrySystem(Microsoft::Applications::Events::ILogManager&, Microsoft::Applications::Events::IRuntimeConfig&, Microsoft::Applications::Events::IOfflineStorage&, Microsoft::Applications::Events::IHttpClient&, Microsoft::Applications::Events::ITaskDispatcher&, Microsoft::Applications::Events::IBandwidthController*, Microsoft::Applications::Events::LogSessionDataProvider&)::$_2>, bool ()>::operator()() (in telemetryd_v2.dwarf) (/Applications/Xcode_16.4.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX15.5.sdk/usr/include/c++/v1/__functional/function.h:313) [0x104bda300]
996 Microsoft::Applications::Events::HttpClientManager::cancelAllRequests() (in telemetryd_v2.dwarf) (/Users/cloudtest/vss/_work/1/s/src/edr/MAT/MAT/lib/http/HttpClientManager.cpp:153) [0x104c01174]
801 swtch_pri + 8 (libsystem_kernel.dylib + 3208) [0x185a87c88] (running)
193 swtch_pri + 8 (libsystem_kernel.dylib + 3208) [0x185a87c88]
*94 ??? (kernel.release.t6050 + 24180) [0xfffffe000b3dde74] (running)
*62 ??? (kernel.release.t6050 + 24180) [0xfffffe000b3dde74]
*59 ??? (kernel.release.t6050 + 2149104) [0xfffffe000b5e4af0] (running)
*3 ??? (kernel.release.t6050 + 2150656) [0xfffffe000b5e5100]
*2 ??? (kernel.release.t6050 + 2122228) [0xfffffe000b5de1f4]
*2 ??? (kernel.release.t6050 + 843264) [0xfffffe000b4a5e00]
*2 ??? (kernel.release.t6050 + 842980) [0xfffffe000b4a5ce4] (running)
*1 ??? (kernel.release.t6050 + 2122276) [0xfffffe000b5de224] (running)
*21 ??? (kernel.release.t6050 + 843532) [0xfffffe000b4a5f0c] (runnable)
*15 ??? (kernel.release.t6050 + 27000) [0xfffffe000b3de978] (running)
*1 ??? (kernel.release.t6050 + 27000) [0xfffffe000b3de978]
*1 ??? (kernel.release.t6050 + 843632) [0xfffffe000b4a5f70]
*1 ??? (kernel.release.t6050 + 2168000) [0xfffffe000b5e94c0]
*1 ??? (kernel.release.t6050 + 2148624) [0xfffffe000b5e4910]
*1 ??? (kernel.release.t6050 + 26064) [0xfffffe000b3de5d0]
*1 ??? (kernel.release.t6050 + 316424) [0xfffffe000b425408] (running)
2 cthread_yield + 36 (libsystem_pthread.dylib + 15400) [0x185ac8c28] (running)
3 cthread_yield + 44 (libsystem_pthread.dylib + 15408) [0x185ac8c30] (running)
2 Microsoft::Applications::Events::HttpClientManager::cancelAllRequests() (in telemetryd_v2.dwarf) (/Users/cloudtest/vss/_work/1/s/src/edr/MAT/MAT/lib/http/HttpClientManager.cpp:152) [0x104c01168] (running)
Thread 0x3b1aec 1001 samples (1-1001) priority 20 (base 20) last ran 1011.253s ago
1001 thread_start + 8 (libsystem_pthread.dylib + 7196) [0x185ac6c1c]
1001 _pthread_start + 136 (libsystem_pthread.dylib + 27736) [0x185acbc58]
1001 void* std::__1::__thread_proxy[abi:ne190102]<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, void (*)(void*), void*>>(void*) (in telemetryd_v2.dwarf) (/Applications/Xcode_16.4.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX15.5.sdk/usr/include/c++/v1/__thread/thread.h:209) [0x104c20c58]
1001 Microsoft::Applications::Events::PlatformAbstraction::WorkerThread::threadFunc(void*) (in telemetryd_v2.dwarf) (/Users/cloudtest/vss/_work/1/s/src/edr/MAT/MAT/lib/pal/WorkerThread.cpp:0) [0x104c20410]
1001 Microsoft::Applications::Events::PlatformAbstraction::Event::wait(unsigned int) const (in telemetryd_v2.dwarf) (/Users/cloudtest/vss/_work/1/s/src/edr/MAT/MAT/lib/./pal/WorkerThread.hpp:55) [0x104c20924]
1001 bool std::__1::condition_variable::wait_for[abi:ne190102]<long long, std::__1::ratio<1l, 1000l>, Microsoft::Applications::Events::PlatformAbstraction::Event::wait(unsigned int) const::'lambda0'()>(std::__1::unique_lock<std::__1::mutex>&, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000l>> const&, Microsoft::Applications::Events::PlatformAbstraction::Event::wait(unsigned int) const::'lambda0'()) (in telemetryd_v2.dwarf) (/Applications/Xcode_16.4.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX15.5.sdk/usr/include/c++/v1/__condition_variable/condition_variable.h:205) [0x104c20a54]
1001 std::__1::condition_variable::__do_timed_wait(std::__1::unique_lock<std::__1::mutex>&, std::__1::chrono::time_point<std::__1::chrono::system_clock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l>>>) + 104 (libc++.1.dylib + 26864) [0x1859e18f0]
1001 <patched truncated backtrace>
1001 __psynch_cvwait + 8 (libsystem_kernel.dylib + 17676) [0x185a8b50c]
*1001 psynch_cvcontinue + 0 (pthread + 17948) [0xfffffe000b316abc]
Thread 0x3b1aed 1001 samples (1-1001) priority 20 (base 20) last ran 22618.041s ago
1001 thread_start + 8 (libsystem_pthread.dylib + 7196) [0x185ac6c1c]
1001 _pthread_start + 136 (libsystem_pthread.dylib + 27736) [0x185acbc58]
1001 void* std::__1::__thread_proxy[abi:ne190102]
< redacted >
1001 sense::Telemetry::OneDsTelemetryClient::LogEvent(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, bool) (in telemetryd_v2.dwarf) (/Users/cloudtest/vss/_work/1/s/src/edr/libtelemetry/1ds/OneDsTelemetryClient.cpp:344) [0x104c99de8]
1001 Microsoft::Applications::Events::Logger::LogEvent(Microsoft::Applications::Events::EventProperties const&) (in telemetryd_v2.dwarf) (/Users/cloudtest/vss/_work/1/s/src/edr/MAT/MAT/lib/api/Logger.cpp:334) [0x104bedfac]
1001 Microsoft::Applications::Events::Logger::submit(CsProtocol::Record&, Microsoft::Applications::Events::EventProperties const&) (in telemetryd_v2.dwarf) (/Users/cloudtest/vss/_work/1/s/src/edr/MAT/MAT/lib/api/Logger.cpp:589) [0x104bf15cc]
1001 Microsoft::Applications::Events::LogManagerImpl::sendEvent(Microsoft::Applications::Events::IncomingEventContext* const&) (in telemetryd_v2.dwarf) (/Users/cloudtest/vss/_work/1/s/src/edr/MAT/MAT/lib/api/LogManagerImpl.cpp:726) [0x104be8f10]
1001 std::__1::recursive_mutex::lock() + 16 (libc++.1.dylib + 29536) [0x1859e2360]
1001 _pthread_mutex_firstfit_lock_slow + 216 (libsystem_pthread.dylib + 6368) [0x185ac68e0]
1001 <patched truncated backtrace>
1001 __psynch_mutexwait + 8 (libsystem_kernel.dylib + 14812) [0x185a8a9dc]
*1001 psynch_mtxcontinue + 0 (pthread + 9696) [0xfffffe000b314a80] (blocked by turnstile waiting for telemetryd_v2 [29655] [unique pid 129217] thread 0x3b1aee)
Describe your environment. macOS
Steps to reproduce.
Customer hit it randomly, I have their spindump.
What is the expected behavior?
Not 100% CPU :)
What is the actual behavior?
100% CPU usage due to spin yield-loop in CancelAllRequests
Additional context.
Two problems:
No lock on m_httpCallbacks — it reads m_httpCallbacks.empty() without holding m_httpCallbacksMtx, but every other access to m_httpCallbacks (in handleSendRequest and onHttpResponse) is guarded by that mutex. This is a data race (undefined behavior). On arm64 especially, the thread may never see the list become empty due to store/load ordering.
The yield-loop can spin forever — CancelAllRequests() tells the HTTP client to cancel, but the callbacks are removed from m_httpCallbacks in onHttpResponse(), which is scheduled asynchronously via scheduleOnHttpResponse() → scheduleTask(). If the task dispatcher is blocked or shut down (e.g., PauseTransmission has stopped dispatching), the response handler never runs, m_httpCallbacks never empties, and this loop spins indefinitely — which is exactly what the spindump shows.
The call chain in the spindump confirms this: PauseTransmission() calls into cancelAllRequests(), but pausing transmission likely also stops or stalls the task dispatcher that would normally run onHttpResponse to drain the callback list. The yield-loop has no timeout or fallback, so it burns CPU forever.
A fix would need to either:
Use a condition variable signaled from onHttpResponse instead of polling
Add a timeout to the loop and forcibly clear callbacks after it expires
Ensure onHttpResponse can still execute even when transmission is paused (or drain synchronously)
Spindump: