Skip to content

Conversation

@naftaly
Copy link
Contributor

@naftaly naftaly commented May 9, 2025

This PR creates a service to capture hangs. Hangs are sent in the same manner as they are on Android as described here. A similar span is also sent using iOS terminology as an interim in order for it to appear in the timeline.

  • Currently stack traces are not sent, this is for a future PR.
  • The iOS span appears as "emb_thread_blockage" in the Performance timeline.
  • Limit is set to 200 hangs per session. Currently hard coded.
  • Limit is set to 0 samples per hang. Currently hard coded.
  • Add a session attribute named emb-thread-blockage with the count of hangs in the session.
  • Currently using ux for the type, I think it needs to be perf.thread_blockage.

Client Work

  • Add limits.
  • Figure out what spans to log (ANR or Hang), and get backend work done accordingly.
  • If there's an un-terminated hang span on startup, we should determine that a crash occurred due to a watchdog event (SIGKILL 0x8badf00d), but still close the span and have it appear in the timeline.
  • Understand the overflow error that is currently commented out.

Backend work

  • Make sure the correct naming is used on each platform. (iOS => Hang)
  • Show perf.thread_blockage spans in timeline.
  • Add remote config limits, here's the structure used in the app:
hang_limits: {
    hang_per_session: Uint,
    samples_per_hang: Uint
}

Gate

Set hang_per_session = 0 to ensure the watchdog doesn't run.

Visuals

Screenshot 2025-09-03 at 1 36 48 PM

@github-actions
Copy link

github-actions bot commented May 9, 2025

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Scanned Files

None

@github-actions
Copy link

github-actions bot commented May 9, 2025

Warnings
⚠️ No CHANGELOG entry added.

Generated by 🚫 Danger Swift against 2225322

@codecov
Copy link

codecov bot commented May 10, 2025

Codecov Report

❌ Patch coverage is 74.82877% with 147 lines in your changes missing coverage. Please review.
✅ Project coverage is 89.10%. Comparing base (5ea59ef) to head (4b7fccc).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
.../EmbraceCore/Capture/Hang/HangCaptureService.swift 7.27% 102 Missing ⚠️
...onfigurable/RemoteConfig/RemoteConfigPayload.swift 36.00% 16 Missing ⚠️
...ceCore/Capture/Hang/Watchdog/NanosecondClock.swift 65.51% 10 Missing ⚠️
...braceCore/Capture/Hang/Watchdog/HangWatchdog.swift 95.55% 8 Missing ⚠️
...ts/TestSupport/Mocks/MockEmbraceConfigurable.swift 28.57% 5 Missing ⚠️
Sources/EmbraceConfiguration/HangLimits.swift 42.85% 4 Missing ⚠️
Sources/EmbraceCore/Capture/CaptureServices.swift 96.87% 1 Missing ⚠️
...reTests/Capture/Hang/HangCaptureServiceTests.swift 99.34% 1 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #233      +/-   ##
==========================================
- Coverage   89.44%   89.10%   -0.34%     
==========================================
  Files         465      470       +5     
  Lines       29366    29936     +570     
==========================================
+ Hits        26265    26675     +410     
- Misses       3101     3261     +160     
Files with missing lines Coverage Δ
Sources/EmbraceCaptureService/CaptureService.swift 90.38% <100.00%> (+1.25%) ⬆️
...urces/EmbraceCommonInternal/Locks/UnfairLock.swift 100.00% <ø> (ø)
...figInternal/EmbraceConfigurable/RemoteConfig.swift 79.43% <100.00%> (-5.86%) ⬇️
...figuration/EmbraceConfigurable/DefaultConfig.swift 91.30% <100.00%> (+0.39%) ⬆️
...braceCore/Capture/UX/View/ViewCaptureService.swift 65.59% <100.00%> (-1.22%) ⬇️
...rces/EmbraceCoreDataInternal/CoreDataWrapper.swift 77.77% <ø> (-0.13%) ⬇️
...ces/EmbraceIO/Capture/CaptureService+Helpers.swift 100.00% <100.00%> (ø)
...rces/EmbraceIO/Capture/CaptureServiceBuilder.swift 100.00% <100.00%> (ø)
...ests/Capture/UX/View/ViewCaptureServiceTests.swift 100.00% <100.00%> (ø)
...s/EmbraceCoreTests/Public/Embrace+SetupTests.swift 85.71% <ø> (ø)
... and 10 more

... and 3 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@naftaly naftaly force-pushed the acohen/hang-capture-service branch 2 times, most recently from aa6bb0a to 86dc2a7 Compare May 20, 2025 19:28
@naftaly naftaly force-pushed the acohen/hang-capture-service branch from ac29d23 to a107311 Compare May 21, 2025 21:49
@naftaly naftaly force-pushed the acohen/hang-capture-service branch from f501b9d to fa93535 Compare May 29, 2025 16:17
@naftaly naftaly requested a review from ArielDemarco May 29, 2025 20:16
@naftaly naftaly force-pushed the acohen/hang-capture-service branch from 5c3fcc5 to febebe2 Compare June 3, 2025 22:35
@naftaly naftaly force-pushed the acohen/hang-capture-service branch 3 times, most recently from d852558 to 6af2a1b Compare June 19, 2025 22:00
@naftaly naftaly force-pushed the acohen/hang-capture-service branch 2 times, most recently from 1a30db7 to 2225322 Compare June 24, 2025 20:34
@naftaly naftaly force-pushed the acohen/hang-capture-service branch 2 times, most recently from 6667e4c to 3ebda71 Compare July 15, 2025 20:37
@naftaly naftaly force-pushed the acohen/hang-capture-service branch 3 times, most recently from 822f1e7 to 1e9ae37 Compare July 30, 2025 15:45
@naftaly naftaly force-pushed the acohen/hang-capture-service branch 3 times, most recently from ee8093d to 21a23d6 Compare August 6, 2025 15:21
@naftaly naftaly force-pushed the acohen/hang-capture-service branch from 21a23d6 to c3d260d Compare August 8, 2025 19:22
@naftaly naftaly force-pushed the acohen/hang-capture-service branch from c3d260d to a36ccaa Compare August 11, 2025 15:25
@naftaly naftaly force-pushed the acohen/hang-capture-service branch 3 times, most recently from 3333080 to 4e78004 Compare August 19, 2025 14:58
@naftaly naftaly changed the title [WIP] Hang Capture Service Hang Capture Service Aug 19, 2025
@naftaly naftaly marked this pull request as ready for review August 19, 2025 21:15
@naftaly naftaly requested a review from a team as a code owner August 19, 2025 21:15
@naftaly naftaly force-pushed the acohen/hang-capture-service branch 4 times, most recently from 026ee39 to 2274265 Compare August 26, 2025 20:34
@naftaly naftaly requested a review from Copilot August 27, 2025 14:48
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements a hang capture service to detect and report app hangs on iOS. The service monitors the main RunLoop and generates OpenTelemetry spans when hangs exceed Apple's default 0.25 second threshold.

Key changes include:

  • Creates a HangWatchdog that monitors the main RunLoop using CFRunLoopObserver to detect when the thread is blocked
  • Implements HangCaptureService that converts hang events into OpenTelemetry spans with "emb-thread-blockage" naming
  • Adds configuration support with HangLimits to control the number of hangs per session (200) and samples per hang (0)

Reviewed Changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
Tests/TestSupport/Mocks/MockEmbraceConfigurable.swift Adds hangLimits property to mock configuration
Tests/TestSupport/EditableConfig.swift Adds hangLimits property to test configuration
Tests/EmbraceIOTests/CaptureServiceBuilderTests.swift Updates tests to include HangCaptureService in service count validation
Tests/EmbraceCoreTests/Capture/Hang/HangCaptureServiceTests.swift Comprehensive test suite for hang detection functionality
Sources/EmbraceIO/Capture/CaptureServiceBuilder.swift Integrates hang watchdog into default capture services
Sources/EmbraceIO/Capture/CaptureService+Helpers.swift Adds helper method to create HangCaptureService
Sources/EmbraceCore/Capture/Hang/Watchdog/NanosecondClock.swift High-precision timing utility for hang measurement
Sources/EmbraceCore/Capture/Hang/Watchdog/HangWatchdog.swift Core hang detection implementation using RunLoop observers
Sources/EmbraceCore/Capture/Hang/HangCaptureService.swift Service that converts hang events to OpenTelemetry spans
Sources/EmbraceCore/Capture/CaptureServices.swift Integrates hang limits configuration and session lifecycle
Sources/EmbraceConfiguration/HangLimits.swift Configuration class for hang capture limits
Sources/EmbraceConfiguration/EmbraceConfigurable/DefaultConfig.swift Adds hangLimits to default configuration
Sources/EmbraceConfiguration/EmbraceConfigurable.swift Adds hangLimits property to configuration protocol
Sources/EmbraceConfigInternal/EmbraceConfigurable/RemoteConfig/RemoteConfigPayload.swift Remote configuration support for hang limits
Sources/EmbraceConfigInternal/EmbraceConfigurable/RemoteConfig.swift Implements hangLimits property for remote config
Sources/EmbraceCommonInternal/Locks/UnfairLock.swift Minor extension reorganization
Sources/EmbraceCaptureService/CaptureService.swift Adds session lifecycle methods to base CaptureService

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@naftaly naftaly force-pushed the acohen/hang-capture-service branch 3 times, most recently from ad0f5d4 to 1231b39 Compare September 2, 2025 16:06
const prAvg = '${{ steps.pr-tests.outputs.pr_avg }}';
const mainAvg = '${{ steps.main-tests.outputs.main_avg }}';
const percentDiff = '${{ steps.comparison.outputs.percent_diff }}';
const absDiff = '${{ steps.comparison.outputs.abs_diff }}';

Check notice

Code scanning / zizmor

code injection via template expansion

code injection via template expansion
const mainAvg = '${{ steps.main-tests.outputs.main_avg }}';
const percentDiff = '${{ steps.comparison.outputs.percent_diff }}';
const absDiff = '${{ steps.comparison.outputs.abs_diff }}';
const result = '${{ steps.comparison.outputs.result }}';

Check notice

Code scanning / zizmor

code injection via template expansion

code injection via template expansion
Comment on lines 114 to 117
- name: Checkout main branch
uses: actions/checkout@v4
with:
ref: main

Check warning

Code scanning / zizmor

credential persistence through GitHub Actions artifacts

credential persistence through GitHub Actions artifacts
- name: Calculate performance comparison
id: comparison
run: |
pr_avg=${{ needs.performance-test-pr.outputs.pr_avg }}

Check notice

Code scanning / zizmor

code injection via template expansion

code injection via template expansion
id: comparison
run: |
pr_avg=${{ needs.performance-test-pr.outputs.pr_avg }}
main_avg=${{ needs.performance-test-main.outputs.main_avg }}

Check notice

Code scanning / zizmor

code injection via template expansion

code injection via template expansion
run: |
pr_avg=${{ needs.performance-test-pr.outputs.pr_avg }}
main_avg=${{ needs.performance-test-main.outputs.main_avg }}
pr_std=${{ needs.performance-test-pr.outputs.pr_std }}

Check notice

Code scanning / zizmor

code injection via template expansion

code injection via template expansion
pr_avg=${{ needs.performance-test-pr.outputs.pr_avg }}
main_avg=${{ needs.performance-test-main.outputs.main_avg }}
pr_std=${{ needs.performance-test-pr.outputs.pr_std }}
main_std=${{ needs.performance-test-main.outputs.main_std }}

Check notice

Code scanning / zizmor

code injection via template expansion

code injection via template expansion
const percentDiff = '${{ steps.comparison.outputs.percent_diff }}';
const absDiff = '${{ steps.comparison.outputs.abs_diff }}';
const result = '${{ steps.comparison.outputs.result }}';
const emoji = '${{ steps.comparison.outputs.emoji }}';

Check notice

Code scanning / zizmor

code injection via template expansion

code injection via template expansion
const absDiff = '${{ steps.comparison.outputs.abs_diff }}';
const result = '${{ steps.comparison.outputs.result }}';
const emoji = '${{ steps.comparison.outputs.emoji }}';
const prStd = '${{ needs.performance-test-pr.outputs.pr_std }}';

Check notice

Code scanning / zizmor

code injection via template expansion

code injection via template expansion
const result = '${{ steps.comparison.outputs.result }}';
const emoji = '${{ steps.comparison.outputs.emoji }}';
const prStd = '${{ needs.performance-test-pr.outputs.pr_std }}';
const mainStd = '${{ needs.performance-test-main.outputs.main_std }}';

Check notice

Code scanning / zizmor

code injection via template expansion

code injection via template expansion
const emoji = '${{ steps.comparison.outputs.emoji }}';
const prStd = '${{ needs.performance-test-pr.outputs.pr_std }}';
const mainStd = '${{ needs.performance-test-main.outputs.main_std }}';
const tStat = '${{ steps.comparison.outputs.t_stat }}';

Check notice

Code scanning / zizmor

code injection via template expansion

code injection via template expansion
const prStd = '${{ needs.performance-test-pr.outputs.pr_std }}';
const mainStd = '${{ needs.performance-test-main.outputs.main_std }}';
const tStat = '${{ steps.comparison.outputs.t_stat }}';
const significant = '${{ steps.comparison.outputs.significant }}';

Check notice

Code scanning / zizmor

code injection via template expansion

code injection via template expansion
@naftaly naftaly force-pushed the acohen/hang-capture-service branch from ac9ac6c to 99639aa Compare September 2, 2025 20:00
@github-actions
Copy link

github-actions bot commented Sep 2, 2025

🚀 Swift Test Performance Report

✅ Performance Improvement

Branch Average Duration Standard Deviation Sample Size
PR Branch 1.553s ±1.418672s 20 runs
Main Branch 1.994s ±1.568736s 20 runs

Performance Impact

  • Absolute Difference: -.441s
  • Relative Change: -22.00%
  • T-statistic: -.932

📊 Difference not statistically significant

Analysis

🎉 Significant improvement detected - Great optimization work!

📊 View detailed statistics
  • Test Configuration: 20 runs per branch with 3 warmup runs
  • Runner: macOS Latest (parallel execution)
  • Swift Version: Latest available
  • Statistical Test: Two-sample t-test approximation
  • Workflow: Swift Test Performance Comparison
  • Run ID: 17414136709

Note: This test uses a reduced sample size (20 runs) for faster CI execution.
For production performance validation, consider running with larger sample sizes.


This comment will be updated on subsequent commits.

naftaly and others added 2 commits September 3, 2025 10:08
Co-Authored-By: Copilot <175728472+Copilot@users.noreply.github.com>
@naftaly naftaly force-pushed the acohen/hang-capture-service branch from ac277f7 to 5521530 Compare September 3, 2025 14:08
@naftaly naftaly merged commit b535c25 into main Sep 3, 2025
20 of 24 checks passed
@naftaly naftaly deleted the acohen/hang-capture-service branch September 3, 2025 21:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants