Skip to content

test: stabilize flaky tests#1614

Merged
jpnurmi merged 17 commits into
masterfrom
jpnurmi/test/flaky-timestamps
Mar 31, 2026
Merged

test: stabilize flaky tests#1614
jpnurmi merged 17 commits into
masterfrom
jpnurmi/test/flaky-timestamps

Conversation

@jpnurmi
Copy link
Copy Markdown
Collaborator

@jpnurmi jpnurmi commented Mar 31, 2026

Fixes two distinct issues with assert_timestamp in the test suite:

1. Hard-coded 10s threshold too tight for busy CI

assert_timestamp checked that timestamps were less than 10 seconds old, but tests can exceed that in busy CI environments (e.g. test_breakpad_oom_stdout took 10.6s on the Windows ClangCL CI runner):

FAILED tests/test_integration_stdout.py::test_breakpad_oom_stdout - assert 10.609089 < 10

Replace the arbitrary threshold with a before/after approach: an autouse fixture records the test start time, and the assertion verifies the timestamp falls between test start and now. This eliminates timing sensitivity entirely — it doesn't matter how long the test takes, while still verifying that the timestamp is sensible.

2. Host-device clock skew on Android

The test binary runs on the Android device but pytest runs on the host, so their clocks can differ. Even though CI syncs the emulator clock before test execution, the clock can drift during the run (e.g. NTP correction). Measure the offset per test via adb shell date +%s and adjust timestamps accordingly.

Other changes

  • Extract a shared adb() helper into tests/__init__.py to replace duplicated helpers across test files.
  • Revert flaky inproc & breakpad OOM stdout tests added in a848058 (keep native & crashpad OOM tests).

jpnurmi and others added 2 commits March 31, 2026 12:15
`assert_timestamp` checked that timestamps were less than 10 seconds
old, but tests can exceed that in busy CI environments (e.g.
`test_breakpad_oom_stdout` took 10.6s on Windows CI). Replace the
hard-coded threshold with a before/after approach: an autouse fixture
records the test start time, and the assertion verifies the timestamp
falls between test start and now. This eliminates timing sensitivity
entirely — it doesn't matter how long the test takes, while still
verifying that the timestamp is sensible.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The test binary runs on the Android device but pytest runs on the
host, so their clocks can differ. Measure the offset once per session
via `adb shell date +%s` and expose `tests.now()` that adjusts for it.

Also extract a shared `adb()` helper into `tests/__init__.py` to
replace duplicated helpers in test_dotnet_signals and test_inproc_stress.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@jpnurmi jpnurmi force-pushed the jpnurmi/test/flaky-timestamps branch 2 times, most recently from fe944fa to 891cabb Compare March 31, 2026 11:19
Comment thread tests/assertions.py Outdated
The "in the future" check compared device timestamps against raw host
time, while the "in the past" check correctly used the offset-adjusted
test_start. Apply the clock offset to both bounds.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
jpnurmi and others added 3 commits March 31, 2026 13:46
The emulator clock can drift after initial sync (e.g. NTP correction),
so measuring the offset once at session start is not reliable enough.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When ANDROID_API is set but no device is connected, `adb shell date`
returns empty output. Fall back to zero offset instead of crashing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
is_android from conditions.py is the raw ANDROID_API env string, not
a bool. When the skipif expression evaluates to a string, pytest tries
to eval it as Python code causing a SyntaxError.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Comment thread tests/conftest.py
jpnurmi and others added 2 commits March 31, 2026 15:04
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The OOM tests added in a848058 cause flaky failures (empty output,
proxy test interference). Keep the native & crashpad OOM tests.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@jpnurmi jpnurmi changed the title test: stabilize flaky timestamp assertions test: stabilize flaky tests Mar 31, 2026
@supervacuus
Copy link
Copy Markdown
Collaborator

supervacuus commented Mar 31, 2026

Revert flaky inproc & breakpad OOM stdout tests added in a848058 (keep native & crashpad OOM tests).

Just saw this:

_______________________________ test_native_oom _______________________________

cmake = <bound method CMake.compile of <tests.cmake.CMake object at 0x000001DFE7C2CA70>>
httpserver = <HTTPServer host=localhost port=49541>

    @pytest.mark.skipif(not has_oom, reason="OOM test unreliable in this environment")
    def test_native_oom(cmake, httpserver):
        """Test OOM crash capture with native backend"""
        tmp_path = cmake(["sentry_example"], {"SENTRY_BACKEND": "native"})
    
        httpserver.expect_request("/api/123456/envelope/").respond_with_data("OK")
    
        run_crash(
            tmp_path,
            "sentry_example",
            ["log", "stdout", "oom"],
            env=dict(os.environ, SENTRY_DSN=make_dsn(httpserver)),
        )
    
        time.sleep(2)
    
        run(
            tmp_path,
            "sentry_example",
            ["log", "no-setup"],
            env=dict(os.environ, SENTRY_DSN=make_dsn(httpserver)),
        )
    
>       assert len(httpserver.log) >= 1
E       assert 0 >= 1
E        +  where 0 = len([])
E        +    where [] = <HTTPServer host=localhost port=49541>.log

tests\test_integration_native.py:120: AssertionError

here: https://github.com/getsentry/sentry-native/actions/runs/23800241648/job/69358394729#step:30:36751
and here: https://github.com/getsentry/sentry-native/actions/runs/23800368814/job/69358858226#step:30:42018

Maybe the OOM tests are generally flaky.

@jpnurmi
Copy link
Copy Markdown
Collaborator Author

jpnurmi commented Mar 31, 2026

getsentry/sentry-native/actions/runs/23800241648/job/69358394729#step:30:36751

Maybe the OOM tests are generally flaky.

D*mn, even with native. Sorry about that! We've been running similar tests with Crashpad in sentry-unreal without any issues. 🙁

Edit: or maybe the native daemon just needs more tweaks to work reliably under memory pressure... ==> native skipped for now

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@jpnurmi jpnurmi requested a review from supervacuus March 31, 2026 14:34
@supervacuus
Copy link
Copy Markdown
Collaborator

D*mn, even with native. Sorry about that! We've been running similar tests with Crashpad in sentry-unreal without any issues. 🙁

no need to be sorry, i am actually happy when test fail sometimes... not all tests are regression tests, but our red/green fetish completely eliminated any nuance.

Edit: or maybe the native daemon just needs more tweaks to work reliably under memory pressure... ==> native skipped for now

i am not sure this is backend specific, at least not in the handling sense (but could imagine in the timing sense).

@supervacuus
Copy link
Copy Markdown
Collaborator

jpnurmi and others added 4 commits March 31, 2026 16:48
choco install returns 0 even when the install fails. Add a
sccache --version check to catch installation failures early.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Chocolatey can fail due to transient network issues. Add
continue-on-error to the CI step, and guard sccache calls in
conftest.py with shutil.which so tests still run without it.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Autofix Details

Bugbot Autofix prepared a fix for the issue found in the latest run.

  • ✅ Fixed: sccache continue-on-error won't degrade gracefully without cmake guard
    • cmake_configure now only sets sccache compiler launcher options when both USE_SCCACHE is set and the sccache binary is available in PATH.

Create PR

Or push these changes by commenting:

@cursor push fef1bdf1ab
Preview (fef1bdf1ab)
diff --git a/tests/cmake.py b/tests/cmake.py
--- a/tests/cmake.py
+++ b/tests/cmake.py
@@ -130,7 +130,8 @@
     __tracebackhide__ = True
 
     options = dict(options)
-    if os.environ.get("USE_SCCACHE"):
+    has_sccache = os.environ.get("USE_SCCACHE") and shutil.which("sccache")
+    if has_sccache:
         options.update(
             {
                 "CMAKE_C_COMPILER_LAUNCHER": "sccache",

This Bugbot Autofix run was free. To enable autofix for future PRs, go to the Cursor dashboard.

Comment thread .github/workflows/ci.yml
jpnurmi and others added 2 commits March 31, 2026 17:17
The native backend's daemon sends the crash envelope directly via HTTP
during crash processing, so there's no need to restart the app. Wait
for the daemon's request instead of restarting and hoping for a resend.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Move has_sccache to conditions.py and use it in cmake.py to skip
compiler launcher options when sccache is missing. The Ninja generator
is still used when USE_SCCACHE is set regardless.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Comment thread tests/test_integration_native.py
Add a restart run inside the wait block as a safety net to send any
remaining crash data the daemon may not have sent.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Comment thread tests/cmake.py
@jpnurmi
Copy link
Copy Markdown
Collaborator Author

jpnurmi commented Mar 31, 2026

Great, now the whole infra broke. What a day... 😅

Cloning into 'kcov'...
remote: Internal Server Error
fatal: unable to access 'https://github.com/SimonKagstrom/kcov.git/': The requested URL returned error: 500
fetch https://dl-cdn.alpinelinux.org/alpine/v3.21/main/x86_64/APKINDEX.tar.gz
WARNING: updating https://dl-cdn.alpinelinux.org/alpine/v3.21/main: temporary error (try again later)

The reinit stress tests spin 8 threads in tight loops, which causes
extreme slowdown under valgrind. Adding sched_yield() lets the OS
scheduler (and valgrind's thread serializer) make progress without
throttling the test's throughput.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@jpnurmi jpnurmi merged commit bcd4633 into master Mar 31, 2026
53 checks passed
@jpnurmi jpnurmi deleted the jpnurmi/test/flaky-timestamps branch March 31, 2026 17:21
BernhardMarconato pushed a commit to elgatosf/sentry-native that referenced this pull request Apr 21, 2026
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants