test: stabilize flaky tests#1614
Conversation
`assert_timestamp` checked that timestamps were less than 10 seconds old, but tests can exceed that in busy CI environments (e.g. `test_breakpad_oom_stdout` took 10.6s on Windows CI). Replace the hard-coded threshold with a before/after approach: an autouse fixture records the test start time, and the assertion verifies the timestamp falls between test start and now. This eliminates timing sensitivity entirely — it doesn't matter how long the test takes, while still verifying that the timestamp is sensible. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The test binary runs on the Android device but pytest runs on the host, so their clocks can differ. Measure the offset once per session via `adb shell date +%s` and expose `tests.now()` that adjusts for it. Also extract a shared `adb()` helper into `tests/__init__.py` to replace duplicated helpers in test_dotnet_signals and test_inproc_stress. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
fe944fa to
891cabb
Compare
The "in the future" check compared device timestamps against raw host time, while the "in the past" check correctly used the offset-adjusted test_start. Apply the clock offset to both bounds. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The emulator clock can drift after initial sync (e.g. NTP correction), so measuring the offset once at session start is not reliable enough. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When ANDROID_API is set but no device is connected, `adb shell date` returns empty output. Fall back to zero offset instead of crashing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
is_android from conditions.py is the raw ANDROID_API env string, not a bool. When the skipif expression evaluates to a string, pytest tries to eval it as Python code causing a SyntaxError. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The OOM tests added in a848058 cause flaky failures (empty output, proxy test interference). Keep the native & crashpad OOM tests. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Just saw this: here: https://github.com/getsentry/sentry-native/actions/runs/23800241648/job/69358394729#step:30:36751 Maybe the OOM tests are generally flaky. |
D*mn, even with native. Sorry about that! We've been running similar tests with Crashpad in sentry-unreal without any issues. 🙁 Edit: or maybe the native daemon just needs more tweaks to work reliably under memory pressure... ==> native skipped for now |
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
no need to be sorry, i am actually happy when test fail sometimes... not all tests are regression tests, but our red/green fetish completely eliminated any nuance.
i am not sure this is backend specific, at least not in the handling sense (but could imagine in the timing sense). |
choco install returns 0 even when the install fails. Add a sccache --version check to catch installation failures early. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Chocolatey can fail due to transient network issues. Add continue-on-error to the CI step, and guard sccache calls in conftest.py with shutil.which so tests still run without it. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This reverts commit 76173ec.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Autofix Details
Bugbot Autofix prepared a fix for the issue found in the latest run.
- ✅ Fixed: sccache
continue-on-errorwon't degrade gracefully without cmake guard- cmake_configure now only sets sccache compiler launcher options when both USE_SCCACHE is set and the sccache binary is available in PATH.
Or push these changes by commenting:
@cursor push fef1bdf1ab
Preview (fef1bdf1ab)
diff --git a/tests/cmake.py b/tests/cmake.py
--- a/tests/cmake.py
+++ b/tests/cmake.py
@@ -130,7 +130,8 @@
__tracebackhide__ = True
options = dict(options)
- if os.environ.get("USE_SCCACHE"):
+ has_sccache = os.environ.get("USE_SCCACHE") and shutil.which("sccache")
+ if has_sccache:
options.update(
{
"CMAKE_C_COMPILER_LAUNCHER": "sccache",This Bugbot Autofix run was free. To enable autofix for future PRs, go to the Cursor dashboard.
The native backend's daemon sends the crash envelope directly via HTTP during crash processing, so there's no need to restart the app. Wait for the daemon's request instead of restarting and hoping for a resend. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Move has_sccache to conditions.py and use it in cmake.py to skip compiler launcher options when sccache is missing. The Ninja generator is still used when USE_SCCACHE is set regardless. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add a restart run inside the wait block as a safety net to send any remaining crash data the daemon may not have sent. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
|
Great, now the whole infra broke. What a day... 😅 |
The reinit stress tests spin 8 threads in tight loops, which causes extreme slowdown under valgrind. Adding sched_yield() lets the OS scheduler (and valgrind's thread serializer) make progress without throttling the test's throughput. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>


Fixes two distinct issues with
assert_timestampin the test suite:1. Hard-coded 10s threshold too tight for busy CI
assert_timestampchecked that timestamps were less than 10 seconds old, but tests can exceed that in busy CI environments (e.g.test_breakpad_oom_stdouttook 10.6s on the Windows ClangCL CI runner):Replace the arbitrary threshold with a before/after approach: an autouse fixture records the test start time, and the assertion verifies the timestamp falls between test start and now. This eliminates timing sensitivity entirely — it doesn't matter how long the test takes, while still verifying that the timestamp is sensible.
2. Host-device clock skew on Android
The test binary runs on the Android device but pytest runs on the host, so their clocks can differ. Even though CI syncs the emulator clock before test execution, the clock can drift during the run (e.g. NTP correction). Measure the offset per test via
adb shell date +%sand adjust timestamps accordingly.Other changes
adb()helper intotests/__init__.pyto replace duplicated helpers across test files.