fix(session-log): fix Windows test flake from dropped log lines#1246
Merged
Aaronontheweb merged 1 commit intoMay 31, 2026
Merged
Conversation
SessionLogActorTests.Successive_dispatchers_append_to_same_canonical_file flaked on Windows CI: the per-line open→write→close writer triggers Defender scan-on-close, whose transient write-excluding handle blocks the next append. When AppendLine exhausted its retry budget the actor dropped the line (logged a warning), so the polling assertion spun to the AwaitAssert deadline and failed with an opaque "substring not found". - SessionLogFile.AppendLine: the retry budget claimed 10/20/40/80ms (150ms) but the catch filter `attempt < MaxAttempts - 1` only ran 3 sleeps (70ms) and never used the 4th entry. Fix the off-by-one (one retry per backoff entry), extend the schedule to ~585ms, and add jitter so retries don't phase-lock with a periodic scanner. Bounded blocking on an external OS resource (the AV handle) is the legitimate use of Thread.Sleep here. - SessionLogActorTests: wrap the test in EventFilter.Warning(contains: "Dropped") .ExpectAsync(0, ...) so a dropped audit line fails fast with the real cause instead of a ~17s opaque timeout. A deeper follow-up (a persistent per-actor append handle, opened in PreStart and closed in PostStop) would eliminate the per-line scan-on-close churn entirely; this change keeps the writer's shape and just makes the retry cover the spike.
This was referenced May 31, 2026
SessionLogFile: replace per-line open/close + Thread.Sleep retry with a persistent file handle
#1249
Closed
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What's wrong
One test —
SessionLogActorTests.Successive_dispatchers_append_to_same_canonical_file— randomly fails on the Windows CI runner. It never fails on Linux or macOS.Why
The session log is written one line at a time: open the file, write a line, close it. On Windows, antivirus (Defender) scans a file right after it's closed, and while scanning it briefly locks the file. If the next log line tries to write during that scan, the write is blocked.
The writer retries when that happens, but two problems combined:
So under a slow/busy Windows runner the line got dropped, and the test then waited ~17 seconds for a line that was never coming and failed with a confusing "substring not found".
(Linux and macOS don't lock files this way, which is why it only happens on Windows.)
The fix
Not in this PR
The cleaner long-term fix is to keep the log file open for the life of the session instead of opening and closing it for every line — that avoids the antivirus collision entirely. Left out here to keep this change small and low-risk; happy to do it separately.
Testing
slopwatchand copyright-header checks clean; no baseline changes.