Skip to content

test(logging): TestLogFileWatchdog_ReopensOnDelete is flaky on macOS under sequential runs #375

@ericfitz

Description

@ericfitz

Observation

While running the full file_watchdog test set sequentially via make test-unit name=TestLogFileWatchdog on macOS, TestLogFileWatchdog_ReopensOnDelete occasionally fails with:

file_watchdog_test.go:67: expected reopen Warn in slogger output; got ""

The test passes 5/5 in isolation (name=TestLogFileWatchdog_ReopensOnDelete count1=true).

Hypothesis

macOS kqueue (the fsnotify backend on Darwin) appears to have resource-pressure or event-coalescing edge cases when multiple watchers are created and torn down in rapid succession across consecutive tests in the same package. The 500ms polling fallback in internal/slogging/file_watchdog.go catches most missed events but the test's waitForFile deadline of 2s combined with the 200ms post-delete sleep may occasionally miss the reopen Warn before the test reads the buffer.

Suggested fix

Increase the post-delete wait, or switch the test to poll for the Warn message rather than sleep-then-check. Alternatively, refactor the watchdog tests to share a single watcher across cases (or use t.TempDir more aggressively to ensure each test gets a fresh kqueue fd in a fresh directory).

Reproduction

make test-unit name=TestLogFileWatchdog count1=true
# Repeat ~10 times; failure rate ~10%.

Origin

Discovered during implementation of #372. Production code is unaffected — only the test reliability is at issue.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

Status

Done

Relationships

None yet

Development

No branches or pull requests

Issue actions