Summary
DailyStatsActor.PreStart() performs synchronous SQLite I/O — connection
open + two CREATE TABLE IF NOT EXISTS statements + commit (see
src/Netclaw.Daemon/Gateway/DailyStatsActor.cs:105-110, 391-435). The
`QuerySkillUsageStats` handler opens a second SQLite connection. On
Windows CI cold-start, AV / Defender scanning the freshly-created `.db`
file plus first-time fsync on COMMIT regularly pushes the round-trip
beyond the test's 3-second `Ask` timeout.
Symptom
Netclaw.Daemon.Tests.Gateway.DailyStatsActorTests.QuerySkillUsageStats_returns_groupable_rows_for_each_method
flakes on Windows CI with `Akka.Actor.AskTimeoutException : Timeout
after 3.00 seconds` at `DailyStatsActorTests.cs:44`. Linux passes
consistently. Same SHA can pass and fail on consecutive reruns —
genuinely flaky, not deterministic.
Observed once on PR #916 rerun
(`https://github.com/netclaw-dev/netclaw/actions/runs/25523848615\`).
Root cause analysis
Both akka-net-specialist and dotnet-concurrency-specialist agents
agree: this is a latency issue, not a race. The actor's mailbox FIFO
guarantees the four `Tell`s before the `Ask` are correctly
ordered. The latency comes from disk I/O on the actor's startup path,
not from any synchronization race.
Latency contributors on Windows CI cold start:
- `PreStart`: SQLite connection open + transaction + 2 CREATE TABLEs
- COMMIT (fsync). First-time file creation triggers AV scan.
- Query handler: a second SQLite connection open + SELECT, with
Microsoft.Data.Sqlite pool keyed by connection string (which is
unique per test, so pooling does not help).
- xUnit parallelism on a contended Windows runner amplifies both.
Proposed fixes (priority order)
-
Move table creation off the actor's startup path. Run
`EnsureTable()` once at `ActorSystem` setup or as part of the
existing `SchemaMigrationHostedService`. The actor should not
block its mailbox on schema DDL. This eliminates the latency at its
source for both tests and production startup.
-
Pre-warm in tests. Open and close a SQLite connection plus run
the CREATE TABLE statements before `ActorOf` in the test fixture.
Test-only mitigation; does not help production cold start.
-
Cheap drive-by: bump the test's `Ask` timeout from 3s to 10s.
Hides the slow path but does not fix it. Worth shipping immediately
to unblock CI; do (1) as the proper fix.
Acceptance criteria
- DailyStatsActor's `PreStart` no longer performs SQLite DDL.
- The test passes with a tight (≤5s) `Ask` timeout on Windows CI
reruns.
- Schema initialization runs deterministically before the actor
system starts processing messages.
Reference
Summary
DailyStatsActor.PreStart()performs synchronous SQLite I/O — connectionopen + two
CREATE TABLE IF NOT EXISTSstatements + commit (seesrc/Netclaw.Daemon/Gateway/DailyStatsActor.cs:105-110, 391-435). The`QuerySkillUsageStats` handler opens a second SQLite connection. On
Windows CI cold-start, AV / Defender scanning the freshly-created `.db`
file plus first-time fsync on COMMIT regularly pushes the round-trip
beyond the test's 3-second `Ask` timeout.
Symptom
Netclaw.Daemon.Tests.Gateway.DailyStatsActorTests.QuerySkillUsageStats_returns_groupable_rows_for_each_methodflakes on Windows CI with `Akka.Actor.AskTimeoutException : Timeout
after 3.00 seconds` at `DailyStatsActorTests.cs:44`. Linux passes
consistently. Same SHA can pass and fail on consecutive reruns —
genuinely flaky, not deterministic.
Observed once on PR #916 rerun
(`https://github.com/netclaw-dev/netclaw/actions/runs/25523848615\`).
Root cause analysis
Both akka-net-specialist and dotnet-concurrency-specialist agents
agree: this is a latency issue, not a race. The actor's mailbox FIFO
guarantees the four `Tell`s before the `Ask` are correctly
ordered. The latency comes from disk I/O on the actor's startup path,
not from any synchronization race.
Latency contributors on Windows CI cold start:
Microsoft.Data.Sqlite pool keyed by connection string (which is
unique per test, so pooling does not help).
Proposed fixes (priority order)
Move table creation off the actor's startup path. Run
`EnsureTable()` once at `ActorSystem` setup or as part of the
existing `SchemaMigrationHostedService`. The actor should not
block its mailbox on schema DDL. This eliminates the latency at its
source for both tests and production startup.
Pre-warm in tests. Open and close a SQLite connection plus run
the CREATE TABLE statements before `ActorOf` in the test fixture.
Test-only mitigation; does not help production cold start.
Cheap drive-by: bump the test's `Ask` timeout from 3s to 10s.
Hides the slow path but does not fix it. Worth shipping immediately
to unblock CI; do (1) as the proper fix.
Acceptance criteria
reruns.
system starts processing messages.
Reference