Skip to content

DailyStatsActor: move SQLite EnsureTable off PreStart hot path #925

@Aaronontheweb

Description

@Aaronontheweb

Summary

DailyStatsActor.PreStart() performs synchronous SQLite I/O — connection
open + two CREATE TABLE IF NOT EXISTS statements + commit (see
src/Netclaw.Daemon/Gateway/DailyStatsActor.cs:105-110, 391-435). The
`QuerySkillUsageStats` handler opens a second SQLite connection. On
Windows CI cold-start, AV / Defender scanning the freshly-created `.db`
file plus first-time fsync on COMMIT regularly pushes the round-trip
beyond the test's 3-second `Ask` timeout.

Symptom

Netclaw.Daemon.Tests.Gateway.DailyStatsActorTests.QuerySkillUsageStats_returns_groupable_rows_for_each_method
flakes on Windows CI with `Akka.Actor.AskTimeoutException : Timeout
after 3.00 seconds` at `DailyStatsActorTests.cs:44`. Linux passes
consistently. Same SHA can pass and fail on consecutive reruns —
genuinely flaky, not deterministic.

Observed once on PR #916 rerun
(`https://github.com/netclaw-dev/netclaw/actions/runs/25523848615\`).

Root cause analysis

Both akka-net-specialist and dotnet-concurrency-specialist agents
agree: this is a latency issue, not a race. The actor's mailbox FIFO
guarantees the four `Tell`s before the `Ask` are correctly
ordered. The latency comes from disk I/O on the actor's startup path,
not from any synchronization race.

Latency contributors on Windows CI cold start:

  • `PreStart`: SQLite connection open + transaction + 2 CREATE TABLEs
    • COMMIT (fsync). First-time file creation triggers AV scan.
  • Query handler: a second SQLite connection open + SELECT, with
    Microsoft.Data.Sqlite pool keyed by connection string (which is
    unique per test, so pooling does not help).
  • xUnit parallelism on a contended Windows runner amplifies both.

Proposed fixes (priority order)

  1. Move table creation off the actor's startup path. Run
    `EnsureTable()` once at `ActorSystem` setup or as part of the
    existing `SchemaMigrationHostedService`. The actor should not
    block its mailbox on schema DDL. This eliminates the latency at its
    source for both tests and production startup.

  2. Pre-warm in tests. Open and close a SQLite connection plus run
    the CREATE TABLE statements before `ActorOf` in the test fixture.
    Test-only mitigation; does not help production cold start.

  3. Cheap drive-by: bump the test's `Ask` timeout from 3s to 10s.
    Hides the slow path but does not fix it. Worth shipping immediately
    to unblock CI; do (1) as the proper fix.

Acceptance criteria

  • DailyStatsActor's `PreStart` no longer performs SQLite DDL.
  • The test passes with a tight (≤5s) `Ask` timeout on Windows CI
    reruns.
  • Schema initialization runs deterministically before the actor
    system starts processing messages.

Reference

Metadata

Metadata

Assignees

No one assigned

    Labels

    reliabilityRetries, resilience, graceful degradation

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions