Skip to content

Resilience (ADR 0015): bound blocking adapter IO off the async executor #3

@ogrodev

Description

@ogrodev

Deferred from PR #2 review (raised by both CodeRabbit and cubic). Tracked here so it lands as one uniform pass rather than a divergent spot-fix.

Problem

Several adapter calls run blocking IO directly on the Tauri/tokio async executor, with no per-probe/per-fetch timeout. If a child process or fs call stalls, it ties up a runtime worker thread.

  • crates/adapters/src/sources.rsLocalSourceProbe::is_present (async) calls ClaudeCredentials::is_present()keychain_present() (sync std::process::Command running security) and a std::fs stat.
  • crates/adapters/src/claude.rsClaudeCredentials::load() (async) calls std::fs::read_to_string and read_keychain() (sync security -w, which can raise a Keychain ACL prompt and block on user input).
  • crates/adapters/src/claude.rsdetect_user_agent() spawns claude --version synchronously, and claude_strategy() calls it on every fetch (~every 60s).

The metadata probe (security without -w) is the low-risk case (no prompt, returns promptly); the -w read on the consented load path is the prompt-capable one.

Why it matters

ADR 0015 mandates a per-probe timeout (§4) and a failure gate as core resilience requirements; today neither the probe nor the usage fetch is bounded. This is pre-existing and codebase-wide, not specific to one call — so it should be fixed uniformly.

Proposed work

  • Route each blocking subprocess/fs call through tokio::task::spawn_blocking (or tokio::process::Command) and wrap it in tokio::time::timeout, killing the child on timeout.
  • Apply uniformly across the presence probe and the usage fetch path (load, read_keychain, detect_user_agent).
  • Cache detect_user_agent() instead of spawning claude --version on every refresh tick.
  • Cover with tests using the injected Clock per ADR 0015 (deterministic, no real waiting).

Acceptance

  • No blocking subprocess/fs call is awaited directly on the async executor in adapters.
  • Every external probe/fetch is bounded by a timeout; a hung child cannot starve the runtime.
  • claude --version is detected at most once per process (or refreshed on a long cadence), not per fetch.

Refs: docs/adr/0015-resilience-patterns.md, PR #2 review threads (sources.rs:22, claude.rs:29).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions