Skip to content

Refresh blocked by configure: configuration write lock held during all locator.configure() calls #461

@eleanorjboyd

Description

@eleanorjboyd

Summary

apply_configure_options (crates/pet/src/jsonrpc.rs#L661-L719, introduced by #416) holds configuration.write() for the entire for locator in locators.iter() { locator.configure(...) } loop. Previously the lock was released before configuring locators.

Refresh threads take configuration.read() in three places, all of which now block on the configure writer:

Result: when configure and refresh overlap, every per-environment report can block on locator I/O happening inside a different RPC handler. The Python Environments extension team attributes this regression to a pet.refresh p90 +91% on darwin and +44% on win32 in the v1.33 insiders cohort vs v1.30 stable (May 25, 2026 telemetry).

Repro

In a workspace with ≥ 3 workspace folders and ≥ 50 discovered Python envs, issue configure and refresh RPCs concurrently from the client and measure per-event durations. Pre-#416, the two are independent. Post-#416, refresh duration tracks the slowest locator's configure() runtime.

Proposed fix

  1. Add a configure_in_progress: Mutex<()> to Context. Acquire it at the top of apply_configure_options.
  2. Briefly take configuration.write() only to compute next_config and next_generation (without publishing).
  3. Drop the write lock; call locator.configure(&next_config) for each locator with no lock held.
  4. Re-take configuration.write() for a fast publish (state.config = next_config; state.generation = next_generation).
  5. Keep panic::catch_unwind + rollback semantics from fix: serialize configure locator updates #416 unchanged.

This preserves the invariant introduced in #416 (the new generation is only observable after every locator has finished configuring) while restoring concurrency between configure and refresh.

Acceptance criteria

  • Existing test test_configure_publishes_state_after_shared_locators_are_configured still passes (the new generation is only observable after every locator's configure() returns).
  • New test: while one configure is mid-locator.configure() (use a test locator that blocks on a barrier), a concurrent refresh's report_environment calls complete without blocking on the configure thread.
  • New test: two concurrent configure calls serialize (no interleaved locator.configure() invocations).
  • cargo fmt --all + cargo clippy --all -- -D warnings clean.

Context

Filed in response to the Python Environments extension team's May 25, 2026 regression report (PET v1.33 insiders vs v1.30 stable). Combined with the Hatch I/O amplifier tracked separately, this is the leading hypothesis for the observed cross-platform p90 doubling. Related PRs: #416 (root cause), #460 (Hatch amplifier).

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugIssue identified by VS Code Team member as probable bugimportantIssue identified as high-priority

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions