Skip to content

fix(cluster-labels): single-flight guard for vocab cache builds#274

Merged
lstein merged 1 commit into
masterfrom
lstein/fix/vocab-build-guard
May 23, 2026
Merged

fix(cluster-labels): single-flight guard for vocab cache builds#274
lstein merged 1 commit into
masterfrom
lstein/fix/vocab-build-guard

Conversation

@lstein
Copy link
Copy Markdown
Owner

@lstein lstein commented May 23, 2026

Summary

  • On a cold vocab cache, concurrent /cluster_labels and /image_label requests both dispatch get_or_build_vocab_embeddings through asyncio.to_thread. Every caller saw cache_path missing, loaded the encoder, and re-encoded the full vocabulary before the first writer's atomic .tmp → rename landed — observed as 5+ duplicate "Building vocab embeddings cache at …" lines per cold start.
  • Added a per-encoder threading.Lock registry around the build path in cluster_labels.py, with a re-check of _read_cached_vocab inside the lock so the second waiter picks up the first builder's atomic rename instead of redundantly re-encoding.
  • Per-spec (rather than global) locks preserve the existing ability for two albums with different encoders to build in parallel; only redundant builds of the same encoder are serialized.

Test plan

  • pytest tests/backend/test_cluster_labels.py — 38/38 pass, including a new test_concurrent_builds_are_serialized that spawns 4 threads against a gated fake encoder and asserts encode_calls == 1.
  • ruff check photomap/backend/cluster_labels.py tests/backend/test_cluster_labels.py clean.
  • Manual: open the app with a cleared ~/.cache/photomap/cluster_vocab/ and trigger several concurrent slide-drawer opens; log should show one "Building vocab embeddings cache" line followed by one "Vocab embeddings cached".

🤖 Generated with Claude Code

Concurrent /cluster_labels and /image_label requests both dispatch
get_or_build_vocab_embeddings through asyncio.to_thread, so on a cold
cache every caller would re-load the encoder and re-encode the full
vocabulary before the first writer's atomic rename landed. Add a
per-encoder threading.Lock with a re-check inside the lock so the
second waiter picks up the first builder's output instead.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@lstein lstein merged commit 85dfd51 into master May 23, 2026
5 checks passed
@lstein lstein deleted the lstein/fix/vocab-build-guard branch May 23, 2026 20:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant