feat(llm): extend per-tenant BYO to embeddings — mandatory tenant OpenAI key, gated enqueue, usage tracking#295
Merged
Merged
Conversation
…nAI key, gated enqueue, usage tracking Closes the operator-funded embedding-spend gap: POST /api/v1/articles (role :agent, the lowest role) fired the embedding pipeline on every create/update/ publish using a GLOBAL operator OpenAI key with no tenant scoping, cap, or usage record — a single agent key could loop it to run up the operator's bill. Mandatory BYO for embeddings (mirrors #294's Anthropic BYO exactly): - Schema/migration: add encrypted embedding_api_key (Cloak Vault.Binary, redact: true, never cast — set via put_change) + free-form embedding_model to tenant_llm_settings (no new table; existing RLS covers the columns). - Context: Loopctl.Llm.resolve(tenant_id, :embedding) resolves the SEPARATE embedding key + model (default text-embedding-3-small); has_embedding_key?/1; :embedding recorded in the shared llm_usage_events ledger (input=prompt_tokens, output=0); usage_summary groups it automatically. - EmbeddingClient.generate_embedding/2 now takes tenant_id, resolves the TENANT's key (Authorization: Bearer), records an :embedding usage event, and returns {:error, :no_api_key} for a keyless tenant (no provider call, no operator key). tenant_id threaded through the behaviour + every caller (worker, proposal gate, knowledge search paths, search controller) + the Mox mock. - Enforcement: ArticleEmbeddingWorker gates on the client's {:error, :no_api_key} and cleanly {:discard, {:no_embedding_key, id}} + telemetry — no crash, no retry, no operator-key fallback. Article creation is never blocked; a keyless tenant's article is simply not vector-searchable until a key is configured. The keyless case does NOT trip the global embedding circuit breaker. - Global operator embedding key removed from runtime.exs :embedding_provider (only base_url/model defaults remain). - Config API + MCP: PATCH/GET /tenants/me/llm-config and set_llm_config/llm_config handle embedding_api_key + embedding_model (role :user, key never returned — has_embedding_key + last-4 hint only). OpenApiSpex schemas updated; mcp-server bumped 2.31.1 -> 2.32.0 + CHANGELOG. Security: key encrypted at rest, never logged (verified across every branch), never serialized/audited with its value (embedding_key_set audit carries only a hint), RLS-covered, strictly tenant-scoped resolve, config at role :user, filter_parameters already redacts embedding_api_key (api_key substring). Tests: resolve(:embedding) + defaults + isolation, encryption-at-rest, audit, usage ledger; worker mandatory-BYO discard; real EmbeddingClient key resolution + usage + cross-tenant via Req.Test; config API set/leak/role/isolation. Full precommit green (compile --warnings-as-errors, format, credo --strict, dialyzer, 3629 tests) + 73 mcp node tests.
mkreyman
added a commit
that referenced
this pull request
Jul 4, 2026
…aker, route worker/proposal-gate through it, sanitize provider errors, permanent-error/idempotent embed, keyless degrade Addresses the 15 BYO-introduced regressions from the enhanced review on #295. Circuit-breaker cluster: - The embedding circuit breaker was process-GLOBAL (keyed only by :failures / :circuit_open_until). Now keyed PER-TENANT ({tenant_id, :failures} / :window_start / :open_until) so a failing tenant only degrades ITSELF — one tenant's 401/429/timeout can no longer open the shared breaker and degrade every other tenant's semantic search (CRIT #1). Failures are CLASSIFIED: only 5xx / transport / timeout / crash count; 4xx (401/403/429) and :no_api_key are per-tenant credential/quota problems and are exempt. Increment is now atomic via :ets.update_counter (MED #13). - ArticleEmbeddingWorker and ProposalGate no longer call the client directly; both route through the guarded Knowledge.generate_embedding/3 so circuit-open is respected AND contributed to (HIGH #4). ProposalGate ran synchronously in the HTTP write path — it now fast-fails on outage instead of hammering a dead provider (LB-timeout risk). Provider-error-body key-leak cluster (one shared helper, both vendors): - New Loopctl.Llm.ProviderError.sanitize/1 + log_tag/1 drop any provider response body (which echoes a masked key fragment, e.g. "Incorrect API key provided: sk-...ZXY") — keeping only the status. Applied in EmbeddingClient.post (HIGH #3) and Anthropic.post (MED #11), and ProposalGate now logs a value-free tag instead of inspecting the raw tuple (CRIT #2). Stops the fragment reaching logs and oban_jobs.errors (which is surfaced to tenants via list_extraction_errors). Retry / idempotency / classification: - Loopctl.Llm.permanent_provider_error?/1: a 4xx (except 408/429) is permanent; ArticleEmbeddingWorker {:discard, {:embedding_permanent_error, _}}s a bad/revoked-key embed instead of burning retries (HIGH #5). - New articles.embedding_content_hash column (migration) records the SHA-256 of the embedded text; the worker skips re-calling the paid provider on retry when the stored hash matches the current content (MED #12). Keyless UX + audit + timeout + schema: - mode=semantic degrades to keyword-only (fallback: true) for a keyless tenant instead of 503 (MED #8). - Llm.record_blocked(tenant_id, :embedding) is emitted from the worker skip and ProposalGate's fall-open, matching the Anthropic audit trail (MED #9). - The embedding client's single-attempt budget (4s, no client retries) is kept strictly below Knowledge's 5s Task.yield so a slow-but-valid embed isn't killed-then-miscounted (MED #10). - "embedding" added to the LlmUsageResponse operation enum (MED #7). Tests: per-tenant breaker (:no_api_key + 4xx exempt; 5xx opens; the two-tenant no-cross-degrade repro); permanent-error discard vs transient retry; idempotent skip on re-run; keyless semantic degrade; success-path no-key-leak; a realistic OpenAI 401 body dropped from the error term + logs; skip telemetry asserted. Full precommit green (compile --warnings-as-errors, format, credo --strict, dialyzer 0, 3638 tests / 0 failures) + 73 mcp node tests.
mkreyman
added a commit
that referenced
this pull request
Jul 4, 2026
…r-classed circuit breaker, provider-error sanitizer, permanent-error/idempotent embed, keyless degrade (#296) * fix(embeddings): tenant-scope + error-class the embedding circuit breaker, route worker/proposal-gate through it, sanitize provider errors, permanent-error/idempotent embed, keyless degrade Addresses the 15 BYO-introduced regressions from the enhanced review on #295. Circuit-breaker cluster: - The embedding circuit breaker was process-GLOBAL (keyed only by :failures / :circuit_open_until). Now keyed PER-TENANT ({tenant_id, :failures} / :window_start / :open_until) so a failing tenant only degrades ITSELF — one tenant's 401/429/timeout can no longer open the shared breaker and degrade every other tenant's semantic search (CRIT #1). Failures are CLASSIFIED: only 5xx / transport / timeout / crash count; 4xx (401/403/429) and :no_api_key are per-tenant credential/quota problems and are exempt. Increment is now atomic via :ets.update_counter (MED #13). - ArticleEmbeddingWorker and ProposalGate no longer call the client directly; both route through the guarded Knowledge.generate_embedding/3 so circuit-open is respected AND contributed to (HIGH #4). ProposalGate ran synchronously in the HTTP write path — it now fast-fails on outage instead of hammering a dead provider (LB-timeout risk). Provider-error-body key-leak cluster (one shared helper, both vendors): - New Loopctl.Llm.ProviderError.sanitize/1 + log_tag/1 drop any provider response body (which echoes a masked key fragment, e.g. "Incorrect API key provided: sk-...ZXY") — keeping only the status. Applied in EmbeddingClient.post (HIGH #3) and Anthropic.post (MED #11), and ProposalGate now logs a value-free tag instead of inspecting the raw tuple (CRIT #2). Stops the fragment reaching logs and oban_jobs.errors (which is surfaced to tenants via list_extraction_errors). Retry / idempotency / classification: - Loopctl.Llm.permanent_provider_error?/1: a 4xx (except 408/429) is permanent; ArticleEmbeddingWorker {:discard, {:embedding_permanent_error, _}}s a bad/revoked-key embed instead of burning retries (HIGH #5). - New articles.embedding_content_hash column (migration) records the SHA-256 of the embedded text; the worker skips re-calling the paid provider on retry when the stored hash matches the current content (MED #12). Keyless UX + audit + timeout + schema: - mode=semantic degrades to keyword-only (fallback: true) for a keyless tenant instead of 503 (MED #8). - Llm.record_blocked(tenant_id, :embedding) is emitted from the worker skip and ProposalGate's fall-open, matching the Anthropic audit trail (MED #9). - The embedding client's single-attempt budget (4s, no client retries) is kept strictly below Knowledge's 5s Task.yield so a slow-but-valid embed isn't killed-then-miscounted (MED #10). - "embedding" added to the LlmUsageResponse operation enum (MED #7). Also adds Knowledge.reset_circuit_breaker/1 (tenant-scoped) and switches the breaker tests to it: the tenant-scoped ETS breaker made the old global delete_all_objects reset race concurrent async tests (a wiped counter = a flaky "breaker didn't open"), so per-tenant reset keeps async isolation. Tests: per-tenant breaker (:no_api_key + 4xx exempt; 5xx opens; the two-tenant no-cross-degrade repro); permanent-error discard vs transient retry; idempotent skip on re-run; keyless semantic degrade; success-path no-key-leak; a realistic OpenAI 401 body dropped from the error term + logs; skip telemetry asserted. Follow-up to #295 (squash-merged to master before this review pass landed), so it ships as its own PR rather than an update to that (now-merged) branch. Full precommit green (compile --warnings-as-errors, format, credo --strict, dialyzer 0, full test suite green) + 73 mcp node tests. * fix(embeddings): complete provider-error sanitization + atomic breaker window + supervised breaker table + keyless-embedding log text Addresses the 6 findings from the expedited re-review of #296 (1 Critical, 4 Medium, 1 Low). master ships the round-1 regressions, so this must merge clean. Provider-error sanitization (complete the half-applied #11): - CRIT #1: Anthropic.post's 200-UNEXPECTED-SHAPE branch still returned the raw body via redact_body/1 (a no-op for map bodies). A misconfigured/compromised endpoint returning HTTP 200 with an error-shaped body echoing a masked key fragment leaked it into oban_jobs.errors (surfaced by GET /knowledge/pipeline). Now sanitized like every other branch: {:error, {:api_error, 200, :provider_error}}. redact_body/1 DELETED. - MED #4: Anthropic.post's transport-error branch logged inspect(reason); now uses ProviderError.log_tag/1 (value-free), matching EmbeddingClient. - MED #5: content_ingestion_worker.finalize_errors/4 and review_knowledge_worker.classify_result/1 (and ArticleEmbeddingWorker) now route the error term through ProviderError.sanitize/1 BEFORE it becomes an Oban discard/error reason — defense-in-depth at every worker boundary. - LOW #6: ProviderError.sanitize/1 is now total and value-free for the key-bearing shapes: {:api_error,_,body}, a non-atom {:request_failed,_}, {:embedding_crash,_}, and a BARE body (string or map) all collapse; a bare atom and a STRUCTURED domain-error tuple (e.g. {:insert_failed, step, changeset}) are provably key-free and preserved (collapsing them would replace a precise operator reason with a misleading :provider_error). The API key only ever reaches sanitize inside those provider shapes. Breaker correctness: - MED #2: the per-tenant failure counter is now keyed by the WINDOW PERIOD ({tenant_id, :failures, div(now, window)}) so an expired window is a DISTINCT key — the non-atomic check-then-reset in maybe_reset_window is gone, and a burst of concurrent failures for one tenant counts atomically to the threshold. - The re-review's required concurrent test exposed a latent OWNERSHIP bug: the breaker ETS table was created lazily by whatever transient process (request / Oban job / Task) first touched it and VANISHED when that process died — silently resetting the breaker and crashing concurrent peers mid-update_counter. Added a supervised owner (Loopctl.Knowledge.EmbeddingCircuitBreaker, mirroring ExportConcurrency) so the table has a stable, node-lifetime owner. Correct log text: - MED #3: Llm.record_blocked/2 hard-coded "no Anthropic API key configured" even for :embedding. Now branches: "no embedding API key configured" for :embedding. Tests: ProviderError sanitize/log_tag across every shape (incl. the bare-body #6 cases + structured-tuple preservation); Anthropic 200-shape + non-200 exact sanitized value with a key-echoing body + no-leak; a Task.async_stream concurrent one-tenant failure burst opening the breaker reliably; worker-boundary sanitization in review + ingestion + embedding workers (key-bearing api_error -> value-free Oban reason); record_blocked log naming the right credential for both paths. Full precommit green (compile --warnings-as-errors, format, credo --strict, dialyzer 0, 3652 tests / 0 failures) + 73 mcp node tests.
This was referenced Jul 4, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
Closes a Critical from the enhanced review: the embeddings pipeline was an ungated, operator-funded, untracked external-API spend path.
POST /api/v1/articles(role:agent, the lowest role) firedEmbeddingClient.generate_embedding/1on every create/update/publish using the global operator OpenAI key — no tenant scoping, no cap, no usage record. A single agent key could loop it to run up the operator's OpenAI bill.Mark's decision: mandatory BYO for embeddings too — each tenant supplies its OWN embedding key; a tenant without one gets NO embeddings (their articles simply aren't vector-searchable until they configure it). This mirrors the just-merged #294 (per-tenant BYO Anthropic) exactly.
What
embedding_api_key(CloakVault.Binary,redact: true, never cast — set viaput_change) + free-formembedding_modeltotenant_llm_settings. No new table; the existing RLS policy covers the new columns.Loopctl.Llm) —resolve(tenant_id, :embedding)resolves the SEPARATE embedding key + model (defaulttext-embedding-3-small);has_embedding_key?/1;:embeddingrecorded in the SAMEllm_usage_eventsledger (input_tokens= OpenAIprompt_tokens/total_tokens,output_tokens= 0);usage_summarygroups it automatically.EmbeddingClient.generate_embedding/2now takestenant_id, resolves the TENANT's key (Authorization: Bearer), records an:embeddingusage event on success, and returns{:error, :no_api_key}for a keyless tenant (no provider call, no operator key).tenant_idthreaded through the behaviour + every caller (worker, proposal gate, knowledge search paths, search controller) + the Mox mock.ArticleEmbeddingWorkergates on the client's{:error, :no_api_key}and cleanly{:discard, {:no_embedding_key, id}}+ telemetry — no crash, no retry, no operator-key fallback. Article creation is never blocked. The keyless case does NOT trip the global embedding circuit breaker (a per-tenant config gap must not degrade embeddings for tenants that DO have a key).runtime.exs:embedding_provider(onlybase_url/modeldefaults remain).PATCH/GET /tenants/me/llm-configandset_llm_config/llm_confighandleembedding_api_key+embedding_model(role:user, key never returned —has_embedding_key+ last-4 hint only). OpenApiSpex schemas updated;mcp-serverbumped2.31.1→2.32.0+ CHANGELOG.Security
llm_config.embedding_key_setaudit carries only a last-4 hint), RLS-covered, strictly tenant-scoped resolve (no cross-tenant key use), config at role:user, andfilter_parametersalready redactsembedding_api_key(theapi_keydiscard-substring).Tests
resolve(:embedding)+ defaults + tenant isolation, encryption-at-rest, audit, usage ledger; worker mandatory-BYO discard; the realEmbeddingClient(viaReq.Test) — tenant-key resolution + usage recording + cross-tenant (A uses A's key, never B's) + no-key-leak; config API set/leak/role/isolation.Full precommit green (compile
--warnings-as-errors, format,credo --strict, dialyzer 0, 3629 tests / 0 failures) + 73 mcp node tests.Follow-up (not in this PR)
Backfilling embeddings for a tenant's EXISTING articles once they first configure an embedding key is out of scope here — noted for a separate story. Until then, a tenant that sets a key gets embeddings for NEW/updated articles going forward; pre-existing articles stay keyword-searchable only until re-published or a backfill ships.