Skip to content

feat(llm): extend per-tenant BYO to embeddings — mandatory tenant OpenAI key, gated enqueue, usage tracking#295

Merged
mkreyman merged 1 commit into
masterfrom
feat/embeddings-byo
Jul 3, 2026
Merged

feat(llm): extend per-tenant BYO to embeddings — mandatory tenant OpenAI key, gated enqueue, usage tracking#295
mkreyman merged 1 commit into
masterfrom
feat/embeddings-byo

Conversation

@mkreyman

@mkreyman mkreyman commented Jul 3, 2026

Copy link
Copy Markdown
Owner

Why

Closes a Critical from the enhanced review: the embeddings pipeline was an ungated, operator-funded, untracked external-API spend path. POST /api/v1/articles (role :agent, the lowest role) fired EmbeddingClient.generate_embedding/1 on every create/update/publish using the global operator OpenAI key — no tenant scoping, no cap, no usage record. A single agent key could loop it to run up the operator's OpenAI bill.

Mark's decision: mandatory BYO for embeddings too — each tenant supplies its OWN embedding key; a tenant without one gets NO embeddings (their articles simply aren't vector-searchable until they configure it). This mirrors the just-merged #294 (per-tenant BYO Anthropic) exactly.

What

  • Schema/migration — add encrypted embedding_api_key (Cloak Vault.Binary, redact: true, never cast — set via put_change) + free-form embedding_model to tenant_llm_settings. No new table; the existing RLS policy covers the new columns.
  • Context (Loopctl.Llm)resolve(tenant_id, :embedding) resolves the SEPARATE embedding key + model (default text-embedding-3-small); has_embedding_key?/1; :embedding recorded in the SAME llm_usage_events ledger (input_tokens = OpenAI prompt_tokens/total_tokens, output_tokens = 0); usage_summary groups it automatically.
  • ClientEmbeddingClient.generate_embedding/2 now takes tenant_id, resolves the TENANT's key (Authorization: Bearer), records an :embedding usage event on success, and returns {:error, :no_api_key} for a keyless tenant (no provider call, no operator key). tenant_id threaded through the behaviour + every caller (worker, proposal gate, knowledge search paths, search controller) + the Mox mock.
  • EnforcementArticleEmbeddingWorker gates on the client's {:error, :no_api_key} and cleanly {:discard, {:no_embedding_key, id}} + telemetry — no crash, no retry, no operator-key fallback. Article creation is never blocked. The keyless case does NOT trip the global embedding circuit breaker (a per-tenant config gap must not degrade embeddings for tenants that DO have a key).
  • Config removal — the global operator embedding key is removed from runtime.exs :embedding_provider (only base_url/model defaults remain).
  • Config API + MCPPATCH/GET /tenants/me/llm-config and set_llm_config/llm_config handle embedding_api_key + embedding_model (role :user, key never returned — has_embedding_key + last-4 hint only). OpenApiSpex schemas updated; mcp-server bumped 2.31.12.32.0 + CHANGELOG.

Security

  • Key encrypted at rest, never logged (verified across every branch of the client + worker + resolve path), never serialized or audited with its value (llm_config.embedding_key_set audit carries only a last-4 hint), RLS-covered, strictly tenant-scoped resolve (no cross-tenant key use), config at role :user, and filter_parameters already redacts embedding_api_key (the api_key discard-substring).

Tests

resolve(:embedding) + defaults + tenant isolation, encryption-at-rest, audit, usage ledger; worker mandatory-BYO discard; the real EmbeddingClient (via Req.Test) — tenant-key resolution + usage recording + cross-tenant (A uses A's key, never B's) + no-key-leak; config API set/leak/role/isolation.

Full precommit green (compile --warnings-as-errors, format, credo --strict, dialyzer 0, 3629 tests / 0 failures) + 73 mcp node tests.

Follow-up (not in this PR)

Backfilling embeddings for a tenant's EXISTING articles once they first configure an embedding key is out of scope here — noted for a separate story. Until then, a tenant that sets a key gets embeddings for NEW/updated articles going forward; pre-existing articles stay keyword-searchable only until re-published or a backfill ships.

…nAI key, gated enqueue, usage tracking

Closes the operator-funded embedding-spend gap: POST /api/v1/articles (role
:agent, the lowest role) fired the embedding pipeline on every create/update/
publish using a GLOBAL operator OpenAI key with no tenant scoping, cap, or usage
record — a single agent key could loop it to run up the operator's bill.

Mandatory BYO for embeddings (mirrors #294's Anthropic BYO exactly):

- Schema/migration: add encrypted embedding_api_key (Cloak Vault.Binary,
  redact: true, never cast — set via put_change) + free-form embedding_model
  to tenant_llm_settings (no new table; existing RLS covers the columns).
- Context: Loopctl.Llm.resolve(tenant_id, :embedding) resolves the SEPARATE
  embedding key + model (default text-embedding-3-small); has_embedding_key?/1;
  :embedding recorded in the shared llm_usage_events ledger (input=prompt_tokens,
  output=0); usage_summary groups it automatically.
- EmbeddingClient.generate_embedding/2 now takes tenant_id, resolves the TENANT's
  key (Authorization: Bearer), records an :embedding usage event, and returns
  {:error, :no_api_key} for a keyless tenant (no provider call, no operator key).
  tenant_id threaded through the behaviour + every caller (worker, proposal gate,
  knowledge search paths, search controller) + the Mox mock.
- Enforcement: ArticleEmbeddingWorker gates on the client's {:error, :no_api_key}
  and cleanly {:discard, {:no_embedding_key, id}} + telemetry — no crash, no
  retry, no operator-key fallback. Article creation is never blocked; a keyless
  tenant's article is simply not vector-searchable until a key is configured. The
  keyless case does NOT trip the global embedding circuit breaker.
- Global operator embedding key removed from runtime.exs :embedding_provider
  (only base_url/model defaults remain).
- Config API + MCP: PATCH/GET /tenants/me/llm-config and set_llm_config/llm_config
  handle embedding_api_key + embedding_model (role :user, key never returned —
  has_embedding_key + last-4 hint only). OpenApiSpex schemas updated; mcp-server
  bumped 2.31.1 -> 2.32.0 + CHANGELOG.

Security: key encrypted at rest, never logged (verified across every branch),
never serialized/audited with its value (embedding_key_set audit carries only a
hint), RLS-covered, strictly tenant-scoped resolve, config at role :user,
filter_parameters already redacts embedding_api_key (api_key substring).

Tests: resolve(:embedding) + defaults + isolation, encryption-at-rest, audit,
usage ledger; worker mandatory-BYO discard; real EmbeddingClient key resolution
+ usage + cross-tenant via Req.Test; config API set/leak/role/isolation. Full
precommit green (compile --warnings-as-errors, format, credo --strict, dialyzer,
3629 tests) + 73 mcp node tests.
@mkreyman mkreyman marked this pull request as ready for review July 3, 2026 23:30
@mkreyman mkreyman merged commit 10b871e into master Jul 3, 2026
10 checks passed
mkreyman added a commit that referenced this pull request Jul 4, 2026
…aker, route worker/proposal-gate through it, sanitize provider errors, permanent-error/idempotent embed, keyless degrade

Addresses the 15 BYO-introduced regressions from the enhanced review on #295.

Circuit-breaker cluster:
- The embedding circuit breaker was process-GLOBAL (keyed only by :failures /
  :circuit_open_until). Now keyed PER-TENANT ({tenant_id, :failures} /
  :window_start / :open_until) so a failing tenant only degrades ITSELF — one
  tenant's 401/429/timeout can no longer open the shared breaker and degrade every
  other tenant's semantic search (CRIT #1). Failures are CLASSIFIED: only 5xx /
  transport / timeout / crash count; 4xx (401/403/429) and :no_api_key are
  per-tenant credential/quota problems and are exempt. Increment is now atomic via
  :ets.update_counter (MED #13).
- ArticleEmbeddingWorker and ProposalGate no longer call the client directly; both
  route through the guarded Knowledge.generate_embedding/3 so circuit-open is
  respected AND contributed to (HIGH #4). ProposalGate ran synchronously in the
  HTTP write path — it now fast-fails on outage instead of hammering a dead
  provider (LB-timeout risk).

Provider-error-body key-leak cluster (one shared helper, both vendors):
- New Loopctl.Llm.ProviderError.sanitize/1 + log_tag/1 drop any provider response
  body (which echoes a masked key fragment, e.g. "Incorrect API key provided:
  sk-...ZXY") — keeping only the status. Applied in EmbeddingClient.post (HIGH #3)
  and Anthropic.post (MED #11), and ProposalGate now logs a value-free tag instead
  of inspecting the raw tuple (CRIT #2). Stops the fragment reaching logs and
  oban_jobs.errors (which is surfaced to tenants via list_extraction_errors).

Retry / idempotency / classification:
- Loopctl.Llm.permanent_provider_error?/1: a 4xx (except 408/429) is permanent;
  ArticleEmbeddingWorker {:discard, {:embedding_permanent_error, _}}s a
  bad/revoked-key embed instead of burning retries (HIGH #5).
- New articles.embedding_content_hash column (migration) records the SHA-256 of the
  embedded text; the worker skips re-calling the paid provider on retry when the
  stored hash matches the current content (MED #12).

Keyless UX + audit + timeout + schema:
- mode=semantic degrades to keyword-only (fallback: true) for a keyless tenant
  instead of 503 (MED #8).
- Llm.record_blocked(tenant_id, :embedding) is emitted from the worker skip and
  ProposalGate's fall-open, matching the Anthropic audit trail (MED #9).
- The embedding client's single-attempt budget (4s, no client retries) is kept
  strictly below Knowledge's 5s Task.yield so a slow-but-valid embed isn't
  killed-then-miscounted (MED #10).
- "embedding" added to the LlmUsageResponse operation enum (MED #7).

Tests: per-tenant breaker (:no_api_key + 4xx exempt; 5xx opens; the two-tenant
no-cross-degrade repro); permanent-error discard vs transient retry; idempotent
skip on re-run; keyless semantic degrade; success-path no-key-leak; a realistic
OpenAI 401 body dropped from the error term + logs; skip telemetry asserted.

Full precommit green (compile --warnings-as-errors, format, credo --strict,
dialyzer 0, 3638 tests / 0 failures) + 73 mcp node tests.
mkreyman added a commit that referenced this pull request Jul 4, 2026
…r-classed circuit breaker, provider-error sanitizer, permanent-error/idempotent embed, keyless degrade (#296)

* fix(embeddings): tenant-scope + error-class the embedding circuit breaker, route worker/proposal-gate through it, sanitize provider errors, permanent-error/idempotent embed, keyless degrade

Addresses the 15 BYO-introduced regressions from the enhanced review on #295.

Circuit-breaker cluster:
- The embedding circuit breaker was process-GLOBAL (keyed only by :failures /
  :circuit_open_until). Now keyed PER-TENANT ({tenant_id, :failures} /
  :window_start / :open_until) so a failing tenant only degrades ITSELF — one
  tenant's 401/429/timeout can no longer open the shared breaker and degrade every
  other tenant's semantic search (CRIT #1). Failures are CLASSIFIED: only 5xx /
  transport / timeout / crash count; 4xx (401/403/429) and :no_api_key are
  per-tenant credential/quota problems and are exempt. Increment is now atomic via
  :ets.update_counter (MED #13).
- ArticleEmbeddingWorker and ProposalGate no longer call the client directly; both
  route through the guarded Knowledge.generate_embedding/3 so circuit-open is
  respected AND contributed to (HIGH #4). ProposalGate ran synchronously in the
  HTTP write path — it now fast-fails on outage instead of hammering a dead
  provider (LB-timeout risk).

Provider-error-body key-leak cluster (one shared helper, both vendors):
- New Loopctl.Llm.ProviderError.sanitize/1 + log_tag/1 drop any provider response
  body (which echoes a masked key fragment, e.g. "Incorrect API key provided:
  sk-...ZXY") — keeping only the status. Applied in EmbeddingClient.post (HIGH #3)
  and Anthropic.post (MED #11), and ProposalGate now logs a value-free tag instead
  of inspecting the raw tuple (CRIT #2). Stops the fragment reaching logs and
  oban_jobs.errors (which is surfaced to tenants via list_extraction_errors).

Retry / idempotency / classification:
- Loopctl.Llm.permanent_provider_error?/1: a 4xx (except 408/429) is permanent;
  ArticleEmbeddingWorker {:discard, {:embedding_permanent_error, _}}s a
  bad/revoked-key embed instead of burning retries (HIGH #5).
- New articles.embedding_content_hash column (migration) records the SHA-256 of the
  embedded text; the worker skips re-calling the paid provider on retry when the
  stored hash matches the current content (MED #12).

Keyless UX + audit + timeout + schema:
- mode=semantic degrades to keyword-only (fallback: true) for a keyless tenant
  instead of 503 (MED #8).
- Llm.record_blocked(tenant_id, :embedding) is emitted from the worker skip and
  ProposalGate's fall-open, matching the Anthropic audit trail (MED #9).
- The embedding client's single-attempt budget (4s, no client retries) is kept
  strictly below Knowledge's 5s Task.yield so a slow-but-valid embed isn't
  killed-then-miscounted (MED #10).
- "embedding" added to the LlmUsageResponse operation enum (MED #7).

Also adds Knowledge.reset_circuit_breaker/1 (tenant-scoped) and switches the breaker
tests to it: the tenant-scoped ETS breaker made the old global delete_all_objects
reset race concurrent async tests (a wiped counter = a flaky "breaker didn't open"),
so per-tenant reset keeps async isolation.

Tests: per-tenant breaker (:no_api_key + 4xx exempt; 5xx opens; the two-tenant
no-cross-degrade repro); permanent-error discard vs transient retry; idempotent
skip on re-run; keyless semantic degrade; success-path no-key-leak; a realistic
OpenAI 401 body dropped from the error term + logs; skip telemetry asserted.

Follow-up to #295 (squash-merged to master before this review pass landed), so it
ships as its own PR rather than an update to that (now-merged) branch.

Full precommit green (compile --warnings-as-errors, format, credo --strict,
dialyzer 0, full test suite green) + 73 mcp node tests.

* fix(embeddings): complete provider-error sanitization + atomic breaker window + supervised breaker table + keyless-embedding log text

Addresses the 6 findings from the expedited re-review of #296 (1 Critical, 4
Medium, 1 Low). master ships the round-1 regressions, so this must merge clean.

Provider-error sanitization (complete the half-applied #11):
- CRIT #1: Anthropic.post's 200-UNEXPECTED-SHAPE branch still returned the raw body
  via redact_body/1 (a no-op for map bodies). A misconfigured/compromised endpoint
  returning HTTP 200 with an error-shaped body echoing a masked key fragment leaked
  it into oban_jobs.errors (surfaced by GET /knowledge/pipeline). Now sanitized like
  every other branch: {:error, {:api_error, 200, :provider_error}}. redact_body/1
  DELETED.
- MED #4: Anthropic.post's transport-error branch logged inspect(reason); now uses
  ProviderError.log_tag/1 (value-free), matching EmbeddingClient.
- MED #5: content_ingestion_worker.finalize_errors/4 and
  review_knowledge_worker.classify_result/1 (and ArticleEmbeddingWorker) now route
  the error term through ProviderError.sanitize/1 BEFORE it becomes an Oban
  discard/error reason — defense-in-depth at every worker boundary.
- LOW #6: ProviderError.sanitize/1 is now total and value-free for the key-bearing
  shapes: {:api_error,_,body}, a non-atom {:request_failed,_}, {:embedding_crash,_},
  and a BARE body (string or map) all collapse; a bare atom and a STRUCTURED
  domain-error tuple (e.g. {:insert_failed, step, changeset}) are provably key-free
  and preserved (collapsing them would replace a precise operator reason with a
  misleading :provider_error). The API key only ever reaches sanitize inside those
  provider shapes.

Breaker correctness:
- MED #2: the per-tenant failure counter is now keyed by the WINDOW PERIOD
  ({tenant_id, :failures, div(now, window)}) so an expired window is a DISTINCT key
  — the non-atomic check-then-reset in maybe_reset_window is gone, and a burst of
  concurrent failures for one tenant counts atomically to the threshold.
- The re-review's required concurrent test exposed a latent OWNERSHIP bug: the
  breaker ETS table was created lazily by whatever transient process (request / Oban
  job / Task) first touched it and VANISHED when that process died — silently
  resetting the breaker and crashing concurrent peers mid-update_counter. Added a
  supervised owner (Loopctl.Knowledge.EmbeddingCircuitBreaker, mirroring
  ExportConcurrency) so the table has a stable, node-lifetime owner.

Correct log text:
- MED #3: Llm.record_blocked/2 hard-coded "no Anthropic API key configured" even for
  :embedding. Now branches: "no embedding API key configured" for :embedding.

Tests: ProviderError sanitize/log_tag across every shape (incl. the bare-body #6
cases + structured-tuple preservation); Anthropic 200-shape + non-200 exact
sanitized value with a key-echoing body + no-leak; a Task.async_stream concurrent
one-tenant failure burst opening the breaker reliably; worker-boundary sanitization
in review + ingestion + embedding workers (key-bearing api_error -> value-free Oban
reason); record_blocked log naming the right credential for both paths.

Full precommit green (compile --warnings-as-errors, format, credo --strict,
dialyzer 0, 3652 tests / 0 failures) + 73 mcp node tests.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant