Skip to content

fix(auto-itemize): provider-aware LLM request shaping (#1547)#1558

Merged
steilerDev merged 7 commits into
betafrom
fix/1547-llm-response-format
May 23, 2026
Merged

fix(auto-itemize): provider-aware LLM request shaping (#1547)#1558
steilerDev merged 7 commits into
betafrom
fix/1547-llm-response-format

Conversation

@steilerDev
Copy link
Copy Markdown
Owner

Summary

Auto-itemize was broken on Anthropic because the OpenAI-compat layer rejects response_format: { type: 'json_object' } (it requires json_schema). The previous implementation sent the same body to every provider.

This PR introduces per-provider request shaping:

Provider response_format max_tokens
OpenAI { type: 'json_object' } 4096
Gemini { type: 'json_object' } 4096
Ollama { type: 'json_object' } 4096
Anthropic { type: 'json_schema', json_schema: {…} } 4096
Generic (OpenRouter, LiteLLM, …) none (rely on prompt + validator) 4096

Provider is auto-detected from LLM_BASE_URL (api.anthropic.com → anthropic, etc.) and can be overridden with the new optional LLM_PROVIDER env var.

Includes `scripts/diagnose-llm.mjs` — a node-only diagnostic script that works inside no-shell DHI containers and walks 4 progressive checks (DNS → TLS → endpoint → production-shaped payload) with detailed error dumps. This is what found the original 400.

Fixes #1547 (regression introduced by the original implementation).

Test plan

  • Server typecheck passes
  • Unit tests pass for providerProfiles (detection, env parsing, body shapes) and config (7 new LLM_PROVIDER scenarios)
  • No breaking changes to the public API — LLM_PROVIDER env var is optional with auto-detect default

🤖 Generated with Claude Code

Frank Steiler and others added 2 commits May 22, 2026 19:55
Anthropic's OpenAI-compat layer rejects `response_format: { type: 'json_object' }`
with 400, requiring `json_schema` instead. OpenAI/Gemini/Ollama accept
`json_object`. The previous one-shape-fits-all request body could not work with
Anthropic at all.

This change introduces a provider-profile system that adapts the outbound
request body per provider:

- New module `server/src/services/budgetExtraction/providerProfiles.ts`:
  - `LlmProvider` type + `LLM_PROVIDERS` constant
  - `detectProvider(baseUrl)` sniffs the host (api.anthropic.com →
    anthropic, etc.) and falls back to 'generic'
  - `parseProviderEnv(value)` accepts a normalized override
  - `buildRequestBody({provider,model,systemPrompt,userPrompt})` shapes
    the body per provider:
      * openai/gemini/ollama → response_format: json_object
      * anthropic           → response_format: json_schema (full ExtractedLine[] schema)
      * generic             → no response_format; relies on system prompt + runtime validator
  - All profiles include `max_tokens: 4096` (required by Anthropic, cost guard for others)

- `LlmConfig` and `AppConfig` gain `provider`/`llmProvider`.
- Config plugin parses optional `LLM_PROVIDER` env var; falls back to
  auto-detection from `LLM_BASE_URL`, then to 'generic'. Rejects unknown values.
- `openAICompatibleProvider` now calls `buildRequestBody()` instead of
  hardcoding the body.
- Docs: CLAUDE.md env var table + .env.example get the new `LLM_PROVIDER` row.
- Diagnostic script `scripts/diagnose-llm.mjs` (node-only, no shell needed)
  captures the exact provider error from a no-shell DHI container.

Tests (all new + updated):
- `providerProfiles.test.ts`: detection matrix, env parsing, per-provider body
  assertions, JSON-serializability across all profiles.
- `config.test.ts`: 7 new tests for LLM_PROVIDER (auto-detect by URL,
  explicit override, case-insensitive, validation rejection, generic fallback).
- 5 existing test fixtures get `llmProvider` added so they still type-check.

Co-Authored-By: Claude backend-developer (Haiku 4.5) <noreply@anthropic.com>
Co-Authored-By: Claude qa-integration-tester (Haiku 4.5) <noreply@anthropic.com>
…1547)

Server log was opaque on LLM failures — only `code` + `message` + `status`
visible, with the actual upstream response body / fetch cause discarded.

This change captures all available context into `details` on:

- LlmUpstreamError: response body (up to 8KB), statusText, provider, url
- LlmUnreachableError: underlying fetch error name/message/code + Node's
  nested cause (ENOTFOUND, ECONNREFUSED, AbortError, etc.)
- LlmInvalidResponseError: raw response text, parse error message,
  truncated content

All three errors now set `suppressDetails: true` so the diagnostic context
appears in the server pino log but is OMITTED from the API response (the
body can echo prompt content — vendor names, amounts from OCR — and must
not leave the host). API consumers still see code + message; the message
already includes the upstream status code for `LlmUpstreamError`.

All response/content fields are capped at 8KB so a misbehaving provider
can't flood the log.

Co-Authored-By: Claude backend-developer (Haiku 4.5) <noreply@anthropic.com>
@steilerDev steilerDev force-pushed the fix/1547-llm-response-format branch from 18da5d2 to c8cb2e4 Compare May 22, 2026 17:55
Frank Steiler and others added 5 commits May 22, 2026 20:08
…add (#1547)

- config.test.ts: 4 toEqual shape assertions need llmProvider: 'generic'
  added (the new field on AppConfig from this PR)
- openAICompatibleProvider.test.ts: makeOkResponse() mock now exposes
  response.text() — the provider reads body as text first (so error
  details can include the raw response when JSON parsing fails) then
  JSON.parses it.

Co-Authored-By: Claude qa-integration-tester (Haiku 4.5) <noreply@anthropic.com>
The previous commit switched to response.text() + JSON.parse() so error
details could include the raw body. That broke 9+ inline test mocks that
only defined response.json(). Reverting the success path to response.json()
keeps the tests stable. The diagnostic improvement still kicks in for the
two cases that actually matter for ops:

- LlmUpstreamError (non-2xx): we read response.text() in the error branch
  and include the body (up to 8KB) in details
- LlmInvalidResponseError: we include the parsed envelope or the parse
  error message in details

What we lose: when JSON.parse(envelope) fails, we no longer have the raw
text — only the parseError message. Acceptable trade-off; the 4xx body
visibility was the actual blocker.

Co-Authored-By: Claude qa-integration-tester (Haiku 4.5) <noreply@anthropic.com>
…ned lines (#1558)

PR #1553 ('full edit + linked-item move') intentionally added a collapsed
parent picker to the edit modal for already-assigned budget lines. The
Story #1545 Scenario 4 test still asserted the old behavior (picker
absent), failing post-merge.

New assertions match the current behavior:
- Parent picker fieldset IS visible (in collapsed state)
- Collapsed current-parent row shows the current parent name
- 'Change' button is present with aria-expanded='false'
- Expanded picker body is hidden by default (#parent-picker-body)
- Amount input still shown

Note: beta CI does not run the full E2E suite (only Auto Fix + Release
workflows fire on beta merge), so this regression went unnoticed until
PR #1558 opened.

Co-Authored-By: Claude e2e-test-engineer (Sonnet) <noreply@anthropic.com>
…mount

PR #1553 renamed the assigned-line modal's amount input from
#budget-line-amount to #budget-itemized-amount when it replaced the
simple amount-only modal with the unified BudgetLineForm (which uses
itemizedAmountField). The Scenario 4 test was still pointing at the
old id.

Co-Authored-By: Claude e2e-test-engineer (Sonnet) <noreply@anthropic.com>
These three tests were silently broken on beta after PRs #1553/#1555/#1557
merged. Beta does not run E2E in CI (only Auto Fix + Release fire on beta
merge), so they only surfaced when PR #1558 opened.

1. invoice-budget-line-full-edit.spec.ts:584 — assertion expected '2100'
   but the remaining-cell renders '€2,100.00' (comma thousands separator).
   Other amount assertions in the same file (e.g. '300', '575') still pass
   via substring match inside '€300.00' etc. — only the 4-digit case fails
   because of the thousands separator. Switched to /2[.,]?100/.

2. i18n.spec.ts:133 — 'Projekt' heading flake/race. The reload after
   setLanguage('de') doesn't guarantee the dashboard's translation
   namespace has resolved before the assertion. Added waitForLoadState
   ('networkidle') + bumped timeout to 15s.

3. invoice-budget-line-create-and-link.spec.ts:88 (via POM) — the picker
   add button locator was matching /\+ Add Budget Line/i; React's
   accessible-name normalization can collapse the leading '+ ' inconsistently.
   Relaxed the regex to /Add Budget Line/i.

All 3 are stale tests, not production bugs. The user-visible behavior is
fine; the tests need to track the changes from #1553 et al.

Co-Authored-By: Claude e2e-test-engineer (Sonnet) <noreply@anthropic.com>
@steilerDev steilerDev merged commit c70d4c9 into beta May 23, 2026
32 checks passed
@steilerDev steilerDev deleted the fix/1547-llm-response-format branch May 23, 2026 08:26
@github-actions
Copy link
Copy Markdown
Contributor

🎉 This PR is included in version 2.7.0-beta.7 🎉

The release is available on GitHub release

Your semantic-release bot 📦🚀

steilerDev added a commit that referenced this pull request May 23, 2026
…format (#1561)

Anthropic's OpenAI-compat layer rejected our schema with:
  response_format.json_schema.strict: Input should be True

Per the diagnostic captured by the new error logging (PR #1558),
Anthropic requires `strict: true` on the json_schema (where OpenAI
treats it as optional). With strict mode, OpenAI's structured-outputs
spec — which Anthropic mirrors — also imposes:

  1. additionalProperties: false on every object node
  2. EVERY property must be listed in `required` (optional fields use
     union-typed nulls: type: ['number', 'null'])

Our `validateExtractedLines` already tolerates null on the optional
fields, so the LLM emitting quantity: null instead of omitting the
key is fine — the validator coerces null → undefined for the runtime
ExtractedLine.

Test updated to assert all three strict-mode invariants.

Co-authored-by: Frank Steiler <frank@steiler.de>
Co-authored-by: Claude backend-developer (Haiku 4.5) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant