Skip to content

fix(traces): cap backend trace Data to keep admin UI responsive#9960

Merged
mudler merged 3 commits into
masterfrom
fix/backend-trace-data-cap
May 23, 2026
Merged

fix(traces): cap backend trace Data to keep admin UI responsive#9960
mudler merged 3 commits into
masterfrom
fix/backend-trace-data-cap

Conversation

@localai-bot
Copy link
Copy Markdown
Collaborator

Summary

  • The previous fix (fix(traces): cap captured body size to keep admin Traces UI responsive #9946) capped API trace bodies, but /api/backend-traces still ships unbounded Data fields, so the admin Traces page stays in "loading" forever once a chatty agent-pool workload runs.
  • LLM backend traces carry the full chat messages JSON, the full response, and the full streaming chat_deltas content. With RAG, every reasoning step queues 50-500 KiB; 100 entries trivially reach tens of MiB.
  • TTS / audio_transform / transcript traces embed a 30s audio snippet as base64 (~1.3 MiB per trace).
  • Both make /api/backend-traces download + parse slower than the 5s auto-refresh, which is exactly the symptom the API-side fix addressed on the other end.

This PR applies two complementary caps, both honoring LOCALAI_TRACING_MAX_BODY_BYTES (same knob as the API-side cap):

  • Option A - generic safety net (core/trace/backend_trace.go): RecordBackendTrace walks the Data map recursively and replaces any string value larger than the cap with <truncated: N bytes>. Catches anything a future producer forgets to truncate.
  • Option B - head-preserving truncation at producers:
    • core/backend/llm.go: trace.TruncateToBytes(...) applied to messages, response, and chat_deltas.{content,reasoning_content}. Keeps the leading content readable in the UI.
    • core/trace/audio_snippet.go: omit audio_wav_base64 when the encoded blob would exceed the cap (truncated base64 is undecodable). The quality metrics still ship and the React WaveformPlayer already no-ops when the field is absent.

TruncateToBytes guarantees output <= maxBytes, so Option A leaves the producer's head-preserving output alone instead of replacing it with the bare marker.

Test plan

  • go test ./core/trace/... ./core/backend/... ./core/http/middleware/... (3 packages, all ok)
  • New Ginkgo specs:
    • RecordBackendTrace Data capping - top-level oversize, nested map oversize, untouched-when-small, no-double-truncate
    • TruncateToBytes - happy path, disabled cap, oversize, pathologically small cap
    • AudioSnippetFromPCM byte cap - drop+keep metrics when oversize, include when under cap, include when cap disabled
  • Build clean (go build ./core/trace/... ./core/backend/...)
  • All callsites of the changed signatures (InitBackendTracingIfEnabled, AudioSnippet*) updated
  • Manual: deploy to local-ai-distributed namespace and confirm /api/backend-traces payload stays under a few hundred KiB and the admin Traces page renders within the 5s auto-refresh window

mudler added 3 commits May 23, 2026 08:09
…nsive

The previous fix (#9946) capped API trace bodies but missed backend traces,
which carry the same blast radius:

  - LLM backend traces store the full chat messages JSON, full response, and
    full streaming deltas. Every agent-pool reasoning step ships the full
    RAG-augmented history (50-500 KiB per trace, often 100+ traces queued).
  - TTS / audio_transform / transcript traces embed a 30s audio snippet as
    base64, around 1.3 MiB per trace.

Both blow the /api/backend-traces JSON past tens of MiB. The admin Traces
page then keeps re-downloading and re-parsing the buffer faster than the
5s auto-refresh and stays in the loading state forever, the same symptom
the API-side fix addressed.

Apply two complementary caps, both honoring LOCALAI_TRACING_MAX_BODY_BYTES:

Option A (safety net in core/trace): RecordBackendTrace walks the Data map
recursively and replaces any string value larger than the cap with
"<truncated: N bytes>". Catches anything a future producer forgets.

Option B (head-preserving at the producer):
  - core/backend/llm.go: TruncateToBytes on messages, response, and
    chat_deltas content/reasoning_content so the leading content stays
    readable in the UI.
  - core/trace/audio_snippet.go: omit audio_wav_base64 when the encoded
    blob would exceed the cap (truncated base64 is undecodable). The
    quality metrics still ship and the UI's WaveformPlayer simply skips
    when the field is absent.

TruncateToBytes is bounded to <= maxBytes so Option A leaves the producer's
head-preserving output alone instead of replacing it with the bare marker.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-7
…anels

The setting was already plumbed through env (LOCALAI_TRACING_MAX_BODY_BYTES),
CLI flag, and the runtime_settings.json GET/PUT schema, but neither the main
Settings page nor the inline Traces panel offered an input for it. Admins
hitting the "Traces UI stuck loading" symptom had to know to set an env var
or PUT raw JSON to /api/settings to dial the cap.

Add a "Max Body Bytes" row next to "Max Items" in both places. Same input
type, same disabled-when-tracing-off semantics, placeholder shows the 65536
default so users see what they're inheriting.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-7
…Bytes

The Tracing settings panel now has two number inputs. The previous spec
matched 'input[type="number"]' which became ambiguous and triggered a
Playwright strict-mode violation in CI. Switch to getByPlaceholder('100')
for Max Items and add a parallel spec for the new Max Body Bytes field
using getByPlaceholder('65536').

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-7
@mudler mudler merged commit 1198d10 into master May 23, 2026
57 of 58 checks passed
@mudler mudler deleted the fix/backend-trace-data-cap branch May 23, 2026 12:50
@localai-bot localai-bot added the bug Something isn't working label May 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants