Skip to content

Task 158 follow-up: enrich LLM diagnostic suggestions from provider_message text #354

@spinje

Description

@spinje

Context

Task 158 §40 added provider_message: str | None to every LLMCallError, exposing the raw upstream LiteLLM/provider exception text in _diagnostic_context["provider_message"]. Agents reading the diagnostic JSON now see what the provider said (the WHY) alongside pflow's wrapped framing (the WHAT).

But pflow's own remediation suggestions (Diagnostic.suggestions) are static per-(error_class, kind/reason) — they don't yet leverage provider_message to surface provider-specific context that's already in hand.

Problem

Generic suggestions miss provider-specific actionable detail that's already captured in provider_message. Concrete examples:

UnknownModelError(reason="unknown_name") for a deprecated model:

  • provider_message: "Model 'claude-2.1' was retired on 2025-07-21 — use claude-3-5-sonnet instead"
  • Current suggestions: "Check the model name against the provider's current model catalogue."
  • Could surface: "Provider says: model retired. Try the replacement they suggested."

MissingSdkError when provider_message already contains the install command:

  • provider_message: "Google Cloud SDK not found. Install it with: pip install 'litellm[google]'"
  • Current suggestions (already good): the package name is parsed and shown in the install hint.
  • Could be more honest: surface the upstream "install with" text verbatim when present, not only the parsed package name.

InvalidRequestError for context-window overflow:

  • provider_message: "Request exceeds maximum context length of 200000 tokens (got 215431)"
  • Current suggestions: "Check the request shape against the provider's documentation."
  • Could surface: "Provider says request was 215431 tokens; the model's max is 200000. Reduce prompt size or use a model with a larger context."

InvalidRequestError for content policy violations:

  • provider_message: "Content policy violation: prompt may be unsafe"
  • Current suggestions: generic.
  • Could surface: "Provider blocked the prompt as policy-violating. Adjust the prompt to avoid the trigger."

Scope

Add light-touch enrichment in each subclass's to_diagnostics() override that recognizes a small, stable set of patterns in self.provider_message and appends targeted suggestions when matched. Important constraints:

  • Pattern detection must stay narrow: substring-match only on stable provider phrases (e.g. "retired", "context length", "content policy", "quota"). Don't try to parse free-form provider text.
  • Always falls back to generic suggestions: enrichment is additive, not replacement. If no pattern matches, suggestions stay as today.
  • Document the recognized patterns in the subclass docstring so future maintainers know which substrings the diagnostic is looking for and can update if a provider rewords.

Tradeoff vs typed sub-discriminators

There's a related issue proposing typed sub-discriminators on MissingApiKeyError (kind="quota_exceeded", etc.). The two approaches differ:

Approach Where detection lives Stability
Typed sub-discriminator (the other issue) _classify_litellm_error at the seam Detection is centralized; consumers branch on a stable enum; one place to update if provider rewords.
Suggestion enrichment (this issue) Each subclass's to_diagnostics() Detection is co-located with the suggestion text; cheaper; doesn't preclude typed discriminators later.

These are complementary, not alternative:

  • Typed discriminators are the right shape for sub-cases that need different remediation pathways (quota → billing UI; suspended → contact support). Worth the architectural investment.
  • Suggestion enrichment is right for sub-cases where the same remediation pathway applies but with provider-specific detail (deprecated-model name, exceeded-token-count). Lighter touch.

If both ship: the seam-side discriminators set kind, and to_diagnostics() may further enrich suggestions from provider_message text within a kind bucket.

Why this is worth doing even with the typed-discriminator option

  • Lower architectural cost — small per-subclass changes, no enum extensions, no seam-side detection logic.
  • Wins on patterns that don't deserve a discriminator — e.g. surfacing "retired on" model dates is useful but doesn't justify a new UnknownModelError(reason) value.
  • Can ship incrementally as patterns surface; doesn't require a coordinated change.

Why deferred from Task 158

Task 158's structural pass already added provider_message to the diagnostic context. Suggestion enrichment is an extension that's nice-to-have but not load-bearing — agents reading the JSON output already get the raw text and can render it themselves. Filing as follow-up so the surface is captured for prioritization.

Scope of implementation

  • 3-5 stable patterns per subclass to start.
  • Tests in each subclass's test file (test_llm_client.py::TestLLMDiagnostics) verifying that recognized patterns produce the enriched suggestion AND that unrecognized text falls back cleanly to the generic suggestion.

References

  • provider_message introduction: .taskmaster/tasks/task_158/implementation/progress-log.md §40.
  • Existing static suggestions: src/pflow/core/exceptions.py per-subclass to_diagnostics() overrides.
  • Related (typed sub-discriminators on MissingApiKeyError): the other Task 158 follow-up issue.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions