Skip to content

feat: Refactor existing LLM resolution into provider adapters#55

Open
andrejvysny wants to merge 14 commits intohuggingface:mainfrom
andrejvysny:feat/provider-adapters
Open

feat: Refactor existing LLM resolution into provider adapters#55
andrejvysny wants to merge 14 commits intohuggingface:mainfrom
andrejvysny:feat/provider-adapters

Conversation

@andrejvysny
Copy link
Copy Markdown

@andrejvysny andrejvysny commented Apr 22, 2026

Summary

  • refactor the existing Anthropic, OpenAI, and Hugging Face router LiteLLM param resolution into small provider adapter classes
  • keep existing CLI suggestions and web model/session behavior unchanged while reusing shared model-id validation in the CLI switcher
  • centralize provider error classification so the CLI, /model, session errors, and /health/llm show clean auth/quota/model-not-found messages instead of raw provider traces
  • add focused regression tests for adapter parity, validation edge cases, and provider error handling

Verification

  • uv run --with pytest --with pytest-asyncio pytest -q
  • uv run ml-intern "Say Hello!" --model "MiniMaxAI/MiniMax-M2.7"
  • ANTHROPIC_API_KEY= uv run ml-intern "Say Hello!" --model "anthropic/claude-opus-4-6"
  • OPENAI_API_KEY= uv run ml-intern "Say Hello!" --model "openai/gpt-5"
  • uv run ml-intern "Say Hello!" --model "fake-org/not-a-real-model-12345"

@andrejvysny andrejvysny force-pushed the feat/provider-adapters branch from 064c99b to 267618d Compare April 22, 2026 13:59
@andrejvysny andrejvysny changed the title Add config-driven provider adapters for local and routed LLMs Refactor existing LLM resolution into provider adapters Apr 22, 2026
@andrejvysny andrejvysny marked this pull request as ready for review April 22, 2026 16:33
andrejvysny and others added 4 commits April 23, 2026 10:29
Resolve conflicts by taking upstream for llm_params.py (will be
rewritten as thin adapter dispatcher) and main.py (model_switcher
extraction supersedes our changes).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Split NativeAdapter into AnthropicAdapter (thinking config +
output_config.effort) and OpenAIAdapter (reasoning_effort top-level).
Each adapter owns its accepted effort set and raises
UnsupportedEffortError in strict mode, preserving the effort_probe
cascade with zero changes to effort_probe.py or agent_loop.py.

llm_params.py becomes a thin dispatcher delegating to
resolve_adapter().build_params() while keeping the litellm
effort-validation patch and re-exporting UnsupportedEffortError.

model_switcher.py reads suggested models from the adapter registry
instead of maintaining a separate SUGGESTED_MODELS list.

backend/routes/agent.py replaces AVAILABLE_MODELS with
build_model_catalog().

OpenCodeGoAdapter deferred to PR huggingface#60.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Restore the existing web model behavior so PR huggingface#55 stays a behavior-preserving refactor while keeping shared runtime and CLI validation logic.
@andrejvysny
Copy link
Copy Markdown
Author

Pushed a4a7532. Narrowed the PR back to adapter extraction + validation reuse only, restored existing web behavior, and reran the full Python test suite (.......................... [100%]
26 passed in 1.80s → 26 passed).

Classify auth, credits, and missing-model failures once so the CLI, model switcher, and health checks show clean user-facing errors instead of raw provider traces.
@andrejvysny andrejvysny marked this pull request as ready for review April 23, 2026 11:34
@andrejvysny
Copy link
Copy Markdown
Author

Heads up: #60 builds on top of this refactor and adds the follow-up provider adapters / additional provider support, so #55 should land first.

@andrejvysny andrejvysny changed the title Refactor existing LLM resolution into provider adapters feat: Refactor existing LLM resolution into provider adapters Apr 23, 2026
@akseljoonas
Copy link
Copy Markdown
Collaborator

@claude please review this PR. Focus on correctness, security, LiteLLM/Bedrock routing impact, and whether it's safe to merge against current main. Keep it concise and prefer inline comments where it matters.

@claude
Copy link
Copy Markdown

claude Bot commented Apr 24, 2026

Claude finished @akseljoonas's task in 3m 6s —— View job


PR Review: feat/provider-adapters

  • Read repo context and gather diff
  • Review provider_adapters.py, llm_errors.py, llm_params.py
  • Review agent_loop.py, model_switcher.py, backend files
  • Check LiteLLM/Bedrock routing impact
  • Security review
  • Assess correctness and test coverage

Verdict: SAFE TO MERGE — clean refactor, no regressions found, good test coverage. A few nits and one minor misclassification edge case worth knowing before shipping.


Correctness

_CREDITS_MARKERS runs before _RATE_LIMIT_MARKERS and contains "quota" (llm_errors.py:45)

Rate-limit errors that mention quota (e.g. "rate quota exceeded", AWS Bedrock: "ThrottlingException: You have exceeded your quota") will be classified as credits instead of rate_limit, showing users "Insufficient API credits" when they're actually just throttled. The fix is simple: move the "quota" marker to _RATE_LIMIT_MARKERS or add a tighter quota check only after confirming no rate-limit signal is present. Fix this →

Removed institutional knowledge from llm_params.py

The old _patch_litellm_effort_validation and _resolve_llm_params docstrings explained why Anthropic needs thinking + output_config as top-level kwargs (not extra_body), and why the LiteLLM monkey-patch is needed. That context is now gone from the codebase (it will still be in git history but won't be visible to the next reader). The shortened docstring ("Patch LiteLLM's Anthropic effort validation for Claude Opus 4.7") doesn't capture the nuance. Consider keeping at least the top-level "why" sentence.

_friendly_error_message shim in agent_loop.py (line 192) — kept as a one-liner wrapper. It's harmless but dead weight since it's a private function used only in that file. Not blocking.


Security

Traceback no longer sent to users (agent_loop.py:834-840) — The change from str(e) + "\n" + traceback.format_exc() to render_llm_error_message(e) is a net security improvement. Raw tracebacks shown in chat UI could leak file paths and internal variable names. The full trace is now captured via logger.debug(..., exc_info=True) instead.

render_llm_error_message falls back to str(error) for unknown exceptions — this means novel provider errors still get surfaced verbatim via session_manager and agent_loop. No regression vs. before (old code also did str(e)), but worth noting that any provider that embeds tokens/credentials in exception messages (not common, but possible) will still leak them. No action needed here.

"401" / "402" as bare substring markers in _AUTH_MARKERS/_CREDITS_MARKERS — could theoretically false-positive on model IDs like gpt-4-01 embedded in an error. Negligible risk in practice for LLM error strings.


LiteLLM / Bedrock Routing

Bedrock routing is preserved exactlyBedrockAdapter.build_params returns {"model": model_name}, matching the old if model_name.startswith("bedrock/"): return {"model": model_name} one-liner. ✓

model_switcher.py now correctly skips HF catalog lookup for bedrock/ models (model_switcher.py:64) — the old code only bypassed the catalog for anthropic/ and openai/. Bedrock models would have hit _print_hf_routing_info and gotten a spurious catalog miss warning. This is a correctness fix.

is_valid_model_id now accepts bedrock/ IDs — old code used a raw "/" check and would have accepted bedrock/foo incidentally; new code routes through BedrockAdapter.allows_model_name_has_model_suffix, same result. ✓

_has_model_suffix validation for Bedrock — accepts any non-empty token after bedrock/, including dotted inference profile IDs (us.anthropic.claude-opus-4-7) and ARN prefixes (arn:aws:bedrock:... → tail is arn). This is intentionally permissive since Bedrock model ID formats vary widely. ✓


Minor Nits

  • _NETWORK_MARKERS contains "connect" as a bare substring — matches "reconnect", "disconnect", etc. Fine for LLM error context but worth awareness.
  • health_error_type intentionally maps "model" and "provider" errors to "unknown" — correct since the health API contract doesn't include those types. The test test_health_error_type_keeps_public_categories_stable guards this.
  • HfRouterAdapter.matches is intentionally a catch-all (anything not starting with a provider prefix). The routing/validation split (matches vs allows_model_name) is clean and correct.
  • Test test_model_switcher_shows_clean_inconclusive_warning asserts "timeout talking to provider" is not in the output — but ProbeInconclusive("timeout talking to provider") would be classified as "network" by classify_llm_error, so render_llm_error_message returns "The model provider is unavailable or timed out.". ✓

Summary

Refactor is solid. The adapter extraction is clean, Bedrock routing is improved, and the shift from raw tracebacks to structured error messages is a meaningful UX and security improvement. The one functional issue worth fixing before merge is the "quota" marker causing rate-limit errors (specifically Bedrock ThrottlingExceptions) to render as credits errors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants