feat: Refactor existing LLM resolution into provider adapters by andrejvysny · Pull Request #55 · huggingface/ml-intern

andrejvysny · 2026-04-22T11:54:13Z

Summary

refactor the existing Anthropic, OpenAI, and Hugging Face router LiteLLM param resolution into small provider adapter classes
keep existing CLI suggestions and web model/session behavior unchanged while reusing shared model-id validation in the CLI switcher
centralize provider error classification so the CLI, /model, session errors, and /health/llm show clean auth/quota/model-not-found messages instead of raw provider traces
add focused regression tests for adapter parity, validation edge cases, and provider error handling

Verification

uv run --with pytest --with pytest-asyncio pytest -q
uv run ml-intern "Say Hello!" --model "MiniMaxAI/MiniMax-M2.7"
ANTHROPIC_API_KEY= uv run ml-intern "Say Hello!" --model "anthropic/claude-opus-4-6"
OPENAI_API_KEY= uv run ml-intern "Say Hello!" --model "openai/gpt-5"
uv run ml-intern "Say Hello!" --model "fake-org/not-a-real-model-12345"

Resolve conflicts by taking upstream for llm_params.py (will be rewritten as thin adapter dispatcher) and main.py (model_switcher extraction supersedes our changes). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Split NativeAdapter into AnthropicAdapter (thinking config + output_config.effort) and OpenAIAdapter (reasoning_effort top-level). Each adapter owns its accepted effort set and raises UnsupportedEffortError in strict mode, preserving the effort_probe cascade with zero changes to effort_probe.py or agent_loop.py. llm_params.py becomes a thin dispatcher delegating to resolve_adapter().build_params() while keeping the litellm effort-validation patch and re-exporting UnsupportedEffortError. model_switcher.py reads suggested models from the adapter registry instead of maintaining a separate SUGGESTED_MODELS list. backend/routes/agent.py replaces AVAILABLE_MODELS with build_model_catalog(). OpenCodeGoAdapter deferred to PR huggingface#60. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Restore the existing web model behavior so PR huggingface#55 stays a behavior-preserving refactor while keeping shared runtime and CLI validation logic.

andrejvysny · 2026-04-23T11:07:07Z

Pushed a4a7532. Narrowed the PR back to adapter extraction + validation reuse only, restored existing web behavior, and reran the full Python test suite (.......................... [100%]
26 passed in 1.80s → 26 passed).

Classify auth, credits, and missing-model failures once so the CLI, model switcher, and health checks show clean user-facing errors instead of raw provider traces.

andrejvysny · 2026-04-23T11:45:54Z

Heads up: #60 builds on top of this refactor and adds the follow-up provider adapters / additional provider support, so #55 should land first.

# Conflicts: # agent/core/llm_params.py

…_params

akseljoonas · 2026-04-24T14:08:21Z

@claude please review this PR. Focus on correctness, security, LiteLLM/Bedrock routing impact, and whether it's safe to merge against current main. Keep it concise and prefer inline comments where it matters.

claude · 2026-04-24T14:08:39Z

Claude finished @akseljoonas's task in 3m 6s —— View job

PR Review: feat/provider-adapters

Read repo context and gather diff
Review provider_adapters.py, llm_errors.py, llm_params.py
Review agent_loop.py, model_switcher.py, backend files
Check LiteLLM/Bedrock routing impact
Security review
Assess correctness and test coverage

Verdict: SAFE TO MERGE — clean refactor, no regressions found, good test coverage. A few nits and one minor misclassification edge case worth knowing before shipping.

Correctness

_CREDITS_MARKERS runs before _RATE_LIMIT_MARKERS and contains "quota" (llm_errors.py:45)

Rate-limit errors that mention quota (e.g. "rate quota exceeded", AWS Bedrock: "ThrottlingException: You have exceeded your quota") will be classified as credits instead of rate_limit, showing users "Insufficient API credits" when they're actually just throttled. The fix is simple: move the "quota" marker to _RATE_LIMIT_MARKERS or add a tighter quota check only after confirming no rate-limit signal is present. Fix this →

Removed institutional knowledge from llm_params.py

The old _patch_litellm_effort_validation and _resolve_llm_params docstrings explained why Anthropic needs thinking + output_config as top-level kwargs (not extra_body), and why the LiteLLM monkey-patch is needed. That context is now gone from the codebase (it will still be in git history but won't be visible to the next reader). The shortened docstring ("Patch LiteLLM's Anthropic effort validation for Claude Opus 4.7") doesn't capture the nuance. Consider keeping at least the top-level "why" sentence.

_friendly_error_message shim in agent_loop.py (line 192) — kept as a one-liner wrapper. It's harmless but dead weight since it's a private function used only in that file. Not blocking.

Security

Traceback no longer sent to users (agent_loop.py:834-840) — The change from str(e) + "\n" + traceback.format_exc() to render_llm_error_message(e) is a net security improvement. Raw tracebacks shown in chat UI could leak file paths and internal variable names. The full trace is now captured via logger.debug(..., exc_info=True) instead.

render_llm_error_message falls back to str(error) for unknown exceptions — this means novel provider errors still get surfaced verbatim via session_manager and agent_loop. No regression vs. before (old code also did str(e)), but worth noting that any provider that embeds tokens/credentials in exception messages (not common, but possible) will still leak them. No action needed here.

"401" / "402" as bare substring markers in _AUTH_MARKERS/_CREDITS_MARKERS — could theoretically false-positive on model IDs like gpt-4-01 embedded in an error. Negligible risk in practice for LLM error strings.

LiteLLM / Bedrock Routing

Bedrock routing is preserved exactly — BedrockAdapter.build_params returns {"model": model_name}, matching the old if model_name.startswith("bedrock/"): return {"model": model_name} one-liner. ✓

model_switcher.py now correctly skips HF catalog lookup for bedrock/ models (model_switcher.py:64) — the old code only bypassed the catalog for anthropic/ and openai/. Bedrock models would have hit _print_hf_routing_info and gotten a spurious catalog miss warning. This is a correctness fix.

is_valid_model_id now accepts bedrock/ IDs — old code used a raw "/" check and would have accepted bedrock/foo incidentally; new code routes through BedrockAdapter.allows_model_name → _has_model_suffix, same result. ✓

_has_model_suffix validation for Bedrock — accepts any non-empty token after bedrock/, including dotted inference profile IDs (us.anthropic.claude-opus-4-7) and ARN prefixes (arn:aws:bedrock:... → tail is arn). This is intentionally permissive since Bedrock model ID formats vary widely. ✓

Minor Nits

_NETWORK_MARKERS contains "connect" as a bare substring — matches "reconnect", "disconnect", etc. Fine for LLM error context but worth awareness.
health_error_type intentionally maps "model" and "provider" errors to "unknown" — correct since the health API contract doesn't include those types. The test test_health_error_type_keeps_public_categories_stable guards this.
HfRouterAdapter.matches is intentionally a catch-all (anything not starting with a provider prefix). The routing/validation split (matches vs allows_model_name) is clean and correct.
Test test_model_switcher_shows_clean_inconclusive_warning asserts "timeout talking to provider" is not in the output — but ProbeInconclusive("timeout talking to provider") would be classified as "network" by classify_llm_error, so render_llm_error_message returns "The model provider is unavailable or timed out.". ✓

Summary

Refactor is solid. The adapter extraction is clean, Bedrock routing is improved, and the shift from raw tracebacks to structured error messages is a meaningful UX and security improvement. The one functional issue worth fixing before merge is the "quota" marker causing rate-limit errors (specifically Bedrock ThrottlingExceptions) to render as credits errors.

Refactor LLM param resolution into adapters

267618d

andrejvysny force-pushed the feat/provider-adapters branch from 064c99b to 267618d Compare April 22, 2026 13:59

andrejvysny changed the title ~~Add config-driven provider adapters for local and routed LLMs~~ Refactor existing LLM resolution into provider adapters Apr 22, 2026

andrejvysny added 2 commits April 22, 2026 18:13

Merge upstream main into provider adapters

a0eba84

Remove unused session_hf_token parameter from NativeAdapter.build_params

dcb4aff

andrejvysny marked this pull request as ready for review April 22, 2026 16:33

andrejvysny mentioned this pull request Apr 22, 2026

Draft: Add new providers via adapters #60

Draft

andrejvysny added 2 commits April 22, 2026 20:08

Unify model catalog around provider adapters

2f85650

Add OpenCode Go provider adapter

f1f9116

andrejvysny marked this pull request as draft April 22, 2026 18:43

Pro777 mentioned this pull request Apr 23, 2026

Feature request: Allow routing to local OpenAI-compatible backends (Ollama / vLLM / LM Studio) #67

Closed

andrejvysny and others added 4 commits April 23, 2026 10:29

Merge upstream main into provider adapters

a5bdf29

Resolve conflicts by taking upstream for llm_params.py (will be rewritten as thin adapter dispatcher) and main.py (model_switcher extraction supersedes our changes). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Merge upstream main into provider adapters

e5b113a

Trim provider adapters to params and validation

a4a7532

Restore the existing web model behavior so PR huggingface#55 stays a behavior-preserving refactor while keeping shared runtime and CLI validation logic.

Clean provider error handling

d559a37

Classify auth, credits, and missing-model failures once so the CLI, model switcher, and health checks show clean user-facing errors instead of raw provider traces.

andrejvysny marked this pull request as ready for review April 23, 2026 11:34

andrejvysny changed the title ~~Refactor existing LLM resolution into provider adapters~~ feat: Refactor existing LLM resolution into provider adapters Apr 23, 2026

andrejvysny added 4 commits April 24, 2026 11:02

Merge remote-tracking branch 'origin/main' into feat/provider-adapters

958c893

# Conflicts: # agent/core/llm_params.py

feat: add error handling for missing provider adapter in _resolve_llm…

88a95a6

…_params

refactor: revert unnecessary formatting changes

590496f

refactor: cleanup before merge

cc6cff7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Refactor existing LLM resolution into provider adapters#55

feat: Refactor existing LLM resolution into provider adapters#55
andrejvysny wants to merge 14 commits intohuggingface:mainfrom
andrejvysny:feat/provider-adapters

andrejvysny commented Apr 22, 2026 •

edited

Loading

Uh oh!

andrejvysny commented Apr 23, 2026

Uh oh!

andrejvysny commented Apr 23, 2026

Uh oh!

akseljoonas commented Apr 24, 2026

Uh oh!

claude Bot commented Apr 24, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

andrejvysny commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Verification

Uh oh!

andrejvysny commented Apr 23, 2026

Uh oh!

andrejvysny commented Apr 23, 2026

Uh oh!

akseljoonas commented Apr 24, 2026

Uh oh!

claude Bot commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: feat/provider-adapters

Correctness

Security

LiteLLM / Bedrock Routing

Minor Nits

Summary

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

andrejvysny commented Apr 22, 2026 •

edited

Loading

claude Bot commented Apr 24, 2026 •

edited

Loading