Skip to content

v0.30.0

Choose a tag to compare

@manykarim manykarim released this 16 Feb 20:06
· 108 commits to main since this release
70a245d

rf-mcp v0.30.0 Release Notes

Small LLM Optimization (ADR-006 / ADR-007 / ADR-008 / ADR-009 / ADR-010)

This release focuses on making rf-mcp usable with small and medium-sized LLMs (8K-32K context) that previously failed to operate the tool surface reliably. Five new ADRs introduce a layered optimization strategy.

Intent Action Tool (ADR-007)

New intent_action tool provides a library-agnostic abstraction over execute_step. Instead of requiring the LLM to know that Browser Library uses Click while SeleniumLibrary uses Click Element, a single tool handles intent resolution:

intent_action(intent="click", target="text=Login", session_id="...")

Supported intents: navigate, click, fill, hover, select, assert_visible, extract_text, wait_for. The server resolves the intent to the correct keyword and locator syntax based on the session's active library.

Dynamic Tool Profiles (ADR-006)

Tool profiles dynamically control which MCP tools are visible to the LLM. Smaller models see fewer tools with compact descriptions, reducing token overhead from ~7,000 tokens to ~1,000 tokens for the active profile.

Available profiles: browser_exec, api_exec, discovery, minimal_exec, full. Profiles can be activated via manage_session(action="set_tool_profile", tool_profile="browser_exec") or the ROBOTMCP_TOOL_PROFILE environment variable.

Response Optimization (ADR-008)

Configurable response verbosity (detail_level parameter on most tools) with three levels: minimal, standard, full. Reduces response token consumption for models with limited context budgets.

Type-Constrained Parameters (ADR-009)

All string-typed action/mode/strategy parameters now use Literal types, producing enum constraints in the JSON Schema. This eliminates value hallucination (e.g., an LLM inventing action="setup" instead of action="init").

Affected parameters across 9 tools:

  • manage_session action (20 valid values including aliases)
  • intent_action intent (8 values)
  • find_keywords strategy (4 values)
  • execute_step mode (2 values)
  • execute_flow structure (3 values)
  • recommend_libraries mode (5 values)
  • analyze_scenario context (6 values)
  • manage_library_plugins action (3 values)
  • manage_attach action (11 values)
  • detail_level on all tools (3 values)

All values accept case-insensitive input with whitespace trimming.

Parameter Coercion and Guided Recovery (ADR-010)

Server-side resilience for common small LLM mistakes:

  • Array coercion: libraries: "[\"Browser\", \"BuiltIn\"]" (string) is automatically parsed to ["Browser", "BuiltIn"] (array). Also handles comma-separated strings and single values.
  • Deprecated keyword guidance: GET, POST, PUT, DELETE from RequestsLibrary automatically map to their On Session equivalents with a hint in the response.
  • Session ID hints: Init responses include explicit guidance to reuse the session ID.
  • Catalog strategy guidance: find_keywords(strategy="catalog") error message now explains that an active session is required.

Concurrency Fix (v0.29.1 → v0.29.2)

  • Race condition in keyword execution: Removed _suppress_stdout() from the keyword execution path. The os.dup2(2, 1) redirect is process-global and caused a race where concurrent asyncio.to_thread() keyword executions could redirect MCP responses to stderr. console='none' in RobotSettings is sufficient to suppress RF output during runner.run().
  • BuiltIn keyword availability: Added safety checks before every keyword execution to verify BuiltIn is in the RF namespace's keyword store, re-importing if missing.

OpenCode E2E Testing with Small LLMs

New E2E test infrastructure for validating rf-mcp with small LLMs via OpenRouter:

  • tests/e2e/test_intent_action_models.py: pytest-based tests verifying intent_action tool discovery and usage across multiple models. Gated by RUN_INTENT_E2E=true.
  • tests/e2e/run_realistic_e2e.py: Standalone script running realistic multi-step prompts (TodoMVC, REST API, Demoshop) with tool call efficiency metrics.
  • CI integration: New opencode-e2e job in the weekly E2E workflow runs Qwen3 Coder and GLM-4.5 AIR via OpenRouter.
  • Model override: OPENCODE_MODELS env var allows overriding the default model list.

Tested models: GLM-4.7, GLM-4.5 AIR, gpt-oss-20b, Qwen3 Coder, Llama-4 Scout, GLM-4.7 Flash.

Navigate Intent Fallback

When intent_action(intent="navigate", target="https://...") fails because no browser or page is open, the server now detects the error and automatically executes the appropriate recovery sequence before retrying:

  • Browser Library: New Browser + New Page (no browser) or just New Page (page closed)
  • SeleniumLibrary: Open Browser about:blank chrome

The response includes fallback_applied: true and fallback_steps count so the LLM knows recovery happened. Small LLMs no longer need to handle "no browser open" errors themselves, saving 2-4 tool calls.

Strict Mode Hint Improvement

When a Browser Library keyword fails with "resolved to N elements" (Playwright strict mode), the error hint now:

  • Shows the actual element count in the message
  • Suggests >> nth=0 with a note that nth is zero-based
  • Includes >> nth=1 and >> visible=true alternatives
  • Uses the actual failing keyword name in examples

Bug Fixes

  • Pattern store cleanup: cleanup_old_entries(max_age_days=0) now correctly removes all entries (changed > to >= comparison).
  • Windows CI: Fixed benchmark threshold comparisons and fd-redirect tests for Windows compatibility.
  • Build test suite newline escaping: build_test_suite now escapes literal \n, \r, \t in keyword arguments so they don't break the generated .robot file's line structure. Previously, an argument like "123 Flow Street\nSan Francisco" would produce a line break in the output instead of an escaped \\n.

Test Suite

Suite v0.29.0 v0.30.0
Unit 2,286 3,138
Integration 397 510
Benchmarks 140 256

New coverage areas: intent resolution (579 tests), tool profile services (379), response optimization (517), ADR-009 type aliases (663), ADR-010 coercion (840), ADR integration (366), ADR benchmarks (895), intent action E2E (7 models), navigate fallback (47 tests), strict mode hints (6 tests), argument escaping (9 tests).