Skip to content

feat(grounding): Tavily web search, opt-in via TAVILY_API_KEY#7

Merged
heitor-am merged 1 commit into
mainfrom
feat/grounding
Apr 19, 2026
Merged

feat(grounding): Tavily web search, opt-in via TAVILY_API_KEY#7
heitor-am merged 1 commit into
mainfrom
feat/grounding

Conversation

@heitor-am

Copy link
Copy Markdown
Owner

Summary

Adds the optional grounding upgrade outlined in ADR-006: a TavilySearch tool wired into the agent when TAVILY_API_KEY is set. Reduces hallucination on the things LLMs reliably get wrong — API signatures, version-specific behaviour, library APIs.

Changes

  • app/tools.pybuild_tools() returns [TavilySearch(max_results=3, include_answer="basic", search_depth="basic")] when the env var is set; [] otherwise. The Tavily-summarised answer + 3 sources keep the LLM-side payload around ~300 tokens vs dumping raw snippets.
  • app/agent.py — no change required (already computed with_grounding=bool(tools) and passed tools=build_tools()).
  • app/prompts/tutor.py — no change required (rule 9 already toggleable via build_system_prompt(with_grounding=)).
  • tests/test_tools.py — locks both branches with a fake key; defensive "grounding disabled by default" test.
  • tests/test_agent.pyTestSystemPromptToggle class spies on build_system_prompt to assert with_grounding matches bool(build_tools()). Catches the silent regression where the prompt would instruct the model to call a tool that doesn't exist.
  • docs/adr/006-grounding-via-tavily-search.md — Tavily over DuckDuckGo (snippet quality, scraper stability) / over context7 (indexes libs, not Python core) / over custom RAG over docs.python.org (out of scope).
  • README + PRD updated: grounding status planneddelivered, with link to ADR-006.

Operator step (after merge, optional)

To enable grounding in production:

fly secrets set TAVILY_API_KEY=tvly-... --app python-tutor-chatbot

Without the secret, the deploy keeps current v1.0.0 behaviour exactly (tool list is empty, system prompt rule 9 is dropped). The agent never lies about what it can do.

Test plan

  • uv run ruff check . / format --check — passes
  • uv run mypy app — passes (strict)
  • uv run pytest — 34 passed, 100% coverage on app/
  • Post-merge with TAVILY_API_KEY set: smoke a version-specific question (e.g. asyncio.TaskGroup) → Chainlit Step shows tavily_search call
  • Smoke a basic question (e.g. "o que é uma lista?") → answers directly, no tool call

Reduces hallucination on the things LLMs reliably get wrong (API
signatures, version-specific behaviour, library APIs).

- app/tools.py — build_tools() returns [TavilySearch(max_results=3,
  include_answer="basic", search_depth="basic")] when TAVILY_API_KEY
  is set; [] otherwise. The 3 sources + Tavily-summarised answer keep
  the LLM-side payload around ~300 tokens vs raw snippet dumps.
- app/agent.py — already passes tools=build_tools() and computes
  with_grounding=bool(tools); no change needed.
- app/prompts/tutor.py — already wires rule 9 conditionally via
  build_system_prompt(with_grounding=); no change needed.

Tests:
- test_tools.py updated — locks both branches (key set / unset) +
  defensive default-disabled test.
- test_agent.py adds TestSystemPromptToggle — spies on
  build_system_prompt to assert with_grounding flag matches
  bool(build_tools()). Catches the silent regression where the
  prompt would tell the model to call a tool that doesn't exist.

Docs:
- ADR-006 — Tavily over DuckDuckGo (snippet quality, lib stability) /
  over context7 (mismatch — indexes libs, not Python core) / over
  custom RAG over docs.python.org (out of scope).
- README + PRD §2.3 reflect "delivered" status.

34 tests, 100% coverage on app/. Without TAVILY_API_KEY the v1.0.0
behaviour is preserved exactly; with it the agent picks up grounding
+ rule 9 automatically on next session.
@heitor-am heitor-am merged commit 16dfb13 into main Apr 19, 2026
4 checks passed
@heitor-am heitor-am deleted the feat/grounding branch April 19, 2026 00:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant