feat(grounding): Tavily web search, opt-in via TAVILY_API_KEY#7
Merged
Conversation
Reduces hallucination on the things LLMs reliably get wrong (API signatures, version-specific behaviour, library APIs). - app/tools.py — build_tools() returns [TavilySearch(max_results=3, include_answer="basic", search_depth="basic")] when TAVILY_API_KEY is set; [] otherwise. The 3 sources + Tavily-summarised answer keep the LLM-side payload around ~300 tokens vs raw snippet dumps. - app/agent.py — already passes tools=build_tools() and computes with_grounding=bool(tools); no change needed. - app/prompts/tutor.py — already wires rule 9 conditionally via build_system_prompt(with_grounding=); no change needed. Tests: - test_tools.py updated — locks both branches (key set / unset) + defensive default-disabled test. - test_agent.py adds TestSystemPromptToggle — spies on build_system_prompt to assert with_grounding flag matches bool(build_tools()). Catches the silent regression where the prompt would tell the model to call a tool that doesn't exist. Docs: - ADR-006 — Tavily over DuckDuckGo (snippet quality, lib stability) / over context7 (mismatch — indexes libs, not Python core) / over custom RAG over docs.python.org (out of scope). - README + PRD §2.3 reflect "delivered" status. 34 tests, 100% coverage on app/. Without TAVILY_API_KEY the v1.0.0 behaviour is preserved exactly; with it the agent picks up grounding + rule 9 automatically on next session.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds the optional grounding upgrade outlined in ADR-006: a
TavilySearchtool wired into the agent whenTAVILY_API_KEYis set. Reduces hallucination on the things LLMs reliably get wrong — API signatures, version-specific behaviour, library APIs.Changes
app/tools.py—build_tools()returns[TavilySearch(max_results=3, include_answer="basic", search_depth="basic")]when the env var is set;[]otherwise. The Tavily-summarised answer + 3 sources keep the LLM-side payload around ~300 tokens vs dumping raw snippets.app/agent.py— no change required (already computedwith_grounding=bool(tools)and passedtools=build_tools()).app/prompts/tutor.py— no change required (rule 9 already toggleable viabuild_system_prompt(with_grounding=)).tests/test_tools.py— locks both branches with a fake key; defensive "grounding disabled by default" test.tests/test_agent.py—TestSystemPromptToggleclass spies onbuild_system_promptto assertwith_groundingmatchesbool(build_tools()). Catches the silent regression where the prompt would instruct the model to call a tool that doesn't exist.docs/adr/006-grounding-via-tavily-search.md— Tavily over DuckDuckGo (snippet quality, scraper stability) / over context7 (indexes libs, not Python core) / over custom RAG overdocs.python.org(out of scope).planned→delivered, with link to ADR-006.Operator step (after merge, optional)
To enable grounding in production:
Without the secret, the deploy keeps current
v1.0.0behaviour exactly (tool list is empty, system prompt rule 9 is dropped). The agent never lies about what it can do.Test plan
uv run ruff check ./format --check— passesuv run mypy app— passes (strict)uv run pytest— 34 passed, 100% coverage onapp/TAVILY_API_KEYset: smoke a version-specific question (e.g.asyncio.TaskGroup) → Chainlit Step showstavily_searchcall