Skip to content

Architecture

Syed Asif edited this page Jun 8, 2026 · 10 revisions

Architecture

web-search-mcp follows a modular architecture with a clear separation of concerns to ensure maintainability and extensibility.

Project Structure

  • server.py: The entry point of the FastMCP server. It defines 9 MCP tools and exposes them to clients.
  • ddg.py: DuckDuckGo search logic (ddg_search) and web content extraction (fetch_page) using trafilatura — consolidated from the former search.py and reader.py.
  • reddit/: Keyless Reddit search via RSS + shreddit enrichment (reddit_search_tool). Multi-tier pipeline with query expansion and parallel fan-out.
  • hackernews.py: Hacker News search via Algolia API (search_hackernews). Free, no API key needed.
  • github.py: GitHub Issues/PRs search (search_github). Optionally authenticates via GITHUB_TOKEN or gh CLI.
  • polymarket.py: Polymarket prediction market search via Gamma API (search_polymarket). Free, no API key needed.
  • groq_tools.py: Groq-powered tools — browse (GPT-OSS interactive search), research (auto-selecting compound search), analyze_page (URL visit + interpretation).
  • groq_client.py: Shared Groq API client wrapper with retry logic.
  • http_client.py: Shared HTTP client for keyless API calls with timeout handling.
  • models.py: Contains Pydantic models for strict request/response validation.
  • config.py: Manages application settings and API keys using pydantic-settings.
  • utils.py: Provides shared utility functions for error formatting and rate limiting.

Design Principles

  1. Modular Tooling: Each search engine or extraction method is isolated in its own module.
  2. Strict Validation: Pydantic models ensure that all inputs and outputs adhere to the expected schema.
  3. Consistent Error Handling: A centralized utility (utils.py) ensures that all MCP tools return errors in a consistent format.
  4. Keyless by Default: Reddit, Hacker News, Polymarket, and DuckDuckGo tools require no API keys. Only Groq tools require a GROQ_API_KEY.
  5. Tiered Enrichment: Sources like Reddit and GitHub use multi-tier pipelines (quick → default → deep) to balance speed against result depth and comment enrichment.

Data Flow

LLM Client → MCP Tool Call → server.py route → source module → external API
                                                         ↓
                                              ← structured dict / markdown
                                              ← ErrorResponse on failure

Home | Tools | Development | Configuration

Clone this wiki locally