Skip to content

enmito/sift-mcp

Repository files navigation

sift-mcp

A transparent search-result vectorizer for AI agents.

sift wraps Brave Search and assigns every result a structured quality vector — a 9-level tier, editorial standards, commercial intent, authoritative weight, and signals. Downstream agents receive the augmented SERP plus aggregate metrics so they can reason about source quality, not just content.

The prompt, the tier definitions, and the reasons behind every classification are in source and in the output. sift does not pre-filter — it surfaces what's there and lets the agent decide.

Drop-in MCP server — works with Claude Desktop, Claude Code, Cursor, Windsurf, Zed, or any framework that speaks MCP.

📖 Full documentation: https://enmito.github.io/sift-mcp/


Agent search failure modes sift addresses

sift's design is deliberately shaped around specific, recurring weaknesses of agent-driven web search. Each feature maps to one failure mode — see full mapping. Headline cases:

  • Vocabulary mismatch — general-phrased queries (what makes a good leader) return vendor content; academic phrasing of the same concept returns peer-reviewed sources. sift's mean_authoritative_weight makes this visible so the agent can re-query.
  • Structural vendor dominance — for SaaS-operational queries, the entire SERP is commercial thought leadership with nothing to triangulate against. sift fires a specific summary hint so the agent doesn't treat VC blog posts as research.
  • Affiliate listicle contamination — even reputable publishers produce "Best X" commercial articles. sift classifies the specific article, not the brand.
  • TLD-anchored false authority.edu pages include both peer-reviewed papers and degree-program marketing. sift's classifier distinguishes them instead of blanket-trusting the TLD.
  • Hosted-search opacity — Tavily / Exa return opaque scores. Every sift classification carries reason + signals[], making the decision auditable.

What sift does

  • Classifies each SERP result into a 9-tier quality vector (regulated_primarycontent_farm) with reasoning
  • Aggregates the SERP into landscape metrics — tier distribution, vendor dominance, diversity entropy
  • Surfaces summary_hints[] — meta-observations the calling agent incorporates verbatim
  • Logs every classification for drift review and prompt refinement

What sift does not do

  • Rank or re-order results — Brave's original ranking is preserved
  • Fetch page content — classification uses SERP metadata only (URL, title, description)
  • Filter hard by default — recommended_action (keep / tag / block) is advisory; the agent decides
  • Replace Google Safe Browsing — safety_flag is a parallel axis, not a tier

What sift won't do

  • In-process fine-tuning — prompt refinement is offline, informed by the observation log
  • Pre-filter below a threshold — opacity defeats the entire design
  • Generate sources that aren't in the SERP — diagnostic, not generative

Positioning

sift is built for agents that need trust-aware search, not for humans browsing results. If a binary block/keep decision is enough for your use case, a Chrome extension handles it without a server. sift's value shows up when the receiver of the results is a reasoning system — one that can weigh a vendor blog differently from a primary source if only the classification is made visible.

Hosted (Tavily / Exa) sift
Ranking / filter logic Proprietary, opaque scores Open, in source
Per-result quality classification None (Exa returns content excerpts / summaries; neither returns a tier label) quality_vector + signals[] + reason
SERP-level metrics None aggregate_vector (tier distribution, diversity entropy, vendor dominance)
Agent guidance (output) None summary_hints[] — meta-observations agents must incorporate
Filter customization Domain / date / topic filters only Full (prompt, tier policy, recommend policy)
Model choice Fixed to vendor (Tavily /research offers mini/pro tiers; no BYO-model on either) BYO — any OpenAI-compatible endpoint
Lock-in API contract OSS, no lock-in

Quick start

git clone https://github.com/enmito/sift-mcp.git
cd sift-mcp
npm install
cp .env.example .env   # fill in BRAVE_API_KEY and LLM_API_KEY
npm run build

Register with your MCP client:

{
  "mcpServers": {
    "sift": {
      "command": "node",
      "args": ["/absolute/path/to/sift-mcp/dist/index.js"],
      "env": {
        "BRAVE_API_KEY": "...",
        "LLM_API_KEY": "...",
        "LLM_JUDGE_ENABLED": "true",
        "LLM_ENDPOINT": "https://openrouter.ai/api/v1",
        "LLM_MODEL": "meta-llama/llama-3.3-70b-instruct"
      }
    }
  }
}

Restart the client. The search_vectorized tool becomes available.

Full install, configuration, and concept documentation: https://enmito.github.io/sift-mcp/guides/getting-started/

Writing an agent that calls sift? See AGENTS.md for the downstream contract — how to read the output, when to re-query, and use-case patterns.


Contributing

Bug reports, backend implementations, and prompt improvements are welcome. See CONTRIBUTING.md for scope, development setup, and the conventions around prompt / taxonomy changes.

License

MIT.

About

MCP server that wraps Brave Search and augments each result with a transparent quality vector for AI agents.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors