A transparent search-result vectorizer for AI agents.
sift wraps Brave Search and assigns every result a structured quality vector — a 9-level tier, editorial standards, commercial intent, authoritative weight, and signals. Downstream agents receive the augmented SERP plus aggregate metrics so they can reason about source quality, not just content.
The prompt, the tier definitions, and the reasons behind every classification are in source and in the output. sift does not pre-filter — it surfaces what's there and lets the agent decide.
Drop-in MCP server — works with Claude Desktop, Claude Code, Cursor, Windsurf, Zed, or any framework that speaks MCP.
📖 Full documentation: https://enmito.github.io/sift-mcp/
sift's design is deliberately shaped around specific, recurring weaknesses of agent-driven web search. Each feature maps to one failure mode — see full mapping. Headline cases:
- Vocabulary mismatch — general-phrased queries (
what makes a good leader) return vendor content; academic phrasing of the same concept returns peer-reviewed sources. sift'smean_authoritative_weightmakes this visible so the agent can re-query. - Structural vendor dominance — for SaaS-operational queries, the entire SERP is commercial thought leadership with nothing to triangulate against. sift fires a specific summary hint so the agent doesn't treat VC blog posts as research.
- Affiliate listicle contamination — even reputable publishers produce "Best X" commercial articles. sift classifies the specific article, not the brand.
- TLD-anchored false authority —
.edupages include both peer-reviewed papers and degree-program marketing. sift's classifier distinguishes them instead of blanket-trusting the TLD. - Hosted-search opacity — Tavily / Exa return opaque scores. Every sift classification carries
reason+signals[], making the decision auditable.
- Classifies each SERP result into a 9-tier quality vector (
regulated_primary→content_farm) with reasoning - Aggregates the SERP into landscape metrics — tier distribution, vendor dominance, diversity entropy
- Surfaces
summary_hints[]— meta-observations the calling agent incorporates verbatim - Logs every classification for drift review and prompt refinement
- Rank or re-order results — Brave's original ranking is preserved
- Fetch page content — classification uses SERP metadata only (URL, title, description)
- Filter hard by default —
recommended_action(keep/tag/block) is advisory; the agent decides - Replace Google Safe Browsing —
safety_flagis a parallel axis, not a tier
- In-process fine-tuning — prompt refinement is offline, informed by the observation log
- Pre-filter below a threshold — opacity defeats the entire design
- Generate sources that aren't in the SERP — diagnostic, not generative
sift is built for agents that need trust-aware search, not for humans browsing results. If a binary block/keep decision is enough for your use case, a Chrome extension handles it without a server. sift's value shows up when the receiver of the results is a reasoning system — one that can weigh a vendor blog differently from a primary source if only the classification is made visible.
| Hosted (Tavily / Exa) | sift | |
|---|---|---|
| Ranking / filter logic | Proprietary, opaque scores | Open, in source |
| Per-result quality classification | None (Exa returns content excerpts / summaries; neither returns a tier label) | quality_vector + signals[] + reason |
| SERP-level metrics | None | aggregate_vector (tier distribution, diversity entropy, vendor dominance) |
| Agent guidance (output) | None | summary_hints[] — meta-observations agents must incorporate |
| Filter customization | Domain / date / topic filters only | Full (prompt, tier policy, recommend policy) |
| Model choice | Fixed to vendor (Tavily /research offers mini/pro tiers; no BYO-model on either) |
BYO — any OpenAI-compatible endpoint |
| Lock-in | API contract | OSS, no lock-in |
git clone https://github.com/enmito/sift-mcp.git
cd sift-mcp
npm install
cp .env.example .env # fill in BRAVE_API_KEY and LLM_API_KEY
npm run buildRegister with your MCP client:
{
"mcpServers": {
"sift": {
"command": "node",
"args": ["/absolute/path/to/sift-mcp/dist/index.js"],
"env": {
"BRAVE_API_KEY": "...",
"LLM_API_KEY": "...",
"LLM_JUDGE_ENABLED": "true",
"LLM_ENDPOINT": "https://openrouter.ai/api/v1",
"LLM_MODEL": "meta-llama/llama-3.3-70b-instruct"
}
}
}
}Restart the client. The search_vectorized tool becomes available.
Full install, configuration, and concept documentation: https://enmito.github.io/sift-mcp/guides/getting-started/
Writing an agent that calls sift? See AGENTS.md for the downstream contract — how to read the output, when to re-query, and use-case patterns.
Bug reports, backend implementations, and prompt improvements are welcome. See CONTRIBUTING.md for scope, development setup, and the conventions around prompt / taxonomy changes.
MIT.