An AI-powered /ask endpoint for static sites on Cloudflare Pages. Adds Q&A to any markdown-based site using Cloudflare Workers AI. Compatible with the NLWeb protocol.
- Hybrid search: keyword scoring (with log-scaled occurrence capping) + semantic embeddings
- LLM answers: generates natural language answers grounded in your content, with streaming support
- Source filtering: extracts sources the LLM actually referenced (by URL and title matching), with fallback
- Search weighting: per-document
searchWeightvia frontmatter or content type defaults — no hardcoded type boosts - Conversation context: multi-turn follow-ups via the
prevparameter, with automatic query augmentation for vague follow-ups - Prompt caching: uses Cloudflare's
x-session-affinityheader so follow-up queries reuse cached prompt state - Debug mode: append
?debug=trueto see timing, retrieval scores, index size, and model info - NLWeb protocol: compatible with AI agents that speak NLWeb
- Zero infrastructure: no database, no vector store — everything runs on Cloudflare's free/cheap tiers
functions/
ask.js # Entry point (Cloudflare Pages Function)
_ask/
config.js # Constants, model config, helpers
retrieval.js # Search, scoring, embedding, query augmentation
generation.js # LLM prompting, context building, source extraction
nlweb.config.mjs # Site configuration (edit this)
generate-index.mjs # Build-time index generator
Clone the repo or copy the files directly into your project.
Copy nlweb.config.mjs to your project root and update it:
export default {
site: 'yoursite.com',
siteUrl: 'https://yoursite.com',
siteDescription: 'A blog by You about your topics.',
contentDirs: [
{ dir: 'src/content/blog', type: 'BlogPosting', baseUrl: '/', defaultSearchWeight: 1.0 },
{ dir: 'src/content/pages', type: 'WebPage', baseUrl: '/', defaultSearchWeight: 1.2 },
],
outputDir: 'src/generated',
indexFile: 'nlweb-index',
indexImport: '../src/generated/nlweb-index.mjs',
embeddingModel: '@cf/baai/bge-base-en-v1.5',
chatModel: '@cf/meta/llama-3.3-70b-instruct-fp8-fast',
maxContextChars: 10000,
maxEmbedChars: 2000,
maxTokens: 512,
temperature: 0.3,
maxQueryLength: 500,
aiTimeoutMs: 10000,
queryAliases: [],
typeLabels: {
'WebPage': 'page',
'BlogPosting': 'blog post',
'VideoObject': 'video',
},
};CF_ACCOUNT_ID=your-account-id CF_API_TOKEN=your-api-token node generate-index.mjsOr add it to your build scripts:
{
"scripts": {
"prebuild": "CF_ACCOUNT_ID=xxx CF_API_TOKEN=xxx node generate-index.mjs"
}
}The embeddings are optional -- if you don't set the Cloudflare credentials, the index is generated without embeddings and search falls back to keyword-only.
Copy the entire functions/ directory (including _ask/) into your project root. The import path in functions/ask.js expects nlweb.config.mjs and the generated index at the paths configured above.
In your Cloudflare Pages project settings:
- Go to Settings -> Functions -> Bindings
- Add a Workers AI binding with variable name
AI
Parameters:
| Parameter | Required | Description |
|---|---|---|
q / query |
yes | Natural language query (max 500 chars) |
mode |
no | list (default), summarize, generate, or stream |
site |
no | Site identifier (default: your configured site) |
prev |
no | JSON array of {query, answer} objects for conversation context |
decontextualized_query |
no | Pre-processed query (bypasses the raw query) |
query_id |
no | Request tracking ID (auto-generated if omitted) |
debug |
no | Set to true for timing and retrieval diagnostics |
{
"query_id": "uuid",
"site": "yoursite.com",
"mode": "generate",
"query": "your question",
"results": [
{
"url": "/post-slug/",
"name": "Post Title",
"site": "yoursite.com",
"score": 42,
"description": "Post excerpt...",
"schema_object": { }
}
],
"answer": "The answer with [inline links](https://yoursite.com/post/)...",
"sources": [
{ "url": "https://yoursite.com/post/", "title": "Post Title" }
]
}Returns a text/event-stream with Server-Sent Events:
data: {"token": "The "}
data: {"token": "answer "}
data: {"token": "is..."}
data: {"sources": [...], "done": true}
On error or timeout, the final event includes an error field. If the streamed answer is too short/empty, a fallback answer is sent instead.
list: returns search results only (no AI, fast)summarize: returns results + AI-generated summarygenerate: returns results + AI-generated answer with inline linksstream: returns results via SSE with real-time token streaming (recommended for interactive UIs)
The index generator expects markdown files with frontmatter:
---
title: "Post Title"
publishDate: 2024-01-15 # or `date:`
excerpt: "Optional excerpt"
categories: # or `tags:`
- Category
draft: true # drafts are excluded
searchWeight: 1.5 # optional, overrides the content type default
---Each document gets a searchWeight multiplier applied to its blended (keyword + semantic) score. This replaces hardcoded type-based boosts with a flexible per-document system.
- Set
defaultSearchWeighton each content type in config (e.g., 1.2 for pages, 0.5 for video transcripts) - Override per-document with
searchWeightin frontmatter - Default is
1.0if neither is set
Expand domain-specific abbreviations in search queries by adding regex patterns to queryAliases in config:
queryAliases: [
['\\bwp\\b', 'wordpress'],
['\\bcms share\\b', 'cms market share'],
],Each entry is [regexPattern, replacement]. Patterns are applied with case-insensitive global flags.
- Embeddings (build time): ~100 posts = free tier. Cached, so subsequent builds cost nothing for unchanged content.
- Query embedding (runtime): 1 embedding call per question. Negligible.
- LLM answer (runtime): 1 Llama 3.3 70B call per
generate/summarize/streamrequest. Free tier covers light usage.
MIT