AgentRail is a Cloudflare edge layer that gives known AI agents deterministic Markdown responses from the same URLs humans already visit.
Browser or search crawler -> /pricing -> origin HTML
Known AI agent -> /pricing -> generated Markdown if ready
Known AI agent -> /pricing -> origin HTML if Markdown is unavailableThe crawler runs in the background. Request handling never waits for extraction, so cache misses fall through to the original site without adding generation latency.
When a known AI agent requests a page that is not in KV yet, AgentRail returns the origin page and uses ctx.waitUntil to warm KV from that same origin response. A later AI-agent request can then receive the prepared Markdown.
flowchart TD
browser["Human browser"] --> worker["Cloudflare Worker route"]
search["Search crawler"] --> worker
ai["Known AI agent"] --> worker
worker --> classify{"Classify request"}
classify -->|"Browser, search crawler, unknown bot, asset, or non-GET/HEAD"| origin["Origin website HTML"]
classify -->|"Known AI agent"| kvcheck{"KV record exists?"}
kvcheck -->|"ready or fresh stale"| markdown["Return deterministic Markdown"]
markdown --> headers["text/markdown + x-ai-response-layer"]
kvcheck -->|"missing"| originfetch["Fetch origin HTML"]
originfetch --> firstbot["Return origin HTML to first bot"]
originfetch --> waituntil["ctx.waitUntil warmup"]
waituntil --> extract["Extract deterministic Markdown"]
extract --> store["Store page:<normalized-url> in AGENTRAIL_RESOURCES KV"]
kvcheck -->|"pending, failed, skipped, or too stale"| origin
cron["Cloudflare Cron Trigger"] --> sitemap["Fetch sitemap"]
sitemap --> crawl["Crawl sitemap URLs"]
crawl --> extract
store --> nextbot["Next AI-agent request"]
nextbot --> kvcheck
@agentrail/bot-detector: classifies AI agents, search crawlers, browsers, and unknown bots.@agentrail/markdown-extractor: deterministic HTML to Markdown extraction.@agentrail/crawler: sitemap parsing, link discovery, resource keys, and crawl processing.@agentrail/worker: Cloudflare Worker runtime.create-agentrail: scaffold generator for Cloudflare projects.
AgentRail expects Node 22 or newer. Current Wrangler 4 releases require it.
npm testThe repository uses Node's built-in test runner and has no runtime test dependency.
From this repository:
node --import tsx packages/create-agentrail/bin/create-agentrail.ts my-site \
--origin=https://example.com \
'--route=example.com/*' \
--schedule="0 */6 * * *"The CLI checks Cloudflare through Wrangler, reuses an existing AGENTRAIL_RESOURCES KV namespace if one is present, or creates it automatically if it is missing. When that setup succeeds, the generated project contains a Wrangler-compatible Worker entrypoint and config with the real KV namespace id already written into wrangler.jsonc. If automatic setup is skipped or fails, the config keeps a placeholder and the generated README explains the manual KV setup.
It also runs npm install inside the generated project by default, so the normal next step is deploy:
cd my-site
npm run deployAgentRail includes a Cron Trigger for background crawling. On a fresh Cloudflare account, open the Cloudflare dashboard and visit Workers & Pages once before the first deploy. Cloudflare creates the required workers.dev subdomain there. If npm run deploy fails with Cloudflare code: 10063, do that dashboard step and rerun the deploy command.
If you want to generate files only:
node --import tsx packages/create-agentrail/bin/create-agentrail.ts my-site \
--origin=https://example.com \
'--route=example.com/*' \
--skip-installIf you are offline, not logged into Wrangler, or want to wire Cloudflare later:
node --import tsx packages/create-agentrail/bin/create-agentrail.ts my-site \
--origin=https://example.com \
'--route=example.com/*' \
--skip-cloudflareThe generated wrangler.jsonc will contain this placeholder until you add the real KV namespace id:
{
"binding": "AGENTRAIL_RESOURCES",
"id": "replace-with-agentrail-resources-kv-id"
}If you already have a namespace id:
node --import tsx packages/create-agentrail/bin/create-agentrail.ts my-site \
--origin=https://example.com \
'--route=example.com/*' \
--kv-id=your-kv-namespace-idUse this when automatic Cloudflare setup was skipped or failed.
First make sure Wrangler is logged in:
npx wrangler loginCheck whether the namespace already exists:
npx wrangler kv namespace list --jsonIf the output includes a namespace with "title": "AGENTRAIL_RESOURCES", copy its "id".
If it does not exist, create it:
npx wrangler kv namespace create AGENTRAIL_RESOURCESWrangler prints an id. It may look like this:
id = "abc123..."Paste that id into wrangler.jsonc:
{
"kv_namespaces": [
{
"binding": "AGENTRAIL_RESOURCES",
"id": "abc123..."
}
]
}Then deploy:
npm install
npm run deployGenerated projects are local deployment workspaces. Keep them under projects/; that folder is ignored so your site-specific Cloudflare config does not get committed to the AgentRail source repo.
Copy the example config and edit the route and origin:
cp wrangler.example.jsonc wrangler.jsoncFollow the manual KV setup above if AGENTRAIL_RESOURCES is not configured yet, then deploy:
npm install
npm run deployIf this is the first Worker on the Cloudflare account, open Workers & Pages in the Cloudflare dashboard once before deploying so Cloudflare creates the required workers.dev subdomain for cron schedules.
AgentRail only returns Markdown when a stored resource is safe to serve:
ready: return Markdown.stale: return Markdown only inside the configured stale window.missing,pending,failed,skipped, or too stale: pass through to origin.
Humans, traditional search crawlers, unknown bots, assets, and non-GET/HEAD requests always pass through to origin. Known AI-agent GET requests with no KV record also schedule a background warmup from the origin response before passing through. That keeps the first miss fast and prepares the next bot request.
AgentRail treats these user agents as AI-agent traffic by default:
Applebot
GPTBot
ChatGPT-User
OAI-SearchBot
Google-CloudVertexBot
ClaudeBot
Claude-User
Claude-SearchBot
Anthropic-AI
PerplexityBot
Perplexity-User
YouBot
Cohere-AI
Amazonbot
Anchor Browser
Bytespider
Cloudflare Crawler
CCBot
DuckAssistBot
FacebookBot
Manus Bot
Meta-ExternalAgent
Meta-ExternalFetcher
MistralAI-User
Novellum AI Crawl
PetalBot
ProRataInc
TikTok Spider
TimpibotGooglebot, Bingbot, DuckDuckBot, YandexBot, Baiduspider, archive.org_bot, Arquivo Web Crawler, Terracotta Bot, Slurp, and other traditional search crawlers stay on the origin path.
The basic mode uses:
- Worker routes for request switching.
- Cron Trigger for sitemap crawling.
- KV namespace named
AGENTRAIL_RESOURCESfor Markdown records. - Request-time warmup for AI-agent misses.
- Persisted Worker logs through Cloudflare observability.
Cron can crawl sitemap pages directly into KV. A production deployment can add Queues and D1 later, but they are not required for the first useful version.
Local Wrangler does not run Cron Triggers by itself. AgentRail's dev script uses --test-scheduled, so you can run npm run dev and trigger the crawler manually:
curl "http://localhost:8787/__scheduled?cron=0+*/6+*+*+*"For deployed Workers, AgentRail enables persisted logs and invocation logs in wrangler.jsonc. Use npm run tail or the Cloudflare dashboard logs view to inspect requests while testing.
Each record stores Markdown with this shape:
# Page Title
Canonical URL: https://example.com/page
Last generated: 2026-06-03T00:00:00.000Z
Source: public HTML
## Description
Meta description or first meaningful paragraph.
## Content
Clean extracted page content.The extractor preserves source ordering where practical and does not use LLM summarization.
Apache-2.0. See LICENSE.