Skip to content

chore: edge cache headers + agent-aware robots + bot-block middleware#119

Merged
rohitg00 merged 1 commit into
mainfrom
chore/edge-budget-fixes
Apr 28, 2026
Merged

chore: edge cache headers + agent-aware robots + bot-block middleware#119
rohitg00 merged 1 commit into
mainfrom
chore/edge-budget-fixes

Conversation

@rohitg00
Copy link
Copy Markdown
Owner

@rohitg00 rohitg00 commented Apr 28, 2026

Summary

Two Vercel projects covered: skillkit (Vite marketing site, docs/skillkit/) and skillkit-docs (Next.js fumadocs, docs/fumadocs/).

skillkit (Vite, docs/skillkit/)

  • vercel.json headers: Cache-Control on assets (1d browser, 7d edge, 30d SWR), /assets/* immutable for 1y (Vite hashes filenames), HTML (5min/1d/7d), /api JSON (5min/1d/7d)
  • vercel.json redirects: deflect known training+SEO bots to /robots.txt via has user-agent matcher; source uses negative lookahead so /robots.txt itself never matches (no redirect loop)
  • public/robots.txt: explicit allow + deny lists

skillkit-docs (Next.js fumadocs, docs/fumadocs/)

  • next.config.mjs headers(): /_next/static/* immutable for 1y, assets 1d/7d/30d, /docs/* and / 5min/1d/7d
  • src/middleware.ts: UA allow-list passes through, deny-list returns 403 with Cache-Control: max-age=86400 so the rejection itself is edge-cached
  • public/robots.txt: same agent-aware allow + deny lists

Why

Vercel hobby plan hit edge-request and Web Analytics caps. Per Vercel usage chart for Apr:

  • skillkit: 181K edge req
  • skillkit-docs: 85K edge req

docs/skillkit/vercel.json rewrites /_next/* and /docs/* to https://skillkit-docs.vercel.app/..., so every docs pageview fires on both projects. Compounding both with cache headers + bot deflection is needed.

Bot policy (both projects)

Allowed: Googlebot, Bingbot, DuckDuckBot, Applebot, ChatGPT-User, OAI-SearchBot, PerplexityBot, Perplexity-User, Claude-User, Claude-SearchBot, FirecrawlAgent, Context7Bot, Crawl4AI, Clawdbot, OpenClaw, Hermes — plus default User-agent: * allow.

Disallowed: GPTBot, ClaudeBot, anthropic-ai, CCBot, Google-Extended, Applebot-Extended, Bytespider, Amazonbot, Meta-ExternalAgent, cohere-ai, Diffbot, ImagesiftBot, Omgilibot, peer39_crawler, YouBot, Timpibot, ICC-Crawler, AhrefsBot, SemrushBot, MJ12bot, DotBot, PetalBot, BLEXBot, MegaIndex, SeznamBot, DataForSeoBot.

Test plan

  • Both Vercel preview builds succeed
  • https://<preview-skillkit>.vercel.app/robots.txt returns expected content
  • https://<preview-skillkit-docs>.vercel.app/robots.txt returns expected content
  • Asset response includes Cache-Control: public, max-age=86400, ...
  • /_next/static/* includes Cache-Control: public, max-age=31536000, immutable
  • curl -A 'SemrushBot/7.0' https://<preview-skillkit> returns 307 to /robots.txt
  • curl -A 'SemrushBot/7.0' https://<preview-skillkit>/robots.txt returns 200 (no loop)
  • curl -A 'SemrushBot/7.0' https://<preview-docs>/docs/... returns 403
  • curl -A 'ChatGPT-User' https://<preview-docs>/docs/... returns 200
  • curl -A 'FirecrawlAgent/1.0' https://<preview-docs>/docs/... returns 200

Open in Devin Review

Summary by CodeRabbit

  • Chores
    • Enhanced site performance through optimized caching configurations for static assets and routes, including long-lived immutable caching for frequently accessed resources
    • Improved search engine optimization with crawler access rules and sitemap directives
    • Implemented request filtering and management for bot access across the platform

Cuts Vercel hobby edge-request burn on both skillkit (Vite static)
and skillkit-docs (Next.js fumadocs) projects. Site stays
human-first AND agent-first — Firecrawl, Context7, Crawl4AI,
OpenClaw, Hermes, ChatGPT-User, Claude-User, PerplexityBot
explicitly allowed; training crawlers and SEO scrapers blocked.

skillkit (Vite, docs/skillkit):
- vercel.json: Cache-Control on assets (1d browser, 7d edge,
  30d SWR; immutable for hashed /assets/*), HTML (5min browser,
  1d edge, 7d SWR), and /api JSON
- redirects with has user-agent: deflect known training+SEO
  bots to /robots.txt; negative lookahead on source prevents
  /robots.txt redirect loop
- public/robots.txt: explicit allow + deny lists

skillkit-docs (Next.js, docs/fumadocs):
- next.config.mjs: headers() block for /_next/static (1y
  immutable), assets (1d/7d/30d), /docs/* and / (5min/1d/7d)
- src/middleware.ts: UA-aware allow/deny pipeline; allowed
  agents pass through, blocked bots get 403 with cache header
  so Vercel edge serves the rejection cheaply
- public/robots.txt: same allow/deny list as marketing site
@vercel
Copy link
Copy Markdown

vercel Bot commented Apr 28, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
skillkit Ready Ready Preview, Comment Apr 28, 2026 10:17am
skillkit-docs Ready Ready Preview, Comment Apr 28, 2026 10:17am

Request Review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 28, 2026

📝 Walkthrough

Walkthrough

Bot and crawler management policies are implemented across two documentation projects via robots.txt files, HTTP caching headers in configuration, and user-agent based request filtering. Fumadocs adds Next.js headers configuration and middleware for filtering, while Skillkit adds Vercel-based caching headers and bot redirects.

Changes

Cohort / File(s) Summary
Fumadocs Caching Configuration
docs/fumadocs/next.config.mjs
Adds headers() configuration that sets Cache-Control policies for static assets (/_next/static/... immutable), route assets with stale-while-revalidate, and specific routes like /docs/:path* and root /.
Fumadocs Bot & Crawler Management
docs/fumadocs/public/robots.txt, docs/fumadocs/src/middleware.ts
Creates robots.txt with allow/disallow rules for major bots and AI agents; adds Next.js middleware that filters requests by user-agent against allowlist/blocklist regexes, returning 403 for blocked agents.
Skillkit Bot & Crawler Management
docs/skillkit/public/robots.txt
Creates robots.txt defining crawler access rules for major bots and AI/search agents, with allow/disallow directives and sitemap reference.
Skillkit Caching & Bot Redirects
docs/skillkit/vercel.json
Adds global Cache-Control headers for asset types and routes in Vercel configuration; adds redirect rule that routes non-robots.txt requests from blocked bots to /robots.txt.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 Bots and bunnies, now they know
Which paths are safe and which won't go
With caches set and robots wise,
The web crawls on with fewer sighs
hops away in optimization glee

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title accurately and concisely describes the three main changes: edge cache headers, agent-aware robots configuration, and bot-blocking middleware across both projects.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch chore/edge-budget-fixes

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 2 potential issues.

View 3 additional findings in Devin Review.

Open in Devin Review

import { NextResponse } from 'next/server';
import type { NextRequest } from 'next/server';

const BLOCK = /GPTBot|ClaudeBot|anthropic-ai|CCBot|Google-Extended|Applebot-Extended|Bytespider|Amazonbot|Meta-ExternalAgent|cohere-ai|Diffbot|ImagesiftBot|Omgilibot|peer39_crawler|YouBot|Timpibot|ICC-Crawler|AhrefsBot|SemrushBot|MJ12bot|DotBot|PetalBot|BLEXBot|MegaIndex|SeznamBot|DataForSeoBot/i;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 FacebookBot missing from middleware BLOCK regex despite being in robots.txt Disallow list

The robots.txt at docs/fumadocs/public/robots.txt:78-79 explicitly disallows FacebookBot, but the BLOCK regex in the middleware omits it. This means FacebookBot will pass through the middleware (falling through to the default NextResponse.next() at line 17) and serve content normally, undermining the intended bot-blocking enforcement.

Suggested change
const BLOCK = /GPTBot|ClaudeBot|anthropic-ai|CCBot|Google-Extended|Applebot-Extended|Bytespider|Amazonbot|Meta-ExternalAgent|cohere-ai|Diffbot|ImagesiftBot|Omgilibot|peer39_crawler|YouBot|Timpibot|ICC-Crawler|AhrefsBot|SemrushBot|MJ12bot|DotBot|PetalBot|BLEXBot|MegaIndex|SeznamBot|DataForSeoBot/i;
const BLOCK = /GPTBot|ClaudeBot|anthropic-ai|CCBot|Google-Extended|Applebot-Extended|Bytespider|Amazonbot|FacebookBot|Meta-ExternalAgent|cohere-ai|Diffbot|ImagesiftBot|Omgilibot|peer39_crawler|YouBot|Timpibot|ICC-Crawler|AhrefsBot|SemrushBot|MJ12bot|DotBot|PetalBot|BLEXBot|MegaIndex|SeznamBot|DataForSeoBot/i;
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment thread docs/skillkit/vercel.json
{
"source": "/((?!robots\\.txt$).*)",
"has": [
{ "type": "header", "key": "user-agent", "value": "(?i).*(GPTBot|ClaudeBot|anthropic-ai|CCBot|Google-Extended|Applebot-Extended|Bytespider|Amazonbot|Meta-ExternalAgent|cohere-ai|Diffbot|ImagesiftBot|Omgilibot|peer39_crawler|YouBot|Timpibot|ICC-Crawler|AhrefsBot|SemrushBot|MJ12bot|DotBot|PetalBot|BLEXBot|MegaIndex|SeznamBot|DataForSeoBot).*" }
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 FacebookBot missing from vercel.json redirect user-agent pattern despite being in robots.txt Disallow list

The robots.txt at docs/skillkit/public/robots.txt:78-79 explicitly disallows FacebookBot, but the user-agent regex in the vercel.json redirect rule omits it. This means FacebookBot requests will not be redirected to /robots.txt and will be served content normally, undermining the intended bot-blocking enforcement for the skillkit docs site.

Suggested change
{ "type": "header", "key": "user-agent", "value": "(?i).*(GPTBot|ClaudeBot|anthropic-ai|CCBot|Google-Extended|Applebot-Extended|Bytespider|Amazonbot|Meta-ExternalAgent|cohere-ai|Diffbot|ImagesiftBot|Omgilibot|peer39_crawler|YouBot|Timpibot|ICC-Crawler|AhrefsBot|SemrushBot|MJ12bot|DotBot|PetalBot|BLEXBot|MegaIndex|SeznamBot|DataForSeoBot).*" }
{ "type": "header", "key": "user-agent", "value": "(?i).*(GPTBot|ClaudeBot|anthropic-ai|CCBot|Google-Extended|Applebot-Extended|Bytespider|Amazonbot|FacebookBot|Meta-ExternalAgent|cohere-ai|Diffbot|ImagesiftBot|Omgilibot|peer39_crawler|YouBot|Timpibot|ICC-Crawler|AhrefsBot|SemrushBot|MJ12bot|DotBot|PetalBot|BLEXBot|MegaIndex|SeznamBot|DataForSeoBot).*" }
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/fumadocs/src/middleware.ts`:
- Around line 4-11: The middleware currently tests ALLOW before BLOCK which lets
mixed UAs bypass blocking and also omits FacebookBot from BLOCK; update the
middleware function to first test BLOCK (e.g., if (BLOCK.test(ua)) return
NextResponse.rewrite(.../NextResponse.redirect/NextResponse.next with block
code) so block precedence wins, then test ALLOW afterwards, and add the missing
"FacebookBot" token to the BLOCK RegExp constant so explicit disallowed agents
are caught; keep using the existing symbols BLOCK, ALLOW, middleware,
NextRequest and NextResponse to locate and modify the code.

In `@docs/skillkit/vercel.json`:
- Around line 57-65: The redirect blocklist in the "redirects" array (the object
that has the header with "key": "user-agent" and the "value" regex) is missing
FacebookBot whereas robots.txt disallows it; update the user-agent regex value
to include FacebookBot (add the token "FacebookBot" into the alternation list)
so the redirect that maps to "/robots.txt" will also match and block
FacebookBot, ensuring the header-based redirect and
docs/skillkit/public/robots.txt remain consistent.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ea240d64-f2ab-418e-a058-d460155622f4

📥 Commits

Reviewing files that changed from the base of the PR and between bf6ec90 and e42ae9d.

📒 Files selected for processing (5)
  • docs/fumadocs/next.config.mjs
  • docs/fumadocs/public/robots.txt
  • docs/fumadocs/src/middleware.ts
  • docs/skillkit/public/robots.txt
  • docs/skillkit/vercel.json

Comment on lines +4 to +11
const BLOCK = /GPTBot|ClaudeBot|anthropic-ai|CCBot|Google-Extended|Applebot-Extended|Bytespider|Amazonbot|Meta-ExternalAgent|cohere-ai|Diffbot|ImagesiftBot|Omgilibot|peer39_crawler|YouBot|Timpibot|ICC-Crawler|AhrefsBot|SemrushBot|MJ12bot|DotBot|PetalBot|BLEXBot|MegaIndex|SeznamBot|DataForSeoBot/i;

const ALLOW = /Googlebot|Bingbot|DuckDuckBot|Applebot(?!-Extended)|ChatGPT-User|OAI-SearchBot|PerplexityBot|Perplexity-User|Claude-User|Claude-SearchBot|FirecrawlAgent|firecrawl|Context7Bot|Crawl4AI|Clawdbot|OpenClaw|Hermes/i;

export function middleware(req: NextRequest) {
const ua = req.headers.get('user-agent') || '';
if (ALLOW.test(ua)) return NextResponse.next();
if (BLOCK.test(ua)) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Blocklist/enforcement mismatch and precedence bug in UA checks.

FacebookBot is disallowed in docs/fumadocs/public/robots.txt (Line 78) but missing from BLOCK (Line 4). Also, Line 10 checks ALLOW before BLOCK, so a mixed UA containing both patterns can bypass blocking.

Suggested fix
-const BLOCK = /GPTBot|ClaudeBot|anthropic-ai|CCBot|Google-Extended|Applebot-Extended|Bytespider|Amazonbot|Meta-ExternalAgent|cohere-ai|Diffbot|ImagesiftBot|Omgilibot|peer39_crawler|YouBot|Timpibot|ICC-Crawler|AhrefsBot|SemrushBot|MJ12bot|DotBot|PetalBot|BLEXBot|MegaIndex|SeznamBot|DataForSeoBot/i;
+const BLOCK = /GPTBot|ClaudeBot|anthropic-ai|CCBot|Google-Extended|Applebot-Extended|Bytespider|Amazonbot|FacebookBot|Meta-ExternalAgent|cohere-ai|Diffbot|ImagesiftBot|Omgilibot|peer39_crawler|YouBot|Timpibot|ICC-Crawler|AhrefsBot|SemrushBot|MJ12bot|DotBot|PetalBot|BLEXBot|MegaIndex|SeznamBot|DataForSeoBot/i;

 export function middleware(req: NextRequest) {
   const ua = req.headers.get('user-agent') || '';
-  if (ALLOW.test(ua)) return NextResponse.next();
   if (BLOCK.test(ua)) {
     return new NextResponse('disallowed by robots.txt', {
       status: 403,
       headers: { 'Cache-Control': 'public, max-age=86400' },
     });
   }
+  if (ALLOW.test(ua)) return NextResponse.next();
   return NextResponse.next();
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/fumadocs/src/middleware.ts` around lines 4 - 11, The middleware
currently tests ALLOW before BLOCK which lets mixed UAs bypass blocking and also
omits FacebookBot from BLOCK; update the middleware function to first test BLOCK
(e.g., if (BLOCK.test(ua)) return
NextResponse.rewrite(.../NextResponse.redirect/NextResponse.next with block
code) so block precedence wins, then test ALLOW afterwards, and add the missing
"FacebookBot" token to the BLOCK RegExp constant so explicit disallowed agents
are caught; keep using the existing symbols BLOCK, ALLOW, middleware,
NextRequest and NextResponse to locate and modify the code.

Comment thread docs/skillkit/vercel.json
Comment on lines +57 to +65
"redirects": [
{
"source": "/((?!robots\\.txt$).*)",
"has": [
{ "type": "header", "key": "user-agent", "value": "(?i).*(GPTBot|ClaudeBot|anthropic-ai|CCBot|Google-Extended|Applebot-Extended|Bytespider|Amazonbot|Meta-ExternalAgent|cohere-ai|Diffbot|ImagesiftBot|Omgilibot|peer39_crawler|YouBot|Timpibot|ICC-Crawler|AhrefsBot|SemrushBot|MJ12bot|DotBot|PetalBot|BLEXBot|MegaIndex|SeznamBot|DataForSeoBot).*" }
],
"destination": "/robots.txt",
"permanent": false
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

FacebookBot is disallowed in robots but not matched in redirect blocklist.

Line 61 omits FacebookBot, while docs/skillkit/public/robots.txt (Line 78) disallows it. This creates policy drift and allows that crawler to bypass this redirect control.

Suggested fix
-        { "type": "header", "key": "user-agent", "value": "(?i).*(GPTBot|ClaudeBot|anthropic-ai|CCBot|Google-Extended|Applebot-Extended|Bytespider|Amazonbot|Meta-ExternalAgent|cohere-ai|Diffbot|ImagesiftBot|Omgilibot|peer39_crawler|YouBot|Timpibot|ICC-Crawler|AhrefsBot|SemrushBot|MJ12bot|DotBot|PetalBot|BLEXBot|MegaIndex|SeznamBot|DataForSeoBot).*" }
+        { "type": "header", "key": "user-agent", "value": "(?i).*(GPTBot|ClaudeBot|anthropic-ai|CCBot|Google-Extended|Applebot-Extended|Bytespider|Amazonbot|FacebookBot|Meta-ExternalAgent|cohere-ai|Diffbot|ImagesiftBot|Omgilibot|peer39_crawler|YouBot|Timpibot|ICC-Crawler|AhrefsBot|SemrushBot|MJ12bot|DotBot|PetalBot|BLEXBot|MegaIndex|SeznamBot|DataForSeoBot).*" }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"redirects": [
{
"source": "/((?!robots\\.txt$).*)",
"has": [
{ "type": "header", "key": "user-agent", "value": "(?i).*(GPTBot|ClaudeBot|anthropic-ai|CCBot|Google-Extended|Applebot-Extended|Bytespider|Amazonbot|Meta-ExternalAgent|cohere-ai|Diffbot|ImagesiftBot|Omgilibot|peer39_crawler|YouBot|Timpibot|ICC-Crawler|AhrefsBot|SemrushBot|MJ12bot|DotBot|PetalBot|BLEXBot|MegaIndex|SeznamBot|DataForSeoBot).*" }
],
"destination": "/robots.txt",
"permanent": false
}
"redirects": [
{
"source": "/((?!robots\\.txt$).*)",
"has": [
{ "type": "header", "key": "user-agent", "value": "(?i).*(GPTBot|ClaudeBot|anthropic-ai|CCBot|Google-Extended|Applebot-Extended|Bytespider|Amazonbot|FacebookBot|Meta-ExternalAgent|cohere-ai|Diffbot|ImagesiftBot|Omgilibot|peer39_crawler|YouBot|Timpibot|ICC-Crawler|AhrefsBot|SemrushBot|MJ12bot|DotBot|PetalBot|BLEXBot|MegaIndex|SeznamBot|DataForSeoBot).*" }
],
"destination": "/robots.txt",
"permanent": false
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/skillkit/vercel.json` around lines 57 - 65, The redirect blocklist in
the "redirects" array (the object that has the header with "key": "user-agent"
and the "value" regex) is missing FacebookBot whereas robots.txt disallows it;
update the user-agent regex value to include FacebookBot (add the token
"FacebookBot" into the alternation list) so the redirect that maps to
"/robots.txt" will also match and block FacebookBot, ensuring the header-based
redirect and docs/skillkit/public/robots.txt remain consistent.

@rohitg00 rohitg00 merged commit a9e5c83 into main Apr 28, 2026
10 checks passed
@rohitg00 rohitg00 deleted the chore/edge-budget-fixes branch April 28, 2026 10:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant