Releases: lucioduran/ax-audit
v3.6.0 — AI licensing, cloaking detection, crawl efficiency & full documentation
This release consolidates everything since v3.1.0 (v3.2.0 → v3.6.0): four new checks, Content Signals support, parallel batch auditing, fetch retries, a Markdown reporter, and a complete documentation set. ax-audit goes from 15 to 18 checks, and from 229 to 301 tests.
All four new checks are informational in 3.x — they run and report full findings but carry weight 0, so your existing scores and baselines are unchanged. They gain weight in v4.0.
✨ New checks
content-negotiation — Markdown for Agents (v3.2 surface, shipped 3.1; hardened since)
Probes the homepage with Accept: text/markdown — the pattern served by Cloudflare and Vercel and requested by Claude Code, Cursor, and OpenCode (~80% fewer tokens than HTML). Validates the negotiated Content-Type, that the body is real Markdown (not relabeled HTML), Vary: Accept for cache correctness, and reports the size reduction vs HTML. Partial credit for a <link rel="alternate" type="text/markdown"> fallback.
rsl — Really Simple Licensing (v3.3)
Validates RSL 1.0, the machine-readable content-licensing standard endorsed by 1,500+ publishers (Reddit, Yahoo, Medium, O'Reilly). Discovery via all three spec mechanisms — robots.txt License: directive, Link: rel="license" header, and <link rel="license" type="application/rsl+xml">. Document validation: namespace, <content url> requirement, <license> presence, permits/prohibits vocabulary (usage/user/geo), and payment types. Flags pre-1.0 draft tokens with migration hints.
agent-access — Cloaking detection (v3.4)
Probes the homepage with realistic user-agents for the 8 core AI crawlers and compares status and visible-text volume against the baseline. Catches the failure mode invisible to operators: robots.txt allows GPTBot while your WAF returns it a 403 (Cloudflare's "Block AI Crawlers" toggle produces exactly this). Blocks consistent with an explicit robots.txt Disallow are treated as intentional. Includes a verified-bots caveat for WAFs using Web Bot Auth.
crawl-efficiency (v3.5)
Measures the cost of crawling your pages: compression (rewards Brotli, accepts gzip/deflate/zstd), conditional GET (verifies an ETag/Last-Modified validator and that the server answers If-None-Match/If-Modified-Since with a real 304), and response size.
🔧 Improvements
Content Signals Policy in robots-txt (v3.2)
The robots-txt check now parses Content-Signal: directives (contentsignals.org, CC0) — the search / ai-input / ai-train preferences Cloudflare serves by default on 3.8M+ managed domains. Declared signals are reported per User-agent group; malformed segments, unknown names, and out-of-group placement produce warnings. Informational — no score impact.
Infrastructure (v3.6)
- Fetch retries with exponential backoff for transient failures (network errors, timeouts, 408/425/429/5xx).
--retries <n>(default 2). Previously a single transient timeout scored a check 0. - Parallel batch auditing via
--concurrency <n>and the newBatchOptionstype, with order-preserving output. Default remains sequential. - Markdown reporter —
--output markdownfor CI logs and PR comments (single + batch). New exports:renderMarkdown,renderBatchMarkdown. - Added Google's official signed AI-agent user-agent
Google-Agent(agent.bot.goog) to the known-crawler list. - CLI now validates
--retries,--concurrency, and--output.
📚 Documentation
A complete documentation set under docs/, shipped in the npm package and mirrored at lucioduran.com/projects/ax-audit/docs:
- getting-started — first audit, reading the report, impact-ordered remediation, baselines
- concepts — the AX standards landscape (discovery, interaction, governance/licensing, transport)
- checks — exact per-finding scoring for all 18 checks
- cli / api / ci / architecture / faq — full reference, with an API-stability policy
- New
CONTRIBUTING.mdandSECURITY.md
Every new finding has a matching remediation guide at /projects/ax-audit/guides.
🐛 Fixes
- Scorer division by zero: running only weight-0 checks (e.g.
--checks rsl) returnedNaN; now falls back to a plain average.
⚙️ Compatibility
- No breaking changes. New checks are informational (weight 0); scores and baselines are unchanged from 3.1.x. Retries can raise scores on flaky endpoints that previously timed out, but the scoring model itself is unchanged.
📦 Install
npx ax-audit@3.6.0 https://your-site.comFull changelog: v3.1.0...v3.6.0
v3.1.0 — Markdown for Agents: content-negotiation check (15 checks)
Added — content-negotiation check (informational)
- content-negotiation (weight 0 in 3.x): probes the homepage with
Accept: text/markdownto detect Markdown for Agents support — the content-negotiation pattern implemented by Cloudflare and Vercel and requested by Claude Code, Cursor, and OpenCode. Markdown cuts token usage by ~80% vs HTML for the same content.- Validates the negotiated
Content-Type(text/markdown). - Detects relabeled HTML documents masquerading as Markdown (−25).
- Validates
Vary: Acceptso shared caches and CDNs keep the HTML and Markdown representations apart (−15 when missing; acceptsVary: *). - Reports the size reduction vs the HTML representation (informational).
- Partial credit (40) when negotiation is unsupported but a
<link rel="alternate" type="text/markdown">fallback is advertised. - Distinguishes HTTP 406 from plain "ignores Accept" in the failure detail.
- Validates the negotiated
Added — per-request fetch headers
CheckContext.fetchnow accepts an optional{ headers }argument (new exported type:FetchOptions). Custom headers merge case-insensitively over the defaults, so a customAcceptreplaces the default instead of being sent alongside it.- The in-memory request cache now keys on URL + normalized (lowercased, sorted) headers — mirroring
Varysemantics on the wire, so the HTML and Markdown probes of the same URL never collide.
Fixed
- Scorer division by zero:
calculateOverallScorereturnedNaNwhen every selected check had weight 0 (e.g.--checks content-negotiation). It now falls back to a plain average, and returns 0 for empty input.
Scoring
- The new check is informational in 3.x: it runs and reports findings but does not affect the overall score, so existing scores and baselines are unchanged. It will gain weight in v4.0, consistent with treating score-affecting changes as breaking (see v3.0.0).
Tests
- 229 tests total (31 new): content-negotiation (19), fetcher integration against a real local HTTP server (9), and scorer coverage for weight-0 checks (3).
Try it:
npx ax-audit@3.1.0 https://your-site.com --checks content-negotiationv3.0.0 — Full agent-optimization coverage (14 checks)
Added — five new checks (full agent-optimization coverage)
- html-rendering (weight 9%): detects whether the static HTML response actually contains content, since most AI crawlers (GPTBot, ClaudeBot, CCBot, …) do not execute JavaScript. Heuristics: text length, word count, text-to-markup ratio, empty SPA mount points (
#root,#__next,#__nuxt,#app,#svelte,#gatsby), semantic landmarks (<main>,<article>,<header>,<footer>,<nav>), single<h1>,<noscript>fallback, and<img alt>coverage. - sitemap (weight 4%): locates the sitemap via
robots.txtSitemap:directive or/sitemap.xml, validates XML shape, parses<urlset>and<sitemapindex>, samples child sitemaps from indexes, scores<lastmod>coverage and freshness (>365d → stale), enforces 50k-URL / 50MB limits. - seo-basics (weight 7%):
<title>length 20–70,<meta name="description">length 70–160,<link rel="canonical">(absolute, single),<html lang>(BCP 47),<meta charset="utf-8">,<meta name="viewport">, hreflang completeness withx-default. Title/description duplication detection. - tls-https (weight 5%): site is served over HTTPS, HTTP redirects to HTTPS, HSTS
max-age>= 6 months (1 year for preload),includeSubDomains,preloaddirective eligibility per https://hstspreload.org. - well-known-ai (weight 3%): emerging AI-specific discovery files —
/.well-known/ai.txt(Spawning),/.well-known/genai.txt,/ai-plugin.json(legacy ChatGPT plugin),/agents.json(Wildcard / OpenAgents),/.well-known/nlweb.json(Microsoft NLWeb). Each present file scores; coverage is bonus rather than baseline.
Improved — existing checks
- meta-tags: now validates Open Graph completeness (
og:title,og:description,og:url,og:type,og:image,og:site_name) and Twitter Card completeness (twitter:card,twitter:title,twitter:description,twitter:image). Reuses shared HTML utilities for tag matching. - agent-json: validates the
urlfield is absolute and matches the audited origin, and that everyskills[]entry has bothidanddescription. - llms-txt / agent-json / mcp / openapi: validate
Content-Typeof the fetched resource (text/plain/text/markdownfor llms.txt;application/jsonfor the JSON manifests). Penalty: −5 per mismatch. - robots-txt:
CORE_AI_CRAWLERSextended (now 8 entries: GPTBot, ClaudeBot, ChatGPT-User, Claude-SearchBot, Google-Extended, PerplexityBot, OAI-SearchBot, CCBot).ALL_AI_CRAWLERSextended with MistralAI-User, KagiBot, GeminiBot, Goose, AwarioBot family, Bingbot, ImagesiftBot, omgili, Webzio-Extended, and others (47 known crawlers total).
Refactored
- New shared module
src/checks/html-utils.tswith regex-based primitives for HTML inspection (getMetaContent,findLinkTags,findMetaTagsByPrefix,extractVisibleText,countExecutableScripts,getTagAttribute, …). Eliminates duplicated regex code acrossmeta-tags,seo-basics,html-rendering, andstructured-data. - New shared utility
checkContentTypeinsrc/checks/utils.tsfor consistent Content-Type validation.
Scoring
- Weights redistributed across 14 checks, total still sums to 100. New highest-weight signals are llms-txt and robots-txt (11% each) followed by html-rendering / structured-data / http-headers (9%).
Tests
- 198 tests total (77 new). New suites: html-rendering (14), sitemap (12), seo-basics (19), tls-https (11), well-known-ai (8). Plus expanded meta-tags / agent-json / mcp / openapi / llms-txt suites for the new validations.
Breaking
- Score deltas vs v2.x are expected on the same site because (a) weights were redistributed across 14 checks instead of 9, and (b) Content-Type validation on
/llms.txtand the.well-knownJSON manifests now applies a −5 penalty per mismatch. Sites previously scoring 100 may drop a few points until the new signals are addressed. Use--baselineto track regressions explicitly.
v2.4.0 — Baseline Comparison
Baseline Comparison Mode
Track AX score changes over time by saving baselines and comparing against them in subsequent runs.
New CLI flags
--save-baseline <path>— save audit results as a baseline JSON file--baseline <path>— compare against a previous baseline, show per-check score deltas (▲/▼)--fail-on-regression <points>— exit with code 1 if any check regresses by more than N points
Works with all output formats (terminal, JSON, HTML).
CI/CD usage
# Save baseline on main branch
- run: npx ax-audit https://your-site.com --save-baseline .ax-baseline.json
# Gate PRs on regressions
- run: npx ax-audit https://your-site.com --baseline .ax-baseline.json --fail-on-regression 5Programmatic API
New exports: saveBaseline(), loadBaseline(), diffBaseline(), toBaselineData() with full TypeScript types (BaselineData, BaselineDiff, CheckDiff).
Other
- 15 new tests (total: 121)
- Fixed test runner glob that was silently skipping root-level test files
v2.2.1
- Improve type safety, remove duplication, fix nullish coalescing
- Suggest ax-init when score < 100
- Add remediation guide links to all findings
- Add green electric logo to README
- Fix: format robots-txt.ts to pass Prettier check
v2.1.0
- Add remediation hints to all audit findings
v2.0.0
- Add HTML reporter with score gauge and dark mode
- Update version to 2.0.0
v1.14.0
- Minor internal improvements and version bump
v1.13.0
- Add batch URLs support for auditing multiple sites in one run
v1.12.0
- Fix Link header parsing with proper RFC 5988 parser