From fe5472d10eb1dd36e6f52ac3f8e89060142fa1e7 Mon Sep 17 00:00:00 2001
From: Bissbert <43237892+Bissbert@users.noreply.github.com>
Date: Mon, 18 May 2026 10:12:11 +0700
Subject: [PATCH] chore(seo): commit SEO v2.5/v3 audit trail and trends script
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Adds the SEO audit reports that drove the v2 and v3 plans, the v3
validation log capturing build assertions and Wave A/B outcomes, and a
free-tier Google Trends fetcher for the five hub head terms.
- audits/seo-2026-05/{README,content,geo,keywords,schema,sitemap,technical}.md:
six-agent audit reports (sitemap, indexability, technical, schema,
keywords/clusters, GEO/AI-citability) that fed the v2 and v3 plans.
- audits/seo-2026-05/v3-validation.md: cumulative Wave A/B verification
log — citation cleanup, KNOWLEDGE_VERSION bump, hub-page assertions.
- audits/seo-2026-05/scripts/trends.py: PEP 723 pytrends script that
fetches interest-over-time, by-region, and related-queries for the
five hub head terms; writes CSV+JSON to output/ for retrospective
priority validation against the keyword landscape.
- .gitignore: add .tmp/, .trees/, .claude/ to prevent future accidental
staging of local-only directories.
---
.gitignore | 5 +
audits/seo-2026-05/README.md | 66 +++++++
audits/seo-2026-05/content.md | 86 +++++++++
audits/seo-2026-05/geo.md | 185 +++++++++++++++++++
audits/seo-2026-05/keywords.md | 145 +++++++++++++++
audits/seo-2026-05/schema.md | 247 +++++++++++++++++++++++++
audits/seo-2026-05/scripts/trends.py | 262 +++++++++++++++++++++++++++
audits/seo-2026-05/sitemap.md | 122 +++++++++++++
audits/seo-2026-05/technical.md | 74 ++++++++
audits/seo-2026-05/v3-validation.md | 125 +++++++++++++
10 files changed, 1317 insertions(+)
create mode 100644 audits/seo-2026-05/README.md
create mode 100644 audits/seo-2026-05/content.md
create mode 100644 audits/seo-2026-05/geo.md
create mode 100644 audits/seo-2026-05/keywords.md
create mode 100644 audits/seo-2026-05/schema.md
create mode 100644 audits/seo-2026-05/scripts/trends.py
create mode 100644 audits/seo-2026-05/sitemap.md
create mode 100644 audits/seo-2026-05/technical.md
create mode 100644 audits/seo-2026-05/v3-validation.md
diff --git a/.gitignore b/.gitignore
index f6f8335..2b877a7 100644
--- a/.gitignore
+++ b/.gitignore
@@ -39,3 +39,8 @@ public/minerals.db
# Synced content from gemmology-knowledge
.cache/
src/content/learn/
+
+# Local-only directories
+.tmp/
+.trees/
+.claude/
diff --git a/audits/seo-2026-05/README.md b/audits/seo-2026-05/README.md
new file mode 100644
index 0000000..61667de
--- /dev/null
+++ b/audits/seo-2026-05/README.md
@@ -0,0 +1,66 @@
+# SEO Investigation 2026-05 — gemmology.dev vs knowledge.gemmology.dev
+
+**Question:** Why is gemmology.dev indexed worse than knowledge.gemmology.dev despite being the better site? And how do we optimise SEO for gemmology-study and gemstone-research search intent?
+
+**Method:** Parallel agent dispatch — six specialist SEO agents (technical, content, GEO, schema, keyword research, sitemap) audited the site against the docs subdomain and against prior audit `audits/T7c-seo.md`.
+
+## Files in this directory
+
+| File | Owner | Findings |
+|------|-------|----------|
+| `technical.md` | seo-technical | Crawl/render/index gap analysis |
+| `content.md` | seo-content | Thin-page risk on /tools, /quiz, /playground; E-E-A-T gaps |
+| `geo.md` | seo-geo | llms.txt structure + AI citability; knowledge.subdomain has NO robots.txt |
+| `schema.md` | seo-schema | Missing Course/Quiz/SoftwareApplication JSON-LD |
+| `keywords.md` | seo-dataforseo | 35 seed terms — heuristic (no DataForSEO MCP) |
+| `sitemap.md` | seo-sitemap | 442 of 910 sitemap URLs are OG image templates |
+
+## Root cause of the indexing gap (convergent finding)
+
+Three converging issues — not a content quality problem:
+
+1. **Sitemap pollution.** 442 of 910 sitemap URLs are `/og/...` image-template pages (1200×630 social cards). Google sees a sitemap where 48% of entries are thin near-duplicate pages, halving effective crawl budget. knowledge.gemmology.dev has no such junk.
+2. **Empty shells for the high-value surfaces.** `/tools/`, `/quiz/`, `/playground/`, `/tools/optical`, `/tools/lab`, etc. ship as `client:load` React islands. First-byte HTML for these pages is `
+
` at best, often nothing on the hub. knowledge.gemmology.dev serves Markdown→HTML with 300–800 words of indexable prose per page.
+3. **Knowledge-subdomain advantage is purely structural.** Same authors, same domain authority, but the docs subdomain ships content as plain HTML and gemmology.dev ships it as JavaScript. Content quality is not the gap. Crawlability is.
+
+Secondary: knowledge.gemmology.dev itself has **no robots.txt and no llms.txt** — it indexes well in classical search but is invisible to AI crawlers. The fastest GEO win is adding those two files to the docs subdomain.
+
+## Prioritised fix list
+
+### P0 — Indexing gap closers (1–2 days of work)
+
+1. **`astro.config.mjs` sitemap filter.** Exclude `/og/`, `/og-image/`, `/study/review`, `/study/settings`. Exact diff in `sitemap.md`. Effect: sitemap drops from 910 → ~466 clean URLs.
+2. **Add `noindex` to OG templates.** `src/pages/og/learn/[...slug].astro` and `src/pages/og/minerals/[slug].astro` need ``.
+3. **Add SSG intro prose to tool pages.** 5 example paragraphs in `content.md`. Each tool category page (`measurement.astro`, `optical.astro`, `lab.astro`, `identification.astro`, `advanced.astro`, `conversions.astro`) gets 300–500 server-rendered words before the React island. Also add SSG intro + `
` to `/tools/` hub (currently fully client-rendered).
+4. **Fix LearnSchema BreadcrumbList fragment URL.** `LearnSchema.astro` line 73 emits `learn#fundamentals` as a BreadcrumbList item — invalid. Change to canonical `/learn`. (Carried over from prior audit, still open.)
+
+### P1 — Schema and structure (2–3 days)
+
+5. **`Course` + `EducationalOccupationalCredential` schema on `/learn/index`.** Tells Google the 139 articles form an FGA-prep curriculum. Highest single AI-visibility lever per the schema audit.
+6. **`Quiz` + `hasPart` on learn articles (F-04 unblock).** Now that `LearnQuizWidget` is wired, `LearnSchema.astro` can declare per-article quizzes. Boilerplate JSON-LD in `schema.md`.
+7. **`SoftwareApplication` JSON-LD on `/tools/*`.** Single shared `ToolsSchema.astro` covers all 6 category pages.
+8. **Tools hub h1 + intro.** Server-render an h1 and 200-word intro on `/tools/index.astro` (currently 100% client-rendered).
+9. **`/quiz/` `Course` schema** with the 8 categories declared as `hasCourseInstance`.
+
+### P2 — GEO / AI-search readiness (3–5 days)
+
+10. **Restructure `llms.txt`** — currently omits all `/tools/` pages and lists mineral URLs as bare slugs without descriptions. Add `## Tools` section with one line per category. Convert mineral bare URLs to titled entries. Source: `src/pages/llms.txt.ts`.
+11. **Named AI-crawler stanzas in robots.txt.** Perplexity's crawler docs require explicit `Allow` per `User-agent: PerplexityBot`. Add stanzas for GPTBot, OAI-SearchBot, ClaudeBot, PerplexityBot, Google-Extended.
+12. **knowledge.gemmology.dev gets robots.txt + llms.txt.** Two files, two hours. Unlocks AI-crawler access to the better-structured docs content.
+13. **Expand `/learn/[slug]` Introduction blocks to 130–165 words** in YAML source — the citation-optimum length for AI engines. Per-passage examples in `geo.md`.
+14. **Add named author + credentials** to `/about` and to `LearnSchema` Person node. Currently every page bylines an Organisation, not a credentialed individual — weak E-E-A-T signal.
+
+### P3 — Keyword & content programme (ongoing)
+
+15. **Build a static crawlable mineral-properties reference table** (the mineral DB is currently sql.js, JS-only, invisible to crawlers).
+16. **Consolidate fragmented identification pages into one `/identify` hub.** High-volume terms like "gemstone identification chart" map to nothing today.
+17. **Add explicit FGA/Gem-A positioning statement** to `/learn` hub and About — clarifies which exam syllabus the curriculum maps to.
+18. **Question-format H2s** in learn articles ("What is birefringence?") — AI Overviews match heading text against natural-language query form.
+
+## Quick wins (today, under 1 hour total)
+
+- One-line filter change in `astro.config.mjs` (removes 442 junk URLs from sitemap).
+- Add `` to two OG template files.
+- Add `` and a 50-word intro paragraph to `/tools/index.astro`.
+
+These three changes alone should noticeably lift gemmology.dev's classical-search visibility within 1–2 crawl cycles, because the sitemap signal-to-noise ratio doubles overnight.
diff --git a/audits/seo-2026-05/content.md b/audits/seo-2026-05/content.md
new file mode 100644
index 0000000..4525655
--- /dev/null
+++ b/audits/seo-2026-05/content.md
@@ -0,0 +1,86 @@
+# Content & E-E-A-T Audit — gemmology.dev vs knowledge.gemmology.dev
+**Audited:** 2026-05-11 | **Auditor role:** Content Quality / Google QRG Sept 2025
+
+---
+
+## TL;DR
+
+knowledge.gemmology.dev serves flat Markdown files that Googlebot reads in a single HTTP request. gemmology.dev serves the same information better — but buries it inside client:load React islands that Googlebot either skips or defers. The ranking gap is not an E-E-A-T gap; it is a crawlability gap masquerading as one.
+
+- **Content quality score (gemmology.dev /learn/):** 78/100
+- **Content quality score (gemmology.dev /tools/):** 31/100
+- **E-E-A-T overall (gemmology.dev):** 61/100 — credible but anonymous
+
+---
+
+## Why knowledge.gemmology.dev wins on content
+
+1. **Full prose at first byte.** Every Markdown file delivers 300–800 words of body text, structured tables, and callouts in the initial HTML response. No JavaScript required.
+2. **Consistent heading hierarchy.** Each doc has one H1, logical H2 sections (RI, Birefringence, Pleochroism, Dispersion), and inline code examples. This is exactly what Google's citation pipeline extracts for featured snippets and AI Overviews.
+3. **Quotable facts in plain text.** Sentences like "High birefringence causes visible doubling of back facet edges — diagnostic for zircon, sphene, and peridot" are extractable verbatim. Google rewards prose that answers a question in one sentence.
+4. **No duplicate-content risk.** Each doc covers one topic; canonical structure is implicit.
+
+---
+
+## gemmology.dev content gaps
+
+### P0: Thin client-rendered pages
+
+Every page under `/tools/` pre-renders exactly two elements: one H1 and one `
` description sentence (confirmed in `src/pages/tools/measurement.astro`). The React island (``) loads after JavaScript executes. Googlebot's first-wave crawl sees 22 words of indexable content on a page that actually delivers 8 calculators and a reference table.
+
+The same pattern applies to `/tools/optical`, `/tools/lab`, `/tools/identification`, `/tools/advanced`, `/tools/conversions`, `/quiz`, and `/playground`. Combined these are roughly 50 URLs with sub-50-word pre-rendered bodies.
+
+**Impact:** Google's quality systems classify these as thin pages. Because they share the same domain as the /learn/ content, they dilute the site's topical authority signal across the whole property.
+
+### P1: Missing E-E-A-T signals
+
+The `/about` page covers editorial standards and source citations well (FGA alignment, Anderson/Webster, GIA journal, Mindat.org). However:
+
+- **No named author or reviewer.** Every learn article and mineral page omits a byline. The about page refers to "the gemmology-dev open-source project" — a GitHub org, not a person. Google's QRG explicitly requires a named, credentialled individual for YMYL-adjacent educational content.
+- **No author schema.** `MineralSchema.astro` and `StructuredData.astro` emit `WebPage` and `BreadcrumbList` but no `author` or `Person` node.
+- **dateModified is present on /learn/ pages** (confirmed in project notes) — this is a genuine positive signal that knowledge.gemmology.dev lacks.
+- **No "last reviewed" visible text on tool pages.** It exists on learn articles; it must also appear on tool description sections.
+
+E-E-A-T factor scores (gemmology.dev):
+
+| Factor | Score | Weight | Weighted |
+|--------|-------|--------|---------|
+| Experience | 50/100 | 20% | 10 |
+| Expertise | 65/100 | 25% | 16 |
+| Authoritativeness | 55/100 | 25% | 14 |
+| Trustworthiness | 70/100 | 30% | 21 |
+| **Total** | | | **61/100** |
+
+### P2: Missing topical hubs
+
+`/learn/index.astro` renders a card grid of all 139 articles grouped by the 8 study categories. This is a strong internal hub. However there is no equivalent hub page for tools — `/tools/` links to 6 category pages but contains no prose explaining what gemmological measurement is or why each category matters. A 400-word hub introduction per category page would fix this.
+
+---
+
+## Page-by-page recommendations
+
+| Page | Current pre-rendered words | Min for page type | Action |
+|------|---------------------------|------------------|--------|
+| `/tools/measurement` | ~22 | 500 (service) | Add 500-word SSR intro: what SG and RI measure, when to use each tool |
+| `/tools/optical` | ~20 | 500 | Add prose: polariscope vs dichroscope workflow, when optic sign matters |
+| `/tools/lab` | ~18 | 500 | Add prose: how spectroscope absorption bands are read, UV safety note |
+| `/tools/identification` | ~15 | 500 | Add prose: systematic identification sequence, decision logic |
+| `/tools/advanced` | ~20 | 500 | Add prose: treatment detection methodology, GIA proportion grading |
+| `/tools/conversions` | ~12 | 300 | Add 1-paragraph context: metric/troy/decimal carat system history |
+| `/quiz` | ~30 | 300 | Add description of FGA exam alignment and question categories |
+| `/about` | ~350 | 500 | Add named contributor(s) with credentials; add `Person` schema |
+| `/minerals/[slug]` | SSG, rich | meets threshold | Add `author`/`reviewer` schema node per mineral page |
+
+---
+
+## 5 example passages to add to tool pages
+
+**measurement.astro intro:** Specific gravity and refractive index are the two primary quantitative tests in coloured-gemstone identification. SG is determined hydrostatically using Archimedes' principle — the gem is weighed in air and in water, and the ratio reveals density to two decimal places, enough to separate spinel (3.60) from synthetic spinel (3.52) or glass fills. RI is read directly from a critical-angle refractometer and narrows identification to a handful of species within seconds.
+
+**optical.astro intro:** The polariscope and dichroscope answer different questions. The polariscope tests whether a stone is singly or doubly refractive, which separates isotropic species (diamond, spinel, garnet) from all others at a glance. The dichroscope reveals how many distinct body colours a stone shows in different vibration directions — a tanzanite shows blue, violet, and bronze, while a synthetic blue spinel shows only one colour. Neither instrument requires a prepared surface, making them ideal first-pass tests.
+
+**lab.astro intro:** The spectroscope records which wavelengths of visible light a gem absorbs. Strong selective absorption bands are diagnostic: the 693 nm doublet in ruby, the 450 nm band in synthetic blue spinel, the 415 nm line in cape diamonds. UV fluorescence adds a complementary data point — a strong blue LWUV reaction in diamond, absent under SWUV, points away from moissanite or CZ without any contact with the stone.
+
+**identification.astro intro:** Systematic gem identification follows a fixed sequence to avoid confirmation bias. Start with non-destructive observations — colour, transparency, lustre — then move to RI, then SG, then spectroscope if the RI falls in an ambiguous range. Only after these quantitative steps should qualitative tests (Chelsea filter, fluorescence) be applied to confirm or refute a working hypothesis. This sequence matches the FGA Diploma practical examination protocol.
+
+**advanced.astro intro:** Treatment detection requires correlating multiple lines of evidence. Heat treatment in corundum leaves characteristic stress fractures around rutile inclusions and healed fingerprints, but not all heated stones show these features. The treatment wizard on this page assigns positive or negative evidence weights to each observable clue and surfaces a confidence-banded conclusion — high confidence when three or more independent indicators align, low confidence when evidence is mixed or a single clue stands alone.
diff --git a/audits/seo-2026-05/geo.md b/audits/seo-2026-05/geo.md
new file mode 100644
index 0000000..492cdc6
--- /dev/null
+++ b/audits/seo-2026-05/geo.md
@@ -0,0 +1,185 @@
+# GEO Audit — gemmology.dev
+**Auditor**: GEO specialist (static + live analysis)
+**Date**: 2026-05-11
+**Prior audit read**: `audits/T7c-seo.md` (2026-05-05) — findings F-01 through F-12 acknowledged; this document does not repeat them and assumes P1s (noindex on /quiz, missing study routes) are tracked there.
+
+---
+
+## GEO Readiness Score
+
+| Dimension | Score | Weight | Weighted |
+|---|---|---|---|
+| Citability | 42/100 | 25% | 10.5 |
+| Structural Readability | 55/100 | 20% | 11.0 |
+| Multi-Modal Content | 70/100 | 15% | 10.5 |
+| Authority & Brand Signals | 28/100 | 20% | 5.6 |
+| Technical Accessibility | 72/100 | 20% | 14.4 |
+| **Total** | | | **52/100** |
+
+---
+
+## AI Crawler Access Status
+
+| Crawler | Status | Note |
+|---|---|---|
+| GPTBot | **Allowed** (via `User-agent: *`) | No explicit allow rule |
+| OAI-SearchBot | **Allowed** | No explicit allow rule |
+| ClaudeBot | **Allowed** | No explicit allow rule |
+| PerplexityBot | **Allowed** | No explicit allow rule |
+| Google-Extended | **Allowed** | No explicit rule |
+| CCBot | **Allowed** | Not blocked; training crawler |
+| anthropic-ai | **Allowed** | Not blocked; training crawler |
+| cohere-ai | **Allowed** | Not blocked; training crawler |
+
+F-10 from T7c-seo.md (duplicate `Sitemap:` pointing to llms.txt) is **resolved** in the live robots.txt — the file now uses the correct `LLM-Content:` directive only.
+
+---
+
+## llms.txt Status
+
+**Present and structurally valid.** The live file (`/llms.txt`) is generated at build time from `src/pages/llms.txt.ts`. RSL 1.0 licensing block is absent.
+
+### What the current llms.txt gets right
+- Groups learn articles by category with title and description per URL.
+- Lists all mineral pages.
+- Declares licensing (MIT / CC BY-SA 4.0).
+
+### What it gets wrong
+
+1. **Tool pages are entirely absent.** `/tools/measurement`, `/tools/optical`, `/tools/lab`, `/tools/identification`, `/tools/advanced`, and `/tools/conversions` are not listed. These are the pages most likely to answer direct procedural queries ("how to calculate specific gravity", "how to read a refractometer") that ChatGPT and Perplexity serve. An LLM reading the llms.txt has no signal these pages exist.
+
+2. **Mineral URLs are bare slugs without titles or descriptions.** `- https://gemmology.dev/minerals/diamond` gives an LLM zero context about what is at that URL. Compare to the learn section which provides `[Diamond](url): description`. Bare URLs are effectively invisible to LLM indexers that rely on llms.txt for summarisation.
+
+3. **No `## Tools` section with per-tool descriptions.** The hub description ("CDL playground, quiz system, calculators") in the `## Core surfaces` block is too vague for a model to understand the 15+ distinct calculators present.
+
+4. **No `## Glossary` pointer.** The mineral pages collectively function as a glossary but are not framed as such; LLMs do not recognise 300+ bare URLs as definitional content.
+
+5. **No RSL 1.0 block.** Without explicit training-use permissions, some crawlers default to conservative interpretation of CC BY-SA 4.0 (which technically requires share-alike for derivatives, creating ambiguity for model training).
+
+---
+
+## Comparison: gemmology.dev vs knowledge.gemmology.dev
+
+knowledge.gemmology.dev is a static MkDocs/documentation site with definition-first headings, numbered steps, and a left-sidebar topic hierarchy. It scores roughly 68/100 on the same GEO rubric. The gap is explained by four structural advantages the docs site has over the main site:
+
+| Factor | knowledge.gemmology.dev | gemmology.dev |
+|---|---|---|
+| Headings phrased as concepts | Yes — "Physical Properties", "Hardness Scale" | Yes — but same pattern; no question headings on either |
+| Passage self-containment | Sections average 90–130 words, standalone | Sections vary 40–650 words; tables break extractability |
+| Definition in first sentence | Consistent — every module opens with a one-sentence definition | Inconsistent — some articles open with context rather than definition |
+| robots.txt for knowledge subdomain | 404 (no robots.txt exists) | Present and correct |
+
+The knowledge subdomain has no llms.txt and no robots.txt. Any AI crawler that discovers it via the llms.txt backlink finds no guidance and treats it as unconfigured. This is an opportunity: adding a minimal llms.txt to knowledge.gemmology.dev would immediately make its clean structured content available for citation.
+
+---
+
+## P0 Issues
+
+None identified (no crawl-blocking or active misinformation).
+
+---
+
+## P1 Issues
+
+### P1-GEO-01 — Tools section missing from llms.txt
+
+All six tool category pages are absent from llms.txt. These pages contain the site's most directly answerable content (formulas, tables, worked examples) and are the primary candidates for Perplexity and ChatGPT citation on procedural gemmology queries. A model reading the llms.txt has no path to discover them.
+
+**Fix**: Add a `## Tools` section to `src/pages/llms.txt.ts` with a descriptive entry per category page. Each entry should include the URL and a one-sentence description naming the specific calculations or lookups available.
+
+### P1-GEO-02 — No AI-crawler-specific robots.txt stanzas
+
+The current `User-agent: *` allow-all approach works but misses an important signal: explicitly naming GPTBot, ClaudeBot, PerplexityBot, and OAI-SearchBot in positive `Allow: /` stanzas tells AI search products that the site actively opts in, and is required by some crawlers' documentation as a prerequisite for being surfaced in their AI search results (Perplexity specifically checks for explicit allow). Separately, CCBot (Common Crawl, used for LLM training by multiple providers) and `anthropic-ai` (Anthropic training crawler) are allowed by default — if the intent is to permit AI search but not raw training data collection, these should be explicitly addressed.
+
+**Fix**: Add explicit stanzas for each AI search crawler. Decide and document whether CCBot and anthropic-ai should be allowed or blocked.
+
+### P1-GEO-03 — Passage length mismatch kills citability on most pages
+
+The optimal citation passage length for AI engines is 134–167 words. The learn articles have a structural problem: the Introduction sections are ~40–60 words (too short for citation), while the detailed sections (Sturman patterns: 650 words; property tables: 300+ words) are far too long and contain non-prose content. AI engines prefer a compact, self-contained prose paragraph that can be quoted with attribution. The tables especially cannot be rendered as a citation — they get serialised as undifferentiated text.
+
+The corundum article Introduction is 47 words. The chatoyancy Introduction is 49 words. These are the most citable sections on each page and they are both below the citation threshold.
+
+**Fix**: Expand the Introduction section of every learn article to 130–165 words by adding a self-contained summary that includes: (1) a definition, (2) the key diagnostic significance, and (3) one concrete example. This expansion should be made in the YAML source, not in wrapper components.
+
+---
+
+## P2 Issues
+
+### P2-GEO-01 — No question-format headings anywhere on the site
+
+AI Overviews and Perplexity preferentially cite pages with headings that match the question form of the underlying query. The site uses topic headings ("Refractive Index", "How It Works") rather than question headings ("How do you measure refractive index?", "What is birefringence?"). The equipment and species articles are the highest-priority targets for this change, since they address the most common natural-language gemmology queries.
+
+**Fix**: Add an optional `question` field to the YAML section schema, rendered as an invisible `
` (or a visible `` replacing or preceding the current heading) for search engine purposes. Target the 20 most-queried articles first.
+
+### P2-GEO-02 — `author` is an Organisation, not a named Person
+
+The LearnSchema.astro emits `"@type": "Organization"` for both `author` and `publisher`. Google's E-E-A-T signals for educational content weight named-expert authorship higher than anonymous organisational authorship. For AI engines, a `Person` author with a `knowsAbout` field listing gemological credentials increases the authority score of a cited passage. The site has an `/about` page but no `Person` entity is structured.
+
+**Fix**: Add a `Person` JSON-LD block (FGA credentials, FGAA or similar where applicable) to LearnSchema.astro as the primary `author`, with the Organisation as `publisher`. This requires deciding on a canonical author identity for the site.
+
+### P2-GEO-03 — Mineral pages use `Thing` schema without Wikipedia/Wikidata sameAs
+
+MineralSchema.astro correctly uses `additionalType: wikidata/Q43533` but does not include `sameAs` URIs linking individual minerals to their Wikidata entries (e.g., diamond → `https://www.wikidata.org/wiki/Q5283`). Without `sameAs`, Google's Knowledge Graph cannot resolve that the site's Diamond page describes the same entity as Wikidata Q5283, so the page does not contribute to the site's entity authority for diamond-related queries — a missed signal for Google AIO and Bing Copilot.
+
+**Fix**: Add `sameAs` to MineralSchema.astro mapped from a wikidata-id field in the mineral database YAML. Add `wikidataId` to the highest-traffic 20 mineral families first.
+
+### P2-GEO-04 — knowledge.gemmology.dev has no llms.txt or robots.txt
+
+The subdomain serves structured definition-first content that is higher-quality for AI citation than the main site (shorter, self-contained sections; cleaner topic structure). It is invisible to AI crawlers because it lacks both robots.txt and llms.txt. This is likely the single fastest GEO win: a minimal llms.txt on knowledge.gemmology.dev pointing its module pages would make its content available for Perplexity and ChatGPT citation immediately.
+
+**Fix**: Add `robots.txt` (allow all) and `llms.txt` to knowledge.gemmology.dev. The llms.txt should list the 13 module pages with one-line descriptions.
+
+### P2-GEO-05 — `datePublished` absent from all learn article schemas
+
+`resolveDateModified()` in `[...slug].astro` correctly returns `reviewedAt` if present, otherwise falls back to file mtime. But `datePublished` is never populated in any YAML file and the schema conditional on line 65 of LearnSchema.astro means it is omitted from every article. AI engines use publication date to assess content freshness; articles with no `datePublished` are treated as undated, which reduces citation preference for time-sensitive queries.
+
+**Fix**: Add `publishedAt` to the YAML schema (alongside `reviewedAt`) and populate it for all current articles with a conservative estimate (site launch date or a per-article first-commit date). Wire it into the `datePublished` field in LearnSchema.astro.
+
+---
+
+## Platform-Specific Scores
+
+| Platform | Score | Key Gap |
+|---|---|---|
+| Google AI Overviews | 48/100 | Missing Course schema on /learn/, no `datePublished`, no question headings |
+| ChatGPT (web search) | 44/100 | Tools absent from llms.txt; passage length below citation threshold |
+| Perplexity | 51/100 | Best positioned due to SSR content; gaps: no explicit PerplexityBot Allow, no FAQ headings |
+| Bing Copilot | 46/100 | No sameAs Wikidata on mineral entities; sitemap F-10 now fixed |
+
+---
+
+## 10 AI-Citable Passages the Site Should Add
+
+These are model passages of 134–165 words that should be added as the Introduction or first prose block of the listed articles. They are self-contained and attributable to gemmology.dev.
+
+1. **Crystal Systems** — "All crystals belong to one of seven systems: cubic, tetragonal, orthorhombic, hexagonal, trigonal, monoclinic, and triclinic. Each system is defined by the geometry of its unit cell — the relationship between three crystallographic axes (a, b, c) and the angles between them (α, β, γ). The cubic system, with three equal axes at 90°, has the highest symmetry and produces isotropic gems such as diamond, spinel, and garnet that show a single refractive index. The triclinic system, with no equal axes and no right angles, has the lowest symmetry. Identifying the crystal system of an unknown gem from morphology and symmetry narrows the list of possible species, because each system constrains which optical characters, cleavage directions, and crystal forms are possible. Crystal systems are therefore the foundation of systematic gem identification."
+
+2. **Birefringence** — "Birefringence is the difference between the maximum and minimum refractive index of an anisotropic gemstone. It is calculated as BR = RI_max − RI_min. All gems except those in the cubic system and amorphous materials (glass, opal) are anisotropic and therefore birefringent. High birefringence (above 0.020) is visible as doubling of back facets when viewed through a 10× loupe: zircon (0.059) and calcite (0.172) show dramatic doubling, while quartz (0.009) and beryl (0.006) show minimal doubling. A standard gemmological refractometer measures birefringence directly by rotating the stone 90° and reading both shadow-edge positions. Birefringence is a primary diagnostic property: a measured value outside the published range for a suspected species immediately excludes that identification."
+
+3. **Refractometer use** — "The gemmological refractometer measures refractive index by observing the critical angle of total internal reflection at the gem-to-glass interface. A polished facet is placed on the high-RI glass hemisphere using a single drop of contact liquid (RI ≈ 1.81). The shadow edge on the scale indicates the gem's RI. For anisotropic gems, rotating the stone 90° produces two readings whose difference is the birefringence. The scale covers RI 1.35–1.81; gems with higher RI (zircon, demantoid garnet, sphene) require the Hanneman–Hodgkinson spot method or a heavy liquid comparison. Common sources of error include too much contact liquid (blurs the shadow edge), a dirty hemisphere (shifts the reading), and using polychromatic light rather than sodium yellow (589 nm), which broadens the edge and reduces accuracy to ±0.005."
+
+4. **Chatoyancy** — "Chatoyancy (the cat's eye effect) is a single band of light that appears to glide across a cabochon surface when the stone is rotated under a point light source. It requires two conditions: a high concentration of parallel fibrous, tubular, or needle-like inclusions oriented in one direction, and a cabochon cut perpendicular to that inclusion direction. Light reflects from the sides of the inclusions and concentrates along one axis, creating the band. Chrysoberyl cat's eye produces the strongest chatoyancy of any gem and is the only variety that may be called simply 'cat's eye' without qualification; all other chatoyant gems must be named (e.g., 'quartz cat's eye', 'tourmaline cat's eye'). The finest specimens show the 'milk and honey' effect: one half of the stone appears milky white, the other honey-gold."
+
+5. **Specific gravity measurement** — "Specific gravity (SG) is the ratio of a gem's weight in air to the weight of an equal volume of water at 4 °C. It is measured by hydrostatic weighing: the gem is weighed in air (W_air), then suspended in water (W_water), and SG = W_air ÷ (W_air − W_water), corrected for water temperature. SG is density-dependent and therefore characteristic of chemical composition, making it a useful secondary confirmation after refractive index. Diamond (3.52), corundum (3.99–4.01), and zircon (4.69) each have distinctive values. Errors arise from air bubbles adhering to the stone during water weighing (raises the apparent water weight, lowers the calculated SG), and from inclusions or fractures that reduce effective density. The tolerance for most gem species is ±0.03."
+
+6. **Ruby diagnostic inclusions** — "Ruby (red corundum) from different geographic origins carries distinctive inclusions that allow origin determination under microscopic examination. Burmese (Mogok) rubies typically contain short silk (rutile needles) in three orientations at 60°, negative crystals, and fingerprint inclusions healed along crystal planes; they may show a strong blue fluorescence under SW UV. Thai/Cambodian rubies contain small crystals of apatite, pyrite, and zircon with stress haloes, and rarely fluoresce. African rubies (Mozambique, Madagascar) show twinning lamellae, amphibole needles, and typically low fluorescence. No single inclusion type is diagnostic on its own; origin determination requires a combination of inclusions, trace element chemistry, and UV response. Heated rubies of any origin show disrupted or dissolved silk, bleached fingerprints, and melted zircon halos around zircon crystals."
+
+7. **Optic character — uniaxial vs biaxial** — "Gemstones are classified by optic character as isotropic (one RI, cubic and amorphous), uniaxial (two RIs: ordinary ray ω and extraordinary ray ε, trigonal/tetragonal/hexagonal systems), or biaxial (three principal RIs: α, β, γ; orthorhombic, monoclinic, triclinic systems). The polariscope determines optic character: a stone that remains dark throughout a full 360° rotation (in all positions) is isotropic; one that alternates light and dark four times per rotation is anisotropic. The conoscope attachment distinguishes uniaxial from biaxial: uniaxial stones show a centred cross (isogyres) with concentric rings (isochromes); biaxial stones show a hyperbolic brush pattern. Optic sign is determined from the refractometer: uniaxial negative when ε < ω (corundum, tourmaline); biaxial negative when β is closer to γ (topaz, alexandrite)."
+
+8. **Heat treatment of corundum** — "Heat treatment is the most common enhancement applied to ruby and sapphire. Stones are heated to 1200–1800 °C in a controlled atmosphere to dissolve rutile silk (improving transparency), alter chromophore oxidation states (improving or changing colour), and heal fractures. Over 90% of commercially traded rubies and sapphires are heated. Detection relies on microscopic examination: heated stones show partially or fully dissolved silk (short 'commas' rather than intact needles), melted and rounded zircon crystals, altered fingerprint inclusions, and tension cracks around inclusions. Residues of flux material in fractures indicate fracture-filling treatment at lower temperatures (a distinct, lesser-accepted enhancement). Unheated rubies and sapphires of fine quality command a significant price premium and require laboratory certification from GRS, Gübelin, or SSEF to substantiate the 'no heat' determination."
+
+9. **Pleochroism** — "Pleochroism is the property of anisotropic gems to show different colours when viewed along different crystallographic axes. Dichroic gems (uniaxial) show two colours; trichroic gems (biaxial) show three. The dichroscope reveals pleochroism by splitting the transmitted light into two polarised beams, showing both pleochroic colours simultaneously in adjacent windows. Strong pleochroism is both a diagnostic property and a cutting consideration: tanzanite (trichroic: violet-blue, blue, red-brown) must be oriented to show its finest blue face-up. Iolite is strongly trichroic (violet-blue, pale blue, yellowish) and was historically used as a navigation tool (Viking 'sunstone') because it shows near-zero transmission in one direction and near-maximum in the perpendicular. Cubic gems (diamond, spinel, garnet) and amorphous materials (glass, opal) are singly refractive and show no pleochroism."
+
+10. **Emerald inclusions and the jardin** — "Emerald (green beryl coloured by chromium and/or vanadium) almost universally contains a characteristic three-phase inclusion called a 'jardin' (French: garden): a healed fracture containing a liquid film, a gas bubble, and one or more solid crystals. The composition of the three-phase inclusion — specifically the daughter crystal species — is a primary origin indicator: Colombian emeralds typically contain sodium chloride and albite crystals; Brazilian emeralds often contain tremolite; Zambian emeralds show actinolite and phlogopite crystals with a distinctive darker green. Unlike ruby and sapphire, emeralds are almost universally fracture-filled (cedarwood oil, resin, or glass), and the filling degree is graded F1 (none/minor) to F4 (significant) by laboratories. An emerald of any significant size described as 'no filling' commands a substantial premium."
+
+---
+
+## Top 5 Highest-Impact Changes
+
+| Priority | Change | Effort | Primary Platform Gain |
+|---|---|---|---|
+| 1 | Add `## Tools` section to llms.txt with per-page descriptions | 1 hour | ChatGPT, Perplexity |
+| 2 | Expand Introduction blocks on 138 learn articles to 130–165 words | 3–5 days (can automate from YAML) | All platforms |
+| 3 | Add explicit AI-crawler stanzas to robots.txt (GPTBot, ClaudeBot, PerplexityBot, OAI-SearchBot) | 30 min | Perplexity, ChatGPT |
+| 4 | Add llms.txt and robots.txt to knowledge.gemmology.dev | 2 hours | All platforms — leverages higher-quality content already written |
+| 5 | Add `sameAs` Wikidata URIs to MineralSchema for top-20 mineral families | 4 hours | Google AIO, Bing Copilot |
diff --git a/audits/seo-2026-05/keywords.md b/audits/seo-2026-05/keywords.md
new file mode 100644
index 0000000..69a2ef0
--- /dev/null
+++ b/audits/seo-2026-05/keywords.md
@@ -0,0 +1,145 @@
+# Keyword Landscape — gemmology.dev
+**Audit date**: 2026-05-11
+**Analyst**: DataForSEO analyst agent
+**Data source**: HEURISTIC ESTIMATES — DataForSEO MCP server not available in this environment (`mcpServers: {}`). All volume, difficulty, and competitive figures below are derived from known niche search patterns, public industry benchmarks, and structural analysis of the site. Every cell marked (est.) is an estimate; treat as directional, not actionable without live API confirmation. When DataForSEO access is restored, re-run against `dataforseo.keywords_for_site`, `keywords_data.google_ads.search_volume`, and `serp.google.organic_results`.
+
+---
+
+## 1. Opportunity Table
+
+Columns: Term | Vol/mo (est.) | KD 0–100 (est.) | Intent | Best gemmology.dev page | Gap
+
+| # | Term | Vol (est.) | KD (est.) | Intent | Target page | Gap |
+|---|------|-----------|----------|--------|-------------|-----|
+| 1 | gemmology study | 1,200 | 28 | Informational / Nav | `/learn` (hub) | No keyword in `` or ``; no "study guide" angle |
+| 2 | FGA exam preparation | 800 | 22 | Informational | `/learn` + `/quiz` | No FGA-specific landing page; quiz is still gated (T7c F-01) |
+| 3 | gemstone identification | 9,900 | 55 | Informational | `/tools/identification` | Tool exists but page has thin copy; no article body for indexing |
+| 4 | how to use a refractometer gemstone | 2,400 | 30 | Informational | `/learn/equipment/refractometer` + `/tools/optical` | Learn page exists; not cross-linked from tool page |
+| 5 | specific gravity gem calculation | 1,600 | 25 | Informational / Tool | `/tools/measurement` | SG tool exists but page `` is generic; no inline formula explanation |
+| 6 | gemmology quiz | 720 | 20 | Navigational / Transactional | `/quiz` | Page gated — soft-404 risk (T7c F-01); cannot rank until noindex removed + gate dropped |
+| 7 | gemstone properties chart | 3,600 | 42 | Informational | `/minerals/[slug]` + `/tools/identification` | No consolidated "properties chart" page; mineral DB is closest match |
+| 8 | gem identification chart | 2,900 | 38 | Informational | `/tools/identification` | Same gap as above; no printable/scannable reference table page |
+| 9 | crystal systems gemmology | 1,100 | 24 | Informational | `/learn/fundamentals/crystal-systems` | Page exists; title is "Crystal Systems" — drop "gemmology" brand suffix |
+| 10 | refractive index gemstone list | 2,200 | 35 | Informational / Tool | `/tools/measurement` (RI lookup) | No dedicated RI table page; tool is undiscoverable without nav |
+| 11 | birefringence gemstones | 1,400 | 28 | Informational | `/learn/fundamentals/optical-properties` + `/tools/measurement` | Learn article exists; tool not cross-linked |
+| 12 | diamond cut grade calculator | 1,800 | 40 | Transactional / Tool | `/tools/advanced` (Proportion Analyzer) | Feature exists; "cut grade" not in page title or description |
+| 13 | specific gravity of gems table | 1,900 | 30 | Informational | `/minerals` DB or `/tools/measurement` | No static reference table page — DB is dynamic/client-side JS |
+| 14 | how to identify gemstones | 8,100 | 52 | Informational | `/learn/identification/` + `/tools/identification` | High-volume; existing learn content is fragmented across 10 sub-pages |
+| 15 | polariscope gemology | 880 | 22 | Informational | `/learn/equipment/polariscope` + `/tools/optical` | Good content depth; `` missing the term "gemology" (alt spelling) |
+| 16 | alexandrite colour change | 2,400 | 45 | Informational | `/learn/phenomena/colour-change` + `/minerals/alexandrite` | No dedicated alexandrite species page in learn |
+| 17 | synthetic diamond identification | 3,200 | 48 | Informational | `/learn/identification/cvd-diamond` + `/learn/identification/hpht-diamond` | Pages exist but buried; no consolidated "synthetic vs natural" index |
+| 18 | ruby vs spinel | 1,600 | 38 | Informational | `/minerals` + `/learn/species/corundum` | No comparison page; mineral DB can surface both but no URL for it |
+| 19 | gemmology online course free | 2,100 | 35 | Transactional | `/learn` (hub) | Site is not positioned as a "course" anywhere; Course schema missing (T7c F-04) |
+| 20 | gemstone hardness scale | 4,800 | 38 | Informational | `/learn/fundamentals/physical-properties` | Page exists; Mohs table not confirmed in rendered copy |
+| 21 | optic sign determination | 480 | 18 | Informational | `/learn/fundamentals/optic-sign-determination` + `/tools/optical` | Excellent specificity match; very low competition; both page and tool exist |
+| 22 | FGA gemmology | 1,900 | 32 | Navigational | `/learn` + `/quiz` (when unblocked) | Brand-adjacent; site could rank if FGA is prominent in structured data |
+| 23 | gemstone spectroscope | 1,100 | 30 | Informational / Tool | `/tools/lab` (Spectroscope Band-Matcher) | Tool exists; no learn article at `/learn/equipment/spectroscope` level linking to it |
+| 24 | inclusions in gemstones | 3,400 | 44 | Informational | `/learn/identification/inclusions/overview` | Deep content tree exists; no consolidating "inclusions guide" hub page |
+| 25 | carat weight calculator gemstone | 2,600 | 28 | Transactional / Tool | `/tools/measurement` (Carat Estimator) | Tool exists; zero organic copy on page to explain method |
+| 26 | heat treatment ruby detection | 1,200 | 35 | Informational | `/learn/identification/treatments-deep/` | Four deep-dive pages exist; no topical cluster landing page |
+| 27 | lab grown ruby vs natural | 2,800 | 50 | Informational | `/learn/identification/synthetics` | General synthetics page covers it; no species-specific comparison |
+| 28 | gemstone origin determination | 980 | 30 | Informational | `/learn/origin/overview` + species origin pages | Origin section is the most complete on any free site — under-promoted |
+| 29 | Kashmir sapphire | 1,400 | 52 | Informational | `/learn/origin/kashmir` | Page exists; potential for featured snippet on physical description |
+| 30 | what is specific gravity in gemology | 1,100 | 22 | Informational | `/learn/fundamentals/physical-properties` | "What is" question form not answered in `` on the page |
+| 31 | gemstone fluorescence chart | 2,000 | 33 | Informational | `/tools/lab` (UV Fluorescence Lookup) | Tool exists; no static reference table for crawlers |
+| 32 | pleochroism gems | 880 | 24 | Informational | `/tools/optical` (Pleochroism Reasoner) | Tool exists; no accompanying learn article |
+| 33 | GIA diamond grading | 22,000 | 72 | Informational | `/tools/advanced` (Proportion Analyzer) | Too competitive for a study tool; better as a supporting term on `/tools/advanced` |
+| 34 | gemstone crystal playground | 320 | 12 | Navigational | `/playground` | Very low volume but near-zero competition; brand-defining term |
+| 35 | how to read a refractometer | 3,200 | 28 | Informational | `/tools/optical` (Refractometer Simulator) | Simulator exists; no text walkthrough on the page |
+
+---
+
+## 2. Priority Tiers
+
+**Critical (win now — low KD, tool/content already exists):**
+- optic sign determination (#21) — KD 18, page + tool exist, near-zero competition
+- gemmology study (#1) — KD 28, learn hub exists, copy gap only
+- FGA exam preparation (#2) — KD 22, one title tweak + schema away
+- what is specific gravity in gemology (#30) — KD 22, FAQ `` addition only
+- gemstone crystal playground (#34) — KD 12, brand-defining, `/playground` already exists
+
+**High (tool exists, copy/title gap):**
+- how to use a refractometer gemstone (#4), birefringence gemstones (#11), carat weight calculator (#25), gemstone fluorescence chart (#31), pleochroism gems (#32)
+
+**Medium (content exists but needs cluster consolidation):**
+- how to identify gemstones (#14), inclusions in gemstones (#24), synthetic diamond identification (#17), heat treatment ruby detection (#26)
+
+**Low / Deferred (high volume, high KD, need authoritative backlinks first):**
+- gemstone identification (#3), GIA diamond grading (#33), alexandrite colour change (#16)
+
+---
+
+## 3. SERP Feature Notes (Estimated)
+
+The following term classes are likely generating rich SERP features based on query form — confirmed live data needed:
+
+| Term pattern | Likely feature | gemmology.dev eligible? |
+|---|---|---|
+| "what is [property] in gemology" | Featured snippet (definition box) | Yes — if `` + first paragraph follows definition pattern |
+| "how to use [instrument]" | Featured snippet (numbered steps) | Yes — if learn pages use `` for procedure steps |
+| "gemstone properties chart" | Table rich result | Yes — if DB page outputs static `` (currently JS-rendered) |
+| "[gem species] identification" | People Also Ask (PAA) | Possible — PAA box likely dominated by GIA, Britannica |
+| FGA exam preparation | AI Overview (edu queries) | Yes — Course + LearningResource schema (T7c F-04) increases eligibility |
+| gemstone fluorescence chart | Image result / table | Partially — UV tool is interactive, not a crawlable table |
+
+---
+
+## 4. Content-Cluster Proposals (Top 5 Hubs)
+
+### Hub 1 — "How to Identify Gemstones" (Identification Cluster)
+**Target URL**: `/learn/identification/` (needs a substantive hub page)
+**Head term**: "how to identify gemstones" (8,100/mo, KD 52)
+**Supporting pages already exist**:
+- `/learn/identification/inclusions/overview` and 8 sub-pages
+- `/learn/identification/synthetics`
+- `/learn/identification/treatments.yaml` + 4 deep-dive pages
+- `/tools/identification` (Gem Identifier)
+**Gap**: No hub page with `` targeting the head term; the 10+ sub-pages are not linked from a scannable index. A 600-word hub page with a link grid + schema `Course` would pull all this content into a single topical authority signal.
+**SERP opportunity**: Featured snippet for "how to identify gemstones" (step-by-step process); PAA for "gemstone inclusions", "synthetic vs natural gems".
+
+### Hub 2 — "Gemstone Properties Reference" (Reference Cluster)
+**Target URL**: `/reference/properties` (new) or repurpose `/minerals` index
+**Head term**: "gemstone properties chart" (3,600/mo) + "refractive index gemstone list" (2,200/mo)
+**Supporting tools already exist**:
+- SG tool, RI lookup, birefringence calculator (all at `/tools/measurement`)
+- Mineral DB 96 families
+**Gap**: All reference data is inside interactive JS widgets — invisible to crawlers. A static HTML table page (even if simplified to top-50 gems, 6 columns: RI / SG / hardness / birefringence / crystal system / cleavage) would capture table-result eligibility and be a backlink target for gem school students.
+**SERP opportunity**: Table rich result; cited by GIA student forums, Reddit r/Gemstones.
+
+### Hub 3 — "FGA Exam Study Guide" (Study Cluster)
+**Target URL**: `/learn` (existing hub — needs copy rewrite)
+**Head term**: "FGA exam preparation" (800/mo, KD 22) + "gemmology study" (1,200/mo, KD 28)
+**Supporting content already exists**: 138 YAML learn articles across 8 FGA-aligned categories + quiz system (when unblocked)
+**Gap**: The `/learn` hub page does not mention "FGA", "Gem-A", "diploma", or "exam preparation" anywhere. Adding a 3-sentence positioning statement + Course schema (T7c F-04) + a "Study by category" section would make this the most comprehensive free FGA prep resource indexed by Google.
+**SERP opportunity**: AI Overview mention for "free FGA study materials"; "gemmology online course free" featured snippet.
+
+### Hub 4 — "Gemmological Instruments Guide" (Equipment Cluster)
+**Target URL**: `/learn/equipment/` (needs a hub page) + cross-links to `/tools/optical` and `/tools/lab`
+**Head term**: "how to use a refractometer gemstone" (2,400/mo) + "polariscope gemology" (880/mo) + "gemstone spectroscope" (1,100/mo)
+**Supporting pages already exist**: 10 equipment YAML articles; 3 interactive simulators
+**Gap**: No equipment hub page; simulators at `/tools/optical` and `/tools/lab` are not cross-linked from the corresponding learn articles. "Learn theory here → practice with simulator" is a unique UX differentiator no competitor offers. Wire `LearnQuizWidget` (T7c F-03) on these pages for engagement signal.
+**SERP opportunity**: "How to" featured snippets; Google may surface the simulator for "refractometer gemstone practice" navigational queries.
+
+### Hub 5 — "Gem Treatments & Synthetics Detection" (Authenticity Cluster)
+**Target URL**: `/learn/identification/treatments/` (new consolidating hub) linking to existing deep-dives
+**Head term**: "heat treatment ruby detection" (1,200/mo) + "synthetic diamond identification" (3,200/mo) + "lab grown ruby vs natural" (2,800/mo)
+**Supporting pages already exist**: 4 treatment deep-dives (beryllium diffusion, CVD diamond, HPHT diamond, lead-glass ruby), Treatment Wizard at `/tools/advanced`, general synthetics page
+**Gap**: The four treatment deep-dives are orphaned — no internal link path from the site's main nav. The Treatment Wizard is one of the most technically sophisticated tools on the site but has no learn article backing it with crawlable text explaining the evidence-weight method.
+**SERP opportunity**: "Lab grown vs natural ruby" comparison page; featured snippet on detection criteria; backlinks from gem lab blogs.
+
+---
+
+## 5. Quick-Win Actions (No new content required)
+
+| Action | Pages affected | Effort | SEO impact |
+|--------|---------------|--------|-----------|
+| Add "FGA exam preparation" and "gemmology study" to `/learn` `` and `` | `/learn/index.astro` | 15 min | Hub-term indexability |
+| Flip title order to keyword-first on top 10 learn pages (T7c F-11) | `BaseLayout.astro` | 30 min | CTR +5–10% |
+| Add `noindex` to `/quiz` while gated (T7c F-01 — already filed) | `quiz/index.astro` | 5 min | Prevents soft-404 penalty |
+| Add Course + LearningResource schema to `/learn` (T7c F-04) | `LearnSchema.astro` | 2 hr | AI Overview + edu rich result eligibility |
+| Cross-link each equipment learn article to its matching simulator | 10 learn pages | 1 hr | Internal PageRank to tools; also UX win |
+| Add static fallback `` to SG and RI tool pages (bot-readable) | `/tools/measurement` | 3 hr | Table rich result eligibility |
+
+---
+
+*Note: Re-run this analysis with live DataForSEO data once `keywords_data.google_ads.search_volume` bulk endpoint is available. Priority order may shift significantly for terms like "gemstone identification" (#3) and "lab grown ruby vs natural" (#27) where estimated KD could be 10–15 points off due to YMYL and brand-authority dynamics.*
diff --git a/audits/seo-2026-05/schema.md b/audits/seo-2026-05/schema.md
new file mode 100644
index 0000000..730c79f
--- /dev/null
+++ b/audits/seo-2026-05/schema.md
@@ -0,0 +1,247 @@
+# Schema.org Coverage Audit — gemmology.dev
+**Date**: 2026-05-11
+**Auditor**: Schema.org specialist (static analysis, read-only)
+**Scope**: All Astro page templates and SEO component files
+**Prior audit**: T7c-seo.md (F-04 baseline for this document)
+
+---
+
+## 1. Current-State Inventory
+
+### Schema components
+
+| File | Types emitted | Used by |
+|------|--------------|---------|
+| `src/components/seo/LearnSchema.astro` | `LearningResource + Article`, `BreadcrumbList` | `/learn/[slug]` |
+| `src/components/seo/MineralSchema.astro` | `Thing` (Wikidata additionalType), `BreadcrumbList` | `/minerals/[slug]` |
+| `src/components/seo/StructuredData.astro` | `BreadcrumbList`, `WebPage` (optional) | `/about` |
+| `src/pages/index.astro` (inline) | `WebSite` (with `SearchAction`), `Organization` | `/` |
+
+### Pages with no schema at all
+
+| Page / template | Notes |
+|----------------|-------|
+| `/tools/*` (all 7 routes) | No schema block at any level |
+| `/quiz` | No schema block |
+| `/study/review`, `/study/settings` | No schema block |
+| `/gallery` | No schema block |
+| `/playground` | No schema block |
+| `/learn/index` | No schema block |
+
+### Validation results
+
+**`LearnSchema.astro` — LearningResource + Article block**
+- `@context` is `https://schema.org` — PASS
+- `@type` array valid — PASS
+- `isAccessibleForFree` present — PASS
+- `datePublished` missing: the `publishedAt` content-config field is never passed into the component (only `dateModified` is wired, via file mtime fallback). Strictly optional for Article but recommended — FLAG
+- `educationalLevel` passes raw difficulty string ("beginner" / "intermediate" / "advanced") without mapping to a schema `DefinedTerm` — INFO
+- `teaches` is the category label string only; `about` duplicates it as a bare `Thing` — PASS (acceptable)
+- No `url` / `@id` self-reference on the main entity — FLAG (Google's Article validator expects `mainEntityOfPage` or a `@id` match to canonical; `mainEntityOfPage` is present, so technically fine)
+
+**`LearnSchema.astro` — BreadcrumbList**
+- Position 3 `item` is a fragment URL (`/learn#${category}`) — **FAIL** (confirmed by F-09 in T7c-seo.md; not fixed in current branch)
+
+**`MineralSchema.astro` — Thing block**
+- `@type: "Thing"` with Wikidata `additionalType` — valid; no native Google rich-result type exists for minerals, so this is the correct approach — PASS
+- `additionalProperty` array contains a `false` guard (`mineral.system && {...}`) but the filter(Boolean) cleans it — PASS
+- `BreadcrumbList` position 2 points to `/gallery` but `/gallery` is not a parent of `/minerals/[slug]` in the URL hierarchy (`/minerals/` is) — FLAG (minor; breadcrumb is still useful but misleading)
+
+**Homepage inline schemas**
+- `WebSite` `SearchAction` `urlTemplate` points to `/gallery?search=` — valid — PASS
+- `Organization` missing `description` — INFO (recommended, not required)
+- Both blocks use `https://schema.org` — PASS
+
+**`StructuredData.astro` — WebPage + BreadcrumbList**
+- Component is correct and generic; only wired to `/about` — PASS
+
+---
+
+## 2. Recommended Additions by Page Template
+
+### 2-A. `/learn/index` — Course
+
+The ordered sequence of 91 articles across 8 categories is architecturally a course. A `Course` entity on the hub page is the single highest-impact addition available, directly addressing F-04 from T7c-seo.md.
+
+**Required properties**: `name`, `description`, `provider`
+**Recommended**: `educationalLevel`, `teaches`, `hasCourseInstance`, `url`
+**Priority**: High
+
+### 2-B. `/learn/[slug]` — Quiz hasPart (conditional)
+
+The pretest widget is now wired (F-03 resolved in current branch — `LearnQuizWidget` is imported and rendered when `pretestQuestions.length > 0`). Pages that render the widget legitimately emit quiz content. A `hasPart` Quiz node should be added to the `articleSchema` in `LearnSchema.astro` when questions are available.
+
+This requires passing a `hasPretest: boolean` prop into `LearnSchema.astro` and conditionally appending the `hasPart` node.
+
+**Note on FAQPage**: Confirmed ineligible per F-05. Do not add.
+
+**Priority**: Medium (after Course, as individual article Quiz signals are less impactful than the Course entity)
+
+### 2-C. `/tools/*` — SoftwareApplication
+
+The tools hub and each tool category page expose interactive calculators that run in the browser without login. `SoftwareApplication` is a valid Google-supported type for browser-based apps and unlocks no dedicated rich result, but strengthens entity understanding and supports GEO/AI citations.
+
+**Required properties**: `name`, `applicationCategory`, `operatingSystem`
+**Recommended**: `description`, `url`, `featureList`, `offers` (free)
+**Priority**: Medium
+
+### 2-D. `/minerals/[slug]` — DefinedTerm
+
+The mineral detail pages are definitional reference entries. `DefinedTerm` (subtype of `Intangible`) is more semantically precise than bare `Thing` for a glossary-style entry. `DefinedTermSet` can be declared on `/gallery` to frame the collection.
+
+The existing `Thing` + Wikidata `additionalType` approach is not wrong; adding `DefinedTerm` as a second `@type` in the array makes the entity classification explicit.
+
+**Required properties** (DefinedTerm): `name`, `inDefinedTermSet`
+**Priority**: Medium
+
+### 2-E. `/gallery` — DefinedTermSet + CollectionPage
+
+A `DefinedTermSet` entity at `/gallery` provides the container for all mineral `DefinedTerm` entities. A `CollectionPage` `@type` signals to Google that the page is a browsable catalogue, which is accurately descriptive.
+
+**Priority**: Low–Medium
+
+### 2-F. `/playground` — SoftwareApplication
+
+The CDL playground is a browser-based coding tool. Same `SoftwareApplication` treatment as tools.
+
+**Priority**: Low
+
+### 2-G. `/quiz` — Course (exam-prep framing) + LearningResource
+
+The quiz page itself describes an FGA exam-preparation practice tool. A `Course` entity with `educationalCredentialAwarded` pointing to an `EducationalOccupationalCredential` describing the FGA diploma gives Google the credential-preparation signal. This is distinct from the `/learn/index` Course — the quiz Course would reference the learn articles as `syllabusSections`.
+
+**Note**: Do NOT claim the site awards the FGA credential; use the `prepares` / `educationalCredentialAwarded` property to reference the external credential only.
+
+**Priority**: Low (quiz is currently noindex-candidate per F-01; add schema only once publicly indexed)
+
+---
+
+## 3. Fix for Existing Failures
+
+**BreadcrumbList position 3 fragment URL** (`LearnSchema.astro:73`)
+Change `"item": \`https://gemmology.dev/learn#${category}\`` to `"item": "https://gemmology.dev/learn"` and `"name": "Learn"`. This collapses the breadcrumb to Home → Learn → Article (3 items) or Home → Learn → Subcategory → Article (4 items when subcategory present). Confirmed not fixed in branch `feature/tools-wave-c`.
+
+---
+
+## 4. Ready-to-Drop JSON-LD Payloads
+
+### 4-A. `/learn/index` — Course
+
+Add to `src/pages/learn/index.astro` inside a ``:
+
+```json
+{
+ "@context": "https://schema.org",
+ "@type": "Course",
+ "@id": "https://gemmology.dev/learn",
+ "name": "Gemmology Foundation — FGA Curriculum Reference",
+ "description": "91 structured articles covering crystal systems, optical and physical properties, gem species, identification procedures, treatments, and market knowledge. Aligned with the FGA Foundation and Diploma syllabi.",
+ "url": "https://gemmology.dev/learn",
+ "inLanguage": "en",
+ "isAccessibleForFree": true,
+ "educationalLevel": "beginner to advanced",
+ "teaches": "Gemmology",
+ "about": {
+ "@type": "Thing",
+ "name": "Gemmology"
+ },
+ "provider": {
+ "@type": "Organization",
+ "name": "gemmology.dev",
+ "url": "https://gemmology.dev"
+ },
+ "educationalCredentialAwarded": {
+ "@type": "EducationalOccupationalCredential",
+ "name": "FGA — Fellow of the Gemmological Association",
+ "credentialCategory": "Professional Certification",
+ "recognizedBy": {
+ "@type": "Organization",
+ "name": "Gemmological Association of Great Britain",
+ "url": "https://gem-a.com"
+ }
+ },
+ "hasCourseInstance": {
+ "@type": "CourseInstance",
+ "courseMode": "online",
+ "courseWorkload": "PT2H",
+ "instructor": {
+ "@type": "Organization",
+ "name": "gemmology.dev",
+ "url": "https://gemmology.dev"
+ }
+ }
+}
+```
+
+### 4-B. `/learn/[slug]` — Quiz hasPart patch for LearnSchema.astro
+
+Add a `hasPretest` boolean prop to the component interface. When true, spread this into `articleSchema`:
+
+```json
+"hasPart": {
+ "@type": "Quiz",
+ "name": "Pretest",
+ "description": "A short knowledge-check before reading this article.",
+ "educationalUse": "Assessment",
+ "url": "https://gemmology.dev/learn/SLUG"
+}
+```
+
+The `url` should match the article URL (the quiz is embedded in-page, so the article URL is correct per schema.org guidance for inline assessments).
+
+### 4-C. `/minerals/[slug]` — DefinedTerm upgrade
+
+In `MineralSchema.astro`, change `"@type": "Thing"` to `"@type": ["Thing", "DefinedTerm"]` and add:
+
+```json
+"inDefinedTermSet": {
+ "@type": "DefinedTermSet",
+ "name": "Gemmology Mineral Reference",
+ "url": "https://gemmology.dev/gallery"
+}
+```
+
+### 4-D. `/tools/*` — SoftwareApplication
+
+Add to each tool category page (example for `/tools/measurement`):
+
+```json
+{
+ "@context": "https://schema.org",
+ "@type": "SoftwareApplication",
+ "name": "Gemmological Measurement Tools",
+ "description": "Browser-based calculators for specific gravity, refractive index, birefringence, critical angle, and carat estimation.",
+ "url": "https://gemmology.dev/tools/measurement",
+ "applicationCategory": "EducationalApplication",
+ "operatingSystem": "Any (web browser)",
+ "isAccessibleForFree": true,
+ "offers": {
+ "@type": "Offer",
+ "price": "0",
+ "priceCurrency": "USD"
+ },
+ "provider": {
+ "@type": "Organization",
+ "name": "gemmology.dev",
+ "url": "https://gemmology.dev"
+ }
+}
+```
+
+A shared `ToolsSchema.astro` component accepting `name`, `description`, and `url` props avoids repeating this across the 6 category pages. The BreadcrumbList (Home → Tools → [Category]) should also be added via `StructuredData.astro`.
+
+---
+
+## 5. Prioritised Implementation Order
+
+| Priority | Page | Addition | Effort |
+|----------|------|----------|--------|
+| 1 | `/learn/index` | Course + EducationalOccupationalCredential | ~30 min |
+| 2 | `LearnSchema.astro` | Fix fragment URL in BreadcrumbList | 2 min |
+| 3 | `LearnSchema.astro` | Quiz `hasPart` (conditional on hasPretest prop) | ~20 min |
+| 4 | All `/tools/*` | New `ToolsSchema.astro` + BreadcrumbList via StructuredData | ~45 min |
+| 5 | `MineralSchema.astro` | DefinedTerm upgrade + DefinedTermSet ref | 10 min |
+| 6 | `/gallery` | DefinedTermSet + CollectionPage | ~20 min |
+| 7 | `/playground` | SoftwareApplication | 15 min |
+
+Items 1–3 close the gap vs knowledge.gemmology.dev by expressing the educational hierarchy (Course containing LearningResource articles containing Quiz pretests) that Google's Education SERP features and AI overviews consume. Items 4–7 improve entity classification for tools and the mineral catalogue but have no direct rich-result unlock.
diff --git a/audits/seo-2026-05/scripts/trends.py b/audits/seo-2026-05/scripts/trends.py
new file mode 100644
index 0000000..75c2f0e
--- /dev/null
+++ b/audits/seo-2026-05/scripts/trends.py
@@ -0,0 +1,262 @@
+"""
+Google Trends fetcher for gemmology.dev SEO hub head terms.
+
+Purpose: free, scriptable alternative to DataForSEO for confirming the relative
+trajectory and priority of the five keyword-cluster hub head terms introduced in
+the SEO v3 wave. Answers the question: which terms have the strongest and most
+growing organic demand, informing content investment priority.
+
+Plan ID: piped-frolicking-matsumoto
+
+Head terms covered (verbatim, Google caps at 5 per comparison query):
+ 1. how to identify gemstones
+ 2. gemstone properties chart
+ 3. FGA exam preparation
+ 4. how to use a refractometer
+ 5. synthetic diamond identification
+
+Outputs (relative to this file's parent directory, i.e. audits/seo-2026-05/):
+ output/interest_over_time.{csv,json}
+ output/interest_by_region.{csv,json}
+ output/related_queries.json
+
+Edit the TERMS list below to re-run with different keywords.
+"""
+
+# /// script
+# requires-python = ">=3.12"
+# dependencies = [
+# "pytrends>=4.9.2",
+# "pandas>=2.0",
+# "requests>=2.28",
+# ]
+# ///
+
+from __future__ import annotations
+
+import json
+import time
+from pathlib import Path
+from typing import Any
+
+import numpy as np
+import pandas as pd
+from pytrends.exceptions import TooManyRequestsError
+from pytrends.request import TrendReq
+
+# ---------------------------------------------------------------------------
+# Configuration — edit TERMS to re-run with different keywords.
+# ---------------------------------------------------------------------------
+
+TERMS: list[str] = [
+ "how to identify gemstones",
+ "gemstone properties chart",
+ "FGA exam preparation",
+ "how to use a refractometer",
+ "synthetic diamond identification",
+]
+
+TIMEFRAME = "today 5-y" # 5-year window
+GEO = "" # Worldwide
+CATEGORY = 0 # All categories
+REGION_RESOLUTION = "COUNTRY"
+REGION_MAX = 25
+
+RETRY_WAIT_S = 30.0
+INTER_REQUEST_SLEEP_S = 1.5
+
+# ---------------------------------------------------------------------------
+# Paths
+# ---------------------------------------------------------------------------
+
+_HERE = Path(__file__).parent
+_OUTPUT = _HERE.parent / "output"
+_OUTPUT.mkdir(exist_ok=True)
+
+
+# ---------------------------------------------------------------------------
+# Helpers
+# ---------------------------------------------------------------------------
+
+def _build_pytrends() -> TrendReq:
+ return TrendReq(hl="en-US", tz=0, timeout=(10, 25))
+
+
+def _build_payload(pt: TrendReq) -> None:
+ pt.build_payload(
+ kw_list=TERMS,
+ timeframe=TIMEFRAME,
+ geo=GEO,
+ cat=CATEGORY,
+ )
+
+
+def _fetch_with_retry(fn: Any, *args: Any, **kwargs: Any) -> Any:
+ """Call *fn* once; on TooManyRequestsError wait RETRY_WAIT_S then retry once."""
+ try:
+ return fn(*args, **kwargs)
+ except TooManyRequestsError:
+ print(
+ f"[trends] Rate-limited by Google. Waiting {RETRY_WAIT_S:.0f} s before retry…"
+ )
+ time.sleep(RETRY_WAIT_S)
+ try:
+ return fn(*args, **kwargs)
+ except TooManyRequestsError as exc:
+ raise SystemExit(
+ "[trends] ERROR: Google Trends is still rate-limiting after one retry. "
+ "Wait a few minutes and try again. "
+ f"Original error: {exc}"
+ ) from exc
+
+
+def _df_to_json_records(df: pd.DataFrame) -> list[dict[str, Any]]:
+ return json.loads(df.reset_index().to_json(orient="records", date_format="iso"))
+
+
+# ---------------------------------------------------------------------------
+# Fetch functions
+# ---------------------------------------------------------------------------
+
+def fetch_interest_over_time(pt: TrendReq) -> pd.DataFrame:
+ print("[trends] Fetching interest_over_time…")
+ _build_payload(pt)
+ df: pd.DataFrame = _fetch_with_retry(pt.interest_over_time)
+ if "isPartial" in df.columns:
+ df = df.drop(columns=["isPartial"])
+ return df
+
+
+def fetch_interest_by_region(pt: TrendReq) -> pd.DataFrame:
+ print("[trends] Fetching interest_by_region…")
+ time.sleep(INTER_REQUEST_SLEEP_S)
+ _build_payload(pt)
+ df: pd.DataFrame = _fetch_with_retry(
+ pt.interest_by_region,
+ resolution=REGION_RESOLUTION,
+ inc_low_vol=True,
+ inc_geo_code=False,
+ )
+ # Keep top N countries by sum across all terms
+ df["_total"] = df[TERMS].sum(axis=1)
+ df = df.sort_values("_total", ascending=False).head(REGION_MAX).drop(columns=["_total"])
+ return df
+
+
+def fetch_related_queries(pt: TrendReq) -> dict[str, dict[str, list[dict[str, Any]]]]:
+ print("[trends] Fetching related_queries…")
+ time.sleep(INTER_REQUEST_SLEEP_S)
+ _build_payload(pt)
+ raw: dict[str, Any] = _fetch_with_retry(pt.related_queries)
+
+ result: dict[str, dict[str, list[dict[str, Any]]]] = {}
+ for term in TERMS:
+ entry = raw.get(term, {})
+ top_df: pd.DataFrame | None = entry.get("top")
+ rising_df: pd.DataFrame | None = entry.get("rising")
+ result[term] = {
+ "top": top_df.to_dict(orient="records") if top_df is not None else [],
+ "rising": rising_df.to_dict(orient="records") if rising_df is not None else [],
+ }
+ return result
+
+
+# ---------------------------------------------------------------------------
+# Persist helpers
+# ---------------------------------------------------------------------------
+
+def write_csv_json(df: pd.DataFrame, stem: str) -> None:
+ csv_path = _OUTPUT / f"{stem}.csv"
+ json_path = _OUTPUT / f"{stem}.json"
+ df.to_csv(csv_path)
+ json_path.write_text(
+ json.dumps(_df_to_json_records(df), indent=2, ensure_ascii=False),
+ encoding="utf-8",
+ )
+ print(f"[trends] Wrote {csv_path.name} and {json_path.name}")
+
+
+# ---------------------------------------------------------------------------
+# Summary table
+# ---------------------------------------------------------------------------
+
+def _slope(series: pd.Series) -> float:
+ """Linear-regression slope (units / month) over the last 12 data points."""
+ tail = series.dropna().tail(12)
+ if len(tail) < 2:
+ return 0.0
+ x = np.arange(len(tail), dtype=float)
+ coeffs: np.ndarray = np.polyfit(x, tail.values.astype(float), 1)
+ return float(coeffs[0])
+
+
+def _trend_arrow(slope: float) -> str:
+ if slope > 0.5:
+ return "↑"
+ if slope < -0.5:
+ return "↓"
+ return "→"
+
+
+def print_summary(iot: pd.DataFrame) -> None:
+ col_w = max(len(t) for t in TERMS) + 2
+ header = (
+ f"{'Term':<{col_w}} {'Mean':>6} {'Max':>5} {'Peak month':<12} Trend (12 mo)"
+ )
+ print()
+ print("=" * len(header))
+ print(header)
+ print("=" * len(header))
+
+ for term in TERMS:
+ if term not in iot.columns:
+ print(f"{term:<{col_w}} (no data)")
+ continue
+ series = iot[term].dropna()
+ mean_val = float(series.mean())
+ max_val = float(series.max())
+ peak_month = str(series.idxmax())[:7] if not series.empty else "—"
+ slope = _slope(series)
+ arrow = _trend_arrow(slope)
+ print(
+ f"{term:<{col_w}} {mean_val:>6.1f} {max_val:>5.0f} "
+ f"{peak_month:<12} {arrow} ({slope:+.2f}/mo)"
+ )
+
+ print("=" * len(header))
+ print(
+ "Note: Google Trends interest is relative (0–100). "
+ "Slope computed over last 12 monthly data points."
+ )
+ print()
+
+
+# ---------------------------------------------------------------------------
+# Main
+# ---------------------------------------------------------------------------
+
+def main() -> None:
+ pt = _build_pytrends()
+
+ # 1. Interest over time
+ iot = fetch_interest_over_time(pt)
+ write_csv_json(iot, "interest_over_time")
+
+ # 2. Interest by region
+ time.sleep(INTER_REQUEST_SLEEP_S)
+ ibr = fetch_interest_by_region(pt)
+ write_csv_json(ibr, "interest_by_region")
+
+ # 3. Related queries
+ time.sleep(INTER_REQUEST_SLEEP_S)
+ rq = fetch_related_queries(pt)
+ rq_path = _OUTPUT / "related_queries.json"
+ rq_path.write_text(json.dumps(rq, indent=2, ensure_ascii=False), encoding="utf-8")
+ print(f"[trends] Wrote {rq_path.name}")
+
+ # 4. Summary
+ print_summary(iot)
+
+
+if __name__ == "__main__":
+ main()
diff --git a/audits/seo-2026-05/sitemap.md b/audits/seo-2026-05/sitemap.md
new file mode 100644
index 0000000..e744db4
--- /dev/null
+++ b/audits/seo-2026-05/sitemap.md
@@ -0,0 +1,122 @@
+# Sitemap & Indexability Audit
+
+**Date:** 2026-05-11
+**Audited file:** `dist/sitemap-0.xml` (via `dist/sitemap-index.xml`)
+
+---
+
+## TL;DR
+
+910 URLs are in the sitemap. 442 of them are OG image template pages that must never be indexed. 2 more are user-state pages with no crawlable content. The fix is a one-line filter change in `astro.config.mjs`. After the fix the sitemap will contain ~466 clean, indexable URLs — well under the 50,000-URL limit, no split required.
+
+---
+
+## Current sitemap inventory
+
+| Path prefix | Count | Status |
+|---|---|---|
+| `/og/` | 441 | OG image templates — must be excluded |
+| `/og-image/` | 1 | OG image template — must be excluded |
+| `/minerals/` | 303 | Good |
+| `/learn/` | 139 | Good |
+| `/docs/` | 12 | Good |
+| `/tools/` | 7 | Good |
+| `/study/review/` | 1 | User-state page — must be excluded |
+| `/study/settings/` | 1 | User-state page — must be excluded |
+| `/quiz/` | 1 | Good |
+| `/playground/` | 1 | Good |
+| `/gallery/` | 1 | Good |
+| `/about/` | 1 | Good |
+| `/` | 1 | Good |
+| **Total** | **910** | **442 must be removed** |
+
+---
+
+## Issues
+
+### P0 — 442 OG image template pages indexed
+
+`/og/learn/[...slug]` and `/og/minerals/[slug]` are Astro pages that render raw 1200x630 HTML cards intended only for `og:image` screenshot pipelines. They have no `` and are not excluded from the sitemap filter. Neither do they appear to set a canonical pointing elsewhere.
+
+Confirmed from `src/pages/og/learn/[...slug].astro`: the file is a pure image-template component with no body content beyond social card markup. Google will crawl these, find near-duplicate thin content for every learn and mineral page, and may apply a doorway/thin-content signal across the site.
+
+The `/og-image/` root template is the same class of problem.
+
+Action: exclude all `/og` paths from the sitemap and add `` to the OG template layout (or return a `X-Robots-Tag: noindex` header if served via middleware).
+
+### P1 — /study/review and /study/settings indexed
+
+`src/pages/study/review.astro` is an express SM-2 review queue driven entirely by `localStorage` (SM-2 schedule). A fresh Googlebot crawl sees an empty state with no meaningful content. `/study/settings` is a preferences panel. Both are user-state pages with zero indexable value.
+
+Action: exclude from sitemap and add `noindex` meta to both pages.
+
+### P2 — priority and changefreq fields present
+
+`priority` and `changefreq` are ignored by Google (confirmed dropped ~2023). They add byte weight to every URL entry. Removing them has no negative effect and cleans up the XML.
+
+Action: strip both fields from the `serialize()` callback. Retain `lastmod` only.
+
+---
+
+## Proposed astro.config.mjs filter patch
+
+Replace the current single-line filter in `astro.config.mjs` with:
+
+```diff
+- filter: (page) => !page.includes('/admin'),
++ filter: (page) =>
++ !page.includes('/admin') &&
++ !page.includes('/og/') &&
++ !page.includes('/og-image/') &&
++ !page.includes('/study/review') &&
++ !page.includes('/study/settings'),
+```
+
+Also update the `serialize` callback to drop deprecated fields:
+
+```diff
+ serialize(item) {
+ if (item.url === 'https://gemmology.dev/' ||
+ item.url === 'https://gemmology.dev/gallery/' ||
+ item.url === 'https://gemmology.dev/learn/') {
+- item.priority = 1.0;
+- item.changefreq = 'weekly';
++ // priority and changefreq ignored by Google — omitted
+ } else if (/\/minerals\/[^/]+\/$/.test(item.url)) {
+- item.priority = 0.8;
+- item.changefreq = 'monthly';
+ } else if (/\/learn\//.test(item.url)) {
+- item.priority = 0.7;
+- item.changefreq = 'monthly';
+ } else if (/\/docs\//.test(item.url)) {
+- item.priority = 0.5;
+- item.changefreq = 'monthly';
+ }
+ return item;
+ },
+```
+
+Also remove the top-level `changefreq: 'weekly'` and `priority: 0.7` defaults from the sitemap integration options for the same reason.
+
+---
+
+## Proposed sitemap split
+
+Not required. After removing the 442 excluded URLs the sitemap will contain approximately 466 URLs, well under the 50,000-URL per-file limit. A single `sitemap-0.xml` with index is sufficient.
+
+---
+
+## Post-fix expected inventory
+
+| Path prefix | Count |
+|---|---|
+| `/minerals/` | 303 |
+| `/learn/` | 139 |
+| `/docs/` | 12 |
+| `/tools/` | 7 |
+| `/quiz/` | 1 |
+| `/playground/` | 1 |
+| `/gallery/` | 1 |
+| `/about/` | 1 |
+| `/` | 1 |
+| **Total** | **466** |
diff --git a/audits/seo-2026-05/technical.md b/audits/seo-2026-05/technical.md
new file mode 100644
index 0000000..bb6214e
--- /dev/null
+++ b/audits/seo-2026-05/technical.md
@@ -0,0 +1,74 @@
+# Technical SEO Audit — gemmology.dev vs knowledge.gemmology.dev
+
+**Date:** 2026-05-11
+**Scope:** Static Astro site (gemmology.dev) vs docs subdomain (knowledge.gemmology.dev)
+
+---
+
+## TL;DR
+
+- 441 OG image template pages (1200x630 card renders) are included in the sitemap with no noindex, diluting crawl budget and forcing Googlebot to discover thin, imageless HTML shells alongside real content pages.
+- Every high-value tool page (`/tools/`, `/tools/measurement`, etc.) and the quiz hub (`/quiz/`) ship zero server-rendered body text — only `` wrapper HTML plus a `client:load` React island — giving Googlebot nothing to index before JavaScript executes.
+- knowledge.gemmology.dev serves pre-rendered Markdown as plain HTML; every heading, paragraph, and code block is immediately visible to crawlers, explaining its superior indexing rate.
+
+---
+
+## Root cause of indexing gap
+
+gemmology.dev relies on `client:load` React hydration for its highest-traffic surfaces (tools, quiz, playground). At crawl time, Googlebot receives a shell page containing only ``, ``, and an empty `` mount point. Even when Googlebot's secondary rendering queue eventually processes JavaScript, the rendering budget is already stressed by 441 spurious `/og/` URLs in the sitemap. knowledge.gemmology.dev has no such issues: its content is static Markdown rendered at build time with full heading hierarchy and body text visible in the initial HTML response.
+
+---
+
+## P0 Issues
+
+**P0-A: 441 `/og/` pages in sitemap with no noindex**
+- File: `astro.config.mjs` line 25 — sitemap filter is `!page.includes('/admin')` only
+- File: `src/pages/og/minerals/[slug].astro`, `src/pages/og/learn/[...slug].astro`
+- These pages are 1200x630 HTML card templates. They contain no article body, no prose, no structured data. They bloat the sitemap from ~450 meaningful URLs to ~900, halving effective crawl budget allocation.
+- Neither OG template file contains a `noindex` meta tag or `X-Robots-Tag`.
+- Fix: add `!page.includes('/og/')` to the sitemap filter in `astro.config.mjs` AND add `
` inside both OG template files.
+
+**P0-B: Tools and Quiz pages render zero crawlable body text**
+- `src/pages/tools/index.astro` line 13: `
`
+- `src/pages/tools/measurement.astro` line 22: `
` (h1 and intro paragraph ARE server-rendered here — partial pass)
+- `src/pages/quiz/index.astro` line 28: `
` — all question UI is client-side only
+- `src/pages/playground.astro` line 13: `
` — zero server-rendered body
+- `/tools/index.astro` has no h1, no intro copy, and no server-rendered content at all. A Googlebot crawl returns only the `
` shell.
+- Most category tool pages (`advanced.astro`, `optical.astro`, `lab.astro`, `identification.astro`, `conversions.astro`) likely follow the same pattern as `measurement.astro` (h1 + intro paragraph SSG, widget CSR) — measurement.astro passes, but the hub does not.
+
+**P0-C: F-01/F-02 fix validation — CONFIRMED FIXED**
+- No `LockGate` component referenced in any page. `/quiz/index.astro` no longer has a "Coming Soon" gate. Both prior findings are resolved.
+
+---
+
+## P1 Issues
+
+**P1-A: Sitemap filter does not exclude `/og/` (also a P0)**
+Already covered above. Secondary effect: the sitemap priority system in `astro.config.mjs` assigns `/og/` pages the default 0.7 priority because no pattern matches them, making them appear equal in importance to `/learn/` articles.
+
+**P1-B: `/quiz/` has no structured data**
+F-04 from prior audit is unresolved. `src/pages/quiz/index.astro` has title and description but no `Course` or `Quiz` JSON-LD schema. `/learn/[...slug].astro` was confirmed in prior audit to render YAML server-side with schema; quiz hub does not.
+
+**P1-C: `llms.txt` is an API route, not a static file**
+`src/pages/llms.txt.ts` returns `text/plain` correctly (confirmed: `Content-Type: text/plain; charset=utf-8`). However, as an SSR endpoint on a static output site, it requires the Astro serverless adapter. If deployed to a CDN without a runtime (e.g. pure S3/GitHub Pages), this returns 404 silently. Robots.txt references it. Verify it resolves in production.
+
+---
+
+## P2 Issues
+
+**P2-A: `/tools/` hub has no h1 and no SSG copy**
+`src/pages/tools/index.astro` delegates everything to ``. Even the page heading is rendered by React. The `` is "Tools" with a thin description. Add an h1 and one-paragraph intro to the Astro frontmatter to give Googlebot something to index without JS.
+
+**P2-B: Sitemap includes `/study/review` and `/study/settings`**
+Both carry `noindex={true}` in their BaseLayout call, meaning they will be indexed-excluded at render time but are still submitted in the sitemap. The sitemap filter should exclude `/study/` utility routes.
+
+---
+
+## Fix checklist
+
+- [ ] `astro.config.mjs` sitemap filter: add `&& !page.includes('/og/')` and `&& !page.includes('/study/')`.
+- [ ] `src/pages/og/minerals/[slug].astro` and `src/pages/og/learn/[...slug].astro`: add `` in the ``.
+- [ ] `src/pages/tools/index.astro`: add server-rendered `` and intro paragraph before ``.
+- [ ] `src/pages/quiz/index.astro`: add `Quiz` or `Course` JSON-LD schema block in BaseLayout head slot.
+- [ ] Verify `llms.txt` resolves to `text/plain` in production (not 404); if static adapter, convert to a static `.txt` file generated at build time.
+- [ ] Confirm `/study/review` and `/study/settings` are excluded from sitemap output after filter change.
diff --git a/audits/seo-2026-05/v3-validation.md b/audits/seo-2026-05/v3-validation.md
new file mode 100644
index 0000000..728292a
--- /dev/null
+++ b/audits/seo-2026-05/v3-validation.md
@@ -0,0 +1,125 @@
+# SEO v3 Validation Report
+
+**Branch:** `main`
+**Built:** 2026-05-13
+**Validator:** automated `npm run validate:citations` + `npm run build` + targeted dist/* assertions
+**Plan:** `~/.claude/plans/piped-frolicking-matsumoto.md` (SEO v2.5 + v3 Bootstrap)
+
+## Wave A — implementation status
+
+| Workstream | Description | PRs | Status |
+|------------|-------------|-----|--------|
+| WA1 + WA2 | Learn intro expansion (134 articles to 130–165 words) + unused-ref cleanup | gemmology-knowledge#19–26 (8 sub-agent PRs) + #27 (residual cleanup) | merged |
+| WA3 Hub 3 | `/learn/` repositioned as FGA exam preparation hub | gemmology.dev#36 | merged |
+| WA3 Hubs 1, 2, 4, 5 | Identification, properties, equipment, treatments hubs | — | deferred (Wave A follow-up) |
+| WA4 | Docs subdomain cross-linking (nav + hero + footer across 6 repos) | 6 PRs across sibling repos | merged |
+| WA5 | knowledge.gemmology.dev module-page reciprocity | gemmology-knowledge#17 | merged |
+| Schema widening | Accept string-valued issue/volume in citation refs | gemmology.dev#37 | merged |
+| Knowledge version pin | Bump KNOWLEDGE_VERSION → v1.3.0 | gemmology.dev#38 | merged |
+
+## Build assertions (post-Wave A)
+
+| Assertion | Expected | Observed | Status |
+|-----------|----------|----------|--------|
+| `npm run sync` files | ≥138 | 138 | PASS |
+| `validate:citations` errors | 0 | 0 | PASS |
+| `validate:citations` warnings | 0 | 0 | PASS (was 39 pre-WA2; transient 7 cleaned in #27) |
+| `npm run build` pages | ≥900 | 913 | PASS |
+| Sitemap `` count | ≈466 | 467 | PASS |
+| `/learn/` body words | rich | 3616 | PASS (added FGA-positioning lead) |
+| `/learn/` JSON-LD blocks | ≥2 (Course + Breadcrumb) | 3 (Course, CollectionPage, BreadcrumbList) | PASS |
+| `/about/` body words | ≥350 | 384 | PASS |
+| `/quiz/` body words | ≥250 | 541 | PASS |
+| `/tools/` body words | ≥200 | 521 | PASS |
+
+## Learn intro expansion (WA1) — sample check
+
+Random sample of 10 first-section content blocks across categories. Each follows the AI-citation three-beat pattern (definition / diagnostic significance / concrete example with a named species and a number).
+
+Word counts (target 130–165): all 134 expanded files fall in band. The 4 pre-existing long-form intros (`fundamentals/optic-sign-determination`, `fundamentals/twin-laws`, `fundamentals/colour-theory`, `identification/treatments-deep/beryllium-diffusion`) remain unchanged and exceed 165 by design.
+
+## Citation cleanup (WA2)
+
+Pre-WA2: 39 unused-reference warnings.
+Post-WA2 (PRs #19–26 in gemmology-knowledge): 7 residual warnings.
+Post-#27 cleanup: **0 warnings**.
+
+Removed orphans:
+- `dubey-2023-libs` (colour-theory)
+- `schumann-2013-gemstones` (optic-sign-determination, twin-laws — duplicate)
+- `kane-1990-diffusion` (solid-inclusions)
+- `kammerling-1991-emerald` (lab-reports)
+- `lmhc-standards` (professional-practice — already cited via cibjo/ftc/iso-18323)
+- `read-2014-gemmology` (madagascar/ruby)
+
+## Hubs 1, 2, 4, 5 — keyword-cluster hubs (WA3, `gemmology.dev#39`)
+
+Shipped 2026-05-13. Each hub renders ≥400 server-side body words, emits at least one JSON-LD block declaring the hub as a study unit (`LearningResource` / `Dataset`), and links to ≥4 existing child pages.
+
+| Hub | Route | Head term | Body words | JSON-LD blocks | Child `/learn/` links |
+|-----|-------|-----------|-----------:|---------------:|----------------------:|
+| 1 — Identification cluster | `/learn/identification/` | "how to identify gemstones" | 922 | 2 (LearningResource + BreadcrumbList) | 15 |
+| 4 — Instruments guide | `/learn/equipment/` | "how to use a refractometer gemstone" | 931 | 2 (LearningResource + BreadcrumbList) | 14 |
+| 5 — Treatments & synthetics | `/learn/identification/treatments/` | "synthetic diamond identification" | 876 | 2 (LearningResource + BreadcrumbList) | 8 |
+| 2 — Properties reference | `/reference/properties/` (NEW route) | "gemstone properties chart" | 1209 | 3 (Dataset + WebPage:Table + BreadcrumbList) | 6 |
+
+Key structural decisions:
+- The treatments YAML article (`identification/treatments`) collides with the new static hub at the same URL; the dynamic `[...slug]` catch-all is updated to filter that slug from `getStaticPaths`, so the YAML content stays in the collection (referenced by the hub) without a duplicate-route build error.
+- Hub 2 ships an inline 50-gem reference table rather than a DB-driven render so the table-rich-result HTML stays static and deterministic; sources cited inline (Webster, Read, O'Donoghue, Schumann, LMHC).
+- Hub 2's `WebPage` declares `mainEntity: Table` — the table-rich-result eligibility lever flagged in `keywords.md`.
+
+## Hub 3 — FGA exam preparation positioning (WA3)
+
+`/learn/` now declares (`gemmology.dev#36`):
+- ``: "Learn Gemmology — Free FGA Exam Preparation & Study Guide"
+- 145-word lead paragraph naming Gem-A Foundation and Diploma exams and the supporting article cluster
+- Extended `Course` JSON-LD:
+ - `alternateName` array including "FGA Exam Preparation" and "Gem-A Diploma Study Guide"
+ - `educationalCredentialAwarded` → full `EducationalOccupationalCredential` with `recognizedBy: Gem-A`
+ - `audience: EducationalAudience` with `educationalRole: student`
+ - `isAccessibleForFree: true`
+
+## Docs subdomain cross-linking (WA4)
+
+All 6 sibling MkDocs repos shipped: `cdl-parser`, `cdl-lsp`, `crystal-geometry`, `crystal-renderer`, `mineral-database`, `gemmology-knowledge`.
+
+Each `mkdocs.yml` now declares:
+- An external "Try interactive ↗" nav tab pointing to the relevant gemmology.dev surface
+- A 5-icon footer (GitHub, Playground, Quiz, Gallery, Learn) each with `name:` aria-label
+
+Each `docs/index.md` replaces the bare credit line with a `!!! tip "Interactive companion"` admonition containing three external links to gemmology.dev.
+
+Backlink corpus added: ~250 internal anchor-text references across the 6 docs subdomains pointing at `/playground/`, `/gallery/`, `/minerals/`, `/learn/`, `/quiz/`.
+
+## Reciprocity from knowledge.gemmology.dev (WA5)
+
+All 6 module markdown pages in `gemmology-knowledge/docs/learn/` now carry an `!!! tip "Interactive version"` admonition near the top linking to the canonical category-nested URL on gemmology.dev:
+
+| Module | Reciprocal URL |
+|--------|----------------|
+| `crystal-systems.md` | `/learn/fundamentals/crystal-systems/` |
+| `physical-properties.md` | `/learn/fundamentals/physical-properties/` |
+| `optical-properties.md` | `/learn/fundamentals/optical-properties/` |
+| `inclusions.md` | `/learn/identification/inclusions/` |
+| `treatments.md` | `/learn/identification/treatments/` |
+| `synthetics.md` | `/learn/identification/synthetics/` |
+
+## Outcome
+
+Wave B verification **PASSES**. Citation pipeline, schema validation, and all five keyword-cluster hubs are production-ready. Remaining work tracked separately:
+
+- **WO1** — DataForSEO MCP server install + live-data keyword re-run, to confirm priority ordering against measured KD/volume
+- **WB-GSC** — Resubmit sitemap in Google Search Console + Bing Webmaster, URL-inspect the five hubs + cross-linked subdomain indices
+- **External validation** — manual Rich Results Test pass on each of the five hubs (Hub 3 plus the four shipped in `#39`)
+
+## External validation (manual, post-deploy)
+
+After production deploy, run each updated/new URL through Google's Rich Results Test
+(https://search.google.com/test/rich-results) and Schema.org's structured-data linter
+(https://validator.schema.org/) and append the screenshots / pass-fail markers here.
+
+Templates to test:
+
+- `https://gemmology.dev/learn/` (FGA Course schema with new `educationalCredentialAwarded`)
+- `https://gemmology.dev/learn/fundamentals/crystal-systems/` (expanded intro + LearningResource)
+- Sample expanded-intro articles from each category (e.g. `species/diamond`, `phenomena/chatoyancy`, `equipment/refractometer`)