Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -39,3 +39,8 @@ public/minerals.db
# Synced content from gemmology-knowledge
.cache/
src/content/learn/

# Local-only directories
.tmp/
.trees/
.claude/
66 changes: 66 additions & 0 deletions audits/seo-2026-05/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
# SEO Investigation 2026-05 — gemmology.dev vs knowledge.gemmology.dev

**Question:** Why is gemmology.dev indexed worse than knowledge.gemmology.dev despite being the better site? And how do we optimise SEO for gemmology-study and gemstone-research search intent?

**Method:** Parallel agent dispatch — six specialist SEO agents (technical, content, GEO, schema, keyword research, sitemap) audited the site against the docs subdomain and against prior audit `audits/T7c-seo.md`.

## Files in this directory

| File | Owner | Findings |
|------|-------|----------|
| `technical.md` | seo-technical | Crawl/render/index gap analysis |
| `content.md` | seo-content | Thin-page risk on /tools, /quiz, /playground; E-E-A-T gaps |
| `geo.md` | seo-geo | llms.txt structure + AI citability; knowledge.subdomain has NO robots.txt |
| `schema.md` | seo-schema | Missing Course/Quiz/SoftwareApplication JSON-LD |
| `keywords.md` | seo-dataforseo | 35 seed terms — heuristic (no DataForSEO MCP) |
| `sitemap.md` | seo-sitemap | 442 of 910 sitemap URLs are OG image templates |

## Root cause of the indexing gap (convergent finding)

Three converging issues — not a content quality problem:

1. **Sitemap pollution.** 442 of 910 sitemap URLs are `/og/...` image-template pages (1200×630 social cards). Google sees a sitemap where 48% of entries are thin near-duplicate pages, halving effective crawl budget. knowledge.gemmology.dev has no such junk.
2. **Empty shells for the high-value surfaces.** `/tools/`, `/quiz/`, `/playground/`, `/tools/optical`, `/tools/lab`, etc. ship as `client:load` React islands. First-byte HTML for these pages is `<h1>+<p>` at best, often nothing on the hub. knowledge.gemmology.dev serves Markdown→HTML with 300–800 words of indexable prose per page.
3. **Knowledge-subdomain advantage is purely structural.** Same authors, same domain authority, but the docs subdomain ships content as plain HTML and gemmology.dev ships it as JavaScript. Content quality is not the gap. Crawlability is.

Secondary: knowledge.gemmology.dev itself has **no robots.txt and no llms.txt** — it indexes well in classical search but is invisible to AI crawlers. The fastest GEO win is adding those two files to the docs subdomain.

## Prioritised fix list

### P0 — Indexing gap closers (1–2 days of work)

1. **`astro.config.mjs` sitemap filter.** Exclude `/og/`, `/og-image/`, `/study/review`, `/study/settings`. Exact diff in `sitemap.md`. Effect: sitemap drops from 910 → ~466 clean URLs.
2. **Add `noindex` to OG templates.** `src/pages/og/learn/[...slug].astro` and `src/pages/og/minerals/[slug].astro` need `<meta name="robots" content="noindex, nofollow">`.
3. **Add SSG intro prose to tool pages.** 5 example paragraphs in `content.md`. Each tool category page (`measurement.astro`, `optical.astro`, `lab.astro`, `identification.astro`, `advanced.astro`, `conversions.astro`) gets 300–500 server-rendered words before the React island. Also add SSG intro + `<h1>` to `/tools/` hub (currently fully client-rendered).
4. **Fix LearnSchema BreadcrumbList fragment URL.** `LearnSchema.astro` line 73 emits `learn#fundamentals` as a BreadcrumbList item — invalid. Change to canonical `/learn`. (Carried over from prior audit, still open.)

### P1 — Schema and structure (2–3 days)

5. **`Course` + `EducationalOccupationalCredential` schema on `/learn/index`.** Tells Google the 139 articles form an FGA-prep curriculum. Highest single AI-visibility lever per the schema audit.
6. **`Quiz` + `hasPart` on learn articles (F-04 unblock).** Now that `LearnQuizWidget` is wired, `LearnSchema.astro` can declare per-article quizzes. Boilerplate JSON-LD in `schema.md`.
7. **`SoftwareApplication` JSON-LD on `/tools/*`.** Single shared `ToolsSchema.astro` covers all 6 category pages.
8. **Tools hub h1 + intro.** Server-render an h1 and 200-word intro on `/tools/index.astro` (currently 100% client-rendered).
9. **`/quiz/` `Course` schema** with the 8 categories declared as `hasCourseInstance`.

### P2 — GEO / AI-search readiness (3–5 days)

10. **Restructure `llms.txt`** — currently omits all `/tools/` pages and lists mineral URLs as bare slugs without descriptions. Add `## Tools` section with one line per category. Convert mineral bare URLs to titled entries. Source: `src/pages/llms.txt.ts`.
11. **Named AI-crawler stanzas in robots.txt.** Perplexity's crawler docs require explicit `Allow` per `User-agent: PerplexityBot`. Add stanzas for GPTBot, OAI-SearchBot, ClaudeBot, PerplexityBot, Google-Extended.
12. **knowledge.gemmology.dev gets robots.txt + llms.txt.** Two files, two hours. Unlocks AI-crawler access to the better-structured docs content.
13. **Expand `/learn/[slug]` Introduction blocks to 130–165 words** in YAML source — the citation-optimum length for AI engines. Per-passage examples in `geo.md`.
14. **Add named author + credentials** to `/about` and to `LearnSchema` Person node. Currently every page bylines an Organisation, not a credentialed individual — weak E-E-A-T signal.

### P3 — Keyword & content programme (ongoing)

15. **Build a static crawlable mineral-properties reference table** (the mineral DB is currently sql.js, JS-only, invisible to crawlers).
16. **Consolidate fragmented identification pages into one `/identify` hub.** High-volume terms like "gemstone identification chart" map to nothing today.
17. **Add explicit FGA/Gem-A positioning statement** to `/learn` hub and About — clarifies which exam syllabus the curriculum maps to.
18. **Question-format H2s** in learn articles ("What is birefringence?") — AI Overviews match heading text against natural-language query form.

## Quick wins (today, under 1 hour total)

- One-line filter change in `astro.config.mjs` (removes 442 junk URLs from sitemap).
- Add `<meta name="robots" content="noindex">` to two OG template files.
- Add `<h1>` and a 50-word intro paragraph to `/tools/index.astro`.

These three changes alone should noticeably lift gemmology.dev's classical-search visibility within 1–2 crawl cycles, because the sitemap signal-to-noise ratio doubles overnight.
86 changes: 86 additions & 0 deletions audits/seo-2026-05/content.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
# Content & E-E-A-T Audit — gemmology.dev vs knowledge.gemmology.dev
**Audited:** 2026-05-11 | **Auditor role:** Content Quality / Google QRG Sept 2025

---

## TL;DR

knowledge.gemmology.dev serves flat Markdown files that Googlebot reads in a single HTTP request. gemmology.dev serves the same information better — but buries it inside client:load React islands that Googlebot either skips or defers. The ranking gap is not an E-E-A-T gap; it is a crawlability gap masquerading as one.

- **Content quality score (gemmology.dev /learn/):** 78/100
- **Content quality score (gemmology.dev /tools/):** 31/100
- **E-E-A-T overall (gemmology.dev):** 61/100 — credible but anonymous

---

## Why knowledge.gemmology.dev wins on content

1. **Full prose at first byte.** Every Markdown file delivers 300–800 words of body text, structured tables, and callouts in the initial HTML response. No JavaScript required.
2. **Consistent heading hierarchy.** Each doc has one H1, logical H2 sections (RI, Birefringence, Pleochroism, Dispersion), and inline code examples. This is exactly what Google's citation pipeline extracts for featured snippets and AI Overviews.
3. **Quotable facts in plain text.** Sentences like "High birefringence causes visible doubling of back facet edges — diagnostic for zircon, sphene, and peridot" are extractable verbatim. Google rewards prose that answers a question in one sentence.
4. **No duplicate-content risk.** Each doc covers one topic; canonical structure is implicit.

---

## gemmology.dev content gaps

### P0: Thin client-rendered pages

Every page under `/tools/` pre-renders exactly two elements: one H1 and one `<p>` description sentence (confirmed in `src/pages/tools/measurement.astro`). The React island (`<MeasurementTools client:load />`) loads after JavaScript executes. Googlebot's first-wave crawl sees 22 words of indexable content on a page that actually delivers 8 calculators and a reference table.

The same pattern applies to `/tools/optical`, `/tools/lab`, `/tools/identification`, `/tools/advanced`, `/tools/conversions`, `/quiz`, and `/playground`. Combined these are roughly 50 URLs with sub-50-word pre-rendered bodies.

**Impact:** Google's quality systems classify these as thin pages. Because they share the same domain as the /learn/ content, they dilute the site's topical authority signal across the whole property.

### P1: Missing E-E-A-T signals

The `/about` page covers editorial standards and source citations well (FGA alignment, Anderson/Webster, GIA journal, Mindat.org). However:

- **No named author or reviewer.** Every learn article and mineral page omits a byline. The about page refers to "the gemmology-dev open-source project" — a GitHub org, not a person. Google's QRG explicitly requires a named, credentialled individual for YMYL-adjacent educational content.
- **No author schema.** `MineralSchema.astro` and `StructuredData.astro` emit `WebPage` and `BreadcrumbList` but no `author` or `Person` node.
- **dateModified is present on /learn/ pages** (confirmed in project notes) — this is a genuine positive signal that knowledge.gemmology.dev lacks.
- **No "last reviewed" visible text on tool pages.** It exists on learn articles; it must also appear on tool description sections.

E-E-A-T factor scores (gemmology.dev):

| Factor | Score | Weight | Weighted |
|--------|-------|--------|---------|
| Experience | 50/100 | 20% | 10 |
| Expertise | 65/100 | 25% | 16 |
| Authoritativeness | 55/100 | 25% | 14 |
| Trustworthiness | 70/100 | 30% | 21 |
| **Total** | | | **61/100** |

### P2: Missing topical hubs

`/learn/index.astro` renders a card grid of all 139 articles grouped by the 8 study categories. This is a strong internal hub. However there is no equivalent hub page for tools — `/tools/` links to 6 category pages but contains no prose explaining what gemmological measurement is or why each category matters. A 400-word hub introduction per category page would fix this.

---

## Page-by-page recommendations

| Page | Current pre-rendered words | Min for page type | Action |
|------|---------------------------|------------------|--------|
| `/tools/measurement` | ~22 | 500 (service) | Add 500-word SSR intro: what SG and RI measure, when to use each tool |
| `/tools/optical` | ~20 | 500 | Add prose: polariscope vs dichroscope workflow, when optic sign matters |
| `/tools/lab` | ~18 | 500 | Add prose: how spectroscope absorption bands are read, UV safety note |
| `/tools/identification` | ~15 | 500 | Add prose: systematic identification sequence, decision logic |
| `/tools/advanced` | ~20 | 500 | Add prose: treatment detection methodology, GIA proportion grading |
| `/tools/conversions` | ~12 | 300 | Add 1-paragraph context: metric/troy/decimal carat system history |
| `/quiz` | ~30 | 300 | Add description of FGA exam alignment and question categories |
| `/about` | ~350 | 500 | Add named contributor(s) with credentials; add `Person` schema |
| `/minerals/[slug]` | SSG, rich | meets threshold | Add `author`/`reviewer` schema node per mineral page |

---

## 5 example passages to add to tool pages

**measurement.astro intro:** Specific gravity and refractive index are the two primary quantitative tests in coloured-gemstone identification. SG is determined hydrostatically using Archimedes' principle — the gem is weighed in air and in water, and the ratio reveals density to two decimal places, enough to separate spinel (3.60) from synthetic spinel (3.52) or glass fills. RI is read directly from a critical-angle refractometer and narrows identification to a handful of species within seconds.

**optical.astro intro:** The polariscope and dichroscope answer different questions. The polariscope tests whether a stone is singly or doubly refractive, which separates isotropic species (diamond, spinel, garnet) from all others at a glance. The dichroscope reveals how many distinct body colours a stone shows in different vibration directions — a tanzanite shows blue, violet, and bronze, while a synthetic blue spinel shows only one colour. Neither instrument requires a prepared surface, making them ideal first-pass tests.

**lab.astro intro:** The spectroscope records which wavelengths of visible light a gem absorbs. Strong selective absorption bands are diagnostic: the 693 nm doublet in ruby, the 450 nm band in synthetic blue spinel, the 415 nm line in cape diamonds. UV fluorescence adds a complementary data point — a strong blue LWUV reaction in diamond, absent under SWUV, points away from moissanite or CZ without any contact with the stone.

**identification.astro intro:** Systematic gem identification follows a fixed sequence to avoid confirmation bias. Start with non-destructive observations — colour, transparency, lustre — then move to RI, then SG, then spectroscope if the RI falls in an ambiguous range. Only after these quantitative steps should qualitative tests (Chelsea filter, fluorescence) be applied to confirm or refute a working hypothesis. This sequence matches the FGA Diploma practical examination protocol.

**advanced.astro intro:** Treatment detection requires correlating multiple lines of evidence. Heat treatment in corundum leaves characteristic stress fractures around rutile inclusions and healed fingerprints, but not all heated stones show these features. The treatment wizard on this page assigns positive or negative evidence weights to each observable clue and surfaces a confidence-banded conclusion — high confidence when three or more independent indicators align, low confidence when evidence is mixed or a single clue stands alone.
Loading
Loading