Skip to content

fix: Make heading IDs CSS-safe while preserving Unicode text#92

Merged
dereuromark merged 2 commits intomasterfrom
fix-improve-normalizeid-sanitization
Mar 10, 2026
Merged

fix: Make heading IDs CSS-safe while preserving Unicode text#92
dereuromark merged 2 commits intomasterfrom
fix-improve-normalizeid-sanitization

Conversation

@josbeir
Copy link
Contributor

@josbeir josbeir commented Mar 10, 2026

Summary

This PR hardens heading ID normalization to avoid invalid CSS selectors, while preserving Unicode heading text (for example Japanese/Cyrillic headings).

Changes

  1. Updated HeadingIdTracker::normalizeId() in HeadingIdTracker.php
  • Strip #
  • Trim whitespace
  • Convert whitespace runs to -
  • Replace invalid selector characters with - using Unicode-aware regex:
    • /[^\p{L}\p{N}_-]+/u
  • Collapse repeated dashes
  • Trim leading/trailing dashes
  • Fallback to heading when result is empty
  1. Updated tests
  • HeadingIdTrackerTest.php
    • Added coverage for:
      • special-character input from inline-code-like heading text
      • dash collapsing
      • Unicode preservation (日本語の見出し)
      • empty-result fallback (### -> heading)
  • TableOfContentsExtensionTest.php
    • Updated inline math TOC ID expectation:
      • Equation-E=mc^2 -> Equation-E-mc-2

Why

Some consumers (for example HTMX scroll restoration via querySelector) require selector-safe IDs. The previous logic could emit unsafe IDs for punctuation-heavy headings, and the first revision over-sanitized non-ASCII headings. This change keeps IDs safe and internationalization-friendly.

Copilot AI review requested due to automatic review settings March 10, 2026 12:28
@josbeir josbeir force-pushed the fix-improve-normalizeid-sanitization branch from 172aec2 to a01b086 Compare March 10, 2026 12:30
@codecov
Copy link

codecov bot commented Mar 10, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 93.55%. Comparing base (2dde660) to head (08f0a3f).
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff            @@
##             master      #92   +/-   ##
=========================================
  Coverage     93.54%   93.55%           
- Complexity     2312     2315    +3     
=========================================
  Files            79       79           
  Lines          6107     6113    +6     
=========================================
+ Hits           5713     5719    +6     
  Misses          394      394           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates heading ID normalization to produce CSS-selector-safe IDs while preserving non-ASCII/Unicode heading text, and aligns related TOC expectations in tests.

Changes:

  • Harden HeadingIdTracker::normalizeId() by trimming, whitespace-to-dash normalization, Unicode-aware character filtering, dash collapsing, and empty-result fallback.
  • Add/adjust PHPUnit coverage for punctuation-heavy headings, dash collapsing, Unicode headings, and empty-result fallback.
  • Update TOC test expectation for inline-math-derived heading IDs.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File Description
src/Renderer/HeadingIdTracker.php Implements stricter, Unicode-aware heading ID normalization intended to be safe for CSS selectors / querySelector.
tests/TestCase/Renderer/HeadingIdTrackerTest.php Adds new normalization assertions (special chars, dash collapsing, Unicode preservation, empty fallback).
tests/TestCase/Extension/TableOfContentsExtensionTest.php Updates TOC ID expectation for inline math to match new normalization rules.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@dereuromark dereuromark merged commit f8f1a4c into master Mar 10, 2026
6 checks passed
@dereuromark dereuromark deleted the fix-improve-normalizeid-sanitization branch March 10, 2026 13:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants