Skip to content

Default Latin table glyph-faithful to EN 300 468 Figure A.1#2

Merged
fishloa merged 1 commit into
mainfrom
text-figure-a1
Jun 4, 2026
Merged

Default Latin table glyph-faithful to EN 300 468 Figure A.1#2
fishloa merged 1 commit into
mainfrom
text-figure-a1

Conversation

@fishloa
Copy link
Copy Markdown
Owner

@fishloa fishloa commented Jun 4, 2026

Resolves the deferred audit finding on text/mod.rs 0xA8 by reading the vendored PDF directly (Annex A Figure A.1, p. 159, V1.19.1 — the 2025 edition includes Unicode equivalents in the figure).

Verdict on the disputed byte: 0xA8 = ¤ U+00A4 (existing code was right; auditor conflated it with combining 0xC8 diaeresis). The real bug was 0xA4 → € U+20AC (DVB's superset addition).

Scope: full GR-area rewrite — the old other as char Latin-1 fallback was wrong across the A/B/D/E/F rows. Full non-spacing diacritic row with precomposed forms + base-plus-combining-mark fallback. figure_a1_* tests pin every defined position. Figure A.1 hand-transcribed into docs/en_300_468.md with proper citation.

🤖 Generated with Claude Code

…e A.1

Verified against the vendored PDF (specs/etsi_en_300_468_v01.19.01_dvb_si.pdf,
Annex A Figure A.1, p. 159 — 'Character code table 00 - Latin alphabet with
Unicode equivalents'), which resolves the audit dispute:

- 0xA8 = U+00A4 currency sign — existing mapping CONFIRMED correct (the
  auditor's 'diaeresis' claim conflated 0xA8 with combining prefix 0xC8)
- 0xA4 = U+20AC € — the actual bug in that pair (DVB superset addition;
  was decoding as ¤)

Full GR-area rewrite (the old Latin-1 fallback was wrong across the
A/B/D/E/F rows — quotes, arrows, ×/÷, ™/♪, fractions, Ø/Œ/Þ/ŧ/ŋ/SHY…):

- iso_6937_single: exhaustive per-byte table with Unicode codepoints,
  undefined (grey) positions → U+FFFD
- combining_mark + extended combine(): full non-spacing row (grave, acute,
  circumflex, tilde, macron, breve, dot, diaeresis, ring, cedilla, double
  acute, ogonek, caron) with precomposed forms; unmatched pairs emit
  base + Unicode combining mark (canonically equivalent); undefined
  prefixes 0xC0/0xC9/0xCC and dangling prefixes → U+FFFD

TDD: figure_a1_* tests pin every defined GR position to its Figure A.1
codepoint (written first, RED, then implemented).

Docs: Figure A.1 hand-transcribed into dvb-si/docs/en_300_468.md with PDF
page cite + verbatim superset note; README + CHANGELOG updated.

All tests pass (stable + MSRV 1.75), clippy -D warnings clean.
@fishloa fishloa merged commit 9f1474f into main Jun 4, 2026
4 checks passed
@fishloa fishloa deleted the text-figure-a1 branch June 4, 2026 06:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant