Default Latin table glyph-faithful to EN 300 468 Figure A.1#2
Merged
Conversation
…e A.1 Verified against the vendored PDF (specs/etsi_en_300_468_v01.19.01_dvb_si.pdf, Annex A Figure A.1, p. 159 — 'Character code table 00 - Latin alphabet with Unicode equivalents'), which resolves the audit dispute: - 0xA8 = U+00A4 currency sign — existing mapping CONFIRMED correct (the auditor's 'diaeresis' claim conflated 0xA8 with combining prefix 0xC8) - 0xA4 = U+20AC € — the actual bug in that pair (DVB superset addition; was decoding as ¤) Full GR-area rewrite (the old Latin-1 fallback was wrong across the A/B/D/E/F rows — quotes, arrows, ×/÷, ™/♪, fractions, Ø/Œ/Þ/ŧ/ŋ/SHY…): - iso_6937_single: exhaustive per-byte table with Unicode codepoints, undefined (grey) positions → U+FFFD - combining_mark + extended combine(): full non-spacing row (grave, acute, circumflex, tilde, macron, breve, dot, diaeresis, ring, cedilla, double acute, ogonek, caron) with precomposed forms; unmatched pairs emit base + Unicode combining mark (canonically equivalent); undefined prefixes 0xC0/0xC9/0xCC and dangling prefixes → U+FFFD TDD: figure_a1_* tests pin every defined GR position to its Figure A.1 codepoint (written first, RED, then implemented). Docs: Figure A.1 hand-transcribed into dvb-si/docs/en_300_468.md with PDF page cite + verbatim superset note; README + CHANGELOG updated. All tests pass (stable + MSRV 1.75), clippy -D warnings clean.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Resolves the deferred audit finding on
text/mod.rs0xA8 by reading the vendored PDF directly (Annex A Figure A.1, p. 159, V1.19.1 — the 2025 edition includes Unicode equivalents in the figure).Verdict on the disputed byte: 0xA8 = ¤ U+00A4 (existing code was right; auditor conflated it with combining 0xC8 diaeresis). The real bug was 0xA4 → € U+20AC (DVB's superset addition).
Scope: full GR-area rewrite — the old
other as charLatin-1 fallback was wrong across the A/B/D/E/F rows. Full non-spacing diacritic row with precomposed forms + base-plus-combining-mark fallback.figure_a1_*tests pin every defined position. Figure A.1 hand-transcribed intodocs/en_300_468.mdwith proper citation.🤖 Generated with Claude Code