feat(markdown): font-independent Unicode glyph mapping#24
Merged
Conversation
Replace the single hard-coded euro fix with a data-driven table mapping
non-renderable Unicode characters to font-independent TeX nodes, so they
no longer tofu under the DIN text font (tectonic/XeTeX does no font
fallback).
- Mapping table (pytex_markdown/glyphs.py): € -> eurosym \euro{};
→ ↔ ≤ ≥ · -> inline-math \rightarrow \leftrightarrow \leq \geq \cdot.
The arrow targets match the existing ASCII-arrow rewrites; · maps to
the math \cdot rather than the font-dependent \textperiodcentered.
- _prose() now splits prose generically over the table (the euro is the
first entry, no longer a special case); code spans stay verbatim.
- Genuine missing glyph (unmapped AND absent from any bundled DIN weight)
-> \texttt{[missing glyph]} placeholder + MissingGlyphWarning naming the
char and U+XXXX, instead of silent tofu. DIN coverage is read by a
zero-dependency cmap parser over the bundled fonts; the rule is
conservative (renderable only if present in every weight).
- Allowlist eurosym for UNTRUSTED/SANDBOXED so untrusted € renders
instead of raising TrustError.
Tests: per-char mapping, untrusted+sandboxed render of all mapped chars,
missing-glyph placeholder+warning, euro regression, code-span untouched.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Replaces the single hard-coded euro fix in the Markdown converter with a
data-driven table mapping non-renderable Unicode characters to
font-independent TeX nodes. tectonic (XeTeX) does no font fallback, so a code
point the bundled DIN text font lacks would otherwise render as a blank "tofu"
box.
Mapping table (
pytex_markdown/glyphs.py)€\euro{}eurosym→$\rightarrow$↔$\leftrightarrow$≤$\leq$≥$\geq$·$\cdot$→and->typeset identically.
·maps to the math\cdot(multiplication dot) rather than thefont-dependent
\textperiodcentered, which would itself tofu under DIN._prose()now splits prose generically over the table (functional, genexpstyle); the euro is the first entry, no longer a special case. Code
spans/blocks stay verbatim.
Trust levels
eurosymadded to theUNTRUSTED/SANDBOXEDpackage allowlist (_policy.py),so a
€in untrusted Markdown renders instead of being rejected with aTrustError. The math targets pull no package.Missing-glyph handling
A character that is neither mapped nor present in every bundled DIN
weight is genuinely unrenderable → replaced by a
\texttt{[missing glyph]}placeholder and a
MissingGlyphWarningnaming the char +U+XXXX, insteadof silent tofu. DIN coverage is read by a zero-dependency
cmapparser over thebundled fonts; the rule is conservative (renderable only if present in every
weight).
Tests
€ → ↔ ≤ ≥ ·).UNTRUSTEDandSANDBOXEDrender of all mapped chars — noTrustError,no network/compile.
[missing glyph]placeholder + warning.test_euro.pystays green).Status
basedpyright src: 0 errors, 0 warnings from changed files.ruff format --check+ruff check: clean.Wiki updated (Markdown-to-PDF "Unicode glyph handling" section, Blob-API
allowlist note).
🤖 Generated with Claude Code