Skip to content

feat(markdown): font-independent Unicode glyph mapping#24

Merged
frederikbeimgraben merged 1 commit into
mainfrom
feat/unicode-glyph-mapping
Jun 4, 2026
Merged

feat(markdown): font-independent Unicode glyph mapping#24
frederikbeimgraben merged 1 commit into
mainfrom
feat/unicode-glyph-mapping

Conversation

@frederikbeimgraben
Copy link
Copy Markdown
Owner

Summary

Replaces the single hard-coded euro fix in the Markdown converter with a
data-driven table mapping non-renderable Unicode characters to
font-independent TeX nodes. tectonic (XeTeX) does no font fallback, so a code
point the bundled DIN text font lacks would otherwise render as a blank "tofu"
box.

Mapping table (pytex_markdown/glyphs.py)

Char Target Package
\euro{} eurosym
$\rightarrow$ — (base TeX)
$\leftrightarrow$
$\leq$
$\geq$
· $\cdot$
  • The arrow targets match the existing ASCII-arrow rewrites, so and ->
    typeset identically.
  • · maps to the math \cdot (multiplication dot) rather than the
    font-dependent \textperiodcentered, which would itself tofu under DIN.
  • _prose() now splits prose generically over the table (functional, genexp
    style); the euro is the first entry, no longer a special case. Code
    spans/blocks stay verbatim.

Trust levels

eurosym added to the UNTRUSTED/SANDBOXED package allowlist (_policy.py),
so a in untrusted Markdown renders instead of being rejected with a
TrustError. The math targets pull no package.

Missing-glyph handling

A character that is neither mapped nor present in every bundled DIN
weight is genuinely unrenderable → replaced by a \texttt{[missing glyph]}
placeholder and a MissingGlyphWarning naming the char + U+XXXX, instead
of silent tofu. DIN coverage is read by a zero-dependency cmap parser over the
bundled fonts; the rule is conservative (renderable only if present in every
weight).

Tests

  • Per-char mapping (€ → ↔ ≤ ≥ ·).
  • UNTRUSTED and SANDBOXED render of all mapped chars — no TrustError,
    no network/compile.
  • Missing glyph → [missing glyph] placeholder + warning.
  • Euro regression (existing test_euro.py stays green).
  • Code spans untouched; renderable chars (umlauts, dashes) pass through.

Status

  • Full suite: 859 passed, 2 skipped.
  • basedpyright src: 0 errors, 0 warnings from changed files.
  • ruff format --check + ruff check: clean.

Wiki updated (Markdown-to-PDF "Unicode glyph handling" section, Blob-API
allowlist note).

🤖 Generated with Claude Code

Replace the single hard-coded euro fix with a data-driven table mapping
non-renderable Unicode characters to font-independent TeX nodes, so they
no longer tofu under the DIN text font (tectonic/XeTeX does no font
fallback).

- Mapping table (pytex_markdown/glyphs.py): € -> eurosym \euro{};
  → ↔ ≤ ≥ · -> inline-math \rightarrow \leftrightarrow \leq \geq \cdot.
  The arrow targets match the existing ASCII-arrow rewrites; · maps to
  the math \cdot rather than the font-dependent \textperiodcentered.
- _prose() now splits prose generically over the table (the euro is the
  first entry, no longer a special case); code spans stay verbatim.
- Genuine missing glyph (unmapped AND absent from any bundled DIN weight)
  -> \texttt{[missing glyph]} placeholder + MissingGlyphWarning naming the
  char and U+XXXX, instead of silent tofu. DIN coverage is read by a
  zero-dependency cmap parser over the bundled fonts; the rule is
  conservative (renderable only if present in every weight).
- Allowlist eurosym for UNTRUSTED/SANDBOXED so untrusted € renders
  instead of raising TrustError.

Tests: per-char mapping, untrusted+sandboxed render of all mapped chars,
missing-glyph placeholder+warning, euro regression, code-span untouched.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@frederikbeimgraben frederikbeimgraben merged commit 6c2a005 into main Jun 4, 2026
1 check passed
@frederikbeimgraben frederikbeimgraben deleted the feat/unicode-glyph-mapping branch June 4, 2026 14:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant