Skip to content

Fix caption used as fallback alt text in PDF/UA#14142

Merged
gordonwoodhull merged 2 commits intomainfrom
bugfix/14107
Mar 3, 2026
Merged

Fix caption used as fallback alt text in PDF/UA#14142
gordonwoodhull merged 2 commits intomainfrom
bugfix/14107

Conversation

@gordonwoodhull
Copy link
Contributor

Summary

Removes the anti-pattern where figure captions are copied into image alt text for PDF/UA compliance. Captions describe a figure's significance in context; alt text describes what the image looks like. Using one as the other is an accessibility anti-pattern that merely silences validators without helping screen reader users.

This was introduced in the Jan 2026 PDF/UA work (commits a867c3c24 and ba75b374f) to satisfy PDF/UA validators, which require every <Figure> structure element to have an /Alt string.

Fixes #14107

What changed

LaTeX (latex.lua): Removed 3 caption-as-alt blocks that copied image.caption into image.attributes["alt"] before clearing the caption for separate rendering.

Typst (pandoc3_figure.lua, floatreftarget.lua, typst.lua): Used a marker attribute (_quarto_no_caption_alt) to prevent the caption-as-alt fallback for figure images, while preserving it for inline images where image.caption IS the standard markdown alt text (![alt text](img.svg) in running text).

The distinction matters because of how Pandoc 3 represents alt text in its AST.

Pandoc 3 AST analysis

In Pandoc 3, {alt="text"} on an image does not set image.attributes["alt"] — it replaces image.caption with the alt value. The ![visible caption] text moves to figure.caption.long, while image.caption becomes the alt text. This means image.caption serves double duty:

  • Inline images: image.caption IS the alt text (the ![...] content is alt by the HTML spec)
  • Figure images: image.caption is a copy of the visible caption (unless overridden by {alt="..."})

To distinguish these cases, we compare image.caption to the figure's visible caption (figure.caption.long or float.caption_long). If they match, no explicit alt was provided — we set _quarto_no_caption_alt to suppress the fallback. If they differ, {alt="..."} was used and image.caption IS the explicit alt text.

This is the same heuristic Pandoc's own Markdown writer uses (confirmed via Pandoc source — it compares image alt to figure caption to decide whether to emit {alt="..."}). Pandoc itself acknowledges the ambiguity: the AST cannot distinguish "caption that happens to match alt" from "no explicit alt provided." The recommended workaround is Quarto's fig-alt attribute, which flows through a separate, unambiguous path.

How fig-alt vs alt work

  • {alt="text"}: Pandoc replaces image.caption, no separate attribute. Works but ambiguous when caption and alt are intentionally identical.
  • {fig-alt="text"}: Quarto stores as image.attributes["fig-alt"], propagated explicitly to \includegraphics[alt=...] (LaTeX) or image(alt: "...") (Typst). Always unambiguous. This is the recommended approach.

Test plan

  • caption-not-alt-ua.qmd — verifies caption is NOT copied to \includegraphics[alt=] and that Quarto warns about missing alt text
  • typst-image-alt-text.qmd — TC1 moved to must-not-match; TC8 added for fig-alt
  • ua-image-alt-text.qmd — now uses explicit fig-alt
  • All 39 pdf-standard tests pass

Remove the caption-as-alt fallback introduced in the PDF/UA compliance
work (a867c3c, ba75b37). Using captions as alt text is an
accessibility anti-pattern — captions describe a figure's significance
in context while alt text describes what the image looks like.

LaTeX: remove 3 caption-as-alt blocks in latex.lua, and add fig-alt to
alt conversion in pandoc3_figure.lua for Pandoc 3 Figures without
cross-ref labels.

Typst: mark figure images with _quarto_no_caption_alt so that the
caption-as-alt fallback in typst.lua only fires for inline images
(where image.caption IS the standard markdown alt text).

Key insight: In Pandoc 3, {alt="text"} replaces the Image's caption
content rather than populating image.attributes["alt"]. So
image.caption serves double duty as both visible caption and alt text
override. We distinguish the two cases by comparing image.caption to
figure.caption — when they match, the caption was NOT overridden (the
bug case we suppress); when they differ, an explicit {alt="..."} was
provided (which we preserve). This is the same heuristic Pandoc's own
Markdown writer uses when round-tripping Figures.

Explicit fig-alt (Quarto's dedicated attribute) flows through a
completely separate path and always works unambiguously.

Fixes #14107
- Fix regex in parse-error.ts to handle tagpdf line continuations when
  filenames are long (the warning wraps across multiple (tagpdf) lines)
- Update warning message to recommend fig-alt instead of caption-as-alt
- Add printsMessage check to caption-not-alt-ua test to verify the
  missing alt text warning is surfaced
- Use labeled figure in caption-not-alt-ua test to avoid unrelated UA-2
  structural nesting issue with unlabeled figures
- Add ua2-unlabeled-figure-caption test documenting known LaTeX/tagpdf
  limitation where unlabeled captioned figures produce <Caption> directly
  under <Document> instead of inside a grouping element
@gordonwoodhull
Copy link
Contributor Author

Second commit: Fix tagpdf warning parsing and document UA-2 structural issue

The second commit (32d1172ab) addresses two things discovered while testing:

1. tagpdf warning regex fix (parse-error.ts)

The regex that parses tagpdf's "Alternative text for graphic is missing" warning assumed the filename and instead. appeared on a single (tagpdf) continuation line. When filenames are long, tagpdf wraps across multiple lines:

Package tagpdf Warning: Alternative text for graphic is missing.
(tagpdf)                Using 'caption-not-alt-ua_files/mediabag/penrose.pdf'
(tagpdf)                instead.

Fixed the regex to allow an optional (tagpdf) line break before instead.. Also updated the warning message to recommend fig-alt instead of the old ![alt text](image.png) advice (which was itself the caption-as-alt anti-pattern).

2. UA-2 structural issue with unlabeled captioned figures (ua2-unlabeled-figure-caption.qmd)

Discovered that unlabeled captioned figures (![caption](img.svg) without {#fig-label}) produce invalid UA-2 structure. These go through pandoc3_figure.lualatexImageFigure() → bare \begin{figure}[H], and tagpdf places <Caption> as a sibling of <Figure> directly under <Document>:

/Document
  /Caption    ← UA-2 violation: must be inside a grouping element
  /Figure     ← orphaned from its caption

Labeled figures ({#fig-label}) go through FloatRefTarget, which wraps in a \Div that provides the grouping context:

/Document
  /Div
    /Caption  ← properly nested ✓
    /Figure

This is a pre-existing LaTeX/tagpdf limitation, not caused by our changes. Added ua2-unlabeled-figure-caption.qmd to document it with the expected verapdf validation warning, and updated caption-not-alt-ua.qmd to use a labeled figure to avoid conflating the two issues.

@gordonwoodhull gordonwoodhull added this to the v1.9 milestone Mar 2, 2026
@posit-snyk-bot
Copy link
Collaborator

posit-snyk-bot commented Mar 2, 2026

Snyk checks have passed. No issues have been found so far.

Status Scanner Critical High Medium Low Total (0)
Open Source Security 0 0 0 0 0 issues
Licenses 0 0 0 0 0 issues

💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse.

@gordonwoodhull gordonwoodhull merged commit 4127c26 into main Mar 3, 2026
89 of 93 checks passed
@gordonwoodhull gordonwoodhull deleted the bugfix/14107 branch March 3, 2026 03:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Caption should not be used as fallback alt text for images in PDF/Typst

2 participants