Skip to content

fix: tolerate startxref offset near xref keyword#810

Closed
vitormattos wants to merge 2 commits into
smalot:masterfrom
vitormattos:fix/metadata-stream-invalid-object-reference
Closed

fix: tolerate startxref offset near xref keyword#810
vitormattos wants to merge 2 commits into
smalot:masterfrom
vitormattos:fix/metadata-stream-invalid-object-reference

Conversation

@vitormattos
Copy link
Copy Markdown

Summary

Tolerate startxref offsets that land on whitespace immediately before the xref keyword, or one byte into xref.

Why

Some malformed PDFs do not point startxref exactly at the beginning of xref, even though the cross-reference table is still recoverable.

Without this tolerance, parsing can fail with Invalid object reference for $obj. on files that should still be recoverable.

Validation

Validated against representative regressions from the corpora used in the larger PDF.js integration work:

  • PDF.js: outlines_for_editor.pdf
  • VeraPDF: veraPDF test suite 6-6-2-3-2-t01-pass-c.pdf
  • VeraPDF: veraPDF test suite 6-1-2-t01-fail-a.pdf

Notes

This is a focused fix and can be reviewed independently from the aggregate integration PR.

Signed-off-by: Vitor Mattos <1079143+vitormattos@users.noreply.github.com>
Signed-off-by: Vitor Mattos <1079143+vitormattos@users.noreply.github.com>
@vitormattos
Copy link
Copy Markdown
Author

Superseded by the consolidation chain in the fork: vitormattos#36\n\nThis includes PR810 parser fix and regression coverage, integrated on top of PR808 stack with fixture/test consolidation.

@vitormattos vitormattos deleted the fix/metadata-stream-invalid-object-reference branch April 27, 2026 17:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant