Skip to content

fix(deps): resolve pdfjs-dist version mismatch breaking OCR fallback#17

Merged
willgriffin merged 1 commit intomainfrom
fix/pdfjs-dist-version-mismatch
Dec 13, 2025
Merged

fix(deps): resolve pdfjs-dist version mismatch breaking OCR fallback#17
willgriffin merged 1 commit intomainfrom
fix/pdfjs-dist-version-mismatch

Conversation

@willgriffin
Copy link
Copy Markdown
Contributor

Summary

Fixes the pdfjs-dist version mismatch that broke OCR fallback for scanned PDFs.

  • Root cause: pdf-to-png-converter used pdfjs-dist 5.4.449, but unpdf bundles 5.4.296 internally, causing version conflict errors
  • Fix: Pin pdfjs-dist to 5.4.296 via pnpm override to match unpdf's bundled version
  • Tests: Fixed tests that previously passed while OCR was broken - now they assert meaningful text extraction

Changes

  • Pin pdfjs-dist to 5.4.296 via pnpm override
  • Downgrade pdf-to-png-converter to ^3.10.0 (compatible with 5.4.296)
  • Fix extraction tests to assert OCR actually extracts >100 chars containing "bentley"
  • Add Renovate config to group PDF deps and require approval for updates

Test plan

  • pnpm test passes locally (41 tests)
  • OCR now extracts 2631 chars from scanned PDF (was 4 chars before)
  • CI tests pass

Closes #16

The API version from pdf-to-png-converter (5.4.449) did not match
the worker version bundled in unpdf (5.4.296), causing OCR fallback
to fail for scanned PDFs.

Changes:
- Pin pdfjs-dist to 5.4.296 via pnpm override to match unpdf bundle
- Downgrade pdf-to-png-converter to ^3.10.0 (compatible with 5.4.296)
- Fix tests to actually assert OCR extracts meaningful text
- Group PDF deps in Renovate and require approval for updates

Closes #16
@willgriffin willgriffin merged commit 079647a into main Dec 13, 2025
2 checks passed
@willgriffin willgriffin deleted the fix/pdfjs-dist-version-mismatch branch December 13, 2025 21:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

pdfjs-dist version mismatch breaks OCR fallback for scanned PDFs

1 participant