feat: add --extract support for local PDF files#203
Merged
steipete merged 2 commits intosteipete:mainfrom Apr 26, 2026
Merged
Conversation
Extend the --extract flag to accept local PDF files, routing them through the existing markitdown extraction path (same as remote PDF URLs). Previously, --extract on a local file was only supported for audio/video media files. - Update validation in runner-plan.ts to allow .pdf extensions - Add PDF extract path in runner-execution.ts before handleFileInput - Add tests covering happy path, non-PDF rejection, and missing uvx Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add isPdfExtension() helper to input.ts (replaces duplicated .endsWith checks) - Add MAX_PDF_EXTRACT_BYTES constant (500 MB) to raise the 50 MB limit - Add progress spinner around loadLocalAsset + extractAssetContent - Wrap test temp dirs in try/finally for cleanup - Track execFileMock calls to verify local PDF branch is exercised Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
6 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
--extractto accept local PDF files, routing them through the existing markitdown extraction path (same as remote PDF URLs)runner-plan.tsto allow.pdfextensions alongside audio/videoisPdfExtension()helper tosrc/run/flows/asset/input.tsalongsideisTranscribableExtensionMAX_PDF_EXTRACT_BYTESconstant (500 MB) to raise the default 50 MB limit for PDF extractionBehaviour
Note: image-based (scanned) PDFs with no embedded text layer will produce empty output from markitdown — this is a markitdown limitation, not a bug in this change. OCR plugin support would be a separate feature.
Test plan
pnpm test tests/cli.asset.local-pdf-extract.test.ts— 3 new tests passpnpm test— no regressions in existing tests🤖 Generated with Claude Code