Skip to content

Enhance and optimize image extraction functionality#47

Merged
sumitsahoo merged 2 commits intomainfrom
dev
Apr 12, 2026
Merged

Enhance and optimize image extraction functionality#47
sumitsahoo merged 2 commits intomainfrom
dev

Conversation

@sumitsahoo
Copy link
Copy Markdown
Owner

This pull request significantly refactors and enhances the PDF image extraction tool in src/tools/ExtractImages.tsx. The main improvements include more robust image extraction logic, better memory management, and a clearer user interface for selecting images. The extraction process now supports additional PDF.js image types and handles image data more reliably.

PDF image extraction improvements:

  • Added PdfjsImageData interface and helper functions (fetchNamedImage, paintImageToCanvas) to robustly handle various image representations from PDF.js, including support for both named and inline image objects, and correct handling of different pixel formats (RGBA, RGB, ImageBitmap, and src URLs). [1] [2]
  • Reworked the extraction loop to use a single reusable canvas for all images, improving performance and memory usage. Also, added explicit cleanup for canvases after extraction and per-page cleanup with page.cleanup(). [1] [2] [3]

User interface and usability:

  • Changed the default selection behavior so that no images are selected after extraction, requiring users to explicitly select images to download.
  • Replaced the previous toggle-all selection button with separate "Select all" and "Clear" buttons, each with appropriate icons, for more intuitive multi-selection control. [1] [2]
  • Optimized the calculation of the total selected image size by memoizing the computation with useMemo.

These changes make the tool more reliable, efficient, and user-friendly.

Copilot AI review requested due to automatic review settings April 12, 2026 15:08
@sumitsahoo sumitsahoo merged commit af0f176 into main Apr 12, 2026
6 checks passed
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the PDF image extraction tool to improve robustness (supporting more PDF.js image representations), reduce memory churn during extraction, and make image selection/download UX clearer.

Changes:

  • Adds PDF.js image-handling helpers to extract both named and inline images and paint them reliably to a canvas (RGBA/RGB/ImageBitmap/src).
  • Reworks extraction to reuse canvases across images/pages and performs explicit cleanup after each page and at the end.
  • Updates the UI selection model (default: none selected; new “Select all” / “Clear” controls) and memoizes total selected size.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants