-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: add decode_as_image() to ContentStreams #2615
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #2615 +/- ##
=======================================
Coverage 95.13% 95.14%
=======================================
Files 51 51
Lines 8538 8547 +9
Branches 1702 1703 +1
=======================================
+ Hits 8123 8132 +9
Misses 261 261
Partials 154 154 ☔ View full report in Codecov by Sentry. |
Should we really expect the users to basically call Additionally, what happens when it is no image? We log a warning, but is there an exception as well due to invalid image data? If yes, why both? |
Why strange. This offers a way to get the image from an stream where images are present but not part of the images (such as the use in pattern as provided in B2.pdf, but also in annotations)
I thought about this and my concern is that this may hide some actual issues. I've completed the annotation |
I am still not sure whether we can really expect the user to examine every content stream for a possible image. Personally, I would prefer a clean solution, thus I am going to leave this PR open for further discussion. |
I've reviewed quickly the PDF 1.7 spec, and there is many objects not part of the current At least providing a function to ease extraction of images for other developers should be an improvements |
In this case, could you please fix the merge conflicts and add some basic example to the docs? |
Co-authored-by: Stefan <96178532+stefan6419846@users.noreply.github.com>
Co-authored-by: Stefan <96178532+stefan6419846@users.noreply.github.com>
test doc for example in documentation: |
Co-authored-by: Stefan <96178532+stefan6419846@users.noreply.github.com>
## What's new ### New Features (ENH) - Accept ETen-B5 and UniCNS-UTF16 encodings (#2721) by @pubpub-zz - Add decode_as_image() to ContentStreams (#2615) by @pubpub-zz - context manager for PdfReader (#2666) by @tibor-reiss - Add capability to set font and size in fields (#2636) by @pubpub-zz - Allow to pass input file without named argument (#2576) by @pubpub-zz ### Bug Fixes (BUG) - Fix deprecation for Ressources when using old constants (#2705) by @stefan6419846 - Fix images issue 4 bits encoding and LUT starting with UTF16_BOM (#2675) by @pubpub-zz - Reading large compressed images takes huge time to process (#2644) by @snanda85 - Highlighted Text Cannot Be Printed (#2604) by @Nifury - Fix UnboundLocalError on malformed pdf (#2619) by @farjasju ### Documentation (DOC) - Various improvements on docstrings and examples by @j-t-1 ### Robustness (ROB) - Cope with missing Standard 14 fonts in fields (#2677) by @pubpub-zz - Improve inline image extraction (#2622) by @pubpub-zz - Cope with loops in Fields tree (#2656) by @pubpub-zz - Discard /I in choice fields for compatibility with Acrobat (#2614) by @pubpub-zz - Cope with some issues in pillow (#2595) by @pubpub-zz - Cope with some image extraction issues (#2591) by @pubpub-zz ### Maintenance (MAINT) - Deprecate interiour_color with replacement interior_color (#2706) by @j-t-1 - Add deprecate_with_replacement to PdfWriter.find_bookmark (#2674) by @j-t-1 ### Code Style (STY) - Change Link to be a non-markup annotation (#2714) by @j-t-1 [Full Changelog](4.2.0...4.3.0)
closes #2613