-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: add incremental capability to PdfWriter #2811
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2811 +/- ##
==========================================
+ Coverage 95.88% 95.91% +0.02%
==========================================
Files 51 51
Lines 8576 8735 +159
Branches 1696 1744 +48
==========================================
+ Hits 8223 8378 +155
Misses 210 210
- Partials 143 147 +4 ☔ View full report in Codecov by Sentry. |
Are we able to fix the partial coverage? At least the |
I left the remaining ones as I would like to fix them with small evolutions I would like to process apart :
|
to be merged after py-pdf#2811
Hey i was following this PR and tried it with the PDF i mentioned in #2780. I modified my "test-code" to use the new increment feature and the created pdf has some issues. When opening the created pdf in Acrobat two things happen: normally i would say screw Acrobat, but i have to use Acrobat, so maybe this is happening because i made an error in my code? import pypdf
from pypdf import PdfReader
from pypdf import PdfWriter
from pathlib import Path
import urllib.request
irs_form = Path("f5471sm.pdf")
if not irs_form.is_file():
urllib.request.urlretrieve("https://www.irs.gov/pub/irs-pdf/f5471sm.pdf", "f5471sm.pdf")
form = PdfReader("f5471sm.pdf")
fields = form.get_form_text_fields()
form.close()
writer = PdfWriter("f5471sm.pdf", incremental=True)
for key,field in fields.items():
fields[key] = key
writer.update_page_form_field_values(None, fields)
with open("f5471sm-"+pypdf.__version__+".pdf","wb") as file:
writer.write(file)
writer.close() |
@ljbergmann I've not been able to identify why acrobat reader says it is damaged. I did other tests where no message is reported. |
@pubpub-zz i'm giving my best to identify these discrepancies and contribute to the PR / project, but i have to admit python and pdf are not my strong suit. I've had a quick look at the mentioned pdf and can verify that an XFA structure exists. Is there any documentation available for the interaction with XFA? |
a few hints: The simpler your document is the best it is. |
To investigate the error message a bit more i reduced my test script even more and removed the update of fields. import pypdf
from pypdf import PdfWriter
from pathlib import Path
import urllib.request
irs_form = Path("f5471sm.pdf")
if not irs_form.is_file():
urllib.request.urlretrieve("https://www.irs.gov/pub/irs-pdf/f5471sm.pdf", "f5471sm.pdf")
writer = PdfWriter("f5471sm.pdf", incremental=True)
with open("f5471sm-"+pypdf.__version__+".pdf","wb") as file:
writer.write(file)
writer.close() The error message does occur even in this case. If you compare the original and the created pdf the files only differ in the last 12 lines. The "new PDF" contains the following lines:
If i remove them manually the error message is gone. I hope this helps? |
Co-authored-by: Stefan <96178532+stefan6419846@users.noreply.github.com>
Co-authored-by: Stefan <96178532+stefan6419846@users.noreply.github.com>
@pubpub-zz Could you please have a look at the remaining remarks to allow continuing with this PR and the PR built on top of this? |
@stefan6419846 |
@pubpub-zz Thanks for your patience. Could you please rebase your other PRs accordingly and point to the changes where the next reviews are possible? |
## Version 5.0.0, 2024-09-15 This version drops support for Python 3.7 (not maintained since July 2023), PdfMerger (use PdfWriter instead) and AnnotationBuilder (use annotations instead). ### Deprecations (DEP) - Remove the deprecated PfdMerger and AnnotationBuilder classes and other deprecations cleanup (#2813) - Drop Python 3.7 support (#2793) ### New Features (ENH) - Add capability to remove /Info from PDF (#2820) - Add incremental capability to PdfWriter (#2811) - Add UniGB-UTF16 encodings (#2819) - Accept utf strings for metadata (#2802) - Report PdfReadError instead of RecursionError (#2800) - Compress PDF files merging identical objects (#2795) ### Bug Fixes (BUG) - Fix sheared image (#2801) ### Robustness (ROB) - Robustify .set_data() (#2821) - Raise PdfReadError when missing /Root in trailer (#2808) - Fix extract_text() issues on damaged PDFs (#2760) - Handle images with empty data when processing an image from bytes (#2786) ### Developer Experience (DEV) - Fix coverage uploads (#2832) - Test against Python 3.13 (#2776) [Full Changelog](4.3.1...5.0.0)
This PR introduces a new capability I was expecting to propose for a while : you can now build some PDF as incrementation from an existing PDF. This allow to keep signature validation of existing forms / documents.
closes #2780 (partially : requires XFA form to be modified manually)