Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Enable merging forms with overlapping names #1553

Merged
merged 24 commits into from
Jan 30, 2023

Conversation

pubpub-zz
Copy link
Collaborator

@pubpub-zz pubpub-zz commented Jan 15, 2023

Add functions to add a top level grouping form field.
Functions to rename top level field also introduced.
PdfWriter.merge/PdfWriter.append extended to merge set of fields.

Closes #1538
Closes #1585

pubpub-zz and others added 4 commits January 15, 2023 13:46
Co-authored-by: Martin Thoma <info@martin-thoma.de>
Co-authored-by: Martin Thoma <info@martin-thoma.de>
Co-authored-by: Martin Thoma <info@martin-thoma.de>
Co-authored-by: Martin Thoma <info@martin-thoma.de>
@codecov
Copy link

codecov bot commented Jan 15, 2023

Codecov Report

Base: 91.84% // Head: 91.89% // Increases project coverage by +0.04% 🎉

Coverage data is based on head (cb28971) compared to base (98511ac).
Patch coverage: 97.95% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1553      +/-   ##
==========================================
+ Coverage   91.84%   91.89%   +0.04%     
==========================================
  Files          33       33              
  Lines        6196     6244      +48     
  Branches     1229     1242      +13     
==========================================
+ Hits         5691     5738      +47     
  Misses        326      326              
- Partials      179      180       +1     
Impacted Files Coverage Δ
pypdf/_writer.py 84.54% <94.73%> (+0.17%) ⬆️
pypdf/_reader.py 90.41% <100.00%> (+0.32%) ⬆️
pypdf/xmp.py 92.10% <0.00%> (ø)
pypdf/_cmap.py 95.25% <0.00%> (ø)
pypdf/_page.py 89.85% <0.00%> (ø)
pypdf/_utils.py 97.43% <0.00%> (ø)
pypdf/filters.py 97.31% <0.00%> (ø)
pypdf/pagerange.py 100.00% <0.00%> (ø)
... and 8 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@pubpub-zz
Copy link
Collaborator Author

Test to be improved 😉

pypdf/_reader.py Outdated Show resolved Hide resolved
@MartinThoma
Copy link
Member

Looks good from my side. If you think it's ready, I could merge it :-)

@pubpub-zz
Copy link
Collaborator Author

Will try first to improve slightly the test tonight to prevent coverage degradation

pubpub-zz and others added 4 commits January 17, 2023 19:04
Co-authored-by: Martin Thoma <info@martin-thoma.de>
@pubpub-zz
Copy link
Collaborator Author

pubpub-zz commented Jan 17, 2023

@MartinThoma
the best I can do...
ready for merging for me

@MartinThoma MartinThoma added the workflow-forms From a users perspective, forms is the affected feature/workflow label Jan 22, 2023
pypdf/_reader.py Outdated Show resolved Hide resolved
pypdf/_reader.py Outdated Show resolved Hide resolved
if "/AcroForm" not in catalog or not isinstance(
catalog["/AcroForm"], DictionaryObject
):
return None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about raising a MissingElementException("/AcroForm not in catalog") instead of returning None?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This cover the case where the form is not a form. I considered that the function should not raise an error

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is a form that is not a form? Did you want to write "AcroForm"?

Does that mean that this grouping only makes sense for AcroForm? (I don't understand the AcroForm parts yet)

@MartinThoma MartinThoma changed the title ENH : Merge Forms ENH: Enable merging forms with overlapping names Jan 22, 2023
@pubpub-zz
Copy link
Collaborator Author

@MartinThoma, can you provide upgrade on my comments. Should I mark them as resolved ?

@MartinThoma
Copy link
Member

@pubpub-zz I've added a comment:

This cover the case where the form is not a form. I considered that the function should not raise an error

What do you mean by "the form is not a form"?

@pubpub-zz
Copy link
Collaborator Author

What do you mean by "the form is not a form"?

I mixed up : I meant the doc passed is not actually a form. when docs are passed in batchs for example. For the /Fields, I'm prefer also to not raise an exception at least for the moment : XFA forms may be some case where /Fields do not exist (If you have some examples that could be interesting to look at)

pypdf/_reader.py Outdated Show resolved Hide resolved
pypdf/_reader.py Outdated Show resolved Hide resolved
@MartinThoma
Copy link
Member

For the /Fields, I'm prefer also to not raise an exception at least for the moment : XFA forms may be some case where /Fields do not exist

So you don't want to add an Exception because the exception could be wrong for XFA. Did I understand that right?

Could you add this as a comment in the code? It might make my life easier in future when somebody edits that part :-)

pubpub-zz and others added 4 commits January 29, 2023 23:22
Co-authored-by: Martin Thoma <info@martin-thoma.de>
Co-authored-by: Martin Thoma <info@martin-thoma.de>
@MartinThoma
Copy link
Member

Looks good to me! If you wan't I can merge today :-)

@MartinThoma
Copy link
Member

@pubpub-zz I took the "thumbs up" as "go ahead and merge" :-) I'll make the release latest on Sunday; depends on my workload.

@pubpub-zz pubpub-zz deleted the merge_forms branch January 30, 2023 22:39
MartinThoma added a commit that referenced this pull request Feb 5, 2023
NOTICE: pypdf changed the way it represents numbers parsed from PDF files.
  pypdf<3.4.0 represented numbers as Decimal, pypdf>=3.4.0 represents them as
  floats. Several other PDF libraries to this, as well as many PDF viewers.
  We hope to fix issues with too high precision like this and get a speed boost.
  In case your PDF documents rely on more than 18 decimals of precision you
  should check if it still works as expected.
  To clarify: This does not affect the text shown in PDF documents. It affects
  numbers, e.g. when graphics are drawn on the PDF or very exact positions are
  used. Typically, 5 decimals should be enough.

New Features (ENH)
-  Enable merging forms with overlapping names (#1553)
-  Add 'over' parameter to merge_transformend_page & co (#1567)

Bug Fixes (BUG)
-  Fix getter of the PageObject.rotation property with an indirect object (#1602)
-  Restore merge_transformed_page & co (#1567)
-  Replace decimal by float (#1563)

Robustness (ROB)
-  PdfWriter.remove_images: /Contents might not be in page_ref (#1598)

Developer Experience (DEV)
-  Introduce ruff (#1586, #1609)

Maintenance (MAINT)
-  Remove decimal (#1608)

[Full Changelog](3.3.0...3.4.0)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
workflow-forms From a users perspective, forms is the affected feature/workflow
Projects
None yet
2 participants