Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ROB: Fix errors/warnings on no /Resources within extract_text #1276

Merged
merged 5 commits into from Aug 28, 2022

Conversation

pubpub-zz
Copy link
Collaborator

fix #1272 (in text) and #1269 (in Xform)

@codecov
Copy link

codecov bot commented Aug 25, 2022

Codecov Report

Merging #1276 (a4feaba) into main (ceb997d) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##             main    #1276   +/-   ##
=======================================
  Coverage   95.02%   95.03%           
=======================================
  Files          30       30           
  Lines        4968     4974    +6     
  Branches     1023     1024    +1     
=======================================
+ Hits         4721     4727    +6     
  Misses        140      140           
  Partials      107      107           
Impacted Files Coverage Δ
PyPDF2/_page.py 93.99% <100.00%> (+0.06%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

tests/test_page.py Outdated Show resolved Hide resolved
tests/test_page.py Outdated Show resolved Hide resolved
pubpub-zz and others added 3 commits August 26, 2022 08:02
Co-authored-by: Matthew Peveler <matt.peveler@gmail.com>
Co-authored-by: Matthew Peveler <matt.peveler@gmail.com>
@MartinThoma MartinThoma changed the title ROB : fix errors/warnings on no /resources with extract_text ROB: Fix errors/warnings on no /Resources within extract_text Aug 27, 2022
@MartinThoma
Copy link
Member

Thank you for the PR @pubpub-zz 🤗

@MartinThoma
Copy link
Member

According to TABLE 3.27 Entries in a page object in the PDF 1.7 specs, the /Resources is required:

(Required; inheritable) A dictionary containing any resources required by
the page (see Section 3.7.2, “Resource Dictionaries”). If the page requires
no resources, the value of this entry should be an empty dictionary. Omit-
ting the entry entirely indicates that the resources are to be inherited from
an ancestor node in the page tree.

As it is required, I would throw an exception in strict mode and a logger_warning otherwise.

Do you know if we handle the inheritance case? From what can it be inherited?

@pubpub-zz
Copy link
Collaborator Author

I've added some code to check in the parents. But if not present. I do not think it worth to check that Tj or TJ is present in the content (it will complexify the code with very limited interest)
if somebody exhibits a file showing this case, we can add the test at that time

@MartinThoma MartinThoma merged commit af9c01b into py-pdf:main Aug 28, 2022
MartinThoma added a commit that referenced this pull request Aug 28, 2022
Robustness (ROB):
-  Fix errors/warnings on no /Resources within extract_text (#1276)
-  Add required line separators in ContentStream ArrayObjects (#1281)

Maintenance (MAINT):
-  Use NameObject idempotency (#1290)

Testing (TST):
-  Rectangle deletion (#1289)
-  Add workflow tests (#1287)
-  Remove files after tests ran (#1286)

Packaging (PKG):
-  Add minimum version for typing_extensions requirement (#1277)

Full Changelog: 2.10.3...2.10.4
@pubpub-zz pubpub-zz deleted the iss_ress_extract branch September 3, 2022 19:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

KeyError: '/Resources'
3 participants