Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TimeOut / stuck in loop (?) - 1.24.1 PDF-hul for attached file #646

Closed
asciim0 opened this issue Sep 10, 2020 · 2 comments · Fixed by #719
Closed

TimeOut / stuck in loop (?) - 1.24.1 PDF-hul for attached file #646

asciim0 opened this issue Sep 10, 2020 · 2 comments · Fixed by #719
Assignees
Labels
bug A product defect that needs fixing P2 Medium priority issues to be scheduled in a future release

Comments

@asciim0
Copy link
Contributor

asciim0 commented Sep 10, 2020

Using both UI and CLI (on Windows environment, tested on different machines) the attached file nevers seems to reach a result for validation.

897714407.pdf

@rosetta-development
Copy link
Collaborator

We encounter the same issue with our pdf file.
Thread is stuck on :
edu.harvard.hul.ois.jhove.Property.getByName(Property.java:159)

@carlwilson carlwilson added bug A product defect that needs fixing P2 Medium priority issues to be scheduled in a future release labels Mar 4, 2021
@karenhanson
Copy link
Contributor

I think I have this same issue - so wanted to share what I found while trying to troubleshoot in case it adds useful context. Looks like #645 may be the same thing too. In my case the following lines in the PDF are causing an infinite loop (using v1.24):

464 0 obj
<</Dest(þÿ s e c S 0 0 4)/Next 463 0 R/Parent 399 0 R/Prev 465 0 R/Title(þÿ P a r t A\t & \t .)>>
endobj
465 0 obj
<</Dest(þÿ s e c S 0 0 3)/Next 464 0 R/Parent 399 0 R/Prev 466 0 R/Title(þÿ P a r t B)>>
endobj

In the Tokenizer it seems that the backslash in the title causes it to go into Literal.readBackslashSequence() where it fails to see the >> that should end the entry. It then proceeds to read in the following row as part of the same object, which sets the Next value back to 464 and causes an infinite loop where it keeps reloading the same garbled object. In my case the PDF gets stuck in this loop.

I note that the PDF attached to this issue and the one attached to #645 both have backslashes in the Title property of a dictionary entry. I tested the fix in PR #652 and it works for my issue too.

carlwilson added a commit that referenced this issue Apr 7, 2022
Patch integration tests and added regression test files for #652:
- patched the result of pdf-hul-76-372051162.pdf; and
- added regression tests for #645 and #646.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug A product defect that needs fixing P2 Medium priority issues to be scheduled in a future release
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants