Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gofpdi fails to correctly parse streams on some pdfs #53

Open
napalu opened this issue Jun 5, 2022 · 0 comments · May be fixed by #54
Open

gofpdi fails to correctly parse streams on some pdfs #53

napalu opened this issue Jun 5, 2022 · 0 comments · May be fixed by #54

Comments

@napalu
Copy link

napalu commented Jun 5, 2022

When reading some PDFs (seen this typically when importing scanned-in PDFs), gofpdi will fail to detect 'endstream', panicking with panic: Failed to get content: Failed to get page content: Failed to resolve object: Expected next token to be: endstream, got: dstream.

When reading a PDF stream the reader should start reading stream after the first CRLF sequence but instead skips all leading whitespace which can result in reading past the 'endstream' token.

Here's a test PDF with described behaviour.
BRW2C6FC94B5488_000827.pdf

napalu added a commit to napalu/gofpdi that referenced this issue Jun 5, 2022
use bytes.Buffer instead of string concatenation in readToken
closes phpdave11#53
@napalu napalu linked a pull request Jun 5, 2022 that will close this issue
syamcode added a commit to syamcode/gofpdi that referenced this issue Aug 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant