Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Fix Parsing of Inline Images #332

Closed
wants to merge 12 commits into from
Closed

Conversation

speedplane
Copy link

The inline image parser does not look for whitespace before the EI keyword as it should. Thus if you have a content stream as follows, the parser would crash:

BI [inline image dictionary]
ID
asfASF213ad>]asf
213lkasdf9as12EI
QsdkfjasdfkjfdiI
EI
Q

Notice the EI on one line and the Q on the following line occurs in two places. To properly check, we need to make sure the EI is preceded by white-space.

Also, added a protection against infinite loops in case the PDF is corrupt and the inline image never ends.

@vstoykov
Copy link
Contributor

vstoykov commented Jul 20, 2017

#331 is also implements protection against incorrect images. Also make parsing of inline images a lot faster.

@MartinThoma MartinThoma added is-bug From a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF workflow-images From a users perspective, image handling is the affected feature/workflow labels Apr 6, 2022
PyPDF2/filters.py Outdated Show resolved Hide resolved
@MartinThoma
Copy link
Member

The current solution is not compatible with the recent BytesIO implementation. Do you mind to adjust your PR?

@MartinThoma MartinThoma added the needs-change The PR/issue cannot be handled as issue and needs to be improved label Apr 16, 2022
@speedplane
Copy link
Author

I fixed the merge conflict, I'm not sure what you're referring to re BytesIO.

@MartinThoma
Copy link
Member

I fixed the merge conflict, I'm not sure what you're referring to re BytesIO.

CI is failing:

image

@MartinThoma
Copy link
Member

@speedplane We made some pretty heavy changes to PyPDF2 recently. If you search for if tok2 == b"I": in generic.py, you can see the section that you adjusted. Do you want to adjust the PR / open a new PR?

Do you have an example PDF where this adjustment is necessary? Does it close one of the open issues?

@MartinThoma MartinThoma changed the title Fix Parsing of Inline Images BUG: Fix Parsing of Inline Images Jun 25, 2022
@MartinThoma MartinThoma added needs-rebase This PR cannot be merged as the main branch is too different. You need to rebase or merge main. and removed needs-change The PR/issue cannot be handled as issue and needs to be improved labels Jun 25, 2022
@MartinThoma MartinThoma added the needs-test A test should be added before this PR is merged. label Jul 24, 2022
@MartinThoma
Copy link
Member

It would help me a lot if we had an image that shows the described issue.

@speedplane
Copy link
Author

Sorry, this is all I have. I can't remember what this fixed or how it fixes it.

@MartinThoma
Copy link
Member

@speedplane The issue you addressed was fixed via #1327 .

May I add you to https://pypdf2.readthedocs.io/en/latest/meta/CONTRIBUTORS.html ? Your PR was not merged, but you did make a valuable contribution with this PR. It was just me not being able to understand it at the time.

@MartinThoma MartinThoma closed this Sep 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
is-bug From a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF needs-rebase This PR cannot be merged as the main branch is too different. You need to rebase or merge main. needs-test A test should be added before this PR is merged. workflow-images From a users perspective, image handling is the affected feature/workflow
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants