BUG: Fix Parsing of Inline Images #332

speedplane · 2017-02-28T05:29:39Z

The inline image parser does not look for whitespace before the EI keyword as it should. Thus if you have a content stream as follows, the parser would crash:

BI [inline image dictionary]
ID
asfASF213ad>]asf
213lkasdf9as12EI
QsdkfjasdfkjfdiI
EI
Q

Notice the EI on one line and the Q on the following line occurs in two places. To properly check, we need to make sure the EI is preceded by white-space.

Also, added a protection against infinite loops in case the PDF is corrupt and the inline image never ends.

…whitespace before and after it, not just after it.

vstoykov · 2017-07-20T14:20:40Z

#331 is also implements protection against incorrect images. Also make parsing of inline images a lot faster.

…t range.

…whitespace before and after it, not just after it.

…t range.

PyPDF2/filters.py

MartinThoma · 2022-04-16T13:42:42Z

The current solution is not compatible with the recent BytesIO implementation. Do you mind to adjust your PR?

speedplane · 2022-04-16T19:49:20Z

I fixed the merge conflict, I'm not sure what you're referring to re BytesIO.

MartinThoma · 2022-04-16T20:50:32Z

I fixed the merge conflict, I'm not sure what you're referring to re BytesIO.

CI is failing:

MartinThoma · 2022-06-19T11:10:01Z

@speedplane We made some pretty heavy changes to PyPDF2 recently. If you search for if tok2 == b"I": in generic.py, you can see the section that you adjusted. Do you want to adjust the PR / open a new PR?

Do you have an example PDF where this adjustment is necessary? Does it close one of the open issues?

MartinThoma · 2022-07-24T07:31:07Z

It would help me a lot if we had an image that shows the described issue.

speedplane · 2022-08-04T03:22:04Z

Sorry, this is all I have. I can't remember what this fixed or how it fixes it.

MartinThoma · 2022-09-06T19:25:10Z

@speedplane The issue you addressed was fixed via #1327 .

May I add you to https://pypdf2.readthedocs.io/en/latest/meta/CONTRIBUTORS.html ? Your PR was not merged, but you did make a valuable contribution with this PR. It was just me not being able to understand it at the time.

speedplane added 2 commits February 28, 2017 00:25

Fix an issue with parsing inline images. We must look for an EI with …

46f735a

…whitespace before and after it, not just after it.

Finish fix-inline-image-bug

9681531

speedplane mentioned this pull request Mar 7, 2017

Feature/ascii85 off by one #333

Closed

speedplane and others added 7 commits November 7, 2018 22:05

Fix an off by one error here, this value can represent the full 32 bi…

b847fb9

…t range.

Add a bit more logging on this type of error.

8b35ab5

Fix an issue with parsing inline images. We must look for an EI with …

ba1799d

…whitespace before and after it, not just after it.

Finish fix-inline-image-bug

68f50af

Fix an off by one error here, this value can represent the full 32 bi…

cae961d

…t range.

Add a bit more logging on this type of error.

d6ed4c7

Merge remote-tracking branch 'origin/master'

dfe632c

MartinThoma added is-bug From a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF workflow-images From a users perspective, image handling is the affected feature/workflow labels Apr 6, 2022

Merge branch 'main' into master

25a5c6c

MartinThoma reviewed Apr 16, 2022

View reviewed changes

PyPDF2/filters.py Outdated Show resolved Hide resolved

Update PyPDF2/filters.py

a53f72e

MartinThoma added the needs-change The PR/issue cannot be handled as issue and needs to be improved label Apr 16, 2022

Merge branch 'main' into master

80c5ac2

MartinThoma changed the title ~~Fix Parsing of Inline Images~~ BUG: Fix Parsing of Inline Images Jun 25, 2022

MartinThoma added needs-rebase This PR cannot be merged as the main branch is too different. You need to rebase or merge main. and removed needs-change The PR/issue cannot be handled as issue and needs to be improved labels Jun 25, 2022

MartinThoma mentioned this pull request Jul 24, 2022

MAINT: Add diagnostic output to exception in read_from_stream #1159

Merged

MartinThoma added the needs-test A test should be added before this PR is merged. label Jul 24, 2022

MartinThoma mentioned this pull request Sep 6, 2022

ROB : fix image extraction #1327

Merged

MartinThoma mentioned this pull request Sep 6, 2022

PdfReadError: Unexpected end of stream #1090

Closed

MartinThoma closed this Sep 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Fix Parsing of Inline Images #332

BUG: Fix Parsing of Inline Images #332

speedplane commented Feb 28, 2017

vstoykov commented Jul 20, 2017 •

edited

MartinThoma commented Apr 16, 2022

speedplane commented Apr 16, 2022

MartinThoma commented Apr 16, 2022

MartinThoma commented Jun 19, 2022

MartinThoma commented Jul 24, 2022

speedplane commented Aug 4, 2022

MartinThoma commented Sep 6, 2022

BUG: Fix Parsing of Inline Images #332

BUG: Fix Parsing of Inline Images #332

Conversation

speedplane commented Feb 28, 2017

vstoykov commented Jul 20, 2017 • edited

MartinThoma commented Apr 16, 2022

speedplane commented Apr 16, 2022

MartinThoma commented Apr 16, 2022

MartinThoma commented Jun 19, 2022

MartinThoma commented Jul 24, 2022

speedplane commented Aug 4, 2022

MartinThoma commented Sep 6, 2022

vstoykov commented Jul 20, 2017 •

edited