Trailing spaces and NUL characters in PDF cause failure identifying EOF #20

freakboy3742 · 2011-02-16T14:51:04Z

I have a collection of PDFs that contain a line of NUL and space characters on the line after the %%EOF marker. The current technique for identifying the %%EOF fails on these PDFs because the 'while not line' check on line 704 of pdf.py (the start of the read() method on PdfFileReader) isn't sufficient to identify this line of NUL and spaces as something worth ignoring.

…and less error prone.

jimr · 2012-12-06T16:24:46Z

Works for me, would be great to see this merged.

jobo3208 · 2012-12-31T14:55:08Z

I agree. This fixes a major shortcoming of the library IMO. Can't tell you how many PDF's I've encountered with this problem.

freakboy3742 added 2 commits February 16, 2011 22:40

Ensure that nulls and spaces after the %%EOF are ignored.

ab4ed6b

Modified the approach for discovering EOF -- new technique is faster …

cff6cef

…and less error prone.

This pull request was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Trailing spaces and NUL characters in PDF cause failure identifying EOF #20

Trailing spaces and NUL characters in PDF cause failure identifying EOF #20

freakboy3742 commented Feb 16, 2011

jimr commented Dec 6, 2012

jobo3208 commented Dec 31, 2012

Trailing spaces and NUL characters in PDF cause failure identifying EOF #20

Trailing spaces and NUL characters in PDF cause failure identifying EOF #20

Conversation

freakboy3742 commented Feb 16, 2011

jimr commented Dec 6, 2012

jobo3208 commented Dec 31, 2012