Added pdf support #40

mgartner · 2015-02-23T02:31:18Z

I added support for finding the size of a PDF (in Postscript Points) by writing a rudimentary parser that looks for a Page's CropBox or MediaBox attributes.

I'm happy to explain in more detail how this all works but the general structure is:

Find the XREF offset looking from the end of the file.
Go to the XREF table and find the offsets of each object in the file.
Find the first object that is a dictionary, has a "/Type" of "/Page", and contains either "/CropBox" or "/MediaBox".
Return the value contained by the box attribute.

I've tested mainly with the PDF included in this PR, but I'm going to be doing some more thorough ad-hoc testing now. I welcome any feedback you have.

coveralls · 2015-02-23T03:37:35Z

Coverage decreased (-1.52%) to 93.51% when pulling 59030d9 on mgartner:pdf into ed0c56c on netroy:master.

mgartner · 2015-02-23T03:49:29Z

When doing some profiling I noticed that most of the time was being spent in isSVG, probably because a regex is pretty inefficient here. Since this was a major bottle neck (over 10x the rest of the execution time) I moved SVG to the end of the list so it is the last file type checked. Other file types will short-circuit and isSVG isn't called.

Marcus Gartner added 2 commits February 23, 2015 22:36

added pdf support

fca2775

checks if file is SVG last

a847358

mgartner closed this Mar 1, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added pdf support #40

Added pdf support #40

mgartner commented Feb 23, 2015

coveralls commented Feb 23, 2015

mgartner commented Feb 23, 2015

Added pdf support #40

Added pdf support #40

Conversation

mgartner commented Feb 23, 2015

coveralls commented Feb 23, 2015

mgartner commented Feb 23, 2015