Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added pdf support #40

Closed
wants to merge 2 commits into from
Closed

Added pdf support #40

wants to merge 2 commits into from

Conversation

mgartner
Copy link

I added support for finding the size of a PDF (in Postscript Points) by writing a rudimentary parser that looks for a Page's CropBox or MediaBox attributes.

I'm happy to explain in more detail how this all works but the general structure is:

  1. Find the XREF offset looking from the end of the file.
  2. Go to the XREF table and find the offsets of each object in the file.
  3. Find the first object that is a dictionary, has a "/Type" of "/Page", and contains either "/CropBox" or "/MediaBox".
  4. Return the value contained by the box attribute.

I've tested mainly with the PDF included in this PR, but I'm going to be doing some more thorough ad-hoc testing now. I welcome any feedback you have.

@coveralls
Copy link

Coverage Status

Coverage decreased (-1.52%) to 93.51% when pulling 59030d9 on mgartner:pdf into ed0c56c on netroy:master.

@mgartner
Copy link
Author

When doing some profiling I noticed that most of the time was being spent in isSVG, probably because a regex is pretty inefficient here. Since this was a major bottle neck (over 10x the rest of the execution time) I moved SVG to the end of the list so it is the last file type checked. Other file types will short-circuit and isSVG isn't called.

@mgartner mgartner closed this Mar 1, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants