Skip to content
Find file
Fetching contributors…
Cannot retrieve contributors at this time
108 lines (84 sloc) 5.34 KB
pdfmunge - Process PDFs to make them more legible on eBook readers.
Copyright Felix Crux (www.felixcrux.com)
Summary:
This little script performs some processing tasks on PDFs in order to make them
easier to read on an eBook reader. The primary functions are stripping away
large margins, and creating an imitation of landscape-mode on devices that don't
support it by cutting each page in half, and rotating everything 90 degrees
counter-clockwise. It can also remove individual pages from the file.
Usage:
pdfmunge [options]... input_file output_file
-r --rotate Slice pages in half and rotate each half 90 degrees
counter-clockwise, creating a pseudo-landscape mode on
devices that don't do this automatically. Be warned that
this will double the size of your output file.
-m --margin If using rotation/slicing, have each page overlap with the
previous one by this amount (helps with lines getting cut
off in the middle).
-b --bounds Boundaries of visible area on each PDF page. Useful for
cropping off large margins. If this is given, cropping is
done automatically; otherwise it is not done. Boundaries
should be given as four comma-separated numbers, all
enclosed in quotation marks, like so: "10,20,100,120". Any
whitespace inside the quotation marks is ignored.
-o --oddbounds For PDFs that have different margins on even and odd pages,
use these boundary values for odd numbered pages, with
--bounds applying to even numbered ones. If this is given,
--bounds is required.
-e --exclude Numbers or ranges of pages to not include in the output PDF.
These should be given as a series of numbers or ranges
surrounded by quotation marks, and separated by commas. Any
whitespace is ignored. Ranges are given as two numbers
separated by a hyphen/minus sign (-), where the first number
must be smaller than the second. Example: "1,2,4-8,40".
This option takes precedence over --intact.
-i --intact Leave these pages completely unchanged, ignoring
cropping, rotating, or anything else. Requires a set of
numbers or ranges like --exclude. Excluded pages are
ignored even if listed here.
Examples:
For a simple example, we can use the PDF version of John Hughes' paper "Why
Functional Programming Matters", which is available from his website. We want
to strip away the large margins, and exclude the last page of references:
./pdfmunge.py --exclude "23" --bounds "125,110,485,670" WhyFP.pdf Test.pdf
For a complicated example that really shows off all the features, let's look at
the (free) PDF edition of Mark Dominus' "Higher Order Perl"; which is available
on his website (please consider purchasing it in paper form, too!). This book
needs to have the large margins and printers' guides stripped off, but uses
margins of alternating size on alternating pages. Let's also get rid of some of
the "extraneous" material like the dedication page and index (another reason
you should buy the paper copy!). To make it more readable, we'll slice each
page in half and rotate them, creating an imitation of landscape mode on the
eBook reader. We'll also want to refrain from processing page 5, which is the
title page:
./pdfmunge.py --intact "5" --exclude "1-4,6-8,582-592" \
--bounds "215,240,557,778" --oddbounds "153,240,495,778" \
--rotate --margin 3 HoP.pdf Test.pdf
Development:
I threw this script together one evening to scratch an itch I was having with
my eBook reader. It wasn't really intended to be released, and despite some
cleanup, this heritage shows. I'll gladly accept patches.
In particular, open issues are:
- Rotating/slicing should not need to double the filesize. The way it's done
now is tremendously hacky, but I haven't found a way to do it better in the
pyPdf library. As they say, storage is cheap; programmer time and patience
are not.
- Two-column PDFs would benefit from a column-splitting function. This might
be useful even if it didn't work with rotation, since the columns would be
rather narrow.
License:
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.
Something went wrong with that request. Please try again.