Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Improve the legibility of PDFs on e-book readers https://felixcrux.com/library/pdfmunge
Fetching latest commit…
Cannot retrieve the latest commit at this time.
|Failed to load latest commit information.|
pdfmunge - Process PDFs to make them more legible on eBook readers. Copyright Felix Crux (www.felixcrux.com) Summary: This little script performs some processing tasks on PDFs in order to make them easier to read on an eBook reader. The primary functions are stripping away large margins, and creating an imitation of landscape-mode on devices that don't support it by cutting each page in half, and rotating everything 90 degrees counter-clockwise. It can also remove individual pages from the file. Usage: pdfmunge [options]... input_file output_file -r --rotate Slice pages in half and rotate each half 90 degrees counter-clockwise, creating a pseudo-landscape mode on devices that don't do this automatically. Be warned that this will double the size of your output file. -m --margin If using rotation/slicing, have each page overlap with the previous one by this amount (helps with lines getting cut off in the middle). -b --bounds Boundaries of visible area on each PDF page. Useful for cropping off large margins. If this is given, cropping is done automatically; otherwise it is not done. Boundaries should be given as four comma-separated numbers, all enclosed in quotation marks, like so: "10,20,100,120". Any whitespace inside the quotation marks is ignored. -o --oddbounds For PDFs that have different margins on even and odd pages, use these boundary values for odd numbered pages, with --bounds applying to even numbered ones. If this is given, --bounds is required. -e --exclude Numbers or ranges of pages to not include in the output PDF. These should be given as a series of numbers or ranges surrounded by quotation marks, and separated by commas. Any whitespace is ignored. Ranges are given as two numbers separated by a hyphen/minus sign (-), where the first number must be smaller than the second. Example: "1,2,4-8,40". This option takes precedence over --intact. -i --intact Leave these pages completely unchanged, ignoring cropping, rotating, or anything else. Requires a set of numbers or ranges like --exclude. Excluded pages are ignored even if listed here. Examples: For a simple example, we can use the PDF version of John Hughes' paper "Why Functional Programming Matters", which is available from his website. We want to strip away the large margins, and exclude the last page of references: ./pdfmunge.py --exclude "23" --bounds "125,110,485,670" WhyFP.pdf Test.pdf For a complicated example that really shows off all the features, let's look at the (free) PDF edition of Mark Dominus' "Higher Order Perl"; which is available on his website (please consider purchasing it in paper form, too!). This book needs to have the large margins and printers' guides stripped off, but uses margins of alternating size on alternating pages. Let's also get rid of some of the "extraneous" material like the dedication page and index (another reason you should buy the paper copy!). To make it more readable, we'll slice each page in half and rotate them, creating an imitation of landscape mode on the eBook reader. We'll also want to refrain from processing page 5, which is the title page: ./pdfmunge.py --intact "5" --exclude "1-4,6-8,582-592" \ --bounds "215,240,557,778" --oddbounds "153,240,495,778" \ --rotate --margin 3 HoP.pdf Test.pdf Development: I threw this script together one evening to scratch an itch I was having with my eBook reader. It wasn't really intended to be released, and despite some cleanup, this heritage shows. I'll gladly accept patches. In particular, open issues are: - Rotating/slicing should not need to double the filesize. The way it's done now is tremendously hacky, but I haven't found a way to do it better in the pyPdf library. As they say, storage is cheap; programmer time and patience are not. - Two-column PDFs would benefit from a column-splitting function. This might be useful even if it didn't work with rotation, since the columns would be rather narrow. License: Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.