PDF to JPEG images + HTML with <img> alt text converter
Switch branches/tags
Nothing to show
Clone or download
Latest commit 585c166 May 28, 2014
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.gitignore Updated code, README etc. May 2, 2013
README.rst Update README.rst May 28, 2014
pdf2html.py Updated code, README etc. May 2, 2013
test.pdf Updated code, README etc. May 2, 2013

README.rst

Introduction

This is a Python script to convert a PDF to series of HTML <img> tags with alt texts. It makes the presentation suitable embedded for a blog post and reading on a mobile device and such.

Example Workflow:

  • Export presentation from Apple Keynote to PDF file. On Export dialog untick include date and add borders around slides.
  • Run the script against generated PDF file to convert it to a series of JPEG files and a HTML snippet with <img> tags
  • Optionally, the scripts adds a full URL prefix to <img src>, so you don't need to manually link images to your hosting service absolute URL
  • Copy-paste generated HTML to your blog post

Tested with Apple Keynote exported PDFs, but the approach should work for any PDF content.

See example blog post and presentation.

Installation

Dependencies (OSX):

sudo port install ghostscript

Please note that Ghostscript 9.06 crashed for me during the export. Please upgrade to 9.07.

Setting up virtualenv and insllating the code:

git clone xxx
cd pdf-presentation-to-html
curl -L -o virtualenv.py https://raw.github.com/pypa/virtualenv/master/virtualenv.py
python virtualenv.py venv
. venv/bin/activate
pip install pyPdf

Usage

Example:

. venv/bin/activate
python pdf2html.py test.pdf output

Advanced example:

. venv/bin/activate
python pdf2html.py test.pdf output

Even more advanced example with hardcoded URL:

GHOSTSCRIPT=/usr/local/bin/gs python pdf2html.py test.pdf output http://opensourcehacker.com/wp-content/uploads/wpd2013/

Then upload to the server for Wordpress to access:

rsync -av pycon2014 yourserver.example.com:/srv/yoursite/wordpress/wp-content/uploads

Author

Mikko Ohtamaa (blog, Facebook, Twitter, Google+)