Ruby wrapper around the pdftohtml command line utility (around xpdf)
Ruby
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
lib
test
.gitignore
MIT-LICENSE
README.textile
Rakefile
init.rb

README.textile

pdftohtmlr

Wrapper around the command line tool pdftohtml which converts PDF to HTML, go figure.

This gem was inspired by the MiniMagick gem – which does the same thing for ImageMagick (thanks Corey).

Requirements

Just pdftohtml and Ruby (1.8.6+ as far as I know).

On Mac:

brew install pdftohtml

On Ubuntu:
It should be installed by default with the ‘poppler-utils’ package.

Install

http://gemcutter.org/gems/pdftohtmlr

gem install pdftohtmlr

Using

gist examples

require 'pdftohtmlr'
require 'nokogiri'
include PDFToHTMLR
file = PdfFilePath.new([Path to Source PDF])
string = file.convert
doc = file.convert_to_document()

See included test cases for more usage examples, including passwords and URL fetching.

license

MIT (See included MIT-LICENSE)