Wrapper around the command line tool pdftohtml which converts PDF to HTML, go figure.
This gem was inspired by the MiniMagick gem – which does the same thing for ImageMagick (thanks Corey).
Just pdftohtml and Ruby (1.8.6+ as far as I know).
brew install pdftohtml
It should be installed by default with the ‘poppler-utils’ package.
gem install pdftohtmlr
require 'pdftohtmlr' require 'nokogiri' include PDFToHTMLR file = PdfFilePath.new([Path to Source PDF]) string = file.convert doc = file.convert_to_document()
See included test cases for more usage examples, including passwords and URL fetching.
MIT (See included MIT-LICENSE)