Skip to content
This repository

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Tool for extracting pages from pdf as images and text as strings.

branch: master

I think it is time for version 1.0.0

This gem has been converting decks for speakerdeck.com for a few years now,
I think it deserves a 1.0.0 release.
latest commit 4e82746992
Jonathan Hoyt authored
Octocat-spinner-32 lib I think it is time for version 1.0.0 March 11, 2014
Octocat-spinner-32 spec Fix failing spec. March 11, 2014
Octocat-spinner-32 .gitignore Ignore .rvmrc September 08, 2011
Octocat-spinner-32 Gemfile
Octocat-spinner-32 LICENSE Trying to adhere to TomDoc and also un-nested specs since we don't ha… September 05, 2011
Octocat-spinner-32 README.textile Add contributors section. March 11, 2014
Octocat-spinner-32 Rakefile Add Rake task to run the specs and default to that. September 08, 2011
Octocat-spinner-32 grim.gemspec Reimplemented being able to pass in convert/gs paths. October 04, 2011
README.textile
                    ,____
                    |---.\
            ___     |    `
           / .-\  ./=)
          |  |"|_/\/|
          ;  |-;| /_|
         / \_| |/ \ |
        /      \/\( |
        |   /  |` ) |
        /   \ _/    |
       /--._/  \    |
       `/|)    |    /
         /     |   |
       .'      |   |
      /         \  |
     (_.-.__.__./  /

Grim

Grim is a simple gem for extracting (reaping) a page from a pdf and converting it to an image as well as extract the text from the page as a string. It basically gives you an easy to use api to ghostscript, imagemagick, and pdftotext specific to this use case.

Prerequisites

You will need ghostscript, imagemagick, and poppler installed. On the Mac (OSX) I highly recommend using Homebrew to get them installed.


  brew install ghostscript imagemagick poppler

Installation


  gem install grim

Usage


  pdf   = Grim.reap("/path/to/pdf")         # returns Grim::Pdf instance for pdf
  count = pdf.count                         # returns the number of pages in the pdf
  png   = pdf[3].save('/path/to/image.png') # will return true if page was saved or false if not
  text  = pdf[3].text                       # returns text as a String

  pdf.each do |page|
    puts page.text
  end

We also support using other processors (the default is whatever version of Imagemagick/Ghostscript is in your path).


  # specifying one processor with specific ImageMagick and GhostScript paths
  Grim.processor =  Grim::ImageMagickProcessor.new({:imagemagick_path => "/path/to/convert", :ghostscript_path => "/path/to/gs"})

  # multiple processors with fallback if first fails, useful if you need multiple versions of convert/gs
  Grim.processor = Grim::MultiProcessor.new([
    Grim::ImageMagickProcessor.new({:imagemagick_path => "/path/to/6.7/convert", :ghostscript_path => "/path/to/9.04/gs"}),
    Grim::ImageMagickProcessor.new({:imagemagick_path => "/path/to/6.6/convert", :ghostscript_path => "/path/to/9.02/gs"})
  ])

  pdf = Grim.reap('/path/to/pdf)

Reference

Contributors

License

See LICENSE for details.

Something went wrong with that request. Please try again.