A convenient way of reading PDF's and Images using Tesseract
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.



Simple-OCR provides a more convenient way of reading PDF's and Images using the Tessaract Engine.

Installation Instructions

  1. Install Tesseract.
  2. Install ImageMagick.

Example Usage

It's very simple to use Simple-OCR:

# Specify the path of your source image or PDF.
img = OCR::Image.new("source.png")

# Specify the output file name, called "destination" here.
img.scan("destination", "-l eng", :pdf)

You can also give custom command line options.

img.scan("destination", "-l eng -psm 1...", :pdf)

It is also possible to specify the output file type, which can either be:

  • pdf
  • txt
  • hocr
img.scan("destination", "-l eng", :txt)
img.scan("destination", "-l eng", :hocr)



SimpleOCR is maintained and funded by Skcript. The names and logos for Skcript are properties of Skcript.

We love open source, and we have been doing quite a bit of contributions to the community. Take a look at them here. Also, encourage people around us to get involved in community operations. Join us, if you'd like to see the world change from our HQ.