Skip to content
This repository has been archived by the owner on Dec 16, 2023. It is now read-only.
/ simple-ocr Public archive

A convenient way of reading PDF's and Images using Tesseract

Notifications You must be signed in to change notification settings

skcript/simple-ocr

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Simple-OCR

Simple-OCR provides a more convenient way of reading PDF's and Images using the Tessaract Engine.

Installation Instructions

  1. Install Tesseract.
  2. Install ImageMagick.

Example Usage

It's very simple to use Simple-OCR:

# Specify the path of your source image or PDF.
img = OCR::Image.new("source.png")

# Specify the output file name, called "destination" here.
img.scan("destination", "-l eng", :pdf)

You can also give custom command line options.

img.scan("destination", "-l eng -psm 1...", :pdf)

It is also possible to specify the output file type, which can either be:

  • pdf
  • txt
  • hocr
img.scan("destination", "-l eng", :txt)
img.scan("destination", "-l eng", :hocr)

About

Skcript

SimpleOCR is maintained and funded by Skcript. The names and logos for Skcript are properties of Skcript.

We love open source, and we have been doing quite a bit of contributions to the community. Take a look at them here. Also, encourage people around us to get involved in community operations. Join us, if you'd like to see the world change from our HQ.

Releases

No releases published

Packages

No packages published