Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
A delightful python OCR module
branch: master

Fetching latest commit…

Cannot retrieve the latest commit at this time

Failed to load latest commit information.
readbot
.gitignore
README.md

README.md

readbot

A delightful python OCR module

This is a simple module for adding quick and easy OCR processing within your python program. In addition to wrapping an OCR engine (currently, only Tesseract) it also handles file input quite liberally. Valid file input may be a string of a files path, a file object, or even a URL to a file on the web. As an added bonus, if you need to perform OCR on a PDF, this module will use GhostScript to convert the file to a PNG for Tesseract.

This module is intended for very basic OCR use and is in no way comprehensive. Right now, it is less than 100 LOC and simply calls a Tesseract subprocess.

Installation

Its on pypi!

$ pip install tesseract

Tesseract

You need Tesseract to use this module.

(mac)

$ brew install tesseract

(linux)

$ sudo apt-get tesseract-ocr

Usage

from readbot import ReadBot

rb = ReadBot()

print rb.interpret('/path/to/file/Hello_World.png')
Something went wrong with that request. Please try again.