ocR

R bindings to Tesseract. Tesseract is probably the most accurate open source OCR engine available. Combined with the Leptonica Image Processing Library it can read a wide variety of image formats and convert them to text in over 60 languages. It was one of the top 3 engines in the 1995 UNLV Accuracy test. Between 1995 and 2006 it had little work done on it, but since then it has been improved extensively by Google. It is released under the Apache License 2.0.

Introduction

ocR is an open source packages to interact with the OCR Tesseract engine.

Install ocR

Installing directly from Github requires some helper packages. The easiest way to setup ocR is to source the following «init.R» script and then install and run the package with the packagesGithub function. Furthermore, it also utilizes some functions from the «systemR» package. The package is not on CRAN and has to be installed directly from Bioconductor. Running the following lines of code installs and loads everything for captchaSolveR to work:

Windows:

source("https://rawgit.com/greenore/initR/master/init.R")
packagesGithub(c("systemR", "ocR"), repo_name="greenore")

Linux:

source(pipe(paste("wget -O -", "https://rawgit.com/greenore/initR/master/init.R")))
packagesGithub(c("systemR", "ocR"), repo_name="greenore")

Install Tesseract

Linux:

Tesseract is available directly from many Linux distributions. The package is generally called 'tesseract' or 'tesseract-ocr' - search your distribution's repositories to find it. Packages are also generally available for language training data (search the repositories,) but if not you will need to download the appropriate training data, unpack it, and copy the .traineddata file into the 'tessdata' directory, probably /usr/share/tesseract-ocr/tessdata or /usr/share/tessdata.

# On Ubuntu the following command will install tesseract
sudo apt-get update
sudo apt-get install tesseract-ocr

Mac OS X

The easiest way to install Tesseract is with MacPorts. Once it is installed, you can install Tesseract by running the command sudo port install tesseract, and any language with sudo port install tesseract-. List of available langcodes can be found on MacPorts tesseract page.

Windows

An installer is available on the Tesseract-OCR project page. This includes the English training data.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
R		R
man		man
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
NAMESPACE		NAMESPACE
README.md		README.md
ocR.Rproj		ocR.Rproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ocR

Introduction

Install ocR

Windows:

Linux:

Install Tesseract

Linux:

Mac OS X

Windows

About

Releases

Packages

Languages

greenore/ocR

Folders and files

Latest commit

History

Repository files navigation

ocR

Introduction

Install ocR

Windows:

Linux:

Install Tesseract

Linux:

Mac OS X

Windows

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages