Skip to content

sequence-sh/TesseractConnector

Repository files navigation

Sequence Tesseract OCR Connector

Sequence® is a collection of libraries for automation of cross-application e-discovery and forensic workflows.

This connector contains steps to perform optical character recognition (OCR) on image files. It uses the Tesseract open source library as the OCR engine.

Prerequisites

The following needs to be installed:

Examples

OCR a bitmap image

- <path> = 'MyImage.bmp'
- <imageData> = FileRead <path>
- <imageFormat> = GetImageFormat <path>
- <imageText> = TesseractOCR <imageData> <imageFormat>
- Print <imageText>

Documentation

https://sequence.sh

Download

https://sequence.sh/download

Try SCL and Core

https://sequence.sh/playground

Package Releases

Can be downloaded from the Releases page.

NuGet Packages

Release nuget packages are available from nuget.org.