Skip to content

shelfio/aws-lambda-tesseract

master
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
bin
 
 
src
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

aws-lambda-tesseract CircleCI Tesseract

6 MB Tesseract 5.1 (with English training data) to fit inside AWS Lambda

Inspired by chrome-aws-lambda & lambda-scanner-ocr

Install

$ yarn add @shelf/aws-lambda-tesseract

1.x versions of this library were compiled for Node 8.10.

2.x was compiled for Node 10.x runtime.

3.x works for Node 12.x runtime.

4.x works for Node 16.x runtime and compiled with Tesseract 5.1.0. It works with x86_64 CPUs for now only.

How does it work?

This package contains an archive with Tesseract 5.1 compiled for usage in AWS Lambda environment.

When a Lambda starts, it unpacks an archive with a binary to the /tmp folder and makes sure it's done only once per Lambda cold start.

Usage

const {getTextFromImage, isSupportedFile} = require('@shelf/aws-lambda-tesseract');

module.exports.handler = async event => {
  // assuming there is a photo.jpg inside /tmp dir
  // original file will be deleted afterwards

  if (!isSupportedFile('/tmp/photo.jpg')) {
    return false;
  }

  return getTextFromImage('/tmp/photo.jpg');
};

isSupportedFile checks that file has image-like file extension and it's not in the list of unsupported by Tesseract file extensions.

Compile It Yourself

See compile-tesseract.sh

Smoke test that it works by running test.sh script

See Also

Publish

$ git checkout master
$ yarn version
$ yarn publish
$ git push origin master --tags

License

MIT © Shelf