Skip to content

Automated solution for parsing PDF files using Amazon Textract. Complete solution with CloudFormation template, Step Function State Machine, Lambda functions, etc.

Notifications You must be signed in to change notification settings

netesenz/amazon-textract-cloudformation

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Textract with Step Function and Cloud Formation

This is a complete setup for automatic text extraction from PDF / JPEG / PNG files using Amazon Textract.

Deployment

Check out this repository and run the included deploy.sh script.

It will create a new S3 bucket and the use CloudFormation template to build the required resources.

$ ./deploy.sh
[*] Verifying deployment settings...
[x] Stack name: textract-demo
[x] Region: us-west-2
[x] Account ID: 123456789012
[x] Deployment bucket: textract-demo-123456789012-us-west-2

Press [Enter] to continue or Ctrl-C to abort.

When done follow these steps to test that it works:

  1. Upload your test PDF to the /upload folder in the newly created S3 bucket.

  2. Open the Step Function page to follow the progress

  3. When done download the results from the /output folder in the bucket.

Author

Michael Ludvig

About

Automated solution for parsing PDF files using Amazon Textract. Complete solution with CloudFormation template, Step Function State Machine, Lambda functions, etc.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 57.8%
  • Shell 42.2%