Skip to content

A simple AWS Lambda function that extracts Key-Value pairs from an image using AWS Textract for people who would "prefer not to."

Notifications You must be signed in to change notification settings

joshmenden/bartleby

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

bartleby

A simple AWS Lambda function that extracts Key-Value pairs from an image using AWS Textract for people who would "prefer not to."

Setup

This Lambda function takes in an S3 reference and spits back out all of the Key Value pairs that AWS Textract can find.

None of this works without an AWS CLI installed and configured. Follow these instructions to set that up.

To start, run

npm install // install dependencies, for publish and update helpers
touch .env // create a file for environment variables

In your .env file, you need 2 keys: FUNCTION_NAME and ROLE_ARN. FUNCTION_NAME is what you'd like to name your deployment of this Lambda function. The ROLE_ARN is the AWS role that you want your deployment of this Lambda function to assume. In order for this to work, your role needs both S3 and Textract access.

So, an example .env file:

FUNCTION_NAME=bartlebyExtract
ROLE_ARN=arn:aws:iam::99999999999:role/foo_bar_baz

With all of this in place, you're ready to deploy your Lambda function!

Deployment

There are 2 basic scripts to help with development:

npm run publish_lambda will publish the function, and, npm run update_lambda, to update the function if you make any changes to the code.

Usage

The Lambda event requires 2 arguments, and accepts 3.

{
  "Bucket": "bartleby", // bucket name, required
  "Name": "fake-w2-forms.jpg", // s3 object name, required
  "condense": false // explained below, optional
}

If condense is true, you can expect a return object that looks like this

[
  {
    "key": "Name:",
    "value": "Bartleby"
  },
  {
    "key": "Occupation",
    "value": "the Scrivener"
  },
  ...
]

Otherwise, if condense is false or not included, each word of both the key and value are separated into their own object and include AWS's confidence score in those individual words.

[
  {
    "key": [
      {
        "text": "Name:",
        "confidence": 99.93144989013672
      }
    ],
    "value": [
      {
        "text": "Bartleby",
        "confidence": 97.45654345654
      }
    ]
  },
  {
    "key": [
      {
        "text": "Occupation:",
        "confidence": 96.93144989013672
      }
    ],
    "value": [
      {
        "text": "the",
        "confidence": 99.9874289879
      },
      {
        "text": "Scrivener",
        "confidence": 99.8758549483
      }
    ]
  }
]

Eventually...

Would love to include a "Tables" event param that could also return table data in CSV form if possible.

About

A simple AWS Lambda function that extracts Key-Value pairs from an image using AWS Textract for people who would "prefer not to."

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published