Serenata OCR

A Serverless API for OCRing Serenata de Amor's documents (currently limited to Chamber of Deputies receipts). Powered by Claudia.JS and Google Cloud Vision.

From zero to an OCR API in minutes


Initial setup

In terms of tools / development stuff, while a Docker environment is in the works, this is what you'll need:

  • git clone
  • cp config.json{.example,}
  • NodeJS 6.10 (⚠️ This is important, it is the version executed in AWS Lambda).
  • yarn install or npm install
  • Claudia.JS CLI (npm install -g claudia)
  • AWS credentials configured for claudia as outlined in this tutorial

For OCRing with Google Cloud Vision you'll need:


As mentioned above, make sure your AWS credentials are configured as outlined in this tutorial. Once you have that done, proceed to your first deployment of the API:

claudia create --region us-east-1 \
               --api-module app \
               --timeout 60 \
               --memory 512 \
               --set-env-from-json config.json

At the end of claudia create you'll get an url, to test it run:

# One liner if you have `jq` installed
API="https://$(jq -r '' claudia.json)"

# OCR a receipt and get the full text of the PDF
curl "${API}/1789/2015/5631380" > 5631380.json

# Play with the data
jq '.config + .extra' 5631380.json
jq '.ocrResponse.fullTextAnnotation.text' 5631380.json


🚧 Proper documentation is in the works 🚧

  • From a high level, this is what gets done under the hood:
    • The receipt PDF associated with the reimbursement is downloaded from the Chamber of Deputies website.
    • ImageMagick is used to convert the PDF to a PNG image with: convert -density <density> receipt.pdf -quality 100 -deskew 40% -append receipt.png
    • The PNG is uploaded to Google Cloud Vision and the results are sent back to the client.
  • For custom parameters supported by the API, see app.js for now.
  • For local execution, see local.js for now (run with node local.js).
  • Example responses at examples/
  • Some useful utilities at Deskfile
  • More info? Please read the code for now, it is super small:

Wanna help?

See the issue tracker for inspiration.


Feel free to create an issue.

Function times out

Maybe the document is too big for your function to handle so give it more 💪

claudia update --timeout 90 --memory 1024


