DynamoDB to ElasticSearch

This package allow you to easily ZIP a Lambda function and start processing DynamoDB Streams in order to index your DynamoDB objects in ElasticSearch.

It processes the following events:

INSERT
REMOVE
MODIFY

We force an index refresh for each event, for close to realtime indexing.

The DynamoDB JSON objects are unmarshaled and types are correctly converted. (Binary types have never been tested though)

List of numbers and strings are converted to list of Strings. ES doesn't allow arrays of different data types.

Blog reference explaining what we're doing here: https://aws.amazon.com/blogs/compute/indexing-amazon-dynamodb-content-with-amazon-elasticsearch-service-using-aws-lambda/

Unfortunatly, AWS removed the Lambda Blueprint doing what we are doing here. So we've done it ourself again.

Get started

The deployment process is done through the Makefile.

You need to declare the following environment variables to get started:

AWS_BUCKET_CODE: Where your Lambda function will go in AWS S3
IAM_ROLE: The IAM role you created for the Lambda function to run with
ENV: The environment you're running: DEV, QA, PROD, STAGE, ... This is used to pull the correct config file from S3
PROFILE (optional): The AWS profile to use to deploy. Will use default profile by default ...

Obvisouly you need your AWS environment setup correctly.

Create a config file

Create a simple file named ${ENV}_es_creds and put the following in it:

ES_ENDPOINT='https://search-esclust-esclus-xxxxx-xxxxxx.{region}.es.amazonaws.com'

This file just contains the endpoint to your cluster. You should have one file per environment.

Upload it to your bucket where the Lambda function will also be uploaded.

Before Zipping the function, we will download that file locally and will inject it in the Lambda function as the file lib/env.py. The function depends on it and import this file.

That allows you to NOT hardcode your endpoint. This way you can deploy several functions for different clusters and environments without touching the code.

Just set your environment variables ENV correctly, name your file right, upload it and that's it.

Create the function

The first time you need to create the function in AWS Lambda.

make create/DynamoToES DESC="Process DynamoDB stream to ES"

This will download your config file from S3, install all the Python packages in the build folder, ZIP the whole thing, upload the ZIP file to S3 and create your Lambda function. The default is 128MB. You can change the memory of your function with the Makefile or in the console.

Update the function

Let's say you make some changes to the code.

make deploy/DynamoToES DESC="Process DynamoDB stream to ES"

That will update the ZIP and refresh your Lambda function.

Create and update the mapping

Why do we need a mapping ?

There is an issue with the lambda, dynamo streams doesn't ensure that your keys will be in the 'right' order. Most of the time dynamodb stream gives object with primary key first and then secondary. BUT it doesn't do it 100% of the time. That's why we came with the idea that a mapping to fix that was needed.

What does the mapping script do ?

It go through all your dynamodb tables, looking for enabled dynamo streams, list them, get mapping of each table and store it into a json. This json will then be used to check if the parameters are sent in the right order. If not, use the right keys according to the mapping.

Is the mapping mandatory ?

No, even without the mapping the function will works like before, trusting dynamodb streams for keys order.

How do you update the mapping ?

Use the script:

./update_mapping.py

This script gets all dynamo streams linked to the function DynamoToES. If your function is named differently, change it into the script.

It create a file lib/table_mappping.json.

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
.github/workflows		.github/workflows
dist		dist
lib		lib
src/DynamoToES		src/DynamoToES
.gitignore		.gitignore
LICENCE		LICENCE
Makefile		Makefile
README.md		README.md
requirements.txt		requirements.txt
update_mapping.py		update_mapping.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.github/workflows

.github/workflows

dist

dist

lib

lib

src/DynamoToES

src/DynamoToES

.gitignore

.gitignore

LICENCE

LICENCE

Makefile

Makefile

README.md

README.md

requirements.txt

requirements.txt

update_mapping.py

update_mapping.py

Repository files navigation

DynamoDB to ElasticSearch

Get started

Create a config file

Create the function

Update the function

Create and update the mapping

Next

About

Releases

Packages

Contributors 5

Languages

License

bfansports/dynamodb-to-elasticsearch

Folders and files

Latest commit

History

Repository files navigation

DynamoDB to ElasticSearch

Get started

Create a config file

Create the function

Update the function

Create and update the mapping

Next

About

Resources

License

Stars

Watchers

Forks

Languages