Fuzzy Movie Search - Search application with Lambda, Kinesis, Firehose, ElasticSearch, S3

Key	Value
Environment
Services	Lambda, Kinesis, Firehose, ElasticSearch, S3
Integrations	Terraform, AWS CLI
Categories	Serverless; Event-Driven architecture
Level	Intermediate
GitHub	Repository link

Introduction

This Fuzzy Search application demonstrates how to set up an S3-hosted website that enables you to fuzzy-search a movie database. The sample application implements the following integration among the various AWS services:

A data ingestion pipeline which allows adding movie data to an ElasticSearch index via:
- An AWS Lambda function, explosed via a fuction URL.
- The Lambda function sends the JSON payload to a Kinesis Data Stream.
- A Kinesis Firehose Delivery Stream forwards the data to an ElasticSearch domain.
A frontend / website which:
- Has a simple search interface to search for movies in the database.
- The HTML page uses a plain JS script to query data using a second Lambda function.
- This Lambda function performs a fuzzy query on the movie index in the ElasticSearch cluster.

Architecture Diagram

The following diagram shows the architecture that this sample application builds and deploys:

S3 Website that holds the website. [Lambda] (https://docs.localstack.cloud/user-guide/aws/lambda/) for feeding the Kinesis stream and performing the fuzzy-search. Kinesis for forwarding the data into Elasticsearch. Firehose for forwarding the data into Elasticsearch. Elasticsearch which actually holds the data.

Prerequisites

LocalStack Pro with the localstack CLI.
Terraform with the tflocal installed.
AWS CLI with the awslocal wrapper.

Start LocalStack Pro with the LOCALSTACK_API_KEY pre-configured:

export LOCALSTACK_API_KEY=<your-api-key>
docker compose up -d

Instructions

You can build and deploy the sample application on LocalStack by running ./run.sh. Here are instructions to deploy and test it manually step-by-step.

Build the application

To build the Terraform application, run the following commands:

terraform init; terraform plan; terraform apply --auto-approve

This will create all ressources specified in main.tf. This can take can take a couple of minutes. Once it is done, you will be able to save the following values into variables by executing these commands

ingest_function_url=$(terraform output --raw ingest_lambda_url)
elasticsearch_endpoint=$(terraform output --raw elasticsearch_endpoint)

Download the dataset

The dataset we will use for this application is a selection of movies and their typical data such as name, author, genre, etc. Execute the following commands to make it available.

temp_dir=$(mktemp --directory)
movie_dataset_url="https://docs.aws.amazon.com/opensearch-service/latest/developerguide/samples/sample-movies.zip"
curl -L $movie_dataset_url > $temp_dir/sample-movies.zip
unzip $temp_dir/sample-movies.zip -d $temp_dir/

Pre-processing the data

For the data to properly work for our streaming use case, we need to remove the bulk insert instruction.

grep -v '^{ "index"' $temp_dir/sample-movies.bulk > $temp_dir/sample-movies-processed.bulk
mv $temp_dir/sample-movies-processed.bulk $temp_dir/sample-movies.bulk

Populating the database

We know populate the database with the actual entries via our lambda function. Execute the following code to insert the entries line by line. It will take quite some time to finish

cat $temp_dir/sample-movies.bulk | while read line
do
   echo -n "."
   echo $line | curl -s -X POST $ingest_function_url \
        -H 'Content-Type: application/json' \
        -d @- > /dev/null
done

Querying the database

Now you can access the website with its entries under http://movie-search.s3-website.localhost.localstack.cloud:4566/ . If e.g. you search for "Quentis", a misspelling of "Quentin", you should see entries that relate the director "Quentin Tarantino", similar to the following screenshot.

Known limitations

The localstack logs sometimes show error message in regards to the firehose propagation. While this might reduce the size of the database to some degree, it is still be sufficient for demonstration purposes.

Contributing

We appreciate your interest in contributing to our project and are always looking for new ways to improve the developer experience. We welcome feedback, bug reports, and even feature ideas from the community. Please refer to the contributing file for more details on how to get started.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.github/workflows		.github/workflows
images		images
ingest		ingest
search		search
www		www
www_tpl		www_tpl
.env.sample		.env.sample
.gitignore		.gitignore
.terraform.lock.hcl		.terraform.lock.hcl
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE.txt		LICENSE.txt
README.md		README.md
docker-compose.yml		docker-compose.yml
main.tf		main.tf
outputs.tf		outputs.tf
provider.tf		provider.tf
run.sh		run.sh
variables.tf		variables.tf

License

localstack-samples/sample-fuzzy-movie-search-lambda-kinesis-elasticsearch

Folders and files

Latest commit

History

Repository files navigation

Fuzzy Movie Search - Search application with Lambda, Kinesis, Firehose, ElasticSearch, S3

Introduction

Architecture Diagram

Prerequisites

Instructions

Build the application

Download the dataset

Pre-processing the data

Populating the database

Querying the database

Known limitations

Contributing

About

Topics

Resources

License

Stars

Watchers

Forks

Languages