From 5300f99fe5d3a639018a0be32b07d847abbdc5e0 Mon Sep 17 00:00:00 2001 From: fmassot Date: Thu, 18 Jan 2024 11:31:09 +0100 Subject: [PATCH 1/3] Add simple tutorial for Quicwkit on Lambdas --- .../tutorials/tutorial-aws-lambda-simple.md | 132 ++++++++++++++++++ 1 file changed, 132 insertions(+) create mode 100644 docs/get-started/tutorials/tutorial-aws-lambda-simple.md diff --git a/docs/get-started/tutorials/tutorial-aws-lambda-simple.md b/docs/get-started/tutorials/tutorial-aws-lambda-simple.md new file mode 100644 index 00000000000..7f07d652170 --- /dev/null +++ b/docs/get-started/tutorials/tutorial-aws-lambda-simple.md @@ -0,0 +1,132 @@ +--- +title: Serverless Search on AWS Lambda +description: Index and search using AWS Lambda based on an end to end usecase. +tags: [aws, integration] +icon_url: /img/tutorials/aws-logo.png +sidebar_position: 3 +--- + +In this tutorial, we will index and search about 20 million log entries (7 GB decompressed) located on AWS S3 with Quickwit Lambda. + +Concretely, we will deploy an AWS CloudFormation stack with the Quickwit Lambdas, and two buckets: one staging for hosting gzipped newline-delimited JSON files to be indexed and one for hosting the index data. The staging bucket is optional as Quickwit indexer can read data from any S3 files it has access to. + +![Tutorial stack overview](../../assets/images/quickwit-lambda-service.svg) + +Let's go! + +## Install + +### Install AWS CDK + +We will use [AWS CDK](https://aws.amazon.com/cdk/) for our infrastructure automation script. Install it using [npm](https://docs.npmjs.com/downloading-and-installing-node-js-and-npm): +```bash +npm install -g aws-cdk +``` +We also use the `curl` and `make` commands. For instance on Debian based distributions: +```bash +sudo apt update && sudo apt install curl make +``` + +You also need AWS credentials to be properly configured in your shell. One way is using the [credentials file](https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-files.html). + +Finally, clone the Quickwit repository: +```bash +git clone https://github.com/quickwit-oss/tutorials.git +cd tutorials/simple-lambda-stack +``` + +### Setup python environment + +We use python to define the AWS CloudFormation stack we need to deploy, and a python CLI to invoke Lambdas. +Let's install those few packages (boto3, aws-cdk-lib, click, pyyaml). + +```bash +# Install pipenv if needed. +pip install --user pipenv +pipenv shell +pipenv install +``` + +### Bootstrap and deploy + +Configure the AWS region and [account id](https://docs.aws.amazon.com/IAM/latest/UserGuide/console_account-alias.html) where you want to deploy the stack: + +```bash +export CDK_ACCOUNT=123456789 +export CDK_REGION=us-east-1 +``` + +If this region/account pair was not bootstrapped by CDK yet, run: +```bash +cdk bootstrap aws://$CDK_ACCOUNT/$CDK_REGION +``` + +This initializes some basic resources to host artifacts such as Lambda packages. + +## Index the HDFS logs dataset + +Here is an example of a log entry of the dataset: +```json +{ + "timestamp": 1460530013, + "severity_text": "INFO", + "body": "PacketResponder: BP-108841162-10.10.34.11-1440074360971:blk_1074072698_331874, type=HAS_DOWNSTREAM_IN_PIPELINE terminating", + "resource": { + "service": "datanode/01" + }, + "attributes": { + "class": "org.apache.hadoop.hdfs.server.datanode.DataNode" + }, + "tenant_id": 58 +} +``` + +If you have a few minutes ahead of you, you can index the whole dataset which is available on our public S3 bucket. + +```bash +python cli.py index s3://quickwit-datasets-public/hdfs-logs-multitenants.json.gz +``` + +If not, just index the 10,000 documents dataset: + +```bash +python cli.py index s3://quickwit-datasets-public/hdfs-logs-multitenants-10000.json +``` + +## Execute search queries + +Let's start with a query on the field `severity_text` and look for errors: `severity_text:ERROR`: + +```bash +python cli.py search '{"query":"severity_text:ERROR"}' +``` + +It should respond under 1 second and return 10 hits out of 345 if you indexed the whole dataset. If you index the first 10,000 documents, you won't have any hits, try to query `INFO` logs instead. + + +Let's now run a more advanced query: a date histogram with a term aggregation on the `severity_text`` field: + +```bash +python cli.py search '{ "query": "*", "max_hits": 0, "aggs": { "events": { "date_histogram": { "field": "timestamp", "fixed_interval": "30d" }, "aggs": { "log_level": { "terms": { "size": 10, "field": "severity_text", "order": { "_count": "desc" } } } } } } }' +``` + +It should respond under 2 seconds and return the top log levels per 30 days. + + +### Cleaning up + +First, you have to delete the files created on your S3 buckets. +Once done, you can delete the stack. + +```bash +cdk destroy -a cdk/app.py +rm -rf cdk.out +``` + +Congratz! You finished this tutorial! You can level up with the following tutorials to discover all Quickwit features. + +## Next steps + +- [Advanced Lambda tutorial](tutorial-aws-lambda.md) which covers an end-to-end use cases +- [Search REST API](/docs/reference/rest-api) +- [Query language](/docs/reference/query-language) From fe1f9d7582aeb26429b25908d7dcbc2c152d04a6 Mon Sep 17 00:00:00 2001 From: fmassot Date: Thu, 18 Jan 2024 11:32:59 +0100 Subject: [PATCH 2/3] Fix title and description of lambda tutorial. --- docs/get-started/tutorials/tutorial-aws-lambda-simple.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/get-started/tutorials/tutorial-aws-lambda-simple.md b/docs/get-started/tutorials/tutorial-aws-lambda-simple.md index 7f07d652170..6f0bd9fafaf 100644 --- a/docs/get-started/tutorials/tutorial-aws-lambda-simple.md +++ b/docs/get-started/tutorials/tutorial-aws-lambda-simple.md @@ -1,6 +1,6 @@ --- -title: Serverless Search on AWS Lambda -description: Index and search using AWS Lambda based on an end to end usecase. +title: Search with AWS Lambda +description: Index and search using AWS Lambda on 20 million log entries tags: [aws, integration] icon_url: /img/tutorials/aws-logo.png sidebar_position: 3 From e899037672830e0e1416872399cdcbf98b7490ad Mon Sep 17 00:00:00 2001 From: fmassot Date: Thu, 18 Jan 2024 13:40:35 +0100 Subject: [PATCH 3/3] Tutorial aws update. --- .../tutorials/tutorial-aws-lambda-simple.md | 19 ++++++++++--------- .../tutorials/tutorial-aws-lambda.md | 4 ++-- ...ial-hdfs-logs-distributed-search-aws-s3.md | 2 +- 3 files changed, 13 insertions(+), 12 deletions(-) diff --git a/docs/get-started/tutorials/tutorial-aws-lambda-simple.md b/docs/get-started/tutorials/tutorial-aws-lambda-simple.md index 6f0bd9fafaf..7f61edec4c2 100644 --- a/docs/get-started/tutorials/tutorial-aws-lambda-simple.md +++ b/docs/get-started/tutorials/tutorial-aws-lambda-simple.md @@ -3,7 +3,7 @@ title: Search with AWS Lambda description: Index and search using AWS Lambda on 20 million log entries tags: [aws, integration] icon_url: /img/tutorials/aws-logo.png -sidebar_position: 3 +sidebar_position: 4 --- In this tutorial, we will index and search about 20 million log entries (7 GB decompressed) located on AWS S3 with Quickwit Lambda. @@ -12,8 +12,6 @@ Concretely, we will deploy an AWS CloudFormation stack with the Quickwit Lambdas ![Tutorial stack overview](../../assets/images/quickwit-lambda-service.svg) -Let's go! - ## Install ### Install AWS CDK @@ -21,11 +19,6 @@ Let's go! We will use [AWS CDK](https://aws.amazon.com/cdk/) for our infrastructure automation script. Install it using [npm](https://docs.npmjs.com/downloading-and-installing-node-js-and-npm): ```bash npm install -g aws-cdk -``` -We also use the `curl` and `make` commands. For instance on Debian based distributions: -```bash -sudo apt update && sudo apt install curl make -``` You also need AWS credentials to be properly configured in your shell. One way is using the [credentials file](https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-files.html). @@ -37,7 +30,7 @@ cd tutorials/simple-lambda-stack ### Setup python environment -We use python to define the AWS CloudFormation stack we need to deploy, and a python CLI to invoke Lambdas. +We use python 3.10 to define the AWS CloudFormation stack we need to deploy, and a python CLI to invoke Lambdas. Let's install those few packages (boto3, aws-cdk-lib, click, pyyaml). ```bash @@ -47,6 +40,14 @@ pipenv shell pipenv install ``` +### Download Quickwit Lambdas + +```bash +mkdir -p cdk.out +wget -P cdk.out https://github.com/quickwit-oss/quickwit/releases/download/aws-lambda-beta-01/quickwit-lambda-indexer-beta-01-x86_64.zip +wget -P cdk.out https://github.com/quickwit-oss/quickwit/releases/download/aws-lambda-beta-01/quickwit-lambda-searcher-beta-01-x86_64.zip +``` + ### Bootstrap and deploy Configure the AWS region and [account id](https://docs.aws.amazon.com/IAM/latest/UserGuide/console_account-alias.html) where you want to deploy the stack: diff --git a/docs/get-started/tutorials/tutorial-aws-lambda.md b/docs/get-started/tutorials/tutorial-aws-lambda.md index d6fadb1f61b..1584acdee33 100644 --- a/docs/get-started/tutorials/tutorial-aws-lambda.md +++ b/docs/get-started/tutorials/tutorial-aws-lambda.md @@ -1,9 +1,9 @@ --- -title: Serverless Search on AWS Lambda +title: Serverless E2E with Lambda description: Index and search using AWS Lambda based on an end to end usecase. tags: [aws, integration] icon_url: /img/tutorials/aws-logo.png -sidebar_position: 3 +sidebar_position: 5 --- In this tutorial, we’ll show you how to run Quickwit on Lambda on a complete use case. We’ll present you the associated cloud resources, a cost estimate and how to deploy the whole stack using AWS CDK. diff --git a/docs/get-started/tutorials/tutorial-hdfs-logs-distributed-search-aws-s3.md b/docs/get-started/tutorials/tutorial-hdfs-logs-distributed-search-aws-s3.md index df4674f2055..ea90f77cc1e 100644 --- a/docs/get-started/tutorials/tutorial-hdfs-logs-distributed-search-aws-s3.md +++ b/docs/get-started/tutorials/tutorial-hdfs-logs-distributed-search-aws-s3.md @@ -3,7 +3,7 @@ title: Distributed search on AWS S3 description: Index log entries on AWS S3 using an EC2 instance and launch a distributed cluster. tags: [aws, integration] icon_url: /img/tutorials/aws-logo.png -sidebar_position: 4 +sidebar_position: 6 --- In this guide, we will index about 40 million log entries (13 GB decompressed) on AWS S3 using an EC2 instance and launch a three-node distributed search cluster.