Hello! This is a high-quality containerized Python API → managed Kafka cluster → AWS Lambda consumer system, provisioned with Terraform. (CloudFormation is used indirectly, for the Kafka consumer stack.) I hope you will be able to adapt it for your own projects, under the terms of the license.
- Small image
- Secure container
- Secure private network
- Low-code
- Low-cost
- Ready for continuous-integration/continuous-deployment
Table of innovations and best practices...
✓ Quality |
My work |
Advantage |
|---|---|---|
✓ Small image |
||
| Package and module caches | Docker cache mounts |
No bloat, and no slow re-downloading on image re-build |
| Temporary Python modules | Uninstalled |
Same discipline as for operating system packages |
| Temporary software installation, usage, and removal | Same layer |
Fewer, smaller layers, without Docker multi‑stage build complexity |
✓ Secure container |
||
| Base image | Amazon Linux |
Fewer vulnerabilities; frequent updates, from AWS staff; deterministic OS package versions |
| Image build platform | AWS CloudShell or EC2 |
Controlled, auditable environment; low malware risk |
| Non-root user | Yes |
Less access if main process is compromised |
✓ Secure private network |
||
| Internet from private subnets | No |
Lower data exfiltration risk |
| AWS service endpoints | Private |
Traffic never leaves private network |
| Security group rule scope | Named security groups |
Only known pairs of resources can communicate |
✓ Low-code |
||
| API specification | OpenAPI document |
Standard and self-documenting; declarative input validation |
| Serverless compute | ECS Fargate |
Fewer, simpler resource definitions; no platform-level patching |
| Serverless Kafka consumer | AWS Lambda |
AWS event source mapping handles Kafka; code receives JSON input (I re-used an SQS consumer CloudFormation template from my other projects!) |
✓ Low-cost |
||
| Compute pricing | Spot discount |
No commitment; EC2 Spot discounts are higher than Savings Plan discounts and Fargate Spot pricing works similarly |
| CPU architecture | ARM (AWS Graviton) |
Better price/performance ratio; same CPU off‑load |
| Expensive resources | Conditional |
Develop and test at the lowest AWS cost |
✓ CI/CD-ready |
||
| Image build properties | Terraform variables |
Multiple versions |
| Image build software platform | Amazon Linux |
Ready for centralized building |
| Private address allocation | Flexible |
Instead of specifying multiple interdependent address ranges, specify one address space for AWS IP Address Manager (IPAM) to divide |
| Lambda function tests | Central, shared registry |
Realistic, centrally‑executed tests (see shareable Lambda test) |
Jump to: Recommendations • Licenses
-
Choose between AWS CloudShell or an EC2 instance for building the Docker image and running Terraform.
-
CloudShell
Easy ✓-
Authenticate to the AWS Console. Use a non-production AWS account and a privileged role.
-
Open an AWS CloudShell terminal.
-
Prepare for a cross-platform container image build. CloudShell seems to provide Intel CPUs. The following instructions are from "Multi-platform builds" in the Docker Build manual.
sudo docker buildx create --name 'container-builder' --driver 'docker-container' --bootstrap --use
sudo docker run --privileged --rm 'tonistiigi/binfmt' --install all -
Review the Terraform S3 backend documentation and create an S3 bucket to store Terraform state.
-
If at any time you find that your previous CloudShell session has expired, repeat any necessary software installation steps. Your home directory is preserved between sessions, subject to CloudShell persistent storage limitations.
-
-
EC2 instance
EC2 instructions...
-
Create and/or connect to an EC2 instance. I recommend:
arm64t4g.micro⚠ The ARM-based AWS Gravitongarchitecture avoids multi-platform build complexity.- Amazon Linux 2023
- A 30 GiB EBS volume, with default encryption (supports hibernation)
- No key pair; connect through Session Manager
- A custom security group with no ingress rules (yay for Session Manager!)
- A
sched-stop=d=_ H:M=07:00tag for automatic nightly shutdown (this example corresponds to midnight Pacific Daylight Time) with sqlxpert/lights-off-aws
-
During the instance creation workflow (Advanced details → IAM instance profile → Create new IAM profile) or afterward, give your EC2 instance a custom role. Terraform must be able to list/describe, get tags for, create, tag, untag, update, and delete all of the AWS resource types included in this project's
.tffiles. -
Update operating system packages (thanks to AWS's deterministic upgrade philosophy, there shouldn't be any updates if you chose the latest Amazon Linux 2023 image), install Docker, and start it.
sudo dnf check-update
sudo dnf --releasever=latest update
sudo dnf install docker
sudo systemctl start docker
-
-
-
Install Terraform. I'm standardizing on Terraform v1.10.0 (2024-11-27) as the minimum supported version for my open-source projects.
sudo dnf --assumeyes install 'dnf-command(config-manager)'sudo dnf config-manager --add-repo 'https://rpm.releases.hashicorp.com/AmazonLinux/hashicorp.repo' # sudo dnf --assumeyes install terraform-1.10.0-1 sudo dnf --assumeyes install terraform
-
Clone this repository and create
terraform.tfvarsto customize variables.git clone 'https://github.com/sqlxpert/docker-python-openapi-kafka-terraform-cloudformation-aws.git' ~/docker-python-openapi-kafka cd ~/docker-python-openapi-kafka/terraform touch terraform.tfvars
Generate a terraform.tfvars skeleton...
# Requires an up-to-date GNU sed (not the MacOS default!) sed --regexp-extended --silent \ --expression='s/^variable "(.+)" \{$/\n\n# \1 =/p' \ --expression='s/^ description = "(.+)"$/#\n# \1/p' \ --expression='s/^ default = (.+)$/#\n# Default: \1/p' variables.tf
Optional: To save money while building the Docker container image, set
hello_api_aws_ecs_service_desired_count_tasks = 0andcreate_vpc_endpoints_and_load_balancer = false. -
In CloudShell (optional if you chose EC2), create an override file to configure your Terraform S3 backend.
cat > terraform_override.tf << 'EOF' terraform { backend "s3" { insecure = false region = "RegionCodeForYourS3Bucket" bucket = "NameOfYourS3Bucket" key = "DesiredTerraformStateFileName" use_lockfile = true # No more DynamoDB; now S3-native! } } EOF
-
Initialize Terraform and create the AWS infrastructure. There's no need for a separate
terraform planstep.terraform applyoutputs the plan and gives you a chance to approve before anything is done. If you don't like the plan, don't typeyes!terraform init
terraform apply -target='aws_vpc_ipam_pool_cidr_allocation.hello_api_vpc_private_subnets' -target='aws_vpc_ipam_pool_cidr_allocation.hello_api_vpc_public_subnets'
About this two-stage process...
CloudPosse's otherwise excellent dynamic-subnets module isn't dynamic enough to co-operate with AWS IP Address Manager (IPAM), so you have to let IPAM finalize subnet IP address range allocations beforehand.
terraform apply
In case of an "already exists" error...
- If you receive a "Registry with name
lambda-testevent-schemasalready exists" error, setcreate_lambda_testevent_schema_registry = false, then runterraform applyagain.
- If you receive a "Registry with name
-
Set environment variables needed for building, tagging and pushing up the Docker container image, then build it.
AMAZON_LINUX_BASE_VERSION=$(terraform output -raw 'amazon_linux_base_version') AMAZON_LINUX_BASE_DIGEST=$(terraform output -raw 'amazon_linux_base_digest') AWS_ECR_REGISTRY_REGION=$(terraform output -raw 'hello_api_aws_ecr_registry_region') AWS_ECR_REGISTRY_URI=$(terraform output -raw 'hello_api_aws_ecr_registry_uri') AWS_ECR_REPOSITORY_URL=$(terraform output -raw 'hello_api_aws_ecr_repository_url') HELLO_API_AWS_ECR_IMAGE_TAG=$(terraform output -raw 'hello_api_aws_ecr_image_tag') HELLO_API_DOMAIN_NAME=$(terraform output -raw 'hello_api_load_balander_domain_name') # For later aws ecr get-login-password --region "${AWS_ECR_REGISTRY_REGION}" | sudo docker login --username 'AWS' --password-stdin "${AWS_ECR_REGISTRY_URI}" cd ../python_docker
sudo docker buildx build --build-arg AMAZON_LINUX_BASE_VERSION="${AMAZON_LINUX_BASE_VERSION}" --build-arg AMAZON_LINUX_BASE_DIGEST="${AMAZON_LINUX_BASE_DIGEST}" --platform='linux/arm64' --tag "${AWS_ECR_REPOSITORY_URL}:${HELLO_API_AWS_ECR_IMAGE_TAG}" --output 'type=docker' .
sudo docker push "${AWS_ECR_REPOSITORY_URL}:${HELLO_API_AWS_ECR_IMAGE_TAG}"Updating the container image...
-
You can select a newer Amazon Linux release by setting the
amazon_linux_base_versionandamazon_linux_base_digestvariables in Terraform, runningterraform apply, and re-setting the environment variables.Then, to re-build the image, run
HELLO_API_AWS_ECR_IMAGE_TAG='1.0.1'(choose an appropriate new version number, taking semantic versioning into account) in the shell, repeat the build and push commands, sethello_api_aws_ecr_image_tag = "1.0.1"(for example) in Terraform, and runterraform applyone more time.
-
-
If you changed Terraform variables at the end of Step 3, revert the changes and run
terraform apply. -
In the Amazon Elastic Container Service section of the AWS Console, check the
hello_apicluster. Eventually, you should see 2 tasks running.Container deployment delay...
- It will take a few minutes for ECS to notice, and then deploy, the
container image. Relax, and let it happen. If you are impatient, or if
there is a problem, you can navigate to the
hello_apiservice, open the orange "Update service" pop-up menu, and select "Force new deployment".
- It will take a few minutes for ECS to notice, and then deploy, the
container image. Relax, and let it happen. If you are impatient, or if
there is a problem, you can navigate to the
-
Generate the URLs and then test your API.
echo -e "curl --location --insecure 'http://${HELLO_API_DOMAIN_NAME}/"{'healthcheck','hello','current_time?name=Paul','current_time?name=;echo','error'}"'\n"
Try the different URLs using your Web browser or
curl --location --insecure(these options allow redirection and self-signed TLS certificates).Method Result Expected /healthcheckEmpty response /helloFixed greeting, in a JSON object /current_time?name=PaulReflected greeting and timestamp, in a JSON object /current_time?name=;echoHTTP 400"bad request" error;
Demonstrates protection from command injection/errorHTTP 404"not found" errorAbout redirection to HTTPS, and certificates...
Your Web browser should redirect you from
http:tohttps:and (let's hope!) warn you about the untrusted, self-signed TLS certificate used in this system (which of course is not tied to a pre-determined domain name). Proceed to view the responses from your new API...If your Web browser configuration does not allow accessing Web sites with untrusted certificates, change the
enable_httpsvariable in Terraform, runterraform applytwice (don't ask!), andhttp:links will work without redirection. After you have usedhttps:with a particular site, your browser might no longer allowhttp:for that site. Try an alternate Web browser if necessary. -
Access the
hello_api_ecs_taskCloudWatch log group in the AWS Console. (hello_api_ecs_clusteris reserved for future use.)Periodic internal health checks, plus your occasional Web requests, should appear.
API access log limitations...
The Python connexion module, which I chose because it serves an API from a precise OpenAPI-format specification, uses uvicorn workers. Unfortunately, uvicorn has lousy log format customization support.
-
If you don't wish use Kafka, skip to Step 13.
If you wish to enable Kafka, set
enable_kafka = trueand runterraform apply. AWS MSK is expensive, so enable Kafka only after confirming that the rest of the system is working for you.In case HelloApiKafkaConsumer CloudFormation stack creation fails...
Creation of the Kafka consumer might fail for various reasons. Once the
HelloApiKafkaConsumerCloudFormation stack is inROLLBACK_COMPLETEstatus, delete it, then runterraform applyagain. -
Access the
/current_time?name=Paulmethod several times (adjust the name as you wish). The first use of this method prompts creation of theeventsKafka topic. From now on, use of this method (not the others) will send a message to theeventsKafka topic.The AWS MSK event source mapping reads from the Kafka topic and triggers the consumer Lambda function, which logs decoded Kafka messages to the HelloApiKafkaConsumer CloudWatch log group.
-
Set the
enable_kafka,hello_api_aws_ecs_service_desired_count_tasksandcreate_vpc_endpoints_and_load_balancervariables to their cost-saving values if you'd like to continue experimenting. When you are done, delete all resources; the minimum configuration carries a cost.cd ../terraform terraform state rm 'aws_schemas_registry.lambda_testevent' terraform apply -destroy
Deletion delays and errors...
-
Deleting a VPC Lambda function takes a long time because of the network association; expect 30 minutes if
enable_kafkawastrue. -
Expect an error message about retiring KMS encryption key grants (harmless, in this case).
-
If you cancel and re-run
terraform apply -destroy, a bug in CloudPosse'sdynamic-subnetsmodule might cause a "value depends on resource attributes that cannot be determined until apply" error. For a work-around, edit the cached module file indicated in the error message. Comment out the indicated line and forcecount = 0. Be sure to revert this temporary patch later.
-
This is my own original work, produced without the use of artificial intelligence (AI) and large language model (LLM) code generation. Code from other sources is acknowledged.
I write long option names in my instructions so that other people don't have to look up unfamiliar single-letter options — assuming they can find them!
Here's an example that shows why I go to the trouble, even at the expense of being laughed at by macho Linux users. I started using UNICOS in 1991, so it's not for lack of experience.
Search for the literal text
-tin docs.docker.com/reference/cli/docker/buildx/build , using Command-F, Control-F,/, orgrep. Only 2 of 41 occurrences of-tare relevant!
Where available, full-text (that is, not strictly literal) search engines can't make sense of a 1-letter search term and are also likely to ignore a 2-character term as a "stop-word" that's too short to search for.
My professional and ethical commitment is simple: Only as much technology as a business...
- needs,
- can afford,
- understands (or can learn), and
- can maintain.
Having worked for startups since 2013, I always recommend focusing software engineering effort. It is not possible to do everything, let alone to be good at everything. Managed services, serverless technology, and low-code architecture free software engineers to focus on the core product, that is, on what the company actually sells. Avoid complex infrastructure and tooling unless it offers a unique, tangible, and substantial benefit. Simplicity pays!
Security is easier and cheaper to incorporate at the start than to graft on after the architecture has been finalized, the infrastructure has been templated and created, and the executable code has been written and deployed.
Specialized knowledge of the chosen cloud provider is indispensable. I call it "idiomatic" knowledge, a good part of which is awareness of the range of options supported by your cloud provider. Building generically would mean giving up some performance, some security, and some cloud cost savings. Optimizing later is difficult. "Lean to steer the ship you're on."
| Scope | Link | Included Copy |
|---|---|---|
| Source code, and source code in documentation | GNU General Public License (GPL) 3.0 | LICENSE_CODE.md |
| Documentation, including this ReadMe file | GNU Free Documentation License (FDL) 1.3 | LICENSE_DOC.md |
Copyright Paul Marcelin
Contact: marcelin at cmu.edu (replace "at" with @)