Skip to content

sqlxpert/docker-python-openapi-kafka-terraform-cloudformation-aws

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Containerized Python API, Kafka, AWS Lambda Consumer

Hello! This is a high-quality containerized Python API → managed Kafka cluster → AWS Lambda consumer system, provisioned with Terraform. (CloudFormation is used indirectly, for the Kafka consumer stack.) I hope you will be able to adapt it for your own projects, under the terms of the license.

Innovations and Best Practices

  • Small image
  • Secure container
  • Secure private network
  • Low-code
  • Low-cost
  • Ready for continuous-integration/continuous-deployment
Table of innovations and best practices...

✓ Quality
Typical approach
My work

Advantage

✓ Small image
Package and module caches Cleared or disabled
Docker cache mounts
No bloat, and no slow re-downloading on image re-build
Temporary Python modules Retained
Uninstalled
Same discipline as for operating system packages
Temporary software installation, usage, and removal Separate layers; maybe stages?
Same layer
Fewer, smaller layers, without Docker multi‑stage build complexity

✓ Secure container
Base image Docker Community Python
Amazon Linux
Fewer vulnerabilities; frequent updates, from AWS staff; deterministic OS package versions
Image build platform Local computer
AWS CloudShell or EC2
Controlled, auditable environment; low malware risk
Non-root user Maybe?
Yes
Less access if main process is compromised

✓ Secure private network
Internet from private subnets NAT Gateway
No
Lower data exfiltration risk
AWS service endpoints Public
Private
Traffic never leaves private network
Security group rule scope Ranges of numbered addresses
Named security groups
Only known pairs of resources can communicate

✓ Low-code
API specification In program code
OpenAPI document
Standard and self-documenting; declarative input validation
Serverless compute No
ECS Fargate
Fewer, simpler resource definitions; no platform-level patching
Serverless Kafka consumer No
AWS Lambda
AWS event source mapping handles Kafka; code receives JSON input (I re-used an SQS consumer CloudFormation template from my other projects!)

✓ Low-cost
Compute pricing On-demand; maybe Savings Plan?
Spot discount
No commitment; EC2 Spot discounts are higher than Savings Plan discounts and Fargate Spot pricing works similarly
CPU architecture Intel x86
ARM (AWS Graviton)
Better price/performance ratio; same CPU off‑load
Expensive resources Always on
Conditional
Develop and test at the lowest AWS cost

✓ CI/CD-ready
Image build properties Hard-coded
Terraform variables
Multiple versions
Image build software platform MacOS
Amazon Linux
Ready for centralized building
Private address allocation Fixed
Flexible
Instead of specifying multiple interdependent address ranges, specify one address space for AWS IP Address Manager (IPAM) to divide
Lambda function tests In files
Central, shared registry
Realistic, centrally‑executed tests (see shareable Lambda test)

Jump to: RecommendationsLicenses

Installation

  1. Choose between AWS CloudShell or an EC2 instance for building the Docker image and running Terraform.

    • CloudShell
      Easy

      • Authenticate to the AWS Console. Use a non-production AWS account and a privileged role.

      • Open an AWS CloudShell terminal.

      • Prepare for a cross-platform container image build. CloudShell seems to provide Intel CPUs. The following instructions are from "Multi-platform builds" in the Docker Build manual.

        sudo docker buildx create --name 'container-builder' --driver 'docker-container' --bootstrap --use
        
        sudo docker run --privileged --rm 'tonistiigi/binfmt' --install all
        
      • Review the Terraform S3 backend documentation and create an S3 bucket to store Terraform state.

      • If at any time you find that your previous CloudShell session has expired, repeat any necessary software installation steps. Your home directory is preserved between sessions, subject to CloudShell persistent storage limitations.

    • EC2 instance

      EC2 instructions...
      • Create and/or connect to an EC2 instance. I recommend:

        • arm64
        • t4g.micro ⚠ The ARM-based AWS Graviton g architecture avoids multi-platform build complexity.
        • Amazon Linux 2023
        • A 30 GiB EBS volume, with default encryption (supports hibernation)
        • No key pair; connect through Session Manager
        • A custom security group with no ingress rules (yay for Session Manager!)
        • A sched-stop = d=_ H:M=07:00 tag for automatic nightly shutdown (this example corresponds to midnight Pacific Daylight Time) with sqlxpert/lights-off-aws
      • During the instance creation workflow (Advanced details → IAM instance profile → Create new IAM profile) or afterward, give your EC2 instance a custom role. Terraform must be able to list/describe, get tags for, create, tag, untag, update, and delete all of the AWS resource types included in this project's .tf files.

      • Update operating system packages (thanks to AWS's deterministic upgrade philosophy, there shouldn't be any updates if you chose the latest Amazon Linux 2023 image), install Docker, and start it.

        sudo dnf check-update
        sudo dnf --releasever=latest update
        sudo dnf install docker
        sudo systemctl start docker
  2. Install Terraform. I'm standardizing on Terraform v1.10.0 (2024-11-27) as the minimum supported version for my open-source projects.

    sudo dnf --assumeyes install 'dnf-command(config-manager)'
    
    sudo dnf config-manager --add-repo 'https://rpm.releases.hashicorp.com/AmazonLinux/hashicorp.repo'
    # sudo dnf --assumeyes install terraform-1.10.0-1
    sudo dnf --assumeyes install terraform
    
  3. Clone this repository and create terraform.tfvars to customize variables.

    git clone 'https://github.com/sqlxpert/docker-python-openapi-kafka-terraform-cloudformation-aws.git' ~/docker-python-openapi-kafka
    cd ~/docker-python-openapi-kafka/terraform
    touch terraform.tfvars
    
    Generate a terraform.tfvars skeleton...
    # Requires an up-to-date GNU sed (not the MacOS default!)
    sed --regexp-extended --silent  \
        --expression='s/^variable "(.+)" \{$/\n\n# \1 =/p' \
        --expression='s/^  description = "(.+)"$/#\n# \1/p' \
        --expression='s/^  default = (.+)$/#\n# Default: \1/p' variables.tf
    

    Optional: To save money while building the Docker container image, set hello_api_aws_ecs_service_desired_count_tasks = 0 and create_vpc_endpoints_and_load_balancer = false .

  4. In CloudShell (optional if you chose EC2), create an override file to configure your Terraform S3 backend.

    cat > terraform_override.tf << 'EOF'
    terraform {
      backend "s3" {
        insecure = false
    
        region = "RegionCodeForYourS3Bucket"
        bucket = "NameOfYourS3Bucket"
        key    = "DesiredTerraformStateFileName"
    
        use_lockfile = true # No more DynamoDB; now S3-native!
      }
    }
    EOF
    
  5. Initialize Terraform and create the AWS infrastructure. There's no need for a separate terraform plan step. terraform apply outputs the plan and gives you a chance to approve before anything is done. If you don't like the plan, don't type yes !

    terraform init
    
    terraform apply -target='aws_vpc_ipam_pool_cidr_allocation.hello_api_vpc_private_subnets' -target='aws_vpc_ipam_pool_cidr_allocation.hello_api_vpc_public_subnets'
    
    About this two-stage process...

    CloudPosse's otherwise excellent dynamic-subnets module isn't dynamic enough to co-operate with AWS IP Address Manager (IPAM), so you have to let IPAM finalize subnet IP address range allocations beforehand.

    terraform apply
    
    In case of an "already exists" error...
    • If you receive a "Registry with name lambda-testevent-schemas already exists" error, set create_lambda_testevent_schema_registry = false , then run terraform apply again.
  6. Set environment variables needed for building, tagging and pushing up the Docker container image, then build it.

    AMAZON_LINUX_BASE_VERSION=$(terraform output -raw 'amazon_linux_base_version')
    AMAZON_LINUX_BASE_DIGEST=$(terraform output -raw 'amazon_linux_base_digest')
    AWS_ECR_REGISTRY_REGION=$(terraform output -raw 'hello_api_aws_ecr_registry_region')
    AWS_ECR_REGISTRY_URI=$(terraform output -raw 'hello_api_aws_ecr_registry_uri')
    AWS_ECR_REPOSITORY_URL=$(terraform output -raw 'hello_api_aws_ecr_repository_url')
    HELLO_API_AWS_ECR_IMAGE_TAG=$(terraform output -raw 'hello_api_aws_ecr_image_tag')
    
    HELLO_API_DOMAIN_NAME=$(terraform output -raw 'hello_api_load_balander_domain_name') # For later
    
    aws ecr get-login-password --region "${AWS_ECR_REGISTRY_REGION}" | sudo docker login --username 'AWS' --password-stdin "${AWS_ECR_REGISTRY_URI}"
    
    cd ../python_docker
    
    sudo docker buildx build --build-arg AMAZON_LINUX_BASE_VERSION="${AMAZON_LINUX_BASE_VERSION}" --build-arg AMAZON_LINUX_BASE_DIGEST="${AMAZON_LINUX_BASE_DIGEST}" --platform='linux/arm64' --tag "${AWS_ECR_REPOSITORY_URL}:${HELLO_API_AWS_ECR_IMAGE_TAG}" --output 'type=docker' .
    
    sudo docker push "${AWS_ECR_REPOSITORY_URL}:${HELLO_API_AWS_ECR_IMAGE_TAG}"
    
    Updating the container image...
    • You can select a newer Amazon Linux release by setting the amazon_linux_base_version and amazon_linux_base_digest variables in Terraform, running terraform apply , and re-setting the environment variables.

      Then, to re-build the image, run HELLO_API_AWS_ECR_IMAGE_TAG='1.0.1' (choose an appropriate new version number, taking semantic versioning into account) in the shell, repeat the build and push commands, set hello_api_aws_ecr_image_tag = "1.0.1" (for example) in Terraform, and run terraform apply one more time.

  7. If you changed Terraform variables at the end of Step 3, revert the changes and run terraform apply .

  8. In the Amazon Elastic Container Service section of the AWS Console, check the hello_api cluster. Eventually, you should see 2 tasks running.

    Container deployment delay...
    • It will take a few minutes for ECS to notice, and then deploy, the container image. Relax, and let it happen. If you are impatient, or if there is a problem, you can navigate to the hello_api service, open the orange "Update service" pop-up menu, and select "Force new deployment".
  9. Generate the URLs and then test your API.

    echo -e "curl --location --insecure 'http://${HELLO_API_DOMAIN_NAME}/"{'healthcheck','hello','current_time?name=Paul','current_time?name=;echo','error'}"'\n"
    

    Try the different URLs using your Web browser or curl --location --insecure (these options allow redirection and self-signed TLS certificates).

    Method Result Expected
    /healthcheck Empty response
    /hello Fixed greeting, in a JSON object
    /current_time?name=Paul Reflected greeting and timestamp, in a JSON object
    /current_time?name=;echo HTTP 400 "bad request" error;
    Demonstrates protection from command injection
    /error HTTP 404 "not found" error
    About redirection to HTTPS, and certificates...

    Your Web browser should redirect you from http: to https: and (let's hope!) warn you about the untrusted, self-signed TLS certificate used in this system (which of course is not tied to a pre-determined domain name). Proceed to view the responses from your new API...

    If your Web browser configuration does not allow accessing Web sites with untrusted certificates, change the enable_https variable in Terraform, run terraform apply twice (don't ask!), and http: links will work without redirection. After you have used https: with a particular site, your browser might no longer allow http: for that site. Try an alternate Web browser if necessary.

  10. Access the hello_api_ecs_task CloudWatch log group in the AWS Console. (hello_api_ecs_cluster is reserved for future use.)

    Periodic internal health checks, plus your occasional Web requests, should appear.

    API access log limitations...

    The Python connexion module, which I chose because it serves an API from a precise OpenAPI-format specification, uses uvicorn workers. Unfortunately, uvicorn has lousy log format customization support.

  11. If you don't wish use Kafka, skip to Step 13.

    If you wish to enable Kafka, set enable_kafka = true  and run terraform apply . AWS MSK is expensive, so enable Kafka only after confirming that the rest of the system is working for you.

    In case HelloApiKafkaConsumer CloudFormation stack creation fails...

    Creation of the Kafka consumer might fail for various reasons. Once the HelloApiKafkaConsumer CloudFormation stack is in ROLLBACK_COMPLETE status, delete it, then run terraform apply again.

  12. Access the /current_time?name=Paul method several times (adjust the name as you wish). The first use of this method prompts creation of the events Kafka topic. From now on, use of this method (not the others) will send a message to the events Kafka topic.

    The AWS MSK event source mapping reads from the Kafka topic and triggers the consumer Lambda function, which logs decoded Kafka messages to the HelloApiKafkaConsumer CloudWatch log group.

  13. Set the enable_kafka , hello_api_aws_ecs_service_desired_count_tasks and create_vpc_endpoints_and_load_balancer variables to their cost-saving values if you'd like to continue experimenting. When you are done, delete all resources; the minimum configuration carries a cost.

    cd ../terraform
    terraform state rm 'aws_schemas_registry.lambda_testevent'
    terraform apply -destroy
    Deletion delays and errors...
    • Deleting a VPC Lambda function takes a long time because of the network association; expect 30 minutes if enable_kafka was true .

    • Expect an error message about retiring KMS encryption key grants (harmless, in this case).

    • If you cancel and re-run terraform apply -destroy , a bug in CloudPosse's dynamic-subnets module might cause a "value depends on resource attributes that cannot be determined until apply" error. For a work-around, edit the cached module file indicated in the error message. Comment out the indicated line and force count = 0 . Be sure to revert this temporary patch later.

Comments

Artificial Intelligence and Large Language Models (LLMs)

This is my own original work, produced without the use of artificial intelligence (AI) and large language model (LLM) code generation. Code from other sources is acknowledged.

Long Option Names

I write long option names in my instructions so that other people don't have to look up unfamiliar single-letter options — assuming they can find them!

Here's an example that shows why I go to the trouble, even at the expense of being laughed at by macho Linux users. I started using UNICOS in 1991, so it's not for lack of experience.

Search for the literal text -t in docs.docker.com/reference/cli/docker/buildx/build , using Command-F, Control-F, / , or grep . Only 2 of 41 occurrences of -t are relevant!

Where available, full-text (that is, not strictly literal) search engines can't make sense of a 1-letter search term and are also likely to ignore a 2-character term as a "stop-word" that's too short to search for.

Recommendations

My professional and ethical commitment is simple: Only as much technology as a business...

  • needs,
  • can afford,
  • understands (or can learn), and
  • can maintain.

Having worked for startups since 2013, I always recommend focusing software engineering effort. It is not possible to do everything, let alone to be good at everything. Managed services, serverless technology, and low-code architecture free software engineers to focus on the core product, that is, on what the company actually sells. Avoid complex infrastructure and tooling unless it offers a unique, tangible, and substantial benefit. Simplicity pays!

Security is easier and cheaper to incorporate at the start than to graft on after the architecture has been finalized, the infrastructure has been templated and created, and the executable code has been written and deployed.

Specialized knowledge of the chosen cloud provider is indispensable. I call it "idiomatic" knowledge, a good part of which is awareness of the range of options supported by your cloud provider. Building generically would mean giving up some performance, some security, and some cloud cost savings. Optimizing later is difficult. "Lean to steer the ship you're on."

Licenses

Scope Link Included Copy
Source code, and source code in documentation GNU General Public License (GPL) 3.0 LICENSE_CODE.md
Documentation, including this ReadMe file GNU Free Documentation License (FDL) 1.3 LICENSE_DOC.md

Copyright Paul Marcelin

Contact: marcelin at cmu.edu (replace "at" with @)

About

Containerized Python API → Kafka → AWS Lambda consumer, via Terraform and CloudFormation

Topics

Resources

License

GPL-3.0, GFDL-1.3 licenses found

Licenses found

GPL-3.0
LICENSE_CODE.md
GFDL-1.3
LICENSE_DOC.md

Stars

Watchers

Forks