This repository contains components for a logging service using FastAPI and AWS services.
The FastAPI application receives log messages and forwards them to an SQS queue. It is designed to handle incoming requests efficiently.
The worker component consumes log messages from the SQS queue and stores them in a PostgreSQL database. It ensures that messages are processed and stored reliably.
A stress testing script using k6 is provided to simulate traffic to the FastAPI application:
- Ramp-up: Increases Virtual Users (VUs) to a specified amount over 1 minute.
- Steady-state: Maintains a constant load (specified VUs) for 3 minutes.
- Ramp-down: Decreases VUs to 0 over 1 minute.
The VUs send one log after another to the service as soon as they receive a response from it.
The CDK script automates the deployment of the entire system. It includes provisions for deploying the FastAPI application, Worker, SQS, and PostgreSQL, along with a CloudWatch dashboard for monitoring key metrics.
Ensure you have the following installed on your local machine:
-
Clone the Repository:
git clone <repository-url> cd <repository-directory>
-
Run docker-compose
Navigate to the root directory of your project where docker-compose.yml is located, and execute the following command:
docker-compose build
docker-compose up
This command will start all the services required. The API will be available at localhost:8000/
Ensure you have the following installed and configured:
-
Set AWS Credentials:
Export your AWS credentials as environment variables:
export AWS_ACCESS_KEY_ID=<your-access-key-id> export AWS_SECRET_ACCESS_KEY=<your-secret-access-key> export AWS_DEFAULT_REGION=<your-aws-region>
-
Navigate to CDK Folder:
Go to the directory where the CDK project is located:
cd cdk -
Bootstrap your AWS environment:
cdk bootstrap
-
Set Docker Platform (for Mac M1/M2): If you are using a Mac M1 or M2 with Docker, export the Docker default platform.
export DOCKER_DEFAULT_PLATFORM="linux/amd64"
-
Create and Activate Virtual Environment:
Create a virtual environment and activate it:
python3 -m venv .env source .env/bin/activate -
Install Requirements:
Install the required Python dependencies:
pip install -r requirements.txt
-
Deploy Using CDK:
Deploy your application using CDK:
cdk deploy
-
To know the IP of the FastAPI application:
Run the following command
aws elbv2 describe-load-balancers --query "LoadBalancers[*].[LoadBalancerName,DNSName]" --output tableLook for the load balancer associated with your CDK stack and resolve the DNS name with
nslookup <your-load-balancer-dns-name>
Refer to the stress-test README
Currently, the CDK setup includes only one public subnet. For better security and architecture design:
- The Load Balancer for the FastAPI application should be in a public subnet.
- All other resources, including the ECS tasks and the database, should be placed in private subnets.
To ensure the reliability and maintainability of the application:
- Unit Tests: Write unit tests for individual functions and components to verify their behavior in isolation.
- Integration Tests: Write integration tests to ensure that different parts of the application work together as expected. This includes testing the FastAPI endpoints, the SQS integration, and the database interactions.
To maintain performance and reliability under varying loads:
- Define thresholds for CPU, memory usage and messages in the SQS to trigger auto-scaling actions for the FastAPI and worker services.
- Set up CloudWatch alarms to monitor these metrics and trigger scaling policies.
- Configure notifications to alert when thresholds are breached or scaling actions are performed.
The current project lacks an endpoint to retrieve logs after they have been processed and stored. Implementing a read endpoint for logs with Redis as a cache can provide fast access to recent logs, reducing the load on the PostgreSQL database.
The simplest read endpoint would allow fetching a log by its ID. However, this approach may not be very practical since the endpoint to create logs currently does not return an ID.
A more practical solution would be to retrieve the last N logs sorted by timestamp in descending order. Additionally, the endpoint could support an offset parameter for pagination.
Possible solutions may involve:
To optimize performance:
- Store the last N minutes of logs in Redis with keys that expire when the log timestamp is greater than N minutes from now.
- Use Redis to fetch logs based on timestamp ranges.
- If Redis does not contain the complete dataset, fall back to querying PostgreSQL.
Another approach is to maintain a sorted set in Redis where each log entry is scored by its timestamp. This allows efficient retrieval of the last M logs without needing to query PostgreSQL unless necessary.