This project implements an AWS Lambda function that processes a CSV file uploaded to an S3 bucket and updates a customer master database table. The implementation uses Java 11, Spring Boot, AWS SDK v2, and OpenCSV for parsing.
- src/main/java/com/example/ - Main application code
- config/ - AWS and application configuration
- function/ - Lambda function handler for S3 events
- model/ - JPA entity classes
- repository/ - Spring Data JPA repositories
- service/ - Business logic for CSV processing
- template.yaml - AWS SAM template for deployment
- Streaming CSV processing for memory efficiency with large files (300,000+ rows)
- Batch inserts for performance (configurable batch size of 10,000)
- Transaction management for atomicity
- Comprehensive error handling and logging
- Custom Lambda handler implementation for AWS Lambda compatibility
- Java 11
- Maven
- AWS SAM CLI
- AWS CLI
- AWS Account with appropriate permissions
- Run aws configure to set up your AWS credentials Example:
aws configure
input your credentials and region Example:
aws configure
AWS Access Key ID [None]: YOUR_ACCESS_KEY
AWS Secret Access Key [None]: YOUR_SECRET_KEY
Default region name [None]: ap-northeast-1
Default output format [None]: json
mvn clean package
This will create a fat JAR file in target/lambda-csv-processor-1.0.0.jar
.
sam deploy --guided
During the guided deployment, you'll be prompted for parameters including:
- Stack Name - Choose a name for your CloudFormation stack
- AWS Region - The AWS region to deploy to
- Parameter BucketName - Name of the S3 bucket to use (existing or to be created)
- Parameter DatabaseUrl - JDBC URL for your PostgreSQL database
- Parameter DatabaseUsername - Username for database access
- Parameter DatabasePassword - Password for database access (this will be securely handled and not displayed)
- Confirm changes before deploy - Recommended to set to "Y" to review changes
- Allow SAM CLI IAM role creation - Required permissions for Lambda deployment
- Disable rollback - Whether to disable rollback if errors occur
This approach keeps your database credentials secure by avoiding hardcoded values in your source code or template files.
The SAM template automatically configures the Lambda function with the parameters you provide during deployment. The environment variables are set as follows:
Environment:
Variables:
SPRING_PROFILES_ACTIVE: default
SPRING_DATASOURCE_URL: !Ref DatabaseUrl
SPRING_DATASOURCE_USERNAME: !Ref DatabaseUsername
SPRING_DATASOURCE_PASSWORD: !Ref DatabasePassword
CUSTOM_REGION: !Ref AwsRegion
SPRING_CLOUD_FUNCTION_DEFINITION: processCsvFile
The parameters are securely handled and never exposed in plain text after deployment.
If you're using an existing bucket, CloudFormation cannot directly configure S3 event notifications on it. After deploying the stack, you need to manually configure the bucket to trigger the Lambda function when CSV files are uploaded:
Using AWS CLI:
aws s3api put-bucket-notification-configuration \
--bucket YOUR_BUCKET_NAME \
--notification-configuration '{
"LambdaFunctionConfigurations": [
{
"LambdaFunctionArn": "YOUR_LAMBDA_FUNCTION_ARN",
"Events": ["s3:ObjectCreated:*"],
"Filter": {
"Key": {
"FilterRules": [
{
"Name": "suffix",
"Value": ".csv"
}
]
}
}
}
]
}'
example:
aws s3api put-bucket-notification-configuration \
--bucket my-csv-bucket-6-6 \
--notification-configuration '{"LambdaFunctionConfigurations":[{"LambdaFunctionArn":"arn:aws:lambda:ap-northeast-1:446556758604:function:sam-csv-test-CsvProcessorFunction-QeEHqTHO9oIn","Events":["s3:ObjectCreated:*"],"Filter":{"Key":{"FilterRules":[{"Name":"suffix","Value":".csv"}]}}}]}'
Replace:
YOUR_BUCKET_NAME
with the name of your existing S3 bucketYOUR_LAMBDA_FUNCTION_ARN
with the ARN of the deployed Lambda function (available in the CloudFormation stack outputs)
Using AWS Management Console:
- Navigate to Amazon S3 in the AWS Console
- Select your bucket and go to "Properties"
- Scroll down to "Event notifications" and click "Create event notification"
- Configure the event:
- Enter a name for the event
- Select "All object create events" under "Event types"
- Under "Destination", select "Lambda function"
- Choose your deployed Lambda function
- Under "Filter", enter ".csv" as a suffix
- Click "Save changes"
After deployment and notification configuration, you can test the function by uploading a CSV file to the configured S3 bucket. The Lambda function will be triggered automatically.
- The application processes CSV files in streaming mode to handle large files
- Batch inserts (10,000 records per batch) are used for better database performance
- Hibernate is configured with batch inserts for optimal database performance
- For very large files, the Lambda function's memory allocation can be increased in the SAM template
Important configurations in the Lambda function:
-
Custom Handler: Using a custom handler class
com.example.function.LambdaHandler
to process S3 events -
Memory Allocation: The default is set to 1024MB, but may need to be increased for large CSV files
-
Timeout: Set to 15 minutes (900 seconds) to ensure enough time for processing large files
-
IAM Roles: The Lambda function needs permissions to:
- Read from S3
- Write to CloudWatch logs
- Connect to your RDS database
To monitor your Lambda function's performance and debug issues:
# using sam cli
sam logs -n CsvProcessorFunction --stack-name sam-customers-csv --tail
- IAM Roles: Use the principle of least privilege for Lambda IAM roles
- Database Connection: Use connection pooling and consider database proxies for high-volume scenarios
- Error Handling: Implement comprehensive error handling and notifications
- Monitoring: Set up CloudWatch alarms for errors and performance thresholds
- Security: Ensure all sensitive data in environment variables is encrypted
- Cost Optimization: Review Lambda execution time and memory usage to optimize costs
The Lambda function expects a PostgreSQL database with the following schema:
CREATE TABLE customer_master (
customer_code VARCHAR(4) PRIMARY KEY,
customer_name VARCHAR(50) NOT NULL
);
The Lambda function uses these AWS managed policies in the SAM template:
- AmazonS3ReadOnlyAccess - Provides read access to S3 buckets for CSV file processing
- AmazonRDSFullAccess - Enables database connectivity for customer data updates
These managed policies are applied in the template.yaml file.
- The application handles large CSV files (300,000+ rows) by streaming and batch processing
- Batch processing occurs every 10,000 records for optimal performance
- Transactions ensure all-or-nothing updates to maintain data integrity
- Error handling captures and logs issues with CSV parsing or database operations
- The custom Lambda handler initializes the Spring context once per container lifecycle