Lambda CSV Processor

This project implements an AWS Lambda function that processes a CSV file uploaded to an S3 bucket and updates a customer master database table. The implementation uses Java 11, Spring Boot, AWS SDK v2, and OpenCSV for parsing.

Project Structure

src/main/java/com/example/ - Main application code
- config/ - AWS and application configuration
- function/ - Lambda function handler for S3 events
- model/ - JPA entity classes
- repository/ - Spring Data JPA repositories
- service/ - Business logic for CSV processing
template.yaml - AWS SAM template for deployment

Features

Streaming CSV processing for memory efficiency with large files (300,000+ rows)
Batch inserts for performance (configurable batch size of 10,000)
Transaction management for atomicity
Comprehensive error handling and logging
Custom Lambda handler implementation for AWS Lambda compatibility

Prerequisites

Java 11
Maven
AWS SAM CLI
AWS CLI
AWS Account with appropriate permissions

Setup AWS CLI

Run aws configure to set up your AWS credentials Example:

aws configure

input your credentials and region Example:

aws configure
AWS Access Key ID [None]: YOUR_ACCESS_KEY
AWS Secret Access Key [None]: YOUR_SECRET_KEY
Default region name [None]: ap-northeast-1
Default output format [None]: json

Build & Deploy

1. Build the Lambda Function

mvn clean package

This will create a fat JAR file in target/lambda-csv-processor-1.0.0.jar.

2. Deploy to AWS using SAM

sam deploy --guided

During the guided deployment, you'll be prompted for parameters including:

Stack Name - Choose a name for your CloudFormation stack
AWS Region - The AWS region to deploy to
Parameter BucketName - Name of the S3 bucket to use (existing or to be created)
Parameter DatabaseUrl - JDBC URL for your PostgreSQL database
Parameter DatabaseUsername - Username for database access
Parameter DatabasePassword - Password for database access (this will be securely handled and not displayed)
Confirm changes before deploy - Recommended to set to "Y" to review changes
Allow SAM CLI IAM role creation - Required permissions for Lambda deployment
Disable rollback - Whether to disable rollback if errors occur

This approach keeps your database credentials secure by avoiding hardcoded values in your source code or template files.

3. Lambda Configuration

The SAM template automatically configures the Lambda function with the parameters you provide during deployment. The environment variables are set as follows:

Environment:
  Variables:
    SPRING_PROFILES_ACTIVE: default
    SPRING_DATASOURCE_URL: !Ref DatabaseUrl
    SPRING_DATASOURCE_USERNAME: !Ref DatabaseUsername
    SPRING_DATASOURCE_PASSWORD: !Ref DatabasePassword
    CUSTOM_REGION: !Ref AwsRegion
    SPRING_CLOUD_FUNCTION_DEFINITION: processCsvFile

The parameters are securely handled and never exposed in plain text after deployment.

4. Configure S3 Bucket Notifications

For Existing Buckets

If you're using an existing bucket, CloudFormation cannot directly configure S3 event notifications on it. After deploying the stack, you need to manually configure the bucket to trigger the Lambda function when CSV files are uploaded:

Using AWS CLI:

aws s3api put-bucket-notification-configuration \
  --bucket YOUR_BUCKET_NAME \
  --notification-configuration '{
    "LambdaFunctionConfigurations": [
      {
        "LambdaFunctionArn": "YOUR_LAMBDA_FUNCTION_ARN",
        "Events": ["s3:ObjectCreated:*"],
        "Filter": {
          "Key": {
            "FilterRules": [
              {
                "Name": "suffix",
                "Value": ".csv"
              }
            ]
          }
        }
      }
    ]
  }'

example:

aws s3api put-bucket-notification-configuration \
  --bucket my-csv-bucket-6-6 \
  --notification-configuration '{"LambdaFunctionConfigurations":[{"LambdaFunctionArn":"arn:aws:lambda:ap-northeast-1:446556758604:function:sam-csv-test-CsvProcessorFunction-QeEHqTHO9oIn","Events":["s3:ObjectCreated:*"],"Filter":{"Key":{"FilterRules":[{"Name":"suffix","Value":".csv"}]}}}]}'

Replace:

YOUR_BUCKET_NAME with the name of your existing S3 bucket
YOUR_LAMBDA_FUNCTION_ARN with the ARN of the deployed Lambda function (available in the CloudFormation stack outputs)

Using AWS Management Console:

Navigate to Amazon S3 in the AWS Console
Select your bucket and go to "Properties"
Scroll down to "Event notifications" and click "Create event notification"
Configure the event:
- Enter a name for the event
- Select "All object create events" under "Event types"
- Under "Destination", select "Lambda function"
- Choose your deployed Lambda function
- Under "Filter", enter ".csv" as a suffix
Click "Save changes"

5. Test the Lambda Function in AWS

After deployment and notification configuration, you can test the function by uploading a CSV file to the configured S3 bucket. The Lambda function will be triggered automatically.

Performance Considerations

The application processes CSV files in streaming mode to handle large files
Batch inserts (10,000 records per batch) are used for better database performance
Hibernate is configured with batch inserts for optimal database performance
For very large files, the Lambda function's memory allocation can be increased in the SAM template

AWS Lambda Configuration

Important configurations in the Lambda function:

Custom Handler: Using a custom handler class com.example.function.LambdaHandler to process S3 events
Memory Allocation: The default is set to 1024MB, but may need to be increased for large CSV files
Timeout: Set to 15 minutes (900 seconds) to ensure enough time for processing large files
IAM Roles: The Lambda function needs permissions to:
- Read from S3
- Write to CloudWatch logs
- Connect to your RDS database

Monitoring and Logs

To monitor your Lambda function's performance and debug issues:

# using sam cli
sam logs -n CsvProcessorFunction --stack-name sam-customers-csv --tail

Best Practices

IAM Roles: Use the principle of least privilege for Lambda IAM roles
Database Connection: Use connection pooling and consider database proxies for high-volume scenarios
Error Handling: Implement comprehensive error handling and notifications
Monitoring: Set up CloudWatch alarms for errors and performance thresholds
Security: Ensure all sensitive data in environment variables is encrypted
Cost Optimization: Review Lambda execution time and memory usage to optimize costs

Additional Configuration

Database Schema

The Lambda function expects a PostgreSQL database with the following schema:

CREATE TABLE customer_master (
  customer_code VARCHAR(4) PRIMARY KEY,
  customer_name VARCHAR(50) NOT NULL
);

IAM Permissions

The Lambda function uses these AWS managed policies in the SAM template:

AmazonS3ReadOnlyAccess - Provides read access to S3 buckets for CSV file processing
AmazonRDSFullAccess - Enables database connectivity for customer data updates

These managed policies are applied in the template.yaml file.

Additional Notes

The application handles large CSV files (300,000+ rows) by streaming and batch processing
Batch processing occurs every 10,000 records for optimal performance
Transactions ensure all-or-nothing updates to maintain data integrity
Error handling captures and logs issues with CSV parsing or database operations
The custom Lambda handler initializes the Spring context once per container lifecycle

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
src		src
.gitignore		.gitignore
README.md		README.md
pom.xml		pom.xml
template.yaml		template.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Lambda CSV Processor

Project Structure

Features

Prerequisites

Setup AWS CLI

Build & Deploy

1. Build the Lambda Function

2. Deploy to AWS using SAM

3. Lambda Configuration

4. Configure S3 Bucket Notifications

For Existing Buckets

5. Test the Lambda Function in AWS

Performance Considerations

AWS Lambda Configuration

Monitoring and Logs

Best Practices

Additional Configuration

Database Schema

IAM Permissions

Additional Notes

About

Uh oh!

Releases

Packages

Languages

gilberttaj/lambda-CSV-upload

Folders and files

Latest commit

History

Repository files navigation

Lambda CSV Processor

Project Structure

Features

Prerequisites

Setup AWS CLI

Build & Deploy

1. Build the Lambda Function

2. Deploy to AWS using SAM

3. Lambda Configuration

4. Configure S3 Bucket Notifications

For Existing Buckets

5. Test the Lambda Function in AWS

Performance Considerations

AWS Lambda Configuration

Monitoring and Logs

Best Practices

Additional Configuration

Database Schema

IAM Permissions

Additional Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages