Serverless CSV-Processor is a serverless application that processes large CSV files stored in an Amazon S3 bucket, maps the headers according to a configuration file, and pushes the records to an Amazon SQS queue. This project is built using AWS Lambda and is written in Golang.
- Triggered automatically when a new CSV file is uploaded to the S3 bucket
- Processes large CSV files without loading the entire file into memory
- Efficient processing using concurrent workers
- Maps CSV headers to desired output format using a configuration file
- Pushes processed records to an Amazon SQS queue for further processing
- Moves processed CSV files to an archive folder specified in the configuration file
- Go 1.11 or later
- AWS CLI
- AWS account with access to Lambda, S3, and SQS services
- aws-lambda-go package
The project is organized into the following directory structure and files:
serverless-csv-processor/
├── cmd/
│ └── lambda/
│ └── main.go
├── internal/
│ ├── config/
│ │ └── config.go
│ ├── csv/
│ │ └── csv.go
│ ├── handler/
│ │ └── handler.go
│ └── sqs/
│ └── sqs.go
└── go.mod
cmd/lambda/main.go
: This is the entry point for the AWS Lambda function. It imports thehandler
package and starts the Lambda function with theHandleS3Event
function.internal/
: This directory contains the internal packages that implement the core functionality of the Lambda function.config/
: This package handles the parsing and loading of the configuration file from the S3 bucket. It contains theconfig.go
file which defines theConfig
struct and provides theLoadConfig
function.csv/
: This package is responsible for processing CSV files. It contains thecsv.go
file, which provides functions for reading CSV files from S3, parsing them, and applying the header mapping from the configuration file.handler/
: This package contains thehandler.go
file, which implements the mainHandleS3Event
function that is triggered by the S3 event. It uses the other internal packages to download and parse the configuration file, process the CSV file, and send the data to an SQS queue.sqs/
: This package is responsible for sending data to an SQS queue. It contains thesqs.go
file, which provides theProcessRows
function that sends each row from the processed CSV file to the specified SQS queue.go.mod
: This file defines the Go module and its dependencies.
- Clone the repository to your local machine.
git clone https://github.com/username/ServerlessCSVProcessor.git
cd ServerlessCSVProcessor
- Initialize a new Go module by choosing a module name, which is usually the import path for your project. The module name should be unique to avoid conflicts with other projects. A common convention is to use your repository URL, like
github.com/username/project-name
. Replace<module-name>
with a suitable name for your module, such asgithub.com/johndoe/ServerlessCSVProcessor
.
go mod init <module-name>
- Download and install the required dependencies, including the
aws-lambda-go
package.
go get -u github.com/aws/aws-lambda-go
go get -u github.com/aws/aws-sdk-go/aws
go get -u github.com/aws/aws-sdk-go/aws/session
go get -u github.com/aws/aws-sdk-go/service/s3
go get -u github.com/aws/aws-sdk-go/service/s3/s3manager
go get -u github.com/aws/aws-sdk-go/service/sqs
- Compile the
main.go
file to create the Lambda binary.
GOOS=linux GOARCH=amd64 go build -o main main.go
- Follow the deployment instructions in the Deployment section to set up your serverless CSV processor on AWS.
- Upload your CSV file to the configured S3 bucket.
- Add a
config.json
file to the same S3 bucket, containing the header mappings and SQS queue URL. For example:
{
"header_mapping": {
"OriginalHeader1": "MappedHeader1",
"OriginalHeader2": "MappedHeader2"
},
"sqs_queue_url": "https://sqs.region.amazonaws.com/your-account-id/your-queue-name",
"archive_folder": "archive/2023"
}
- Once the CSV file is uploaded, the Lambda function will be triggered automatically, processing the CSV and pushing the records to the specified SQS queue.
AWS Management Console: You can manually create and configure the necessary AWS resources through the AWS Management Console:
- Create an S3 bucket to store your CSV files and the configuration file.
- Create an SQS queue to receive the parsed CSV records.
- Compile your
main.go
file to create the Lambda binary. - Create a Lambda function, set the runtime to "Go", and upload the compiled binary.
- In the Lambda function configuration, add an S3 trigger with the appropriate event type (e.g., "ObjectCreated"), and specify the S3 bucket you created earlier.
- Configure the necessary IAM roles and permissions for Lambda, S3, and SQS.
There are multiple ways to deploy the serverless CSV processor to AWS. Refer to the AWS Lambda Deployment documentation for a detailed guide on deployment methods.
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
The project is available as open source under the terms of the MIT License.