Generating reports on a schedule is a need for many industries. It's a repetitive task that can be time consuming and subject to human error.
This is the problem I'm gonna be solving with this project.
Automating report generation saves employees time and assures all reports are produced the same safe and tested way avoiding human error and guaranteeing data and report quality. This way the employees can focus their time and attention on other more important matters.
This project has two main cores:
AWS Lambda service that runs on a schedule to fetch data from external API and upload it to an S3 bucket.
AWS Lambda service that runs based on the event of new data landing on the S3 bucket that will trigger a glue crawler so that our data is available to be queried from aws Athena.
We have data from the CoinGecko API comming in daily to an AWS S3 bucket in csv format.
As soon as this data comes into the bucket, it triggers an AWS Glue Crawler.
This crawler crawls the data and creates/updates a glue database and table.
Once this data is catalogued by AWS Glue it can be queried from AWS Athena.
-
Create glue crawler on template to run everytime new data comes in S3 to create/update glue databaseOK! -
Set up Athena for reading data from S3 using database created by crawlerOK! -
Create dashboard to feed from data using Athena
This project was made using the AWS SAM CLI.
To reproduce it, you need to:
-
create a python 3.9 virtual environment (Python 3.9 needed):
py -3.9 -m venv venv
-
Validate the
template.yaml
file:sam validate
orsam validate --lint
-
Build the application:
sam build
-
Deploy the application:
sam deploy
orsam deploy --guided
(to pass the env vars)
You can find more details on SAM CLI commands here.
CoinGecko API for the data provided.