Architecture

Tech Stack

Terraform
Github actions (CI/CD)
AWS Glue Data Catalog
AWS Glue Crawler
AWS Glue Trigger
AWS Glue Classifier
AWS Glue ETL Job
AWS Lambda
AWS Eventbridge
Amazon S3
Amazon Athena
SQL
Python

Overwiew

In this project, I have create an ETL Job on AWS using Terraform. The project extract data from an API (Zillow) which are data from real estate, then, process it using AWS ETL Glue Job with Spark. Data is extracted from the API using lambda function which is scheduled to run every day. At the end, the data is stored in an s3 bucket in a JSON format.

The AWS Crawler then crawl the data and create a table in glue data catalog, then use AWS ETL job with Spark to process that real estate data and build a report to showcast, for each state, country the price per sqft.

For more information you can check this meduim article :

How I build an ETL pipeline with AWS Glue, Lambda and Terraform

Name		Name	Last commit message	Last commit date
Latest commit History 81 Commits
.github/workflows		.github/workflows
Infra		Infra
env		env
etl		etl
images		images
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
requirements.txt		requirements.txt
terraform.tfstate		terraform.tfstate

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Architecture

Tech Stack

Overwiew

About

Releases

Packages

Contributors 2

Languages

g-lorena/aws_etl_pipeline

Folders and files

Latest commit

History

Repository files navigation

Architecture

Tech Stack

Overwiew

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages