Airflow pipeline for data reading, cleansing, aggregating, saving, and emailing the results!
- Read the raw_store_transactions.csv
- Clean data
- Remove special charaters in the STORE_LOCATION column
- Remove special characters in the PRODUCT_ID column
- Create a MySQL table with proper schema for the cleaned store transaction data
- Insert the cleaned store transaction data to the created MySQL table
- Calculate store-wise and location-wise profit for yesterday and save the results in CSV files
- Rename the output files by adding target date
- Send an email with the two output files attached
- Rename the raw_store_transaction.csv file by adding yeserday date
- Python 3.7+
- Docker
-
Download and install Docker if you haven't yet. Docker Download
-
Clone the repo and go to the root directory
-
Open
docker-compose-localExecutor.yml
and updateAIRFLOW__SMTP__SMTP_USER
andAIRFLOW__SMTP__SMTP_PASSWORD
with the proper sender email -
Create a new image by run docker-compose file in terminal and installing necessary containers
➜ docker-compose -f ./docker-compose-localExecutor.yml up -d
-
Make sure proper containers are running
➜ docker ps
-
Open Airflow web server by visiting
https://localhost:8080
-
Create MySQL connection by going to Admin> Connections and typing the following information. You have to create connection whenever restarting the web server.
- To check the tables in MySQL, grab the mysql container id from
docker ps
and rundocker exec -it {container_id} bash
.mysql -u root -p
,use mysql
, andshow tables;
- Turn on the
store_dag
DAG and trigger the dag. - Get an email with processed dataset attached!