This project, titled "Snowflake Internal Stage Data Pipeline," focuses on developing a robust data ingestion pipeline from a MySQL database to Snowflake utilizing Snowflake's Internal Stage and orchestrated with Airflow. The pipeline is designed to perform incremental data loads, enhancing efficiency and reducing data transfer volumes. A metadata-driven approach is employed to streamline and automate the data ingestion process.
- Incremental Data Loading: Efficiently transfers only new or updated records from MySQL to Snowflake, minimizing data transfer and processing time.
- Metadata-Driven Pipeline: Leverages metadata to dynamically configure and execute data ingestion tasks, reducing the need for hard-coded configurations.
- Airflow Orchestration: Utilizes Apache Airflow to manage workflow orchestration, scheduling, and monitoring, ensuring reliable execution of data loading processes.
- Snowflake Integration: Employs Snowflake's Internal Stage for secure and scalable data staging before ingestion into the target tables.
- MySQL: Source database for extracting data.
- Snowflake: Target cloud data warehouse for analytics.
- Apache Airflow: Workflow orchestration tool to manage the data pipeline.
- Python: The primary programming language for scripting and automation.
![image](https://private-user-images.githubusercontent.com/25386607/312123798-ccfbea03-5f73-41b0-a9ed-5b6eceb2a5c8.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTgzOTkyMDMsIm5iZiI6MTcxODM5ODkwMywicGF0aCI6Ii8yNTM4NjYwNy8zMTIxMjM3OTgtY2NmYmVhMDMtNWY3My00MWIwLWE5ZWQtNWI2ZWNlYjJhNWM4LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA2MTQlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNjE0VDIxMDE0M1omWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTAxZmVkMjY3OWYxMTZmMjlhZmY1NDNkNzA5M2FmMTkwOGE3MmM2NzA2NzQ1NmI2MzUzMjQwYzc5NmU0ZGQ2ZGUmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.1_NUPdDSHbAWEeBul2QJPgQ1cV8NxpLOK9RYZ9U8pUY)
There are 2 tables which are getting ingested to Snowflake.
Dedicated Dags are developed for each Table
- amazone_books
- amazonebook_reviews
CREATE TABLE amazone_books ( book_id INT NOT NULL AUTO_INCREMENT ,book_title TEXT ,book_amount FLOAT ,book_author TEXT ,book_rating FLOAT ,book_link TEXT ,business_date DATE DEFAULT(CURRENT_DATE) ,PRIMARY KEY (book_id) );CREATE TABLE amazonebook_reviews ( book_id INT NOT NULL ,reviewer_name TEXT ,rating FLOAT ,review_title TEXT ,review_content TEXT ,reviewed_on DATE ,business_date DATE DEFAULT(CURRENT_DATE) );
For the Incremental load. Primary Keys are required in the Tables. Respective Primary key for the Table are
- amazone_books
- book_id
- amazonebook_reviews
- book_id
- reviewer_name
- business_date
Note: This Source Data is from another Project. To know more about how source data is generated please refer AmazonBooks_DataPipeline
The Airflow Dag Ids for respective Tables are
- amazone_books
- Snowflake_InternalStage_amazone_books_Dag
- amazonebook_reviews
- Snowflake_InternalStage_amazonebook_review_Dag
- Workflow Design: A DAG (Directed Acyclic Graph) in Airflow defines the sequence of tasks for data extraction, staging, and loading.
- Incremental Loading: The pipeline identifies new or updated records in MySQL using timestamps or sequence IDs.
- Metadata Management: Metadata definitions (i.e Config) guide the extraction and loading processes, allowing for flexibility and scalability.
- Monitoring and Logging: Airflow provides comprehensive monitoring and logging capabilities, facilitating troubleshooting and performance optimization.
- Data Quality Checks: Implement additional data validation and quality checks within the pipeline.
- Advanced Scheduling: Enhance scheduling capabilities to support more complex dependencies and triggers.
![image](https://private-user-images.githubusercontent.com/25386607/310148365-f14d9492-fdc6-4670-b56b-75303751392e.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTgzOTkyMDMsIm5iZiI6MTcxODM5ODkwMywicGF0aCI6Ii8yNTM4NjYwNy8zMTAxNDgzNjUtZjE0ZDk0OTItZmRjNi00NjcwLWI1NmItNzUzMDM3NTEzOTJlLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA2MTQlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNjE0VDIxMDE0M1omWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWJjNWI2MDkxZTZiYjQyZGU5MjE2NDczNzg3ZWJkMGY3MzIxMzFiZTBiZTVkNDdhZDk1ZmQwMzE0YTk3Y2M2MjgmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.ptyrjPD_AHcZ3XN3e47nPS9PxGsgnJs07auIOwkaZgw)
![image](https://private-user-images.githubusercontent.com/25386607/310152368-88cf3b74-3387-4e1e-9309-c550ee13d487.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTgzOTkyMDMsIm5iZiI6MTcxODM5ODkwMywicGF0aCI6Ii8yNTM4NjYwNy8zMTAxNTIzNjgtODhjZjNiNzQtMzM4Ny00ZTFlLTkzMDktYzU1MGVlMTNkNDg3LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA2MTQlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNjE0VDIxMDE0M1omWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTc3ZWFkY2ZkMjI1N2E2ZDRhMGQ0NjkyN2RmYTVmYzdhYzBhYjQ0NjVlMWQ0MGVkOGU0MzljMjVjZGRjMzAwNWEmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.v7h76MidWlJaDLjomTn_N9siZlX0ddBRyxvZhcq0qV4)
![image](https://private-user-images.githubusercontent.com/25386607/310148802-51e9538e-0cd2-4c24-b5a2-45337524460d.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTgzOTkyMDMsIm5iZiI6MTcxODM5ODkwMywicGF0aCI6Ii8yNTM4NjYwNy8zMTAxNDg4MDItNTFlOTUzOGUtMGNkMi00YzI0LWI1YTItNDUzMzc1MjQ0NjBkLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA2MTQlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNjE0VDIxMDE0M1omWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTdmNDkxOGU4YTgwNDAyZDkxZTcyZWJmZTEyNzhkMjZhZjI1NDZjYmJiNDkwMDk1NmM3ZTA0NTA0ZWFjYjgwOWUmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.CyVfZ89zOt9phOUH4LozZadi3cAc9I9ZyVmKEHOGdT8)
![image](https://private-user-images.githubusercontent.com/25386607/310149245-241fe857-f986-487b-98f1-e9a378b5b426.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTgzOTkyMDMsIm5iZiI6MTcxODM5ODkwMywicGF0aCI6Ii8yNTM4NjYwNy8zMTAxNDkyNDUtMjQxZmU4NTctZjk4Ni00ODdiLTk4ZjEtZTlhMzc4YjViNDI2LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA2MTQlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNjE0VDIxMDE0M1omWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTczOWE2Y2QzZTE2Mjc3MTA3NmY2YjllZDY3ZmRiZTdlMDQwNWEyNmI4OTVlNGU5YWFmMjFiMWE5ZmNlYjQ3YmImWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0._5Y8ce6uWXdKk72tAjpLDBU82UuLlaaSsiFrQEGoaHs)
![image](https://private-user-images.githubusercontent.com/25386607/310157843-71688582-4fc7-47ce-b646-c25e0c373463.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTgzOTkyMDMsIm5iZiI6MTcxODM5ODkwMywicGF0aCI6Ii8yNTM4NjYwNy8zMTAxNTc4NDMtNzE2ODg1ODItNGZjNy00N2NlLWI2NDYtYzI1ZTBjMzczNDYzLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA2MTQlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNjE0VDIxMDE0M1omWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTZmM2RiZmNmOWYzZjg1MDQwNWE4MjFjZGQwYmNmNmI1ZTJjZTBiODM0MjA1ZmFjMzI4NjU0ZjBmYjU4NDg0NmEmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.aZr4jH8J_0oTtuepEWSpfeOQou-S4sr24bZIpE79_9s)
![image](https://private-user-images.githubusercontent.com/25386607/310149690-9db6ff6a-e755-4c33-97c2-bf9854b7b8ba.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTgzOTkyMDMsIm5iZiI6MTcxODM5ODkwMywicGF0aCI6Ii8yNTM4NjYwNy8zMTAxNDk2OTAtOWRiNmZmNmEtZTc1NS00YzMzLTk3YzItYmY5ODU0YjdiOGJhLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA2MTQlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNjE0VDIxMDE0M1omWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTcwNjEwY2YyNGY1M2M5ODY1Njc4Y2E5OTMwMTU0NWJhMTYyYjc1ZDQ3NWEyZDJkOWQ3YjJiNDY2OTMyMGUyN2YmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.AKjeucrYV_-177wpICvxftWRvhr-3Kh_smcrfgbEjXQ)
![image](https://private-user-images.githubusercontent.com/25386607/310379576-b60ce74f-6f63-45d7-8711-f7356dbdbb2f.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTgzOTkyMDMsIm5iZiI6MTcxODM5ODkwMywicGF0aCI6Ii8yNTM4NjYwNy8zMTAzNzk1NzYtYjYwY2U3NGYtNmY2My00NWQ3LTg3MTEtZjczNTZkYmRiYjJmLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA2MTQlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNjE0VDIxMDE0M1omWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTU4ZTQ2ZDExZGNkZGI4NTUwMTZjY2FjMjc0NTRmY2NlZGJiMTk5ZDJlODEwMDg3MDE1OGU2ODhmZjNiMzlmMjgmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.Gach_I7Ok8nA3JTfLVbRtME5Sy9AtbLqvgdt6-Hestg)
![image](https://private-user-images.githubusercontent.com/25386607/310380210-a6972e16-6177-4c8e-8e17-edcdd2a92ad1.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTgzOTkyMDMsIm5iZiI6MTcxODM5ODkwMywicGF0aCI6Ii8yNTM4NjYwNy8zMTAzODAyMTAtYTY5NzJlMTYtNjE3Ny00YzhlLThlMTctZWRjZGQyYTkyYWQxLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA2MTQlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNjE0VDIxMDE0M1omWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTgxNzM3ZTRkMTIyNTk4ZWNhZWQxOTU3YjJjZjA1ZjE1NDAzZWNlN2Q0YjkwM2JkYTdiYjU1M2JmYzZmZWJjMzcmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.K1TuEZdDAPggvz4vm-JQN9umySbjvpf4RYyj2toXM-s)
![image](https://private-user-images.githubusercontent.com/25386607/310380439-e11cb6ee-9d73-45c6-91d0-1cebc2f91c5d.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTgzOTkyMDMsIm5iZiI6MTcxODM5ODkwMywicGF0aCI6Ii8yNTM4NjYwNy8zMTAzODA0MzktZTExY2I2ZWUtOWQ3My00NWM2LTkxZDAtMWNlYmMyZjkxYzVkLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA2MTQlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNjE0VDIxMDE0M1omWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTEwNzAzNDkxYjhmYmVkZTBiYmI4MzU2MTE2ZTQyMDdlZThjZGRkNzEyMTczZmQ1M2Y2ZmRkOGU4OTllNWZkN2YmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.QzRX8Hi-n9Vfptprjwxcbqvt0Hi7TtvS9l2D-ygzUSs)
![image](https://private-user-images.githubusercontent.com/25386607/310380781-7c4060f9-1291-4e50-baee-dee350d2b979.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTgzOTkyMDMsIm5iZiI6MTcxODM5ODkwMywicGF0aCI6Ii8yNTM4NjYwNy8zMTAzODA3ODEtN2M0MDYwZjktMTI5MS00ZTUwLWJhZWUtZGVlMzUwZDJiOTc5LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA2MTQlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNjE0VDIxMDE0M1omWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTUyZWI5MDQxNjVkNmFlYjkwMmUwNTgxOThjN2M4N2UzODNkNjJkYjlhZWJkNzE5ZDRjMDllMGQ1ZmE0NGNhMGYmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.3D63iCi3AgD1x1Gizx7RkiYyEAQztgf-Eat7-Lk673o)
![image](https://private-user-images.githubusercontent.com/25386607/310381167-d4c6c876-b711-4712-b2b2-22f3b5c38ab3.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTgzOTkyMDMsIm5iZiI6MTcxODM5ODkwMywicGF0aCI6Ii8yNTM4NjYwNy8zMTAzODExNjctZDRjNmM4NzYtYjcxMS00NzEyLWIyYjItMjJmM2I1YzM4YWIzLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA2MTQlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNjE0VDIxMDE0M1omWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTg1YWZjN2NlNGRhMWY0NmM4YmMxNTM5ZTYyYWU5MTZmMjA0OTk1ZWEzZDc2MTg1OTM4ZDM3Y2M0YTJhYjRjNDkmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.HmHpGQzb5K8lA8XMw955wJy5igl0k7oV998Dkj8zzMs)
- Incremental Load: Medium
- Internal stage:Snowflake Docs
- Power of Metadata driven ETL Frameworks:Linkedin