# 🧩 Team_3_CLOUD_CODERS

## 📘 Project Title
**Music Stream**

## 🎯 Objective
To design and implement a distributed ETL pipeline that processes and transforms music streaming data using AWS Airflow, Glue, and DynamoDB, enabling real-time listener and playback insights for business analytics.

## 🧠 Tech Stack
AWS Airflow · AWS Glue · Amazon DynamoDB

## 👥 Team Members
- Pavan Shetti
- Thrishala N P
- Sanketh Shinde
- Megha Singha Roy
- Sudeep S

## 🧩 Project Modules & Backlogs
### Module 1: Data Ingestion
- Collect streaming data from CSV files (songs, streams, users).
- Upload files to S3 and ensure correct folder structure.
- Set up access permissions for Glue and Airflow jobs.

### Module 2: ETL Development
- Create Glue jobs for data transformation and aggregation.
- Use Airflow DAGs to schedule Glue jobs.
- Implement validation before loading data to DynamoDB.

### Module 3: Data Storage
- Design DynamoDB tables for user activity and song metadata.
- Load transformed data from Glue job outputs.
- Validate DynamoDB updates using Airflow logs.

### Module 4: Access & Roles
- Attach IAM policies for Airflow and Glue services.
- Test permissions to confirm job execution and DynamoDB writes.
- Restrict S3 access to specific roles only.

### Module 5: Reporting & Analysis
- Analyze user activity trends using Athena queries on S3 outputs.
- Generate summary tables for playback counts per region.
- Prepare charts for popular artists and genres.

### Module 6: Validation & Testing
- Run Airflow DAG end-to-end to confirm workflow accuracy.
- Validate DynamoDB tables and record counts.
- Document Airflow DAG structure and job flow.



## 🚀 Sprint 1 – AWS Setup & Initialization
**Dates:** 16–18 Oct 2025 (3 days)

**Goal:** Set up S3, Airflow environment, and DynamoDB tables for the ETL pipeline.

**Tasks:**
- Create S3 bucket for raw and processed data.
- Set up AWS Managed Airflow environment and configure connection.
- Create DynamoDB tables for user and playback information.
- Upload input CSV data to S3 and test access permissions.

**Deliverable:**
AWS resources created and verified for ETL pipeline setup.

## 🚀 Sprint 2 – Pipeline Development & Execution
**Dates:** 23–26 Oct 2025 (4 days)

**Goal:** Develop Glue jobs and Airflow DAGs for ETL workflow execution.

**Tasks:**
- Write Glue scripts for transforming music stream data.
- Build Airflow DAG to trigger Glue jobs sequentially.
- Run test DAGs to validate end-to-end execution.
- Check Glue output data consistency and DynamoDB updates.

**Deliverable:**
Working ETL workflow connecting Glue, Airflow, and DynamoDB.

## 🚀 Sprint 3 – Integration, Testing & Demo
**Dates:** 27–31 Oct 2025 (4 days)

**Goal:** Integrate components, validate data accuracy, and prepare project demo.

**Tasks:**
- Run full DAG execution and verify results in DynamoDB.
- Execute Athena queries for aggregated playback metrics.
- Document workflow design and key metrics.
- Prepare project demo and presentation slides.

**Deliverable:**
Tested and validated music stream pipeline ready for demo.