# 🧩 Team_4_DATA_FORGE

## 📘 Project Title
**Signal Stream**

## 🎯 Objective
To design and implement a real-time data streaming pipeline that captures, processes, and visualizes mobile network logs using AWS Glue (PySpark), Kinesis, Athena, and ECS, enabling continuous monitoring and performance analysis.

## 🧠 Tech Stack
AWS Glue (PySpark) · Amazon Kinesis · Amazon Athena · Amazon ECS

## 👥 Team Members
- Rakesh Kumar
- Kavya Bhatta S
- Ijaz Ahmad S
- Sahana J Upadhyaya

## 🧩 Project Modules & Backlogs
### Module 1: Data Streaming Setup
- Create Kinesis data stream for mobile log ingestion.
- Develop a data producer script to push logs to Kinesis.
- Test Kinesis stream with sample data.

### Module 2: Real-Time Processing
- Develop Glue PySpark job to process streaming data.
- Configure data sink to write results into S3.
- Implement partitioning for Athena queries.

### Module 3: Visualization
- Create a dashboard using Streamlit and deploy on ECS.
- Fetch processed data from S3 for visualization.
- Add refresh logic for near real-time updates.

### Module 4: Security & Access
- Configure IAM roles for Glue, Kinesis, and ECS.
- Verify permissions for S3 and Athena queries.
- Ensure network logs are securely handled.

### Module 5: Reporting & Metrics
- Create Athena queries to compute network performance KPIs.
- Build tables for region-wise latency and error metrics.
- Summarize results for visualization in the dashboard.

### Module 6: Validation & Testing
- Test Kinesis-to-Glue data flow using live sample logs.
- Validate ECS container performance and dashboard uptime.
- Document setup and monitoring details.



## 🚀 Sprint 1 – AWS Setup & Initialization
**Dates:** 16–18 Oct 2025 (3 days)

**Goal:** Prepare AWS resources (Kinesis, S3, IAM) and set up basic stream ingestion.

**Tasks:**
- Create Kinesis stream for live mobile log data.
- Upload sample log files to S3 for testing.
- Assign IAM roles and permissions for Glue and ECS.
- Test end-to-end connectivity between Kinesis and S3.

**Deliverable:**
AWS streaming environment ready for development.

## 🚀 Sprint 2 – Pipeline Development & Execution
**Dates:** 23–26 Oct 2025 (4 days)

**Goal:** Implement real-time data processing and Glue job integration.

**Tasks:**
- Write PySpark script for real-time stream transformation.
- Connect Glue job to consume data from Kinesis stream.
- Save processed data to S3 for Athena queries.
- Test job execution and verify partitioned output.

**Deliverable:**
Real-time processing job developed and tested for accuracy.

## 🚀 Sprint 3 – Integration, Testing & Demo
**Dates:** 27–31 Oct 2025 (4 days)

**Goal:** Integrate dashboard, test full stream pipeline, and finalize project demo.

**Tasks:**
- Deploy Streamlit dashboard on ECS and connect to processed data.
- Execute Athena queries for live metrics validation.
- Run full streaming test and validate data refresh on dashboard.
- Prepare final demo and documentation.

**Deliverable:**
Fully functional streaming and visualization pipeline ready for presentation.