# Assignment 4: Health & Fitness Tracking Analysis

## 📌 Overview
In this assignment, you will analyze a fitness & health dataset using **Pandas** and **NumPy** (no other libraries).
The dataset `health_tracking.csv` contains daily activity and wellness metrics.

Your goals are to practice:
- Loading and inspecting data
- Handling missing values
- Creating new features
- Detecting outliers (z-score)
- Aggregating and ranking results
- (ML Extension) Building simple NumPy-only models

---

## 📊 Dataset Description
The dataset contains the following columns:

- **Date** → Day of the record (YYYY-MM-DD)
- **DayOfWeek** → Day name (e.g., Monday)
- **Steps** → Number of steps taken
- **Distance_km** → Distance covered in kilometers
- **Active_Minutes** → Number of active minutes
- **Calories** → Calories burned
- **HeartRate_Avg** → Average heart rate (bpm)
- **Sleep_Hours** → Hours of sleep
- **Water_Liters** → Water intake in liters

---

## 📝 Tasks

### Task 1: Load and Inspect Data
- Load `health_tracking.csv` into Pandas.
- Display the first 5 rows.
- Show the **shape** and **data types**.
- Count **missing values** in each column.

---

### Task 2: Handle Missing Data
- Using **NumPy**, replace missing values in **numeric columns** with the **column mean**.

---

### Task 3: Feature Engineering
- Create `Cals_per_Min` = `Calories / Active_Minutes` (handle divide-by-zero safely).
- Create `Intense_Day` = `True` if `Steps ≥ 10,000`, else `False`.
- Create `Healthy_Day` = `True` if `Sleep_Hours ≥ 7` **and** `Water_Liters ≥ 2`, else `False`.

---

### Task 4: Outlier Detection
- For `Steps`, `Calories`, and `Sleep_Hours`, compute **z-scores** using NumPy:
  \[
  z = \frac{x - \text{mean}}{\text{std}}
  \]
- Create boolean columns to flag outliers where **|z| > 2**.

---

### Task 5: Aggregation
- Group by `DayOfWeek` and compute the **average** of:
  - `Steps`
  - `Calories`
  - `Sleep_Hours`
  - `Water_Liters`

---

### Task 6: Ranking
- Find the **top 3 dates** with the highest `Steps`.
- Find the **top 3 Healthy_Day dates** (where `Healthy_Day = True`) ranked by `Steps`.

---

### Task 7: Correlation
- Using **NumPy**, compute correlation coefficients between:
  - `Steps` and `Calories`
  - `Sleep_Hours` and `HeartRate_Avg`
- Briefly interpret each correlation (positive/negative/weak/strong).

---

## 🤖 ML Extension (Optional but Recommended)

### Task 8: Regression (NumPy Only)
- Predict `Calories` from features (`Steps`, `Distance_km`, `Active_Minutes`, `HeartRate_Avg`, `Sleep_Hours`, `Water_Liters`) using **linear regression** (normal equation with pseudo-inverse).
- Report **MAE**, **RMSE**, and **R²** on a validation split.

---

### Task 9: Classification (NumPy Only)
- Predict `Healthy_Day` (True/False) using **logistic regression** trained with **gradient descent** (include an L2 regularization term).
- Report **Accuracy**, **Precision**, **Recall**, **F1**, and show a **confusion matrix**.

---
