# Assignment 4: Health & Fitness Tracking Analysis

## 📌 Overview
In this assignment, you will analyze a fitness & health dataset using **Pandas** and **NumPy** (no other libraries).
The dataset `health_tracking.csv` contains daily activity and wellness metrics.

Your goals are to practice:
- Loading and inspecting data
- Handling missing values
- Creating new features
- Detecting outliers (z-score)
- Aggregating and ranking results
- (ML Extension) Building simple NumPy-only models

---

## 📊 Dataset Description
The dataset contains the following columns:

- **Date** → Day of the record (YYYY-MM-DD)
- **DayOfWeek** → Day name (e.g., Monday)
- **Steps** → Number of steps taken
- **Distance_km** → Distance covered in kilometers
- **Active_Minutes** → Number of active minutes
- **Calories** → Calories burned
- **HeartRate_Avg** → Average heart rate (bpm)
- **Sleep_Hours** → Hours of sleep
- **Water_Liters** → Water intake in liters

---

## 📝 Tasks

### Task 1: Load and Inspect Data
- Load `health_tracking.csv` into Pandas.
- Display the first 5 rows.
- Show the **shape** and **data types**.
- Count **missing values** in each column.

---

### Task 2: Handle Missing Data
- Using **NumPy**, replace missing values in **numeric columns** with the **column mean**.

---

### Task 3: Feature Engineering
- Create `Cals_per_Min` = `Calories / Active_Minutes` (handle divide-by-zero safely).
- Create `Intense_Day` = `True` if `Steps ≥ 10,000`, else `False`.
- Create `Healthy_Day` = `True` if `Sleep_Hours ≥ 7` **and** `Water_Liters ≥ 2`, else `False`.

---

### Task 4: Outlier Detection
- For `Steps`, `Calories`, and `Sleep_Hours`, compute **z-scores** using NumPy:
  \[
  z = \frac{x - \text{mean}}{\text{std}}
  \]
- Create boolean columns to flag outliers where **|z| > 2**.

---

### Task 5: Aggregation
- Group by `DayOfWeek` and compute the **average** of:
  - `Steps`
  - `Calories`
  - `Sleep_Hours`
  - `Water_Liters`

---

### Task 6: Ranking
- Find the **top 3 dates** with the highest `Steps`.
- Find the **top 3 Healthy_Day dates** (where `Healthy_Day = True`) ranked by `Steps`.

---

### Task 7: Correlation
- Using **NumPy**, compute correlation coefficients between:
  - `Steps` and `Calories`
  - `Sleep_Hours` and `HeartRate_Avg`
- Briefly interpret each correlation (positive/negative/weak/strong).

---

## 🤖 ML Extension (Optional but Recommended)

### Task 8: Regression (NumPy Only)
- Predict `Calories` from features (`Steps`, `Distance_km`, `Active_Minutes`, `HeartRate_Avg`, `Sleep_Hours`, `Water_Liters`) using **linear regression** (normal equation with pseudo-inverse).
- Report **MAE**, **RMSE**, and **R²** on a validation split.

---

### Task 9: Classification (NumPy Only)
- Predict `Healthy_Day` (True/False) using **logistic regression** trained with **gradient descent** (include an L2 regularization term).
- Report **Accuracy**, **Precision**, **Recall**, **F1**, and show a **confusion matrix**.

---


# Assignment 4 — Health & Fitness Tracking Analysis

In [2]:
import pandas as pd
import numpy as np

In [4]:
df = pd.read_csv("health_tracking.csv")

## Task 1

In [5]:
df

Unnamed: 0,Date,DayOfWeek,Steps,Distance_km,Active_Minutes,Calories,HeartRate_Avg,Sleep_Hours,Water_Liters
0,2025-08-01,Friday,11270.0,8.45,81,214.0,88,6.1,3.18
1,2025-08-02,Saturday,4860.0,3.64,81,385.0,65,8.1,3.01
2,2025-08-03,Sunday,,7.04,66,260.0,76,7.7,1.47
3,2025-08-04,Monday,9191.0,6.89,81,229.0,72,8.9,3.23
4,2025-08-05,Tuesday,9734.0,7.3,70,,88,6.9,2.35
5,2025-08-06,Wednesday,10265.0,7.7,74,233.0,95,5.1,
6,2025-08-07,Thursday,4466.0,3.35,83,285.0,75,8.1,3.24
7,2025-08-08,Friday,8426.0,6.32,22,439.0,83,8.3,1.8
8,2025-08-09,Saturday,,7.18,120,370.0,81,7.3,1.28
9,2025-08-10,Sunday,,9.24,70,397.0,72,8.4,1.57


In [6]:
# first 5 value
df.head()

Unnamed: 0,Date,DayOfWeek,Steps,Distance_km,Active_Minutes,Calories,HeartRate_Avg,Sleep_Hours,Water_Liters
0,2025-08-01,Friday,11270.0,8.45,81,214.0,88,6.1,3.18
1,2025-08-02,Saturday,4860.0,3.64,81,385.0,65,8.1,3.01
2,2025-08-03,Sunday,,7.04,66,260.0,76,7.7,1.47
3,2025-08-04,Monday,9191.0,6.89,81,229.0,72,8.9,3.23
4,2025-08-05,Tuesday,9734.0,7.3,70,,88,6.9,2.35


In [9]:
#print shape and data type
print(df.shape)
print(df.dtypes)

(30, 9)
Date               object
DayOfWeek          object
Steps             float64
Distance_km       float64
Active_Minutes      int64
Calories          float64
HeartRate_Avg       int64
Sleep_Hours       float64
Water_Liters      float64
dtype: object
