In [None]:
# Some kind of Banner
# Project Title: Mind-Body Sleep Sync
# Subtitle: Bridging the "Sleep Perception Gap" via Multimodal Wearable Data Analysis

## Project Overview

**Objective**: In the current wearable technology market, users are flooded with objective health metrics (e.g., "You slept 8 hours"), yet many suffer from "Orthosomnia"—the anxiety or disconnect between their tracked data and how they actually feel.

This project aims to solve this business problem by developing a **"Cognitive Sleep Coach"**. By analyzing the gap between **Objective Recovery** (Fitbit data) and **Subjective Fatigue** (Survey data), we will propose a solution that provides personalized cognitive reframing to improve user retention and daily vitality.

## Problem Statement: The "Orthosomnia" Paradox

- **Context**: Wearable devices provide accurate sleep architecture data, yet many users suffer from "Paradoxical Insomnia" or "Orthosomnia"—feeling tired despite good data, or obsessing over scores.

- **Pain Point**: Current health apps fail to explain why a user feels tired when their data looks "green." This leads to a loss of trust in the device and high churn rates.

- **Opportunity**: Leveraging the discrepancy between **Objective Metrics** (HRV, Sleep Efficiency, REM) and **Subjective Metrics** (Stress, Mood, Perceived Quality) to provide actionable psychological and physiological insights.

## Data Overview

This project leverages a **multi-modal dataset** that maps **Objective Biometric Data** (wearable) with **Subjective Self-Report Data** (survey) to analyze and resolve the "Sleep Perception Gap." The key data columns utilized are as follows:

**A. Objective Sleep Metrics (Wearable-based Ground Truth)**
Physiological data collected via wearable devices, serving as the "Ground Truth" for physical recovery.

- `main_sleep_efficiency` **(Sleep Efficiency)**: The ratio of total time asleep to total time in bed (%). A key indicator of sleep density.

- `main_sleep_minutes_asleep` **(Total Sleep Time)**: The actual duration of sleep in minutes.

- `main_sleep_rem_minutes` **/** `main_sleep_deep_minutes`: Duration of specific sleep stages responsible for mental (REM) and physical (Deep) recovery.

- `HRV` **(Heart Rate Variability)**: A physiological marker for stress recovery during sleep.

**B. Subjective Sleep Metrics (Survey-based Perception)**
"Feelings" reported by users every morning. These serve as the comparative baseline to calculate the "Gap" against objective metrics.

- `selfreport_sleep_quality` **(Subjective Sleep Quality)**: The user's perceived overall satisfaction with their sleep (Likert Scale).

- `selfreport_hours_slept`: The sleep duration as perceived by the user. By comparing this with main_sleep_minutes_asleep, we quantify **"Time Perception Distortion"**.

- `selfreport_SOL` **(Sleep Onset Latency)**: Subjective estimation of time taken to fall asleep.

**C. Target Variables (Morning Readiness & Mood)**
The final prediction targets (Y) representing the User Experience (UX) outcomes we aim to improve.

- `energy_morning`: The primary target variable (Y). We aim to predict and improve this score.

- `stress_morning` : Used to analyze the downstream effects of the perception gap.

##??? IS THIS PART BELOW NOT USED??
**D. Contextual Features (Lifestyle Factors)**
External factors influencing sleep quality and perception, used for **Root Cause Analysis**.

- `sedentary_minutes`: Total daily sedentary time. Used to analyze the impact of physical inactivity on sleep depth (Deep Sleep) and sleep onset.

- `steps` **/** `vigorousPA_min_in_range`: General activity levels and high-intensity exercise markers.

## 1. Environment Setup

In [18]:
# Import Libraries
import pandas as pd
import numpy as np
import plotly.express as px         # need to do [python3 -m pip install plotly] before launching python
import matplotlib.pyplot as plt     # need to do [python3 -m pip install matplotlib]
import seaborn as sns               # need to do [python3 -m pip install seaborn]
import scipy.stats                  # need to do [python3 -m pip install scipy]

# Set plot style
sns.set(style="whitegrid")

In [19]:
### Import Student Fitbit data
df = pd.read_csv('/Users/kk/Desktop/Project/MBSC/UTX_student.csv')
print(df.head(3))

   participantID1 participantID2  pre_or_post date_of_activity    HRV  \
0              76       z7trat1j            1        13-Feb-20  13.94   
1              13       6z6k8qzx            1        23-Jan-20  10.60   
2              54       4i7679py            2        21-Jul-20  12.33   

   MVPA_min_in_range  lightPA_min_in_range  lightPA_grps  lightPA_grps2_30min  \
0              10.00                 16.00           1.0                  1.0   
1               2.00                 11.00           1.0                  1.0   
2               4.25                 18.18           1.0                  1.0   

   main_sleep_nrem_minutes  ...  Zmain_sleep_light_minutes  \
0                    139.0  ...                      -2.64   
1                    141.0  ...                      -2.05   
2                    149.0  ...                      -1.01   

   Zmain_sleep_rem_minutes  Znap_minutes_asleep  Zselfreport_hours_slept  \
0                    -1.32                  NaN          