# Reaction Time Prediction Model

## Objective
Predict human reaction time (PVT test) based on physiological and lifestyle factors.

**Dataset**: 64 samples, 7 original features (Gender, Sleep, Heart rate, Caffeine, O2 saturation, Stress, Age)

---

## 1. Import Libraries

In [None]:
import pandas as pd
import numpy as np
import pickle
from sklearn.model_selection import LeaveOneOut, cross_val_predict
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.linear_model import ElasticNet
from sklearn.metrics import mean_squared_error, mean_absolute_error
import warnings
warnings.filterwarnings('ignore')
np.random.seed(42)

## 2. Load and Prepare Data

In [None]:
# Load dataset
df = pd.read_csv('Expérience 2 - Feuil1.csv')

# Remove unnecessary columns
df = df.drop(['GLUCOSE (mg/dL)', 'PARTICIPANT'], axis=1)

# Encode gender (H=1, F=0)
le = LabelEncoder()
df['SEXE'] = le.fit_transform(df['SEXE'])

## 3. Feature Engineering

Create new features by combining existing ones to capture interactions and non-linear relationships.

### Why These Features?

**1. SLEEP_STRESS = Sleep × Stress**
- **Combined effect**: Low sleep + high stress is worse
- **Expected**: More sleep with high stress = better RT; Less sleep with high stress = worse RT

**2. HEART_STRESS = Heart Rate × Stress**
- **Expected**: High heart rate + high stress Could improve RT (alertness) OR worsen it (anxiety)

**3. AGE_SLEEP = Age × Sleep**
- **Age-dependent sleep effect**: Older people need quality sleep more than younger people
- **Expected**: Young + low sleep = less impact; Old + low sleep = bigger impact

**4. CAFFEINE_SLEEP = Caffeine ÷ (Sleep + 1)**
    (+1)To prevent division by zero if sleep hours = 0.

- **Compensatory behavior**: People use caffeine to compensate for poor sleep
- **Relative dependency**: 200mg caffeine after 4h sleep ≠ 200mg after 8h sleep
- **Expected**: High ratio = trying to compensate for tiredness

**5. SLEEP_SQ = Sleep²**
- **Diminishing returns**: 8h→9h sleep improvement ≠ 4h→5h improvement
- **Non-linear relationship**: Sleep doesn't affect RT linearly

**6. AGE_SQ = Age²**
- **Non-linear aging**: RT decline accelerates with age (not constant)
- **Expected**: 20→30 age change ≠ 40→50 age change in RT impact

**7. STRESS_SQ = Stress²**
- **Exponential impact**: Moderate stress (30%) ≠ 2× impact of high stress (60%)

**8. SLEEP_DEFICIT = 1 if Sleep < 7, else 0**
- **Clinical threshold**: 7 hours is recommended minimum
- **Binary switch**: There's a "cliff" effect below 7h
- **Expected**: People with <7h sleep are in a different performance category

**9. HIGH_STRESS = 1 if Stress > 40%, else 0**
- **Threshold effect**: Moderate stress (20-40%) ≠ high stress (>40%)
- **Clinical relevance**: >40% biological stress is considered elevated
- **Categorization**: Separates "stressed" from "not stressed" individuals

In [None]:
# Create interaction and polynomial features
df['SLEEP_STRESS'] = df['SOMMEIL (H)'] * df['STRESS BIOLOGIQUE %']
df['HEART_STRESS'] = df['HEART RATE (BPM)'] * df['STRESS BIOLOGIQUE %']
df['AGE_SLEEP'] = df['AGE'] * df['SOMMEIL (H)']
df['CAFFEINE_SLEEP'] = df['CAFFEINE (mg)'] / (df['SOMMEIL (H)'] + 1)
df['SLEEP_SQ'] = df['SOMMEIL (H)'] ** 2
df['AGE_SQ'] = df['AGE'] ** 2
df['STRESS_SQ'] = df['STRESS BIOLOGIQUE %'] ** 2
df['SLEEP_DEFICIT'] = np.where(df['SOMMEIL (H)'] < 7, 1, 0)
df['HIGH_STRESS'] = np.where(df['STRESS BIOLOGIQUE %'] > 40, 1, 0)

## 4. Prepare Features and Target

In [None]:
# Separate X and y
X = df.drop('PVT : TEMPS DE REACTION (ms)', axis=1)
y = df['PVT : TEMPS DE REACTION (ms)']

## 5. Train Model with Leave-One-Out Cross-Validation

### Model: ElasticNet
- Combines L1 (Lasso) and L2 (Ridge) regularization
- Ideal for small datasets with many features
- Prevents overfitting

### Validation: Leave-One-Out CV
- Uses maximum data (trains on 63, tests on 1, repeats 64 times)
- Most reliable for small datasets

In [None]:
# Scale features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Create and train model
model = ElasticNet(alpha=1, l1_ratio=0.5, max_iter=10000, random_state=42)
loo = LeaveOneOut()

# Get predictions
y_pred = cross_val_predict(model, X_scaled, y, cv=loo)

# Train final model on all data
model.fit(X_scaled, y)

## Before vs After Scaling

### BEFORE (Original Values):
| Feature       | Min  | Max  | Range Type      |
|---------------|------|------|----------------|
| SEXE          | 0    | 1    | tiny range     |
| SLEEP         | 2    | 11   | small range    |
| HEART_RATE    | 55   | 111  | medium range   |
| CAFFEINE      | 0    | 236  | large range    |
| AGE_SQ        | 484  | 2500 | HUGE range     |
| STRESS_SQ     | 121  | 5184 | HUGE range     |

> **Model confusion:** "AGE_SQ must be super important because it's in the thousands!"

---

### AFTER (Scaled Values using StandardScaler):
| Feature          | Min    | Max    |
|------------------|--------|--------|
| SEXE_scaled       | -1.5   | +1.5   |
| SLEEP_scaled      | -1.8   | +2.0   |
| HEART_RATE_scaled | -2.1   | +2.5   |
| CAFFEINE_scaled   | -1.0   | +2.8   |
| AGE_SQ_scaled     | -1.5   | +2.0   |
| STRESS_SQ_scaled  | -1.2   | +2.3   |

> ✅ Now all features are on the **same scale** (roughly -3 to +3).  
> The model can fairly compare features and determine:  
> **"Which feature actually matters more?"**


## 6. Evaluation Metrics

### Metric Definitions:



**MSE (Mean Squared Error)**
- Formula: $MSE = \frac{1}{n}\sum_{i=1}^{n}(y_{actual} - y_{pred})^2$
- Range: 0 to ∞ (0=perfect)
- Units: ms²

**RMSE (Root Mean Squared Error)**
- Formula: $RMSE = \sqrt{MSE}$
- Range: 0 to ∞ (0=perfect)
- Units: ms (same as target)
- Meaning: Average prediction error

**MAE (Mean Absolute Error)**
- Formula: $MAE = \frac{1}{n}\sum_{i=1}^{n}|y_{actual} - y_{pred}|$
- Range: 0 to ∞ (0=perfect)
- Units: ms
- Meaning: Average absolute error (less sensitive to outliers)

---

In [None]:
# Calculate metrics
mse = mean_squared_error(y, y_pred)
rmse = np.sqrt(mse)
mae = mean_absolute_error(y, y_pred)

# Display results
print("="*70)
print("MODEL PERFORMANCE METRICS")
print("="*70)
print(f"MSE:        {mse:.2f} ms²")
print(f"RMSE:       {rmse:.2f} ms")
print(f"MAE:        {mae:.2f} ms")
print("\n" + "="*70)

## 7. Save Model and Scaler for Deployment

In [None]:
# Save model and scaler using pickle
with open('model.pkl', 'wb') as f:
    pickle.dump(model, f)

with open('scaler.pkl', 'wb') as f:
    pickle.dump(scaler, f)

print("✓ Model saved to 'model.pkl'")
print("✓ Scaler saved to 'scaler.pkl'")