# Machine Learning Techniques
In this notebook, we will apply a logistic regression model to predict whether a certain habit (e.g., 'Morning sunlight') affects sleep quality (e.g., 'REM sleep above average').

Below is just an example to show the concept, but it can be applied to each feature in the project:


In [7]:
import pandas as pd
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

# Load the datasets
sleep_data = pd.read_csv('./processed_data/sleep_cycles.csv')
habits_data = pd.read_csv('./processed_data/processed_habits_data.csv')

# Convert 'date' columns to datetime format
sleep_data['date'] = pd.to_datetime(sleep_data['date'], errors='coerce')
habits_data['Date'] = pd.to_datetime(habits_data['Date'], errors='coerce')

# Merge the datasets on the date columns
merged_data = sleep_data.merge(habits_data, left_on='date', right_on='Date', how='inner')

# Feature: Morning sunlight, Target: REM sleep above average
merged_data['REM_above_avg'] = (merged_data['REM'] > merged_data['REM'].mean()).astype(int)
X = merged_data[['Morning sunlight']]
y = merged_data['REM_above_avg']

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [8]:
# Train a logistic regression model
model = LogisticRegression()
model.fit(X_train, y_train)

# Evaluate the model
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)

print(f"Accuracy: {accuracy:.2f}")
print("Confusion Matrix:")
print(conf_matrix)
print("Classification Report:")
print(classification_report(y_test, y_pred))

Accuracy: 0.20
Confusion Matrix:
[[1 0]
 [4 0]]
Classification Report:
              precision    recall  f1-score   support

           0       0.20      1.00      0.33         1
           1       0.00      0.00      0.00         4

    accuracy                           0.20         5
   macro avg       0.10      0.50      0.17         5
weighted avg       0.04      0.20      0.07         5



  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


In [9]:
# Perform cross-validation
cv_scores = cross_val_score(model, X, y, cv=5)
print(f"Cross-Validation Accuracy: {cv_scores.mean():.2f} ± {cv_scores.std():.2f}")

Cross-Validation Accuracy: 0.45 ± 0.12


### Results:
We built a model for predicting one of the metrics in our experiment, that is a model that will predict whether the morning sunlight on a given day will increase the REM sleep on that day.

But due to the limited data in this project, of course the model is not accurate. But the overall concepts are applied for the model.