<a href="https://colab.research.google.com/github/ne-adrita/Predicting-Sleep-Quality-through-Behavioral-Pattern-Recognition/blob/main/LightGBM_colab.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:

# 📌 Step 1: Import necessary libraries
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.metrics import (
    mean_absolute_error, mean_squared_error, r2_score,
    accuracy_score, classification_report, confusion_matrix
)
import lightgbm as lgb  # LightGBM library

# 📂 Step 2: Load dataset
df = pd.read_csv('/content/sleep_pattern_analysis_datasett.csv')  # Upload this file to Colab or local path

# 🧹 Step 3: Data cleaning
df.drop('Person_ID', axis=1, inplace=True)  # Drop the ID column
df['Work Hours (hrs/day)'] = pd.to_numeric(df['Work Hours (hrs/day)'], errors='coerce')  # Convert to numeric
df['Gender'] = LabelEncoder().fit_transform(df['Gender'])  # Label encode the gender column
df.dropna(inplace=True)  # Drop rows with missing values

# 🎯 Step 4: Feature and target split
X = df.drop('Sleep Quality', axis=1)  # Features
y = df['Sleep Quality']  # Target

# ⚖️ Step 5: Feature scaling (optional for LightGBM, but kept for consistency)
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# --------------------------
# 🧠 Step 6A: Regression model using LightGBM
# --------------------------
X_train_r, X_test_r, y_train_r, y_test_r = train_test_split(X_scaled, y, test_size=0.2, random_state=42)

regressor = lgb.LGBMRegressor(random_state=42)
regressor.fit(X_train_r, y_train_r)
y_pred_r = regressor.predict(X_test_r)

print("📊 Regression Evaluation (LightGBM)")
print("MAE:", mean_absolute_error(y_test_r, y_pred_r))
print("MSE:", mean_squared_error(y_test_r, y_pred_r))
print("R²:", r2_score(y_test_r, y_pred_r))

# --------------------------
# 🧠 Step 6B: Classification model using LightGBM
# --------------------------
# Convert sleep quality into categories: 0 = Poor, 1 = Average, 2 = Good
y_class = y.apply(lambda x: 0 if x <= 4 else (1 if x <= 7 else 2))

X_train_c, X_test_c, y_train_c, y_test_c = train_test_split(X_scaled, y_class, test_size=0.2, random_state=42)

classifier = lgb.LGBMClassifier(random_state=42)
classifier.fit(X_train_c, y_train_c)
y_pred_c = classifier.predict(X_test_c)

print("\n📊 Classification Evaluation (LightGBM)")
print("Accuracy:", accuracy_score(y_test_c, y_pred_c))
print("Classification Report:\n", classification_report(y_test_c, y_pred_c, target_names=["Poor", "Average", "Good"]))
print("Confusion Matrix:\n", confusion_matrix(y_test_c, y_pred_c))





[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000403 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 1600
[LightGBM] [Info] Number of data points in the train set: 3999, number of used features: 12
[LightGBM] [Info] Start training from score 5.528632
📊 Regression Evaluation (LightGBM)
MAE: 2.5349062671590645
MSE: 8.873807311270003
R²: -0.09311611516155693
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000175 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 1600
[LightGBM] [Info] Number of data points in the train set: 3999, number of used features: 12
[LightGBM] [Info] Start training from score -0.924199
[LightGBM] [Info] Start training from score -1.190478
[LightGBM] [Info] Start trai

