# Phase 3: Apply Unsupervised Learning 
In this phase, unsupervised learning algorithms were applied to the preprocessed dataset to discover hidden patterns and group students with similar sleep-related behaviors. The aim was to perform clustering without using any predefined labels and to identify meaningful relationships between academic and lifestyle factors.

## 1. Algorithm Application

### 1.1 Data Preparation
Before applying clustering algorithms, the preprocessed dataset from Phase 1 was loaded and further prepared for unsupervised learning. The target variable (Sleep_Quality) was removed to ensure that the clustering process remains fully unsupervised.
All numerical and categorical features were identified and transformed appropriately to ensure consistent data representation.
Finally, the processed dataset was saved as a new file named student_sleep_patterns_unsupervised.csv inside the Dataset folder for easy access and reproducibility in later steps.

In [2]:
# ==========================================
# 1.1 Data Preparation
# ==========================================

import pandas as pd
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.compose import ColumnTransformer

# Load the preprocessed dataset
df = pd.read_csv("../Dataset/student_sleep_patterns_preprocessed.csv")

# ------------------------------------------
# Remove the target variable (Sleep_Quality)
# ------------------------------------------
if "Sleep_Quality" in df.columns:
    df_unsupervised = df.drop(columns=["Sleep_Quality"])
else:
    df_unsupervised = df.copy()

# ------------------------------------------
# Identify column types
# ------------------------------------------
numeric_cols = [col for col in df_unsupervised.columns if pd.api.types.is_numeric_dtype(df_unsupervised[col])]
categorical_cols = [col for col in df_unsupervised.columns if not pd.api.types.is_numeric_dtype(df_unsupervised[col])]

# ------------------------------------------
# Apply preprocessing (scaling + one-hot encoding)
# ------------------------------------------
preprocessor = ColumnTransformer(
    transformers=[
        ("num", StandardScaler(), numeric_cols),
        ("cat", OneHotEncoder(handle_unknown="ignore", sparse_output=False), categorical_cols),
    ],
    remainder="drop"
)

X_transformed = preprocessor.fit_transform(df_unsupervised)

# ------------------------------------------
# Save the new dataset for clustering
# ------------------------------------------
unsupervised_df = pd.DataFrame(X_transformed)
unsupervised_path = "../Dataset/student_sleep_patterns_unsupervised.csv"
unsupervised_df.to_csv(unsupervised_path, index=False)

# ------------------------------------------
# Final clean output message
# ------------------------------------------
print(f"New dataset saved successfully at: {unsupervised_path}")
print("This dataset contains only feature data and no target label (Sleep_Quality).")


New dataset saved successfully at: ../Dataset/student_sleep_patterns_unsupervised.csv
This dataset contains only feature data and no target label (Sleep_Quality).


### 1.2 K-Means Clustering

## 2. Evaluation & Visualization

### 2.1 Evaluation Metrics

### 2.2 Visualization of Clusters

## 3. Integration & Insight

### 3.1 Using Clusters to Improve the Supervised Model

### 3.2 Analysis of Cluster Profiles

### 3.3 Integration Results and Justification