# Training & Evaluation
* We are going to break down training and evaluation into multiple notebooks, one for each algorithm that we train and evalutate. 
* In this first notebook, we'll create baseline models to get the predictions based on `stratified` and `most frequent` classes

## Install Libraries

In [None]:
%pip install scikit-learn

Note: you may need to restart the kernel to use updated packages.


## Import Libraries

In [None]:
import os
import sys
import pandas as pd
import numpy as np
from sklearn.compose import ColumnTransformer
from pathlib import Path

# Build an absolute path from this notebook's parent directory
module_path = os.path.abspath(os.path.join('..'))

# Add to sys.path if not already present
if module_path not in sys.path:
    sys.path.append(module_path)
    
from src.utils import preprocessing

## Initialize Directories

In [None]:
data_root_dir = Path("..", "data/")
models_root_dir = Path("..", "models/")

## Read Data

In [None]:
X_train = pd.read_csv(Path(data_root_dir,"X_train.csv"))
y_train = pd.read_csv(Path(data_root_dir,"y_train.csv"))

In [None]:
preprocessed_data_df = pd.DataFrame(preprocessing.pipeline.fit_transform(
    X_train,y_train), columns=preprocessing.pipeline.get_feature_names_out())
preprocessed_data_df.head()

      sleep_duration
0             bt_7_8
1             bt_7_8
2             bt_7_8
3             bt_7_8
4             bt_5_6
...              ...
22315         bt_7_8
22316           lt_5
22317         bt_7_8
22318         bt_5_6
22319         bt_5_6

[22320 rows x 1 columns]
      dietary_habits
0           moderate
1           moderate
2          unhealthy
3           moderate
4          unhealthy
...              ...
22315      unhealthy
22316      unhealthy
22317        healthy
22318      unhealthy
22319        healthy

[22320 rows x 1 columns]
      degree_level
0      high_school
1         bachelor
2           master
3         bachelor
4         bachelor
...            ...
22315     bachelor
22316       master
22317     bachelor
22318       master
22319     bachelor

[22320 rows x 1 columns]
      age_range
0      18_to_23
1      23_to_28
2      28_to_33
3        gte_33
4      23_to_28
...         ...
22315  23_to_28
22316    gte_33
22317    gte_33
22318  28_to_33
22319  28_to_3

Unnamed: 0,preprocess_gender__gender_female,preprocess_gender__gender_male,preprocess_profession__profession_student,preprocess_profession__profession_working,sleep_duration_pipeline__sleep_duration_bt_5_6,sleep_duration_pipeline__sleep_duration_bt_7_8,sleep_duration_pipeline__sleep_duration_gt_8,sleep_duration_pipeline__sleep_duration_lt_5,dietary_habits_pipeline__dietary_habits_healthy,dietary_habits_pipeline__dietary_habits_moderate,...,age_pipeline__encode_age_range__age_range_gte_33,cgpa_pipeline__encode_cgpa_range__cgpa_range_4_to_7,cgpa_pipeline__encode_cgpa_range__cgpa_range_gte_7,cgpa_pipeline__encode_cgpa_range__cgpa_range_lt_4,hours_pipeline__encode_hours_range__hours_range_4_to_8,hours_pipeline__encode_hours_range__hours_range_gte_8,hours_pipeline__encode_hours_range__hours_range_lt_4,ratings_column_pipeline__academic_pressure,ratings_column_pipeline__study_satisfaction,ratings_column_pipeline__financial_stress
0,0.0,1.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,...,0.0,1.0,0.0,0.0,0.0,0.0,1.0,4.0,1.0,5.0
1,0.0,1.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,...,0.0,1.0,0.0,0.0,0.0,1.0,0.0,3.0,2.0,1.0
2,0.0,1.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,...,0.0,0.0,1.0,0.0,1.0,0.0,0.0,3.0,2.0,5.0
3,0.0,1.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,...,1.0,0.0,1.0,0.0,0.0,1.0,0.0,3.0,5.0,3.0
4,0.0,1.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,1.0,0.0,0.0,1.0,0.0,3.0,4.0,5.0
