# Performance Metrics Hands-On Exercise

This hands-on exercise encourages the utilization of different evaluation metrics using this dataset: https://www.kaggle.com/datasets/uciml/student-alcohol-consumption

Instructions:
 1. Fork the given repository
 2. Rename 'Exercise.ipynb' to '( lastname ).ipynb'
 3. Load the given dataset
 4. For No. 2-4, preprocess the dataset then construct, train, and evaluate a **classification** model (You may experiment in this step)
 5. For No. 5-7, preprocess the dataset then construct, train, and evaluate a **regression** model (You may experiment in this step)

In [12]:
import pandas as pd
import numpy as np
import tensorflow as tf

import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.metrics import classification_report, mean_squared_error, accuracy_score
from sklearn.ensemble import RandomForestClassifier, RandomForestRegressor
from sklearn.linear_model import LinearRegression


### 1. Load the Data

In [4]:
df = pd.read_csv('student-mat.csv')
df.head()

Unnamed: 0,school,sex,age,address,famsize,Pstatus,Medu,Fedu,Mjob,Fjob,...,famrel,freetime,goout,Dalc,Walc,health,absences,G1,G2,G3
0,GP,F,18,U,GT3,A,4,4,at_home,teacher,...,4,3,4,1,1,3,6,5,6,6
1,GP,F,17,U,GT3,T,1,1,at_home,other,...,5,3,3,1,1,3,4,5,5,6
2,GP,F,15,U,LE3,T,1,1,at_home,other,...,4,3,2,2,3,3,10,7,8,10
3,GP,F,15,U,GT3,T,4,2,health,services,...,3,2,2,1,1,5,2,15,14,15
4,GP,F,16,U,GT3,T,3,3,other,other,...,4,3,2,1,2,5,4,6,10,10


### 2. Preprocess the Data (Classification)

In [5]:
# Encode categorical variables
label_encoder = LabelEncoder()
categorical_data = ['address', 'sex', 'famsize', 'Pstatus', 'Mjob', 'Fjob', 
                    'reason', 'guardian', 'schoolsup', 'famsup', 'paid', 'activities', 
                    'nursery', 'higher', 'internet', 'romantic']

for col in categorical_data:
    df[col] = label_encoder.fit_transform(df[col])

### 3. Construct and Train the Model (Classification)

In [6]:
X1 = df.drop(['school'], axis=1)
y1 = df['school']
label_mapping = {label: idx for idx, label in enumerate(y1.unique())}
y1 = y1.map(label_mapping)

# Split the data into training and testing sets
X_train1, X_test1, y_train1, y_test1 = train_test_split(X1, y1, test_size=0.2, random_state=42)

In [7]:
clf = RandomForestClassifier(n_estimators=100, random_state=42)

clf.fit(X_train1, y_train1)

y_pred = clf.predict(X_test1)

### 4. Evaluate the Model (Classification)

In [16]:
accuracy = accuracy_score(y_test1, y_pred)
print(f"Accuracy: {accuracy}")

Accuracy: 0.8481012658227848


### 5. Preprocess the Data (Regression)

In [17]:
df['school'] = label_encoder.fit_transform(df['school'])
X2 = df.drop(['G3'], axis=1)
y2 = df['G3']

X_train2, X_test2, y_train2, y_test2 = train_test_split(X2, y2, test_size=0.2, random_state=1)

scaler = StandardScaler()
X_train_reg = scaler.fit_transform(X_train2)
X_test_reg = scaler.transform(X_test2)

### 6. Construct and Train the Model (Regression)

In [18]:
reg_model = LinearRegression()

reg_model.fit(X_train_reg, y_train2)

y_pred_reg = reg_model.predict(X_test_reg)

### 7. Evaluate the Model (Regression)

In [22]:
mse = mean_squared_error(y_test2, y_pred_reg)
print(f"Mean Squared Error: {mse}")

Mean Squared Error: 3.600790709306769
