# Performance Metrics Hands-On Exercise

This hands-on exercise encourages the utilization of different evaluation metrics using this dataset: https://www.kaggle.com/datasets/uciml/student-alcohol-consumption

Instructions:
 1. Fork the given repository
 2. Rename 'Exercise.ipynb' to '( lastname ).ipynb'
 3. Load the given dataset
 4. For No. 2-4, preprocess the dataset then construct, train, and evaluate a **classification** model (You may experiment in this step)
 5. For No. 5-7, preprocess the dataset then construct, train, and evaluate a **regression** model (You may experiment in this step)

In [1]:
import pandas as pd
import numpy as np
import tensorflow as tf

from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.linear_model import LinearRegression

import matplotlib.pyplot as plt
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, mean_absolute_error, mean_squared_error, r2_score
from sklearn.model_selection import train_test_split

### 1. Load the Data

In [2]:
df = pd.read_csv('16-student-mat.csv')
df.head()

Unnamed: 0,school,sex,age,address,famsize,Pstatus,Medu,Fedu,Mjob,Fjob,...,famrel,freetime,goout,Dalc,Walc,health,absences,G1,G2,G3
0,GP,F,18,U,GT3,A,4,4,at_home,teacher,...,4,3,4,1,1,3,6,5,6,6
1,GP,F,17,U,GT3,T,1,1,at_home,other,...,5,3,3,1,1,3,4,5,5,6
2,GP,F,15,U,LE3,T,1,1,at_home,other,...,4,3,2,2,3,3,10,7,8,10
3,GP,F,15,U,GT3,T,4,2,health,services,...,3,2,2,1,1,5,2,15,14,15
4,GP,F,16,U,GT3,T,3,3,other,other,...,4,3,2,1,2,5,4,6,10,10


### 2. Preprocess the Data (Classification)

In [3]:
# Encode categorical variables
encoder = LabelEncoder()
columns = ['school', 'sex', 'address', 'famsize', 'Pstatus', 'Mjob', 'Fjob', 
                    'reason', 'guardian', 'schoolsup', 'famsup', 'paid', 'activities', 
                    'nursery', 'higher', 'internet', 'romantic']

for col in columns:
    df[col] = encoder.fit_transform(df[col])

In [4]:
df.head()

Unnamed: 0,school,sex,age,address,famsize,Pstatus,Medu,Fedu,Mjob,Fjob,...,famrel,freetime,goout,Dalc,Walc,health,absences,G1,G2,G3
0,0,0,18,1,0,0,4,4,0,4,...,4,3,4,1,1,3,6,5,6,6
1,0,0,17,1,0,1,1,1,0,2,...,5,3,3,1,1,3,4,5,5,6
2,0,0,15,1,1,1,1,1,0,2,...,4,3,2,2,3,3,10,7,8,10
3,0,0,15,1,0,1,4,2,1,3,...,3,2,2,1,1,5,2,15,14,15
4,0,0,16,1,0,1,3,3,2,2,...,4,3,2,1,2,5,4,6,10,10


In [5]:
X = df.drop(['G1', 'G2', 'G3'], axis=1)
y = df['G3']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=32)

### 3. Construct and Train the Model (Classification)

In [6]:
rf_classifier = RandomForestClassifier(n_estimators=100, random_state=32)
rf_classifier.fit(X_train, y_train)

### 4. Evaluate the Model (Classification)

In [7]:
# Evaluate the model
y_pred = rf_classifier.predict(X_test)

# Accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy of Random Forest Classifier: {accuracy:.2f}")

Accuracy of Random Forest Classifier: 0.19


### 5. Preprocess the Data (Regression)

In [8]:
X = df.drop(['G1', 'G2', 'G3'], axis=1)
y = df['G3']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=32)

sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

### 6. Construct and Train the Model (Regression)

In [9]:
lr = LinearRegression()
lr.fit(X_train, y_train)

### 7. Evaluate the Model (Regression)

In [10]:
# Evaluate the model
y_pred = lr.predict(X_test)

r2 = r2_score(y_test, y_pred)
print('R squared: ', r2)

# Mean Squared Error
mse = mean_squared_error(y_test, y_pred)
print('MSE: ', mse)

# Mean Absolute Error
mae = mean_absolute_error(y_test, y_pred)
print('MAE: ', mae)

R squared:  -0.003645030577715147
MSE:  19.587239981471832
MAE:  3.543978306672522
