# 4 Hands On: Evaluation for Supervised-learning Models

Evaluating supervised learning methods is an essential aspect of understanding their performance and selecting the most appropriate algorithm for a given task.
Following, some exercises for evaluating different supervised learning methods:

0. **Loading data:** We load the data from the Iris flower dataset.

In [3]:
import pandas as pd

df = pd.read_csv(filepath_or_buffer='data/04_modeleval/IRIS.csv',sep=',')
df

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,Iris-setosa
1,4.9,3.0,1.4,0.2,Iris-setosa
2,4.7,3.2,1.3,0.2,Iris-setosa
3,4.6,3.1,1.5,0.2,Iris-setosa
4,5.0,3.6,1.4,0.2,Iris-setosa
...,...,...,...,...,...
145,6.7,3.0,5.2,2.3,Iris-virginica
146,6.3,2.5,5.0,1.9,Iris-virginica
147,6.5,3.0,5.2,2.0,Iris-virginica
148,6.2,3.4,5.4,2.3,Iris-virginica


1. **Splitting the data:** Divide the available labeled dataset into training and testing sets. The training set is used to train the model, while the testing set serves as an independent sample for evaluating its performance. A common practice is to use a 70-30 or 80-20 split, but the choice may depend on the dataset size.

In [4]:
from sklearn.model_selection import train_test_split
X = df[['sepal_length', 'sepal_width', 'petal_length', 'petal_width']]
y = df['species']
# Splitting the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

2. **Training and testing:** Train the chosen supervised learning algorithms using the training set. After training, apply the models to the testing set to generate predictions or class labels.

In [5]:
from sklearn.svm import SVC

# Example for training and testing a Support Vector Classifier
model = SVC()
model.fit(X_train, y_train)

y_pred = model.predict(X_test)

3. **Selecting and Computing evaluation metrics:** Determine appropriate evaluation metrics based on the problem at hand. Common metrics for classification tasks include accuracy, precision, recall, F1 score, and area under the receiver operating characteristic curve (AUC-ROC). For regression tasks, metrics like mean squared error (MSE), root mean squared error (RMSE), mean absolute error (MAE), and R-squared are commonly used.
Then, calculate the selected evaluation metrics using the model's predictions and the ground truth labels or values from the testing set. This step allows you to assess the model's performance in terms of the selected metrics.

In [9]:
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred, average='micro')
recall = recall_score(y_test, y_pred, average='micro')
f1 = f1_score(y_test, y_pred, average='micro')

print("Accuracy:", accuracy)
print("Precision:", precision)
print("Recall:", recall)
print("F1 Score:", f1)

Accuracy: 1.0
Precision: 1.0
Recall: 1.0
F1 Score: 1.0


4. **Cross-validation:** To gain more robust performance estimates, employ techniques like k-fold cross-validation. Instead of a single train-test split, cross-validation divides the data into k subsets, or "folds," and iteratively trains and tests the model on different combinations of these subsets. This helps assess the model's average performance across multiple splits, reducing the impact of a single random split.

In [10]:
from sklearn.model_selection import cross_val_score

# Example of cross-validation using 5 folds
scores = cross_val_score(model, X, y, cv=5)
print("Cross-Validation Scores:", scores)
print("Average Accuracy:", scores.mean())


Cross-Validation Scores: [0.96666667 0.96666667 0.96666667 0.93333333 1.        ]
Average Accuracy: 0.9666666666666666


5. **Hyperparameter tuning:** Optimize the models' performance by tuning their hyperparameters. Hyperparameters are adjustable settings that influence the learning process, such as the learning rate, regularization strength, or number of hidden layers in neural networks. Techniques like grid search or random search can help explore different hyperparameter combinations and select the best ones based on cross-validation results.

In [12]:
from sklearn.model_selection import GridSearchCV

# Example of hyperparameter tuning using grid search
param_grid = {'C': [0.1, 1, 10], 'kernel': ['linear', 'rbf']}
grid_search = GridSearchCV(model, param_grid, cv=5)
grid_search.fit(X_train, y_train)

best_params = grid_search.best_params_
best_model = grid_search.best_estimator_
print('Best parameters', best_params)

Best parameters {'C': 1, 'kernel': 'linear'}
