# Machine Learning with Python

In [None]:
import matplotlib.pyplot as plt
import numpy as np

## 2.3 Evaluation

There are many metrics that we may want to use to evaluate performance of supervised learning.

### [Evaluating Classifiers](https://scikit-learn.org/stable/modules/model_evaluation.html#classification-metrics)

`sklearn.metrics` provides most of the commonly-used metrics.

Some of these are restricted to binary classifiers, but others are also defined for multiclass (several possible values for `y`) and/or multilabel (potential for multiple simultaneous values for `y`) problems. 

In [None]:
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import RocCurveDisplay
from sklearn.datasets import load_breast_cancer

cancer = load_breast_cancer()
X_train, X_test, y_train, y_test = train_test_split(cancer.data, cancer.target, random_state=42)


In [None]:

svc = SVC(probability=True, random_state=42)
svc.fit(X_train, y_train)

In [None]:
y_pred = svc.predict(X_test)
y_pred

In [None]:
from sklearn.metrics import classification_report
print( classification_report(y_test,y_pred) )

The receiver-operating characteristic (ROC) curve gives a useful visual evaluation for any method that can return probabilities or prediction scores. The following code works for binary classification (there is an alternative for multiclass classification):

In [None]:
from sklearn.metrics import RocCurveDisplay
svc_disp = RocCurveDisplay.from_estimator(svc, X_test, y_test)
plt.show()

We can also get the area under the curve (AUC) as a metric:

In [None]:
from sklearn.metrics import roc_auc_score
probs = svc.predict_proba(X_test)
auc = roc_auc_score(y_test,probs[:,1])
print("AUC =",auc)

The best possible classifier would have AUC = 1. A binary classifier with random guessing has AUC = 0.5.

We can compare classifiers visually by plotting multiple ROC curves on the same axes.

In [None]:
from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier(n_neighbors=2, metric='minkowski', p=2)
knn.fit(X_train,y_train)

knn_disp = RocCurveDisplay.from_estimator(knn, X_test, y_test)
ax = plt.gca()
svc_disp.plot(ax)
plt.show()


The precision-recall (PR) curve is also a useful evaluation for tasks where we are most interested in eliminating false positives, e.g. screening a population for a disease.

In [None]:
from sklearn.metrics import PrecisionRecallDisplay

display = PrecisionRecallDisplay.from_estimator(svc, X_test, y_test)
#pre,rec,thresholds = precision_recall_curve(y_test,probs[:,1])

The [weighted average precision](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.average_precision_score.html#sklearn.metrics.average_precision_score) over this curve is often quoted as a metric:

In [None]:
from sklearn.metrics import average_precision_score
avg_pre = average_precision_score(y_test, probs[:,1])
print("Average precision =",avg_pre)

### [Evaluating Regressors](https://scikit-learn.org/stable/modules/model_evaluation.html#regression-metrics)

Once again, there are several metrics for evaluation of regression - the user guide has full details for each one.

In [None]:
from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split
diabetes = load_diabetes()
X_train, X_test, y_train, y_test = train_test_split(diabetes.data, diabetes.target, random_state=0)

In [None]:
from sklearn.neural_network import MLPRegressor
nn = MLPRegressor(hidden_layer_sizes=(100),max_iter=10000)
nn.fit(X_train,y_train)

In [None]:
y_pred = nn.predict(X_test)

In [None]:
from sklearn.metrics import mean_absolute_error,mean_squared_error,max_error,r2_score

print("Mean Absolute Error, MAE = %.2f" % mean_absolute_error(y_test, y_pred))
print("Mean squared error, MSE = %.2f" % mean_squared_error(y_test, y_pred))
print("Max Error = %.2f" % max_error(y_test, y_pred))
print("Coefficient of determination, r2 = %.2f" % r2_score(y_test, y_pred))


## Exercise

Use ROC curves to compare the performance of a Decision Tree and Logistic Regressor on the `breast_cancer` dataset.

Does a random forest outperform a simple linear model for the `wine_quality_white` dataset?