# Performance Metrics

After you’ve trained your model using the training set, you want to test it with the test set. But which metrics should you use with your test set?

There are several metrics for evaluating machine learning models, depending on whether you are working with a regression model or a classification model:
- For **regression models**, you want to look at mean squared error and R2. `Mean squared error` is calculated by computing the square of all errors and averaging them over all observations. The lower this number is, the more accurate your predictions were. `R2 (pronounced R-Squared)` is the percentage of the observed variance from the mean that is explained (that is, predicted) by your model. R2 always falls between 0 and 1, and a higher number is better.
- For **classification models**, the most simple metric for evaluating a model is accuracy. `Accuracy` is a common word, but in this case we have a very specific way of calculating it. Accuracy is the percentage of observations which were correctly predicted by the model. Accuracy is simple to understand, but should be interpreted with caution, in particular when the various classes to predict are unbalanced.
Another metric you might come across is the **ROC AUC**, which is a measure of accuracy and stability. AUC stands for “area under the curve”. A higher ROC AUC generally means you have a better model. **Logarithmic loss**, or log loss, is a metric often used in competitions like those run by Kaggle, and it is applied when your classification model outputs not strict classifications (e.g., true and false) but class membership probabilities (e.g., a 10 percent chance of being true, a 75 percent chance of being true, etc.). Log loss applies heavier penalties to incorrect predictions that your model made with high confidence.

In [None]:
see spark/625_4_0... notebook for ROC and other metrics explained

- Data exploration techniques can help in improving model's performance

An interesting way to evaluate the results is by means of a confusion matrix, which shows the correct and incorrect predictions for each class. In the first row, the first column indicates how many classes 0 were predicted correctly, and the second column, how many classes 0 were predicted as 1. In the second row, we note that all class 1 entries were erroneously predicted as class 0.

Therefore, the higher the diagonal values of the confusion matrix the better, indicating many correct predictions.

In [None]:
from sklearn.metrics import confusion_matrix
from matplotlib import pyplot as plt

conf_mat = confusion_matrix(y_true=y_test, y_pred=y_pred)
print('Confusion matrix:\n', conf_mat)

labels = ['Class 0', 'Class 1']
fig = plt.figure()
ax = fig.add_subplot(111)
cax = ax.matshow(conf_mat, cmap=plt.cm.Blues)
fig.colorbar(cax)
ax.set_xticklabels([''] + labels)
ax.set_yticklabels([''] + labels)
plt.xlabel('Predicted')
plt.ylabel('Expected')
plt.show()