# Metrics

## Classification error metrics

### Specificity

$$\frac{\text{TN}}{\text{TN}+\text{FP}}$$

* number of correct negatives out of the actual negatives
* we don't want false positive

### Accuracy 

$$\frac{\text{TP}+\text{TN}}{\text{Total}}$$

* answers the question: how right am I generally ?
* do not use if large class imabalance

### Recall / sensitivity / true positive rate (TPR) 

$$\frac{\text{TP}}{\text{TP}+\text{FN}}$$

* metrics to use when false positive are acceptable
* the closer the value is to 0, the less FN we have

### Precision 

$$\frac{\text{TP}}{\text{TP}+\text{FP}}$$

* proportion of actual positives that were correctly identified
* how good am I at identifying positive results ?
* we don't want false positives

### F1-score

$$\frac{2}{ \frac{1}{ \text{Recall} } + \frac{1}{\text{Precision}} } = \frac{ \text{Recall} * \text{Precision} }{ \text{Recall} + \text{Precision} } * 2$$

* combination of recall and precision -> take into account FP and FN
* should be used when uneven class distribution

### Area under the receiver operating characteristic curve (ROC/AUC)

gives value between 0.5 (total random prediction) and 1 (best).

### Precision-Recall curve (PRC)

* more informative than the ROC curve when evaluating binary classifiers on **unbalanced** datasets

## Regression error metrics

The values we get need to be put into context with the values we are trying to predict. If we try to predict prices and we are off by 4 euros, this error result can be good if we are supposed to predict high price with high variance min-max (700 euros mean) or terrible if we want to predict low price (1 euro mean).

### Mean absolute error

The mean absolute error uses the same scale as the data being measured so we can't use it for comparisons between series using different scales.

Commonly used for forecast error in time series analysis

**does not punish large errors**

### Mean squared error

average of the squares of the errors (average squared difference between the estimated values and the actual values). 

This value is always non negative and the closer we are to 0, the better.

**punish large errors**

### Root Mean square error

The smaller the value of the RMSE, the better the predictive accuracy of the model.

## Analysis on the scores

sklearn.metrics.explained_variance_score gives us information on our predictions against ground truth to try to figure out how we can improve.

Best possible score is 1.0, less is worse.

To also try to make sense of our results, we can plot y_test against y_test in a scatter plot that gives us the best possible predictions as a line and y_test against predictions in the scatter plots to see the difference. We could see info such as outliers that could give us problems.

## Underfit / overfit

* overfit: learned the noise from the train data and not the general pattern => too many features, need more regularization, ... (high variance)
* underfit: doesn't learn anything from the data => model not complex enough, not enough data, ... (high bias)