There are many metrics available to evaluate your machie learning model. But based on the data you have and ouptut expected we can choose different metrics.
In this notebook we will explore different metrics used in classification and regression tasks and their benefits.

For classification examples we will use **Pima indian diabetes** and for regression **housing prediction** database of scikit-learn.

For classification we will use following metrics

    Classification Accuracy.
    Logarithmic Loss.
    Area Under ROC Curve.
    
We will also evaluate two methods which can be used to tell the goodness about your classification model

    Confusion Matrix.
    Classification Report.
    
For regression we wil use following metrics

    Mean Absolute Error.
    Mean Squared Error.
    R^2.

# Classification import, data load and preprocesing

In [15]:
import pandas
from sklearn import model_selection
from sklearn.linear_model import LogisticRegression

url = "https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.data.csv"
names = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']
dataframe = pandas.read_csv(url, names=names)
array = dataframe.values

X = array[:,0:8]
Y = array[:,8]

seed = 7
kfold = model_selection.KFold(n_splits=10, random_state=seed)

model = LogisticRegression()



## 1. Classification accuracy
Classification accuracy is the number of correct predictions made as a ratio of all predictions made.

This is the most common evaluation metric for classification problems, it is also the most misused. It is really only suitable when there are an equal number of observations in each class (which is rarely the case) and that all predictions and prediction errors are equally important, which is often not the case.

In [14]:
scoring = 'accuracy'
results = model_selection.cross_val_score(model, X, Y, cv=kfold, scoring=scoring)
msg = "Accuracy: %.2f (%f)" % ( results.mean(), results.std())
print(msg)

Accuracy: 0.77 (0.048411)


So the model is 77% accurate

## 2. Logarithmic Loss
Logarithmic loss (or logloss) is a performance metric for evaluating the predictions of probabilities of membership to a given class.

The scalar probability between 0 and 1 can be seen as a measure of confidence for a prediction by an algorithm. Predictions that are correct or incorrect are rewarded or punished proportionally to the confidence of the prediction.

In [16]:
scoring = 'neg_log_loss'
results = model_selection.cross_val_score(model, X, Y, cv=kfold, scoring=scoring)
msg = "Log Loss: %.2f (%f)" % ( results.mean(), results.std())
print(msg)

Log Loss: -0.49 (0.046890)


Smaller logloss is better with 0 representing a perfect logloss. 

## 3. Area Under ROC Curve
Area under ROC Curve (or AUC for short) is a performance metric for **binary classification**  problems.

The AUC represents a model’s ability to discriminate between positive and negative classes. **An area of 1.0 represents a model that made all predictions perfectly. An area of 0.5 represents a model as good as random.**

ROC can be broken down into sensitivity and specificity. A binary classification problem is really a trade-off between sensitivity and specificity.

    Sensitivity is the true positive rate also called the recall. It is the number instances from the positive (first) class that actually predicted correctly.

    Specificity is also called the true negative rate. Is the number of instances from the negative class (second) class that were actually predicted correctly.

In [17]:
scoring = 'roc_auc'
results = model_selection.cross_val_score(model, X, Y, cv=kfold, scoring=scoring)
msg = "AUC: %.2f (%f)" % ( results.mean(), results.std())
print(msg)

AUC: 0.82 (0.040709)


## 4. Confusion Matrix

The confusion matrix is a handy presentation of the accuracy of a model with two or more classes.

The table presents predictions on the x-axis and accuracy outcomes on the y-axis. The cells of the table are the number of predictions made by a machine learning algorithm.

For example, a machine learning algorithm can predict 0 or 1 and each prediction may actually have been a 0 or 1. Predictions for 0 that were actually 0 appear in the cell for prediction=0 and actual=0, whereas predictions for 0 that were actually 1 appear in the cell for prediction = 0 and actual=1. And so on.

In [18]:
from sklearn.metrics import confusion_matrix

test_size = 0.33
X_train, X_test, Y_train, Y_test = model_selection.train_test_split(X, Y, test_size=test_size, random_state=seed)

model = LogisticRegression()
model.fit(X_train, Y_train)

predicted = model.predict(X_test)

matrix = confusion_matrix(Y_test, predicted)
print(matrix)

[[141  21]
 [ 41  51]]


Prediction which fall in the diagonal line (141, 51) are right and other two are false

## 5. Classification Report
Scikit-learn does provide a convenience report when working on classification problems to give you a quick idea of the accuracy of a model using a number of measures.

The classification_report() function displays the precision, recall, f1-score and support for each class.

In [20]:

from sklearn.metrics import classification_report

X_train, X_test, Y_train, Y_test = model_selection.train_test_split(X, Y, test_size=test_size, random_state=seed)
model = LogisticRegression()
model.fit(X_train, Y_train)
predicted = model.predict(X_test)
report = classification_report(Y_test, predicted)
print(report)

             precision    recall  f1-score   support

        0.0       0.77      0.87      0.82       162
        1.0       0.71      0.55      0.62        92

avg / total       0.75      0.76      0.75       254



# Regression import, data load and preprocesing

In [21]:
import pandas
from sklearn import model_selection
from sklearn.linear_model import LinearRegression


url = "https://raw.githubusercontent.com/jbrownlee/Datasets/master/housing.data"
names = ['CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX', 'RM', 'AGE', 'DIS', 'RAD', 'TAX', 'PTRATIO', 'B', 'LSTAT', 'MEDV']
dataframe = pandas.read_csv(url, delim_whitespace=True, names=names)
array = dataframe.values


X = array[:,0:13]
Y = array[:,13]


seed = 7
kfold = model_selection.KFold(n_splits=10, random_state=seed)
model = LinearRegression()

## 1. Mean Absolute Error
The Mean Absolute Error (or MAE) is the sum of the absolute differences between predictions and actual values. It gives an idea of how wrong the predictions were.

The measure gives an idea of the magnitude of the error, but no idea of the direction (e.g. over or under predicting).

In [23]:
scoring = 'neg_mean_absolute_error'
results = model_selection.cross_val_score(model, X, Y, cv=kfold, scoring=scoring)
msg = "MAE: %.2f (%f)" % ( results.mean(), results.std())
print(msg)

MAE: -4.00 (2.083599)


## 2. Mean Squared Error
The Mean Squared Error (or MSE) is much like the mean absolute error in that it provides a gross idea of the magnitude of error.

Taking the square root of the mean squared error converts the units back to the original units of the output variable and can be meaningful for description and presentation. This is called the Root Mean Squared Error (or RMSE).

In [25]:
scoring = 'neg_mean_squared_error'
results = model_selection.cross_val_score(model, X, Y, cv=kfold, scoring=scoring)
msg = "MSE: %.2f (%f)" % ( results.mean(), results.std())
print(msg)

MSE: -34.71 (45.573999)


## 3. R^2 Metric
The R^2 (or R Squared) metric provides an indication of the goodness of fit of a set of predictions to the actual values. In statistical literature, this measure is called the coefficient of determination.

This is a value between 0 and 1 for no-fit and perfect fit respectively.

In [26]:
scoring = 'r2'
results = model_selection.cross_val_score(model, X, Y, cv=kfold, scoring=scoring)
msg = "R^2: %.2f (%f)" % ( results.mean(), results.std())
print(msg)

R^2: 0.20 (0.595296)
