# Algorithm Evaluation Metrics

- In this lesson, various algorithm evaluation metrics are demonstrated for both classification and regression type machine learning problems
     - For classification metrics, the Pima Indians onset of diabetes dataset is used as demon- stration. This is a binary classification problem where all of the input variables are numeric.
     - For regression metrics, the Boston House Price dataset is used as demonstration. this is a regression problem where all of the input variables are also numeric.

## Classification Metrics

- Classification Accuracy. 
- Logarithmic Loss.
- Area Under ROC Curve. 
- Confusion Matrix.
- Classification Report.

### Classification Accuracy
- Classification accuracy is the number of correct predictions made as a ratio of all predictions made.
- It is really only suitable when there are an equal number of observations in each class, which is rare.

In [4]:
import warnings
warnings.filterwarnings('ignore')

# Cross Validation Classification Accuracy
from pandas import read_csv
from sklearn.model_selection import KFold
from sklearn.model_selection import cross_val_score
from sklearn.linear_model import LogisticRegression

filename = 'pima-indians-diabetes.csv'
names = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class'] 
dataframe = read_csv(filename, names=names)
array = dataframe.values
X = array[:,0:8]
Y = array[:,8]

kfold = KFold(n_splits=10, random_state=7)
model = LogisticRegression()
scoring = 'accuracy'
results = cross_val_score(model, X, Y, cv=kfold, scoring=scoring) 
print("Accuracy: %.3f (%.3f)" % (results.mean(), results.std()))

Accuracy: 0.770 (0.048)


### Logarithmic Loss
- Evaluating the predictions of probabilities of membership to a given class.
- The scalar probability between 0 and 1 can be seen as a measure of confidence for a prediction by an algorithm.
- Predictions that are correct or incorrect are rewarded or punished proportionally to the confidence of the prediction.
- Smaller logloss is better with 0 representing a perfect logloss.

In [6]:
kfold = KFold(n_splits=10, random_state=7)
model = LogisticRegression()
scoring = 'neg_log_loss'
results = cross_val_score(model, X, Y, cv=kfold, scoring=scoring) 
print("Logloss: %.3f (%.3f)" % (results.mean(), results.std()))

Logloss: -0.493 (0.047)


### Area Under ROC Curve
- Performance metric for binary classification problems.
- Ability to discriminate between positive and negative classes.
- An area of 1.0 represents a model that made all predictions perfectly.
- An area of 0.5 represents a model that is as good as random.
- A binary classification problem is really a trade-off between sensitivity and specificity.
    - Sensitivity is the true positive rate also called the recall. It is the number of instances from the positive (first) class that actually predicted correctly.
    - Specificity is also called the true negative rate. Is the number of instances from the
negative (second) class that were actually predicted correctly.

In [7]:
kfold = KFold(n_splits=10, random_state=7)
model = LogisticRegression()
scoring = 'roc_auc'
results = cross_val_score(model, X, Y, cv=kfold, scoring=scoring)
print("AUC: %.3f (%.3f)" % (results.mean(), results.std()))

AUC: 0.824 (0.041)


### Confusion Matrix
- A table of cells, where each cell represent 

In [10]:
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix

test_size = 0.33
seed = 7
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=test_size,
    random_state=seed)
model = LogisticRegression()
model.fit(X_train, Y_train)
predicted = model.predict(X_test)
matrix = confusion_matrix(Y_test, predicted)
print(matrix)

[[141  21]
 [ 41  51]]


- Example of the confusion matrix (suppose we're doing binary classifying Yes/No)
    - How many Actual No and Predicted No
    - How many Actual Yes and Predicted Yes
    - How many Actual No and Predicted Yes
    - How many Actual Yes and Predicted No
<br/> <img alt="" src="images/confmat.png" width="500px" height="500px" />

### Classification Report
- The classification report() function displays the precision, recall, F1-score and support for each class
- The example below demonstrates the report on the binary classification problem

In [12]:
from sklearn.metrics import classification_report

test_size = 0.33
seed = 7
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=test_size,
    random_state=seed)
model = LogisticRegression()
model.fit(X_train, Y_train)
predicted = model.predict(X_test)
report = classification_report(Y_test, predicted)
print(report)

              precision    recall  f1-score   support

         0.0       0.77      0.87      0.82       162
         1.0       0.71      0.55      0.62        92

   micro avg       0.76      0.76      0.76       254
   macro avg       0.74      0.71      0.72       254
weighted avg       0.75      0.76      0.75       254



<img alt="" src="images/metrics.png" />

- Precision means the percentage of your results which are relevant
- Recall refers to the percentage of total relevant results correctly classified by your algorithm
- Full description of those measure
https://towardsdatascience.com/precision-vs-recall-386cf9f89488


<img alt="" src="images/f1-score.png" />

- F1 Score: The harmonic mean of precision and recall

## Regression Metrics

- Mean Absolute Error
- Mean Squared Error 
- R2

### Mean Absolute Error
- The sum of the absolute differences between predictions and actual values
- how wrong the predictions were and the magnitude of the error, but no idea of the direction
- A value of 0 indicates no error or perfect predictions.

In [15]:
# Cross Validation Regression MAE
from pandas import read_csv
from sklearn.model_selection import KFold
from sklearn.model_selection import cross_val_score
from sklearn.linear_model import LinearRegression

filename = 'housing.csv'
names = ['CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX', 'RM', 'AGE', 'DIS', 'RAD', 'TAX', 'PTRATIO', 
         'B', 'LSTAT', 'MEDV']
dataframe = read_csv(filename, delim_whitespace=True, names=names) 
array = dataframe.values
X = array[:,0:13]
Y = array[:,13]

kfold = KFold(n_splits=10, random_state=7)
model = LinearRegression()
scoring = 'neg_mean_absolute_error'
results = cross_val_score(model, X, Y, cv=kfold, scoring=scoring) 
print("MAE: %.3f (%.3f)" % (results.mean(), results.std()))

MAE: -4.005 (2.084)


### Mean Squared Error
- Provides a gross idea of the magnitude of error
- Taking the square root of the mean squared error
- This is called the Root Mean Squared Error (or RMSE)

In [16]:
num_folds = 10
kfold = KFold(n_splits=10, random_state=7)
model = LinearRegression()
scoring = 'neg_mean_squared_error'
results = cross_val_score(model, X, Y, cv=kfold, scoring=scoring) 
print("MSE: %.3f (%.3f)" % (results.mean(), results.std()))

MSE: -34.705 (45.574)


### R2 Metric
- An indication of the goodness of fit of a set of predictions to the actual values
- This is called the coefficient of determination
- This is a value between 0 and 1 for no-fit and perfect fit respectively
- You can see the predictions have a poor fit to the actual values with a value closer to zero and less than 0.5

In [17]:
kfold = KFold(n_splits=10, random_state=7)
model = LinearRegression()
scoring = 'r2'
results = cross_val_score(model, X, Y, cv=kfold, scoring=scoring) 
print("R^2: %.3f (%.3f)" % (results.mean(), results.std()))

R^2: 0.203 (0.595)
