<center>
  <a href="MLSD-06-MachineLearningErrors-A.ipynb" target="_self">Machine Learning Errors A</a> | <a href="./">Content Page</a> | <a href="MLSD-07-ModelEvaluation-Ex-1.ipynb">Model Evaluation Exercise 1</a>
</center>

# <center>MODEL EVALUATION A</center>

<center><b>Copyright &copy 2023 by DR DANNY POO</b><br> e:dannypoo@nus.edu.sg<br> w:drdannypoo.com</center><br>

# Why?
Machine Learning models are evaluated to determine if they are performing to an acceptable level. The chosen metrics to evaluate the machine learning models are very important.
The choice of metrics influences how the performance of machine learning algorithms is measured and compared. Ultimately, they influence how you weight the importance of different characteristics in the results and your choice of which algorithm to choose.

# Metrics
The metrics include:
- <b>Classification metrics</b>: Accuracy, Logistic Loss (Log Loss), Area Under ROC Curve (AUC).
- <b>Classification Prediction Results Reporting</b>: Confusion Matrix and Classification Report.
- <b>Regression Metrics</b>: Mean Absolute Error, Mean Squared Error and R Squared.

# Dataset Used
Pima Indians Diabetes

# Read in and Explore Data Set

In [None]:
# Import libraries
import pandas as pd
from sklearn import model_selection
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import confusion_matrix

In [None]:
# Read in data
df = pd.read_csv('./data/diabetes/diabetes.csv')
df.head()

In [None]:
# Create features and target
array = df.values
X = array[:,0:8]
y = array[:,8]

In [None]:
# Print X shape and first 5 rows
print(X.shape, "\n", X[0:5])

In [None]:
# Print y shape and first 5 rows
print(y.shape, "\n", y[0:5])

# Classification Metrics

## Classification Accuracy
- It is the number of correct predictions made as a ratio of all predictions made.

In [None]:
# Define a K-Folds cross-validator.
# Split dataset into k consecutive folds (without shuffling by default).
# Each fold is then used once as a validation while the k - 1 remaining folds form the training set.
kfold = model_selection.KFold(n_splits=10, random_state=7, shuffle=True)

In [None]:
# Instantiate a Logistic Regression classifier using 'liblinear' library with regularization applied.
model = LogisticRegression(solver='liblinear')

In [None]:
# Evaluate a score by cross-validation. Scoring used is 'accuracy'
scoring = 'accuracy'
results = model_selection.cross_val_score(model, X, y, cv=kfold, scoring=scoring)
print("Accuracy: \nMean value = %.3f \nStandard Deviation = %.3f" % (results.mean(), results.std()))

**Observations**:
- Accuracy score is 77.1% accurate.

## Logistic Loss (Log Loss)
- It is a performance metric for evaluating the predictions of probabilities of membership to a given class.
- The scalar probability between 0 and 1 can be seen as a measure of confidence for a prediction by an algorithm. 

In [None]:
# Define a K-Folds cross-validator.
# Split dataset into k consecutive folds (without shuffling by default).
# Each fold is then used once as a validation while the k - 1 remaining folds form the training set.
kfold = model_selection.KFold(n_splits=10, random_state=7, shuffle=True)

In [None]:
# Instantiate a Logistic Regression classifier using 'liblinear' library with regularization applied.
model = LogisticRegression(solver='liblinear')

In [None]:
# Evaluate a score by cross-validation. Scoring used is 'negative log loss'
scoring = 'neg_log_loss'
results = model_selection.cross_val_score(model, X, y, cv=kfold, scoring=scoring)
print("Logloss: \nMean value = %.3f \nStandard Deviation = %.3f" % (results.mean(), results.std()))

**Observations**:
-  The measure is inverted to be ascending when using the cross_val_score() function.

## Area Under ROC Curve
- It is a performance metric for binary classification problems.
- The AUC represents a model’s ability to discriminate between positive and negative classes. 
- An area of 1.0 represents a model that made all predictions perfectly. 
- An area of 0.5 represents a model as good as random.

In [None]:
# Define a K-Folds cross-validator.
# Split dataset into k consecutive folds (without shuffling by default).
# Each fold is then used once as a validation while the k - 1 remaining folds form the training set.
kfold = model_selection.KFold(n_splits=10, random_state=7, shuffle=True)

In [None]:
# Instantiate a Logistic Regression classifier using 'liblinear' library with regularization applied.
model = LogisticRegression(solver='liblinear')

In [None]:
# Evaluate a score by cross-validation. Scoring used is 'roc auc'
scoring = 'roc_auc'
results = model_selection.cross_val_score(model, X, y, cv=kfold, scoring=scoring)
print("AUC: \nMean value = %.3f \nStandard Deviation = %.3f" % (results.mean(), results.std()))

**Observations**:
-  The AUC is relatively close to 1 and greater than 0.5, suggesting good predictions.

# Classification Prediction Results Reporting

## Confusion Matrix
- It presents the accuracy of a model with two or more classes.
- The table presents predictions on the x-axis and accuracy outcomes on the y-axis.
- The cells of the table are the number of predictions made by a machine learning algorithm.

![image.png](attachment:image.png)

In [None]:
# Split data into training and test sets
test_size = 0.3
X_train, X_test, y_train, y_test = model_selection.train_test_split(X, y, test_size=test_size, random_state=7)

In [None]:
# Instantiate a Logistic Regression classifier using 'liblinear' library with regularization applied.
model = LogisticRegression(solver='liblinear')

In [None]:
# Construct Confusion Matrix
model.fit(X_train, y_train)
predicted = model.predict(X_test)
matrix = confusion_matrix(y_test, predicted)
print(matrix)

**Observations**:
-  Accuracy score is 76.2% accurate because there were 176 (130+46) out of 231 (130+17+38+46) predicted correctly.

## Classification Report
- The classification_report() function displays the precision, recall, f1-score and support for each class.

In [None]:
# Importing libraries
from sklearn.metrics import precision_score
from sklearn.metrics import recall_score
from sklearn.metrics import f1_score
from sklearn.metrics import classification_report

In [None]:
# Split data into training and test sets
test_size = 0.3
X_train, X_test, y_train, y_test = model_selection.train_test_split(X, y, test_size=test_size, random_state=7)

In [None]:
# Instantiate a Logistic Regression classifier using 'liblinear' library with regularization applied.
model = LogisticRegression(solver='liblinear')

In [None]:
# Fit model
model.fit(X_train, y_train)
predicted = model.predict(X_test)

In [None]:
# Print Precision score of classifier
print(f"Precision Score of the classifier is: {precision_score(y_test, predicted)}")

In [None]:
# Print Recall score of classifier
print(f"Recall Score of the classifier is: {recall_score(y_test, predicted)}")

In [None]:
# Print F1 Score of classifier
print(f"F1 Score of the classifier is: {f1_score(y_test, predicted)}")

In [None]:
# Print Classification Report
report = classification_report(y_test, predicted)
print(report)

**Observations**:
- There is good prediction and recall for the algorithm.

# Regression Metrics

## Read in and Explore Data Set

In [None]:
# Import libraries
from sklearn.linear_model import LinearRegression

In [None]:
# Read in data
columns = ['CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX', 'RM', 'AGE', 'DIS', 'RAD', 'TAX', 'PTRATIO', 'B', 'LSTAT', 'MEDV']
df = pd.read_csv('./data/boston/housing.txt', delim_whitespace=True, names=columns)
df.head()

In [None]:
# Create features and target
array = df.values
X = array[:,0:13]
y = array[:,13]

In [None]:
# Print X shape and first 5 rows
print(X.shape, "\n", X[0:5])

In [None]:
# Print y shape and first 5 rows
print(y.shape, "\n", y[0:5])

## Mean Absolute Error (MAE)
- It is the average of the absolute differences between predictions and actual values. 
- It gives an idea of how wrong the predictions were.
- The measure gives an idea of the magnitude of the error, but no idea of the direction (e.g. over or under predicting).

In [None]:
# Define a K-Folds cross-validator.
# Split dataset into k consecutive folds (without shuffling by default).
# Each fold is then used once as a validation while the k - 1 remaining folds form the training set.
kfold = model_selection.KFold(n_splits=10, random_state=7, shuffle=True)

In [None]:
# Instantiate a Linear Regression classifier.
model = LinearRegression()

In [None]:
# Evaluate a score by cross-validation. Scoring used is 'negative mean absolute error'
scoring = 'neg_mean_absolute_error'
results = model_selection.cross_val_score(model, X, y, cv=kfold, scoring=scoring)
print("MAE: \nMean value = %.3f \nStandard Deviation = %.3f" % (results.mean(), results.std()))

**Observations**:
- A value of 0 indicates no error or perfect predictions.
- This metric is inverted by the cross_val_score() function.

## Mean Squared Error (MSE)
- The Mean Squared Error (or MSE) is much like the mean absolute error in that it provides a gross idea of the magnitude of error.
- Taking the square root of the mean squared error converts the units back to the original units of the output variable and can be meaningful for description and presentation. This is called the <b>Root Mean Squared Error (or RMSE)</b>.


In [None]:
# Define a K-Folds cross-validator.
# Split dataset into k consecutive folds (without shuffling by default).
# Each fold is then used once as a validation while the k - 1 remaining folds form the training set.
kfold = model_selection.KFold(n_splits=10, random_state=7, shuffle=True)

In [None]:
# Instantiate a Linear Regression classifier.
model = LinearRegression()

In [None]:
# Evaluate a score by cross-validation. Scoring used is 'negative mean squared error'
scoring = 'neg_mean_squared_error'
results = model_selection.cross_val_score(model, X, y, cv=kfold, scoring=scoring)
print("MSE: \nMean value = %.3f \nStandard Deviation = %.3f" % (results.mean(), results.std()))

**Observations**:
- This metric is inverted by the cross_val_score() function.
- RMSE is calculated by taking the square root of the absolute value.

## R Squared Metric
- The R^2 (or R Squared) metric provides an indication of the goodness of fit of a set of predictions to the actual values. Also known as the <b>coefficient of determination</b>.
- This is a value between 0 and 1 for no-fit and perfect fit respectively.

In [None]:
# Define a K-Folds cross-validator.
# Split dataset into k consecutive folds (without shuffling by default).
# Each fold is then used once as a validation while the k - 1 remaining folds form the training set.
kfold = model_selection.KFold(n_splits=10, random_state=7, shuffle=True)

In [None]:
# Instantiate a Linear Regression classifier.
model = LinearRegression()

In [None]:
# Evaluate a score by cross-validation. Scoring used is 'r squared'
scoring = 'r2'
results = model_selection.cross_val_score(model, X, y, cv=kfold, scoring=scoring)
print("R^2: \nMean value = %.3f \nStandard Deviation = %.3f" % (results.mean(), results.std()))

**Observations**:
- The predictions have a good fit to the actual values with a value close to one at 0.718.

<center>
  <a href="MLSD-06-MachineLearningErrors-A.ipynb" target="_self">Machine Learning Errors A</a> | <a href="./">Content Page</a> | <a href="MLSD-07-ModelEvaluation-Ex-1.ipynb">Model Evaluation Exercise 1</a>
</center>