## Performance Metrics
We are using two classic datasets:

* Pima Indians Diabetes Dataset – for binary classification.
* Boston Housing Dataset – for regression.

Both datasets are loaded using `pandas.read_csv()` and converted into NumPy arrays. We split them into input features (X) and target outputs (Y) for training and evaluation.

In [1]:
from pandas import read_csv

# Pima Indians Diabetes Dataset
url = 'https://raw.githubusercontent.com/erojaso/MLMasteryEndToEnd/master/data/pima-indians-diabetes.data.csv'
column_names = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']
data = read_csv(url, names=column_names)
X1 = data.values[:, 0:8]
Y1 = data.values[:, 8]
print("Shape of X1:", X1.shape)
print("Shape of Y1:", Y1.shape)

# Boston House Price Dataset
url2 = 'https://raw.githubusercontent.com/erojaso/MLMasteryEndToEnd/master/data/housing.NAN.adjust.csv'
names = ['CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX', 'RM', 'AGE', 'DIS', 'RAD',
         'TAX', 'PTRATIO', 'B', 'LSTAT', 'MEDV']
df2 = read_csv(url2, names=names)
X2 = df2.values[:, 0:13]
Y2 = df2.values[:, 13]
print("Shape of X2:", X2.shape)
print("Shape of Y2:", Y2.shape)

Shape of X1: (768, 8)
Shape of Y1: (768,)
Shape of X2: (506, 13)
Shape of Y2: (506,)


## Classification Metrics (Pima Dataset)

### Classification Accuracy
Accuracy is the most intuitive performance measure. It is
simply the ratio of correctly predicted observations to the total observations.
We evaluate accuracy using 10-fold cross-validation with logistic regression.

In [2]:
from sklearn.model_selection import KFold, cross_val_score
from sklearn.linear_model import LogisticRegression

model = LogisticRegression(solver='liblinear')
results = cross_val_score(model, X1, Y1, cv=KFold(n_splits=10), scoring='accuracy')
print("Accuracy: %.3f%%, Standard Deviation: (%.3f%%)" % (results.mean()*100.0, results.std()*100.0))

Accuracy: 76.951%, Standard Deviation: (4.841%)


### Logarithmic Loss (Log Loss)
Log Loss measures the performance of a classification model where the prediction input is a probability value between 0 and 1.
Lower log loss indicates better performance. Negative log loss `(neg_log_loss)` is used because scikit-learn expects higher values to be better.



In [3]:
results = cross_val_score(model, X1, Y1, cv=KFold(n_splits=10), scoring='neg_log_loss')
print("Log Loss: %.3f, Standard Deviation: (%.3f)" % (results.mean(), results.std()))

Log Loss: -0.493, Standard Deviation: (0.047)


### Area Under the ROC Curve (AUC)
he AUC metric summarizes the performance of a binary classifier over all classification thresholds. It measures the model's ability to distinguish between classes.
* AUC = 1: Perfect classifier
* AUC = 0.5: No discrimination (random guessing)

In [4]:
results = cross_val_score(model, X1, Y1, cv=KFold(n_splits=10), scoring='roc_auc')
print("AUC: %.3f, Standard Deviation: (%.3f)" % (results.mean(), results.std()))

AUC: 0.824, Standard Deviation: (0.041)


### Confusion Matrix
A confusion matrix is a summary of prediction results on a classification problem. It shows the ways in which your classification model is confused when it makes predictions.

In [5]:
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix

X_train, X_test, Y_train, Y_test = train_test_split(X1, Y1, test_size=0.33, random_state=7)
model.fit(X_train, Y_train)
predicted = model.predict(X_test)
print(confusion_matrix(Y_test, predicted))

[[141  21]
 [ 41  51]]


### Classification Report
This report includes precision, recall, F1-score, and support for each class.

* Precision: Correct positive predictions out of total predicted positives

* Recall (Sensitivity): Correct positive predictions out of total actual positives

* F1-Score: Harmonic mean of precision and recall



In [6]:
from sklearn.metrics import classification_report
print(classification_report(Y_test, predicted))

              precision    recall  f1-score   support

         0.0       0.77      0.87      0.82       162
         1.0       0.71      0.55      0.62        92

    accuracy                           0.76       254
   macro avg       0.74      0.71      0.72       254
weighted avg       0.75      0.76      0.75       254



## Regression Metrics (Boston Dataset)

### Mean Absolute Error (MAE)
MAE measures the average magnitude of errors in a set of predictions, without considering their direction.

In [7]:
from sklearn.linear_model import LinearRegression

model = LinearRegression()
results = cross_val_score(model, X2, Y2, cv=KFold(n_splits=10), scoring='neg_mean_absolute_error')
print("MAE: %.3f, Standard Deviation: (%.3f)" % (results.mean(), results.std()))

MAE: -4.034, Standard Deviation: (2.114)


### Mean Squared Error (MSE)
MSE is the average of the squared differences between predicted and actual values. Squaring the errors gives more weight to large errors.

In [8]:
results = cross_val_score(model, X2, Y2, cv=KFold(n_splits=10), scoring='neg_mean_squared_error')
print("MSE: %.3f, Standard Deviation: (%.3f)" % (results.mean(), results.std()))

MSE: -35.099, Standard Deviation: (45.493)


### R² Score (Coefficient of Determination)
R² is the proportion of the variance in the dependent variable that is predictable from the independent variables.

* R² = 1: Perfect fit

* R² = 0: Model predicts nothing better than the mean

* Can be negative if the model is worse than just predicting the mean

In [9]:
results = cross_val_score(model, X2, Y2, cv=KFold(n_splits=10), scoring='r2')
print("R² Score: %.3f, Standard Deviation: (%.3f)" % (results.mean(), results.std()))

R² Score: 0.190, Standard Deviation: (0.594)
