In [25]:
# Top level imports
from sklearn.datasets import fetch_openml

import numpy as np

# Classification Metrics

Here, I'm running through various different classification metrics, and using scikit learn's `metrics` modules equivalents as a benchmark to make sure mine are running as expected.

First, I'll load MNIST as a default classification problem, and use a `SGDClassifier` to get some baseline scores, then compare my home coded metrics against scikit learn's.

In [2]:
mnist = fetch_openml('mnist_784', version=1)
X, y = mnist['data'], mnist['target']

In [3]:
X_train, X_test, y_train, y_test = X[:60000], X[60000:], y[:60000], y[60000:]

In [12]:
# making it a binary classification problem if required
y_train_2 = (y_train == '2')
y_test_2 = (y_test == '2')

In [13]:
from sklearn.linear_model import SGDClassifier

sgd_clf = SGDClassifier()
sgd_clf.fit(X_train, y_train_2)

SGDClassifier()

In [14]:
y_pred_2 = sgd_clf.predict(X_test)

## Accuracy

Accuracy is defined as the amount of correct predictions divided by thet total amount of predictions made. Below I check `scikit learn`'s version of this against my own to see how my implementation performs.



In [20]:
# sklearn
from sklearn.metrics import accuracy_score

accuracy_score(y_test_2, y_pred_2)

0.9677

In [21]:
# my version
from machine_learning.metrics import accuracy

accuracy(y_test_2, y_pred_2)

0.9677

It works, which is good. Accuracy is considered to be quite a flawed metric in evaluating classifiers. This is because it handles datasets where the target variable isn't evenly distributed poorly. Imagine a dataset with 99% of the samples having a target of `0` and 1% having `1`. You can make a 99% accurate classifier by predicting `0` for every single instance. A practical example of this is shown below:

In [23]:
# check balance of whole dataset classes
print(f'Train set = {y_train_2.sum()/len(y_train_2)}')
print(f'Test set = {y_test_2.sum()/len(y_test_2)}')

Train set = 0.0993
Test set = 0.1032


In [27]:
# create an array of always false predictions
y_pred_never_2 = np.zeros(len(y_test_2), dtype=bool)

In [28]:
# evaluate this with our accuracy metrics
accuracy(y_test_2, y_pred_never_2)

0.8968

As shown, you can score high accuracy with poor classifiers, so more nuanced metrics should be sort for proper evaluation of a classifier.