# Machine Learning Metrics

## Determining Model Accuracy

It's very common to train a variety of models, apply each to held out sample and
score the results. Sometimes, a third hold set is used to test a model on
_completely_ new data. Typically, these 2 hold out sets are called the
_validation_ and _test_ sets respectively. In order to evaluate a model, a
suitable metric for the dataset needs to be selected.

Datasets with rare occurring labellings can produce misleading model performance
if a less nuanced metric is chosen to measure it.

### Accuracy

Simple accuracy is the measure of how many discrete labelling by a classifier
(or regressor) were correct.

In [1]:
import numpy as np
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score

# Simple synthetic data
training_points = [[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]]
training_labels = [1, 1, 1, 2, 2, 2]
X = np.array(training_points)
Y = np.array(training_labels)

# Gaussian Naive Bayes classifier
gnb = GaussianNB()
gnb.fit(X, Y)

# Classify test data with the classifier
test_points = [[1, 1], [2, 2], [3, 3], [4, 3]]
test_labels = [2, 2, 2, 1]
predicts = gnb.predict(test_points)

count = len([True for idx, label in enumerate(test_labels) if label == predicts[idx]])
print("Accuracy Rate (manually calculated): %f" % (float(count) / len(test_labels)))
print("Accuracy Rate (via accuracy_score()): %f" % accuracy_score(test_labels, predicts))

Accuracy Rate (manually calculated): 0.750000
Accuracy Rate (via accuracy_score()): 0.750000


### What about accuracy and cross validation?


In [4]:
from sklearn.model_selection import KFold
from sklearn.model_selection import cross_val_score

kf = KFold(n_splits=5)
result = cross_val_score(gnb , X, Y, cv = kf)

print("Avg accuracy: {}".format(result.mean()))

Avg accuracy: 0.4


### Recall, Precision and F-Measure, Oh My!

A level deeper in understanding your model's effectiveness are the Recall,
Precision and F-Measure metrics.

_Recall_ is the measure of how many correct labellings your model predicted.
_Precision_ is the measure of how many incorrect labellings your model predicted.
_F-measure_ is something like an average of the two scores.

In [None]:
import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.metrics import precision_score, recall_score, f1_score, accuracy_score
from sklearn import datasets
from sklearn.model_selection import train_test_split

bc = datasets.load_breast_cancer()
X = bc.data
Y = bc.target

X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.30, random_state=1, stratify=Y)

sc = StandardScaler()
sc.fit(X_train)
X_train_std = sc.transform(X_train)
X_test_std = sc.transform(X_test)
#
# Fit the SVC model
#
svc = SVC(kernel='linear', C=10.0, random_state=1)
svc.fit(X_train, y_train)
#
# Get the predictions
#
y_pred = svc.predict(X_test)


### Confusion Matrix


### ROC & AUC

## Problem-Specific Metrics

### ROUGE & BLEU

### Clustering

### Question Answering