# Machine Learning Metrics

## Determining Model Accuracy

It's very common to train a variety of models, apply each to held out sample and
score the results. Sometimes, a third hold set is used to test a model on
_completely_ new data. Typically, these 2 hold out sets are called the
_validation_ and _test_ sets respectively. In order to evaluate a model, a
suitable metric for the dataset needs to be selected.

Datasets with rare occurring labellings can produce misleading model performance
if a less nuanced metric is chosen to measure it.

### Accuracy

In [None]:
import numpy as np
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score

# Simple synthetic data
training_points = [[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]]
training_labels = [1, 1, 1, 2, 2, 2]
X = np.array(training_points)
Y = np.array(training_labels)

# Gaussian Naive Bayes classifier
gnb = GaussianNB()
gnb.fit(X, Y)

# Classify test data with the classifier
test_points = [[1, 1], [2, 2], [3, 3], [4, 3]]
test_labels = [2, 2, 2, 1]
predicts = gnb.predict(test_points)

count = len([True for idx, label in enumerate(test_labels) if label == predicts[idx]])
print("Accuracy Rate (manually calculated): %f" % (float(count) / len(test_labels)))
print("Accuracy Rate (via accuracy_score()): %f" % accuracy_score(test_labels, predicts))

### Recall & Precision


### Confusion Matrix


### ROC & AUC

## Problem-Specific Metrics

### ROUGE & BLEU

### Clustering

### Question Answering