# Training and Testing

Benefits of testing: 
- Gives estimate of performance on an independent dataset
- Serves as a check on overfitting

## Train/Test Split in sklearn

Look for cross-validation

In [1]:
import numpy as np
from sklearn import cross_validation
from sklearn import datasets
from sklearn import svm

iris = datasets.load_iris()
iris.data.shape, iris.target.shape

((150, 4), (150,))

In [2]:
X_train, X_test, y_train, y_test \
    = cross_validation.train_test_split(iris.data, iris.target, test_size=0.4, random_state=0)

In [3]:
X_train.shape, y_train.shape

((90, 4), (90,))

In [4]:
X_test.shape, y_test.shape

((60, 4), (60,))

In [5]:
clf = svm.SVC(kernel='linear', C=1).fit(X_train, y_train)
clf.score(X_test, y_test)

0.96666666666666667

# Evaluation Metrics


### 1. Accuracy 
Accuracy = (no. of items in a class labelled correctly / all items in that class)

Shortcomings:
- Not ideal for skewed cases (very few Persons of Interest -> Denominator 'All items in that class' is small.)
- May want to err on side of guessing innocent (or guilty, depending on consequences of labelling) -> i.e. asymmetries favouring different types of error.

## Confusion Matrix

[Confusion Matrix](images/14-01.png)

Note: Tuning parameters can move the boundaries.

[Decision Tree Confusion Matrix](images/14-02.png)

[7x7 Confusion Matrix](images/14-03.png)

### Recall: P(alg identifies as A | is A)
(rows for true in rows, predicted in cols)
- is like 'lacer' backwards which is similar to 'liar', and the opposite of a lie is the truth, so the denominator is the true values.
- recall: finding X. i.e. P(finding X | ...)
- Recall = TP/(TP + FN)

### Precision: P(is A | alg identifies as A)
- (columns for true in rows, prediction in cols)
Starts with 'pre', so denominator is predicted.
- Precision = TP/(TP + FP)

### True positives, false positives, false negatives

## F1 Score
The harmonic mean of precision and recall.

$$F_1 = 2 * \frac{precision * recall}{precision + recall}$$
