### Intro

* All estimators have weaknesses (bias, variance, noise)
* Below tools help find the weak spots

* Demos:
[underfit/overfit](plot_underfitting_overfitting.ipynb) | [validation curve](plot_validation_curve.ipynb) | [learning curve](plot_learning_curve.ipynb)


### [Validation curve](http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.validation_curve.html#sklearn.model_selection.validation_curve) | [demo](plot_validation_curve.ipynb)

* use case: plot influence of single hyperparameter on training score, validation score
* if train score = low  and valid score = low ==> underfit
* if train score = high and valid score = low ==> overfit

In [1]:
# example

import numpy as np
from sklearn.model_selection import validation_curve
from sklearn.datasets import load_iris
from sklearn.linear_model import Ridge

np.random.seed(0)
iris = load_iris()
X, y = iris.data, iris.target
indices = np.arange(y.shape[0])
np.random.shuffle(indices)
X, y = X[indices], y[indices]

train_scores, valid_scores = validation_curve(Ridge(), X, y, "alpha",
                                              np.logspace(-7, 3, 3))

print(train_scores)
print(valid_scores)

[[ 0.94141575  0.92944161  0.92267644]
 [ 0.94141563  0.92944153  0.92267633]
 [ 0.47253778  0.45601093  0.42887489]]
[[ 0.90335825  0.92525985  0.94159336]
 [ 0.90338529  0.92523396  0.94159078]
 [ 0.44639995  0.39639757  0.4567671 ]]


### [Learning curve](http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.learning_curve.html#sklearn.model_selection.learning_curve) | [demo](plot_learning_curve.ipynb)

* returns validing & training scores for an estimator with varying #training samples (How much benefit from additional training?)

In [3]:
# learning curve

from sklearn.model_selection import learning_curve
from sklearn.svm import SVC

train_sizes, train_scores, valid_scores = learning_curve(
    SVC(kernel='linear'), 
    X, y, 
    train_sizes=[50, 80, 110], 
    cv=5)

print("train sizes:\n",train_sizes)          
print("train scores:\n",train_scores)
print("valid scores:\n",valid_scores)

train sizes:
 [ 50  80 110]
train scores:
 [[ 0.98        0.98        0.98        0.98        0.98      ]
 [ 0.9875      1.          0.9875      0.9875      0.9875    ]
 [ 0.98181818  1.          0.98181818  0.98181818  0.99090909]]
valid scores:
 [[ 1.          0.93333333  1.          1.          0.96666667]
 [ 1.          0.96666667  1.          1.          0.96666667]
 [ 1.          0.96666667  1.          1.          0.96666667]]
