# Notes on Chapter 6: Best Practices for Model Evaluation and Hyperparameter Tuning

So far, Raschka has introduced us to an world of different models for classifying data and compressing features. But how do we know how well any given model is performing, and how can we figure out how to improve performance when it's bad?

In Chapter 6, we explore many different techniques for improving model performance. Broadly, our focus includes:

- Estimating model performance
- Diagnosing common problems
- Fine-tuning models by adjusting hyperparameters
- Getting familiar with different performance metrics

The specific techniques we'll cover are:

1. [Data-processing pipelines](#Data-processing-pipelines): chaining algorithms together
2. Cross-validation: robust measures of performance
3. Learning and validation curves: measuring bias and variance
4. Grid search: tuning hyperparameters
5. Nested cross-validation: selecting good algorithms
6. Performance metrics: different ways of judging "good" and "bad" models

## Data-processing pipelines

Many preprocessing techniques (like PCA) find parameters that must be reused across all training and testing datasets in order to produce sensible results. To help standardize this procedure, we can build **pipelines** that record our transformation steps and allow us to reuse them across training, testing, and validation sets (or even on new samples from the same population). Pipelines also encourage an object-oriented approach to model-building.

In scikit-learn, we can use the [`Pipeline`](http://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html) class to construct a pipeline:

In [4]:
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression

pipeline = Pipeline([('scaler', StandardScaler()),
                     ('clf', LogisticRegression)])

Note that `Pipeline` estimators have the standard set of `fit()`, `predict()`, and `fit_transform()` methods, but that in order to evaluate correctly, all steps up until the final classification step must call `fit_transform`. 