# 2.2.1 Measuring the Quality of Fit

In order to evaluate the performance of a statistical learning method on
a given data set, we need some way to measure how well its predictions
actually match the observed data. That is, we need to quantify the extent
to which the predicted response value for a given observation is close to
the true response value for that observation. In the regression setting, the
most commonly-used measure is the mean squared error (MSE), given by 

$$ MSE \quad = \quad \frac{1}{n} \sum^n_{i=1}(y_i - \hat{f} (x_i))^2$$

where $\hat{f}$ is the prediction that $\hat{f}$ gives for the $ith$ observation. The MSE
will be small if the predicted responses are very close to the true responses,
and will be large if for some of the observations, the predicted and true
responses differ substantially.

The MSE in (2.5) is computed using the training data that was used to
fit the model, and so should more accurately be referred to as the training
MSE. 

But in general, we do not really care how well the method works 
on the training data. 

Rather, *we are interested in the accuracy of the predictions that we obtain when we apply our method to previously unseen
test data.*

After a model is generated, we are generally not interested in whether $\hat{f} \approx y_i$, 

> instead we want to know whether $\hat{f}_0$ i is approximately equal to $y_0$, where $(x_0, y_0) is a *previously unseen test observation not used to train the statistical learning method*.

We want to choose the method that gives
the lowest test MSE, as opposed to the lowest training MSE. In other words, if we had a large number of test observations, we could compute

$$Ave(y_0 - \hat{f}(x_0))^2$$

which is the average squared prediction error for these test observations $(x_0, y_0)$

Wheen a given method yields a small training MSE but a large test MSE, we are said to be *overfitting* the data. 