## Evaluating Regression Models with Residual Plots

In Notebook 2-1, we used OLS regression to model a linear relationship between two variables. A good way to determine if a linear model is appropriate for a given relationship is to examine the __residuals__. We said that residuals are the prediction error, or the vertical distance from a point to the regression line. Let's look at an OLS plot again, and then make a residual plot of the same data to compare.

In [None]:
import numpy as np
from numpy.random import randn
import pandas as pd

from scipy import stats

import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns

%matplotlib inline
%config InlineBackend.figure_format = 'retina'

In [None]:
tips = sns.load_dataset("tips")
sns.regplot(x="total_bill", y="tip", data=tips, ci=None)
plt.title("Tips Regression")
plt.show()

sns.residplot(x="total_bill", y="tip", data=tips)
plt.title("Tips Residuals")
plt.show()

Take a minute to compare the two plots and see if you can identify the relationship between them.

To evaluate a model using the residual plot, look for any clear pattern. If there is a pattern on the residual plot, that means your model failed to account for that pattern and could be modified to do so. A good residual plot will appear randomly distributed. 

If there is a clear curve to the residual plot, that means the model hasn't accounted for that curve. If the data is horn-shaped, that means the variance is increasing and should be taken into consideration.

Possible explanations for patterns on the residual plot include:
* A missing variable
* A missing higher-order term of a variable in the model to explain the curvature
* A missing interaction between terms already in the model

The following article has a lot of good examples of poor residual vs. fitted plots and how to correct your model:
http://docs.statwing.com/interpreting-residual-plots-to-improve-your-regression/#x-unbalanced-header