### Example Multiple Linear Regression 4.1

For the **Advertising** data we had the following multiple linear regression model
\begin{equation}
sales
=\beta_{0}+\beta_{1}\cdot TV+\beta_{2}\cdot radio +\beta_{3}\cdot newspaper + \epsilon
\end{equation}
For instance, as we discussed earlier, the p-values associated with this model indicate that **TV** and 
**radio** are related to **sales**, but that there is no evidence that 
**newspaper** is associated with **sales**, in the presence of these two.

We now compare the **large** model $ \mathcal{M}_{2} $ 
defined by the equation above with the **small** model 
$ \mathcal{M}_{1} $ (without **newspaper**)
\begin{equation*}
sales
=\beta_{0}+\beta_{1}\cdot TV+\beta_{2}\cdot radio + \epsilon
\end{equation*} 

We use the **anova\_lm()** method function, which performs an *analysis of variance* (ANOVA, using an F-test) in order to test the null hypothesis that the small model $ \mathcal{M}_{1} $ is sufficient to explain the data against the alternative hypothesis that the (more complex) model $ \mathcal{M}_{2} $ is required. In order to use the **anova()** function, $ \mathcal{M}_{1} $ and $ \mathcal{M}_{2} $ must be *nested* models: the predictors in $ \mathcal{M}_{1} $ must be a subset of the predictors in $ \mathcal{M}_{2} $.  This corresponds to the null hypothesis $ \beta_{3}=0 $, that is, that there is no relationship between **newspaper** and **sales**.


The **Python**-output provides us with the information that the residual sum of squares (**RSS**) in the **small** model is given by
\begin{equation*}
\text{RSS}_{0}= 556.91\\
\end{equation*}

whereas the residual sum of squares for the **large** model $ \mathcal{M}_{2} $ 
is
\begin{equation*}
\text{RSS}= 556.83
\end{equation*}


In [2]:
import pandas as pd
import statsmodels.api as sm

# Load data
df = pd.read_csv('./data/Advertising.csv')
x1 = df[['TV', 'radio']]
x2 = df[['TV', 'radio', 'newspaper']]
y = df['sales']

# Fit model
x1_sm = sm.add_constant(x1)
x2_sm = sm.add_constant(x2)
model1 = sm.OLS(y, x1_sm).fit()
model2 = sm.OLS(y, x2_sm).fit()

# Table and print results
table = sm.stats.anova_lm(model1, model2)
print(table)

   df_resid         ssr  df_diff   ss_diff         F    Pr(>F)
0     197.0  556.913980      0.0       NaN       NaN       NaN
1     196.0  556.825263      1.0  0.088717  0.031228  0.859915


The difference between $\text{RSS}$ and $\text{RSS}_{0}$ can be found in the **Python**-output under **ss\_diff**  and is  $0.088717$. The value of $q$ is displayed under $Df$ and is given here by $1$. For the **large** model, we have
\begin{equation*}
n-p-1=200-3-1=196
\end{equation*}
degrees of freedom (**df\_resid**), contrary to the **small** model that has 
\begin{equation*}
n-p-1=200-2-1=197
\end{equation*}
degrees of freedom. Thus, the value of the F-statistic is (**F**)
\begin{align*}
F
&=\dfrac{(\text{RSS}_{0}-\text{RSS})/q}{\text{RSS}/(n-p-1)}\\
&=\frac{(556.91-556.83)/1}{556.83/(200-3-1)}\\
&=\frac{0.088717}{556.83/196}\\
&=0.0312
\end{align*}

The one-sided p-value in upwards direction for the $ F $-statistic assuming the null hypothesis is true, that is $\beta_3=0$, is displayed in the **Python**-output under **Pr($>$F)** : 0.8599. 

Since this p-value is significantly larger than the significance level $\alpha=0.05$, there is no evidence to reject the null hypothesis. We conclude that the predictor  **newspaper** is redundant, and we therefore can omit it. 

### Example Multiple Linear Regression 4.3

If we compare the **large** model $\mathcal{M}_{2}$ with the **small** model (**TV** is omitted) $ \mathcal{M}_{1} $
\begin{equation*}
sales
=\beta_{0}+\beta_{1}\cdot radio +\beta_{2}\cdot newspaper+\epsilon
\end{equation*}
then we come to a very different conclusion:


In [3]:
# Load data
x3 = df[['radio', 'newspaper']]

# Fit model
x3_sm = sm.add_constant(x3)
model3  = sm.OLS(y, x3_sm).fit()

# Table and print results
table = sm.stats.anova_lm(model3, model2)
print(table)


   df_resid          ssr  df_diff      ss_diff            F        Pr(>F)
0     197.0  3614.835279      0.0          NaN          NaN           NaN
1     196.0   556.825263      1.0  3058.010016  1076.405837  1.509960e-81


In this case the p-value is approximately zero, hence we have to reject the null hypothesis $ \beta_{1}=0 $. There is a significant difference in how well the two models $ \mathcal{M}_{1} $ and $ \mathcal{M}_{2} $ fit the data. Omitting **TV** leads to a model that shows a significant deterioration with respect to the quality of the model.
 
In order to get an "overview" about how the quality of a model changes when one predictor variable is omitted, we can use the **anova\_lm()** method on the one model only. However, this only works, when the model is defined using a formula instead of columns of data.    

In [4]:
import statsmodels.formula.api as smf

# Load data
TV = df[['TV']]
radio = df[['radio']]
newspaper = df[['newspaper']]
sales = df['sales']

# Fit model using formula:
model_f = smf.ols(formula='sales ~ TV + radio + newspaper', data=df).fit()

# Table and print results
table_f = sm.stats.anova_lm(model_f, typ=2)
print(table_f)

                sum_sq     df            F        PR(>F)
TV         3058.010016    1.0  1076.405837  1.509960e-81
radio      1361.736549    1.0   479.325170  1.505339e-54
newspaper     0.088717    1.0     0.031228  8.599151e-01
Residual    556.825263  196.0          NaN           NaN
