### Example Multiple Linear Regression 4.11
We now return to the **Advertising** example. A linear model that uses **radio**, **TV**, and an interaction between the two to predict **sales** takes the form
\begin{align*}
\text{ sales }
&=\beta_{0}+\beta_{1}\cdot\text{ TV }+\beta_{2}\cdot\text{ radio }+\beta_{3}\cdot(\text{ TV }\cdot\text{ radio })+\epsilon\\
&=\beta_{0}+(\beta_{1}+\beta_{3}\cdot\text{ radio })\cdot\text{ TV }+\beta_{2}\cdot\text{ radio }+\epsilon
\end{align*}
We can interpret $ \beta_{3} $ as the increase in the effectiveness of **TV** advertising for a one unit increase in **radio** advertising (or vice-versa). The coefficients that result from fitting this model can be found in the following **Python**-output:

In [1]:
import pandas as pd
import statsmodels.api as sm

# Load data
df = pd.read_csv('./data/Advertising.csv')

# Define the linear model:
x = pd.DataFrame({
    'TV' : df['TV'],
    'radio' : df['radio'],
    'TV*radio' : df['TV'] * df['radio']})
y = df['sales']

# Fit model
x_sm = sm.add_constant(x)
model =sm.OLS(y, x_sm).fit()

# Print summary:
print(model.summary())

                            OLS Regression Results                            
Dep. Variable:                  sales   R-squared:                       0.968
Model:                            OLS   Adj. R-squared:                  0.967
Method:                 Least Squares   F-statistic:                     1963.
Date:                Fri, 11 Mar 2022   Prob (F-statistic):          6.68e-146
Time:                        18:27:43   Log-Likelihood:                -270.14
No. Observations:                 200   AIC:                             548.3
Df Residuals:                     196   BIC:                             561.5
Df Model:                           3                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const          6.7502      0.248     27.233      0.0

The results strongly suggest that the model that includes the interaction term is superior to the model that contains only *main effects*. The p-value for the interaction term, $ \text{TV}\cdot\text{radio} $, is extremely low, indicating that there is a strong evidence for $ H_{A}:\;\beta_{3}\neq 0 $. In other words it is clear, that the true relationship is not additive. 

The $ R^{2} $ for the model, that includes in addition to the predictors **TV** and **radio** as well the interaction term $\text{TV}\cdot\text{radio}$, is $0.968$; compared to only 
$0.897$ for the model that predicts **sales** using **TV** and **radio** without an interaction term. This means, that
\begin{equation*}
\dfrac{0.968-0.897}{1-0.897}
=0.69
=69\%
\end{equation*} 
of the variability in **sales** that remains after fitting the additive model has been explained by the interaction term. 
**Python**-output suggest that an increase in **TV** advertising of CHF 1000 is associated with increased
**sales** of 
\begin{equation*}
(\hat{\beta}_{1}+\hat{\beta}_{3}\cdot\text{radio})\cdot 1.000
=19+1.1\cdot\text{radio}
\end{equation*}
units. And an increase in **radio** advertising of CHF 1000 will 
be associated with an increase in **sales** of 
\begin{equation*}
(\hat{\beta}_{2}+\hat{\beta}_{3}\cdot\text{TV})\cdot 1.000
=29+1.1\cdot\text{TV}
\end{equation*}
units.