# 1. Describe the data
- Here is an example of an experiment to determine the effects of temperature, gas/liquid ratio and packing height in reducing the unpleasant odor mùi of a chemical product that was sold for household use.
- It has 15 observations.
- There are 4 variables:
    - Temperature. temp= (Fahrenheit-80)/40 so the original values of the predictor were 40, 80 and 120. 
    - Gas/liquid ratio is transformed from its original scale of measurement.
    - Packing height is transformed from its original scale of measurement.

# 2. Load packages and data

In [1]:
%%capture
pip install faraway

In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import scipy as sp
import statsmodels.api as sm
import statsmodels.formula.api as smf
import seaborn as sns

In [4]:
import faraway.datasets.odor
odor = faraway.datasets.odor.load()
odor

Unnamed: 0,odor,temp,gas,pack
0,66,-1,-1,0
1,39,1,-1,0
2,43,-1,1,0
3,49,1,1,0
4,58,-1,0,-1
5,17,1,0,-1
6,-5,-1,0,1
7,-40,1,0,1
8,65,0,-1,-1
9,7,0,1,-1


# 3. Orthogonality
## 3.1. Theory
- Suppose we can partition X in two $X=[X_1|X_2]$ such that $X_1'X_2=0$. Then $y=X\beta + \epsilon = X_1\beta_1 + X_2\beta_2 + \epsilon$.
- $
X'X = \begin{bmatrix}
X_1'X_1 & X_1'X_2 \\
X_2'X_1 & X_2'X_2
\end{bmatrix}
= \begin{bmatrix}
X_1'X_1 & 0 \\
0 & X_2'X_2
\end{bmatrix}
$
- $\hat{\beta}_1 = (X_1'X_1)^{-1}X_1'y$ and $\hat{\beta}_2 = (X_2'X_2)^{-1}X_2'y$
- $\hat{\beta}_1$ will be the same regardless of whether $X_2$ is in the model or not.
- However, $RSS/df = \hat{\sigma}^2$. $\hat{\sigma}^2$ depends on whether $X_2$ is in the model or not.
- If the covariance between $X_1$ and $X_2$ is zero, then $\sum_j(X_{j1}-\bar{X_1})(X_{j2}-\bar{X_2})=0$. If the predictors are centered, a covariance of zero implies orthogonality.
- Orthogonality is a desirable property, but occur when X is chosen by experimenter. It is a feature of a good design.

## 3.2 Application

In [13]:
# Covariance
odor.iloc[:,1:].cov().round(3)

Unnamed: 0,temp,gas,pack
temp,0.571,0.0,0.0
gas,0.0,0.571,0.0
pack,0.0,0.0,0.571


In [12]:
# Correlation
odor.iloc[:,1:].corr().round(3)

Unnamed: 0,temp,gas,pack
temp,1.0,-0.0,-0.0
gas,-0.0,1.0,0.0
pack,-0.0,0.0,1.0


In [18]:
# Linear model 1
lmod = smf.ols('odor ~ temp + gas + pack', odor).fit()
lmod.params

Intercept    15.200
temp        -12.125
gas         -17.000
pack        -21.375
dtype: float64

In [16]:
lmod.cov_params().round(3)

Unnamed: 0,Intercept,temp,gas,pack
Intercept,86.455,-0.0,-0.0,0.0
temp,-0.0,162.104,0.0,0.0
gas,-0.0,0.0,162.104,0.0
pack,0.0,0.0,0.0,162.104


The standard errors for the coefficients are equal due to the balanced design.

In [17]:
# Linear model 2: drop temp
lmod = smf.ols('odor ~ gas + pack', odor).fit()
lmod.params

Intercept    15.200
gas         -17.000
pack        -21.375
dtype: float64