# ISLP - Chapter 3 - Exercise 11
### Author: pzuehlke

In [2]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.api as sm

__11 (a):__

In [7]:
rng = np.random.default_rng(1)
x = rng.normal(size=100)
y = 2 * x + rng.normal(size=100)

In [8]:
model = sm.OLS(y, x).fit()
model.summary()

0,1,2,3
Dep. Variable:,y,R-squared (uncentered):,0.743
Model:,OLS,Adj. R-squared (uncentered):,0.74
Method:,Least Squares,F-statistic:,285.6
Date:,"Tue, 04 Feb 2025",Prob (F-statistic):,6.23e-31
Time:,22:29:26,Log-Likelihood:,-141.35
No. Observations:,100,AIC:,284.7
Df Residuals:,99,BIC:,287.3
Df Model:,1,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
x1,1.9762,0.117,16.898,0.000,1.744,2.208

0,1,2,3
Omnibus:,1.376,Durbin-Watson:,2.184
Prob(Omnibus):,0.503,Jarque-Bera (JB):,0.847
Skew:,0.121,Prob(JB):,0.655
Kurtosis:,3.381,Cond. No.,1.0


The coefficient estimate $ \hat\beta = 1.9762 $, the standard error is $ 0.117
$. The $ t $-statistic is $ 16.898 $ and the $ p $-value associated to the null
hypothesis that $ \beta = 0 $ is essentially zero.  We can therefore confidently
reject the null hypothesis.

__11 (b):__ 

In [5]:
inverted_model = sm.OLS(x, y).fit()
inverted_model.summary()

0,1,2,3
Dep. Variable:,y,R-squared (uncentered):,0.743
Model:,OLS,Adj. R-squared (uncentered):,0.74
Method:,Least Squares,F-statistic:,285.6
Date:,"Tue, 04 Feb 2025",Prob (F-statistic):,6.23e-31
Time:,22:26:27,Log-Likelihood:,-58.349
No. Observations:,100,AIC:,118.7
Df Residuals:,99,BIC:,121.3
Df Model:,1,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
x1,0.3757,0.022,16.898,0.000,0.332,0.420

0,1,2,3
Omnibus:,13.156,Durbin-Watson:,2.034
Prob(Omnibus):,0.001,Jarque-Bera (JB):,22.596
Skew:,-0.528,Prob(JB):,1.24e-05
Kurtosis:,5.075,Cond. No.,1.0


Call this coefficient $ \gamma $ to distinguish it from the previous one. Our
estimate for using least squares is $ \hat\gamma = 0.3757 $, with standard error
is $ 0.022 $. The $ t $-statistic is $ 16.898 $ (same as before) and the $ p
$-value associated to the null hypothesis that $ \gamma = 0 $ is essentially
zero.  We can therefore confidently reject the null hypothesis.

What is interesting to note is that the estimate for the slope coefficient for
the regression of $ x $ on $ y $ is not simply the inverse of the estimate of
the slope coefficient for the regression of $ y $ on $ x $.

In [14]:
gamma = 0.3757
beta = 1.9762
print(f"{1 / gamma:.4f}", beta)

2.6617 1.9762


Here's the explanation: Suppose we want to regress $ y $ on $ x $, but without the intercept term.
Equivalently, we want to minimize
$$
    \Vert \mathbf{y} - \beta \mathbf{x} \Vert^2\,,
$$
were $ \mathbf x $ and $ \mathbf y $ are the vectors containing the data values
for $ y $ and $ x $ respectively (in the same order). From Linear Algebra, the
choice of $ \beta $ that accomplishes this is the choice that makes the
resulting difference $ \mathbf{y} - \beta \mathbf{x} $ orthogonal to $ \mathbf x $:
$$
    \hat{\beta} = \frac{\mathbf{x} \cdot \mathbf{y}}{\mathbf{x} \cdot \mathbf{x}}
                = \frac{\mathbf{x} \cdot \mathbf{y}}{\Vert \mathbf x \Vert^2}
$$
By symmetry, for the opposite regression of $ x $ on $ y $, the
associated coefficient $ \gamma $ is estimated to be
$$
    \hat{\gamma} = \frac{\mathbf{y} \cdot \mathbf{x}}{\mathbf{y} \cdot \mathbf{y}}
                 = \frac{\mathbf{y} \cdot \mathbf{x}}{\Vert \mathbf y \Vert^2}

$$
Finally,
$$
\hat{\beta}\, \hat{\gamma} = \frac{\big(\mathbf{x} \cdot \mathbf{y}\big)^2}
                                  {\Vert \mathbf{x} \Vert^2 \Vert \mathbf{y} \Vert^2}
                           = \cos^2 \theta\,,
$$
where $ \theta \in [0, \pi] $ is the shortest angle between $ \mathbf{x} $ and $ \mathbf{y} $.
By the Cauchy-Schwarz inequality, this is always $ \le 1 $, with equality if and
only if $ \mathbf x $ and $ \mathbf y $ are linearly dependent (one is a
multiple of the other). As this is clearly false for our data, it is not to
be expected that $ \hat{\gamma} $ be the inverse of $ \hat{\beta} $.

__11 (c):__  The relationship was established in the preceding item: their product
equals the square of the cosine of the angle between $ \mathbf x $ and $ \mathbf
y $. It is almost like the square of the correlation $ \rho $ (which in turn
equals $ R^2 $), except that $ \mathbf x $ and $ \mathbf y $ were not centered.

__11 (d):__ This is a straightforward calculation. We established in item (b) that
$$
    \hat{\beta} = \frac{\mathbf{x} \cdot \mathbf{y}}{\Vert \mathbf x \Vert^2}
$$
And from the formula given in the statement,
$$
\text{SE}(\hat{\beta}) = \sqrt{\frac{\Vert\mathbf{y} - \hat{\beta} \mathbf{x}\Vert^2}
{(n-1) \Vert \mathbf{x} \Vert^2}}
$$
Therefore the $ t $-statistic is
$$
\frac{\hat{\beta}}{\text{SE}(\hat{\beta})} = \frac{\sqrt{n-1} \,(\mathbf{x} \cdot \mathbf{y})}
     {\Vert\mathbf{x}\Vert\,\Vert\mathbf{y} - \hat{\beta} \mathbf{x}\Vert}
$$
Now recall the formula for $ \hat{\beta} $ obtained in item (b), which implies that
$$
\begin{aligned}
\Vert \mathbf{x} \Vert^2\Vert\mathbf{y} - \hat{\beta} \mathbf{x}\Vert^2 &=
    \Vert\mathbf{x}\Vert^2\Big(\Vert{y}\Vert^2 +
    \hat{\beta}^2\Vert \mathbf{x}\Vert^2 -
    2 \,\hat{\beta}\,\mathbf{x} \cdot \mathbf{y}\Big) \\
    &= \Vert \mathbf{x}\Vert^2 \Vert \mathbf{y}\Vert^2 - \mathbf{x} \cdot \mathbf{y}^2\,.
\end{aligned}
$$
Substituting this into the formula for the $ t $-statistic we finally get
$$
\text{$ t $-statistic} = 
\frac{\hat{\beta}}{\text{SE}(\hat{\beta})} = \frac{\sqrt{n-1} \,(\mathbf{x} \cdot \mathbf{y})}
     {\sqrt{\Vert \mathbf{x}\Vert^2 \Vert \mathbf{y}\Vert^2 - \mathbf{x} \cdot \mathbf{y}^2}}
$$

__11 (e):__ The final expression for the $ t $-statistic in (d) is symmetric in
$ \mathbf{x} $ and $ \mathbf{y} $, hence the $ t $-statistic for $ \hat{\gamma}
$ is the same as that for $ \hat{\beta} $. This is consistent with the values
that were obtained numerically with statsmodels in item (b).

__11 (f):__ The only thing that changes in this situation is that we replace $
\mathbf{x} $ by the centered vector $ \mathbf{x}_0 = \mathbf{x} - \bar{x}\,
\mathbf{u} $, where $ \mathbf{u} = (1, 1, \cdots, 1) $ ($ n $ coordinates), and
similarly for $ \mathbf{y} $. The formula for the $ t $-statistic will remain
symmetric in $ x $ and $ y $. Therefore, the $ t $-statistics for the slope
coefficient when regressing $ y $ onto $ x $ will be the same as that for the
regression of $ x $ onto $ y $.
