## 11. In this problem we will investigate the t-statistic for the null hypothesis $H_0 : \beta = 0$ in simple linear regression without an intercept. To begin, we generate a predictor `x` and a response `y` as follows.

In [1]:
import numpy as np
import pandas as pd
import statsmodels.api as sm

# Data generation (given in the question)

In [2]:
rng = np.random.default_rng(1)
x = rng.normal(size=100)
y = 2 * x + rng.normal(size=100)

n = len(x)

## (a) Regress y onto x (NO intercept)

In [3]:
# Regression of y on x without intercept
model_a = sm.OLS(y, x).fit()

# Compact coefficient table
model_a.summary2().tables[1]


Unnamed: 0,Coef.,Std.Err.,t,P>|t|,[0.025,0.975]
x1,1.976242,0.116948,16.898417,6.231546e-31,1.744191,2.208293


## Extract required quantities

**Conclusion:**  
The estimated coefficient is very close to the true value $\beta = 2$.  
The $t$-statistic is extremely large and the $p$-value is essentially zero, so we
**strongly reject** the null hypothesis $H_0:\beta=0$.  
There is strong evidence that $x$ is significantly associated with $y$.


## (b) Regress x onto y (NO intercept)

In [44]:
model_b = sm.OLS(x, y).fit()
model_b.summary2().tables[1]

beta_hat_b = model_b.params[0]
se_b = model_b.bse[0]
t_b = model_b.tvalues[0]
p_b = model_b.pvalues[0]

# Regression (b)
model_b = sm.OLS(x, y).fit()

beta_hat_b = model_b.params[0]
se_b = model_b.bse[0]
t_b = model_b.tvalues[0]
p_b = model_b.pvalues[0]

display(Markdown(
    rf"""
$\hat{{\beta}} = {beta_hat_b:.4f}$  

$SE(\hat{{\beta}}) = {se_b:.4f}$  

$t = {t_b:.4f}$  

$p = {p_b:.2e}$
"""
))



$\hat{\beta} = 0.3757$  

$SE(\hat{\beta}) = 0.0222$  

$t = 16.8984$  

$p = 6.23e-31$


**Conclusion:**  
Although the coefficient estimate and its standard error differ from part (a),
the $t$-statistic and $p$-value are identical.  
Again, we **strongly reject** $H_0:\beta=0$.

## (c) Relationship between (a) and (b)

In [19]:
display(Markdown(
    rf"""
$t$-statistic ( $y$ on $x$ ) $= {t_a:.4f}$  

$t$-statistic ( $x$ on $y$ ) $= {t_b:.4f}$  

Are they equal? **{np.isclose(t_a, t_b)}**

**Conclusion:**  
Although the estimated coefficients differ, the $t$-statistics (and hence the
$p$-values) are identical. Therefore, testing $H_0:\beta=0$ yields the same result
regardless of whether $y$ is regressed on $x$ or $x$ is regressed on $y$.
"""
))




$t$-statistic ( $y$ on $x$ ) $= 16.8984$  

$t$-statistic ( $x$ on $y$ ) $= 16.8984$  

Are they equal? **True**

**Conclusion:**  
Although the estimated coefficients differ, the $t$-statistics (and hence the
$p$-values) are identical. Therefore, testing $H_0:\beta=0$ yields the same result
regardless of whether $y$ is regressed on $x$ or $x$ is regressed on $y$.


## (d) Compute t-statistic using the algebraic formula

#### Formula (no intercept):

- (d) For the regression of Y onto X without an intercept, the t statistic
for $H_0 : \beta = 0$ takes the form $\hat \beta/SE(\hat \beta)$, where $\hat \beta$ is
given by (3.38), and where $$ SE(\hat \beta) = \sqrt{ \frac {\Sigma_{i=1}^n (y_i - x_i \hat\beta)^2} {(n-1) \Sigma_{i'=1}^n x_{i'}^2} } $$ 

Show algebraically, and confirm numerically
in Python, that the t-statistic can be written as 
$$ \frac{(\sqrt{n-1})\Sigma_{i=1}^n x_i y_i}{\sqrt{(\Sigma_{i=1}^n {x^2}_i)(\Sigma_{i'=1}^n {y^2}_{i'})-(\Sigma_{i'=1}^n {x^2}_{i'} {y^2}_{i'})^2}}$$


In [22]:
numerator = np.sqrt(n - 1) * np.sum(x * y)
denominator = np.sqrt(
    np.sum(x**2) * np.sum(y**2) - (np.sum(x * y))**2
)

t_formula = numerator / denominator
t_formula


16.898417063035094

In [35]:
from IPython.display import display, Markdown

display(Markdown(
    rf"""
The formula is:

"""
))



The formula is:



In [31]:
display(Markdown(
    f"""
$t$ (from algebraic formula) $= {t_formula:.4f}$  

$t$ (from regression) $= {t_a:.4f}$  

**Match:** {np.isclose(t_formula, t_a)}
"""
))



$t$ (from algebraic formula) $= 16.8984$  

$t$ (from regression) $= 16.8984$  

**Match:** True


**Conclusion:**  
The algebraic expression for the $t$-statistic exactly matches the value obtained
from the regression output, confirming the theoretical result.


# (e) Show symmetry of the t-statistic


- (e) Using the results from (d), argue that the t-statistic for the regression
of y onto x is the same as the t-statistic for the regression
of x onto y.


As we have done it mathematically in (d) we shall prove it numerically as well (e)

we know we have x and y defined in our previous c 

lets consider x and y from c

In [37]:
# Swap x and y
numerator_swapped = np.sqrt(n - 1) * np.sum(y * x)
denominator_swapped = np.sqrt(
    np.sum(y**2) * np.sum(x**2) - (np.sum(y * x))**2
)

t_swapped = numerator_swapped / denominator_swapped
t_swapped


16.898417063035094

In [45]:
from IPython.display import display, Markdown

display(Markdown(
    f"""
$t$ (from $y$ onto $x$) $= {t_formula:.4f}$  

$t$ (from $x$ onto $y$) $= {t_swapped:.4f}$  

Are they equal? **{np.isclose(t_formula, t_swapped)}**

**Conclusion:**  
Since the algebraic expression for the $t$-statistic is symmetric in $x$ and $y$,
swapping the roles of the predictor and response does not change the value of the
$t$-statistic.
"""
))



$t$ (from $y$ onto $x$) $= 16.8984$  

$t$ (from $x$ onto $y$) $= 16.8984$  

Are they equal? **True**

**Conclusion:**  
Since the algebraic expression for the $t$-statistic is symmetric in $x$ and $y$,
swapping the roles of the predictor and response does not change the value of the
$t$-statistic.


# (f) Regression WITH intercept

- (f) In Python, show that when regression is performed with an intercept,
the t-statistic for $H_0 : \beta_1 = 0$ is the same for the regression of y
onto x as it is for the regression of x onto y.

In [14]:
# y on x with intercept
model_yx = sm.OLS(y, sm.add_constant(x)).fit()
t_yx = model_yx.tvalues[1]

# x on y with intercept
model_xy = sm.OLS(x, sm.add_constant(y)).fit()
t_xy = model_xy.tvalues[1]

t_yx, t_xy


(16.734055202403045, 16.73405520240304)

In [41]:
from IPython.display import display, Markdown

display(Markdown(
    f"""
$t$-statistic for testing $H_0: \\beta_1 = 0$:

$t$ (from regression of $y$ onto $x$) $= {t_yx:.4f}$  

$t$ (from regression of $x$ onto $y$) $= {t_xy:.4f}$  

Are they equal? **{np.isclose(t_yx, t_xy)}**

**Conclusion:**  
When an intercept is included, the $t$-statistic for testing the slope depends only
on the sample correlation between $x$ and $y$. Since correlation is symmetric,
the $t$-statistic is the same regardless of which variable is treated as the response.
"""
))



$t$-statistic for testing $H_0: \beta_1 = 0$:

$t$ (from regression of $y$ onto $x$) $= 16.7341$  

$t$ (from regression of $x$ onto $y$) $= 16.7341$  

Are they equal? **True**

**Conclusion:**  
When an intercept is included, the $t$-statistic for testing the slope depends only
on the sample correlation between $x$ and $y$. Since correlation is symmetric,
the $t$-statistic is the same regardless of which variable is treated as the response.
