# ISLR- Python: Ch3 -- Applied Question 11

In [None]:
# perform standard imports
import numpy as np
import statsmodels.formula.api as smf
import pandas as pd

from matplotlib import pyplot as plt

%matplotlib inline

In this problem we will investigate the t-statistic for the null hypothesis $H_0 : \beta = 0$ in simple linear regression without an intercept. To begin, we generate a predictor x and a response y as follows.

In [None]:
np.random.seed(1)
x = np.random.normal(size=100)
y = 2*x + np.random.normal(size=100)
df = pd.DataFrame({'x':x, 'y':y})

fig, ax = plt.subplots(figsize=(8,6))
ax.scatter(df.x, df.y);

## (a) Perform a simple linear regression of y onto x, without an intercept.

Report the coefficient estimate $\hat{\beta}$, the standard error of this coefficient estimate, and the t-statistic and p-value associated with the null hypothesis $H_0 : \beta = 0$. Comment on these results. (You can perform regression without an intercept using the command `ols('y âˆ¼ x - 1)`.)

In [None]:
lm_fit = smf.ols('y ~ x -1', df).fit()
lm_fit.summary()

The slope (coeffecient) for the x predictor without an intercept is 2.12. This slope has a high t-statistic and low p-value (i.e. significant), which is expected since we know the form ${Y=2X+\epsilon}$

## (b) Now perform a simple linear regression of x onto y without an intercept.

Report the coefficient estimate, its standard error, and the corresponding t-statistic and p-values associated with
the null hypothesis $H_0 : \beta = 0$. Comment on these results.

## (c) What is the relationship between the results obtained in (a) and (b)?

## (d)

For the regression of Y onto X without an intercept, the t-statistic for $H_0 : \beta = 0$ takes the form $\hat{\beta}/SE(\hat{\beta})$, where $\hat{\beta}$ is given by (3.38), and where
$$
{SE\left(\hat{\beta}\right)}
= \sqrt{
    \frac{\sum_{i=1}^{n}{\left(y_i-x_i\hat{\beta}\right)^2}}{\left(n-1\right)\sum_{i=1}^{n}{x_i^2}}
}
$$
(These formulas are slightly different from those given in Sections 3.1.1 and 3.1.2, since here we are performing regression without an intercept.) Show algebraically, and confirm numerically, that the t-statistic can be written as
$$
t = \frac{(\sqrt{n-1})\sum_{i=1}^{n}x_i y_i}{\sqrt{(\sum_{i=1}^{n}x_i^2)(\sum_{i=1}^{n}y_i^2) - (\sum_{i=1}^{n}x_i y_i)^2}}
$$

Numerically:

## (e) Using the results from (d), argue that the t-statistic for the regression of y onto x is the same as the t-statistic for the regression of x onto y.

## (f) In Python, show that when regression is performed with an intercept, the t-statistic for $H_0 : \beta_1 = 0$ is the same for the regression of y onto x as it is for the regression of x onto y .