# Exercises
## Conceptual

### 1.
Describe the null hypotheses to which the p-values given in Table 3.4
correspond. Explain what conclusions you can draw based on these
p-values. Your explanation should be phrased in terms of sales, TV,
radio, and newspaper, rather than in terms of the coefficients of the
linear model.

#### Answer:
The null hypothesis is that that variables `TV`, `radio` and `newspaper` do *not* have an effect on sales I.e.

$H_0^{(1)} : \beta_1 = 0 \qquad H_0^{(2)}: \beta_2 = 0 \qquad H_0^{(3)}: \beta_3 = 0 $

The p-values associated with $\beta_1, \beta_2$  and $\beta_3$ are $<0.0001, <0.0001$ and $0.8599$.

The p-value tells us how likely it is to observe an association between the predictor and the response given random chance, in the absence of any reali association (I.e. under the null hypothesis).

The p-values for $\beta_1 $ and $\beta_2 $ are very small, meaning that we can reject the null hypothesis and say that there *is* a statistically significant relationship between `TV` and `Sales` and `radio` and `Sales`.

The p-value for $\beta_3 $ is close to 1, implying that there is no significant relationship between `newspaper` and `Sales`.

-----

### 2.
Carefully explain the differences between the KNN classifier and KNN
regression methods.

#### Answer:
The KNN classifier is used to predict the conditional probability $P(Y=j\,|\,X=x_0)$ for class $j$ by calculating the fraction of points in the neighbourhood of $x_0$ which have class $j$.

KNN regression is used to predict a value for $f(x_0)$. A neighbour around $x_0$ is identified and the average of all responses in the neighbourhood is taken as $f(x_0)$.

----

### 3.
Suppose we have a data set with five predictors, $X_1$ = GPA, $X_2$ = IQ,
$X_3$ = Gender (1 for Female and 0 for Male), $X_4$ = Interaction between
GPA and IQ, and $X_5$ = Interaction between GPA and Gender. The
response is starting salary after graduation (in thousands of dollars).
Suppose we use least squares to fit the model, and get $\hat\beta_0 = 50$, $\hat\beta_1 = 20$, $\hat\beta_2 = 0.07$, $\hat\beta_3 = 35$, $\hat\beta_4 = 0.01$, $\hat\beta_5 = -10$.

1. Which answer is correct, and why?
  1. For a fixed value of IQ and GPA, males earn more on average than females.
  2. For a fixed value of IQ and GPA, females earn more on average than males.
  3. For a fixed value of IQ and GPA, males earn more on average than females provided that the GPA is high enough.
  4. For a fixed value of IQ and GPA, females earn more on average than males provided that the GPA is high enough.
2. Predict the salary of a female with IQ of 110 and a GPA of 4.0.
3. True or false: Since the coefficient for the GPA/IQ interaction term is very small, there is very little evidence of an interaction effect. Justify your answer.

#### Answer
The model is:

Salary = $50$ + $20$ x GPA + $0.07$ x IQ + $35$ x Female + $0.01$ x (GPA x IQ) - $10$ x (GPA x Female)

1. C is correct.  
If GPA is high enough, the negative coefficient in front of GPA x Female cancels out the positive coefficient in front of Female, and results in Males earning more.

2. IQ = 110, GPA = 4.0  
Salary = 50 + 20x4.0 + 0.07x110 + 35 + 0.01 x 4.0 x 110 - 40  
Salary = $137,100

3. False  
Just because the coefficient is small it doesn't mean that the significance is smaller. Without knowing the standard error we can't state how significant the interaction effect is.  
However, it does seem unlikely that such a small coefficient would lead to a significant effect, given the size of the other coefficients.

In [2]:
50 + (20*4.0) + (0.07*110) + 35 + (0.01*4.0*110) - 40

137.1

### 4.
I collect a set of data (n = 100 observations) containing a single
predictor and a quantitative response. I then fit a linear regression
model to the data, as well as a separate cubic regression,  
i.e. Y = $\beta_0 + \beta_1 X + \beta_2 X^2 + \beta_3 X^3 + \epsilon$

1. Suppose that the true relationship between X and Y is linear,
i.e. $Y = \beta_0 + \beta_1 X + \epsilon$ . Consider the training residual sum of
squares (RSS) for the linear regression, and also the training
RSS for the cubic regression. Would we expect one to be lower
than the other, would we expect them to be the same, or is there
not enough information to tell? Justify your answer.
2. Answer 1. using test rather than training RSS.
3. Suppose that the true relationship between X and Y is not linear,
but we don’t know how far it is from linear. Consider the training
RSS for the linear regression, and also the training RSS for the
cubic regression. Would we expect one to be lower than the
other, would we expect them to be the same, or is there not
enough information to tell? Justify your answer.
4. Answer 3. using test rather than training RSS.

#### Answer
1. The training RSS would be lower using the cubic model. The irreducible error in the training sample would lead to the cubic model be able to better describe the variance in the data than the linear model.
2. The test RSS would be lower with the linear model. Since the true relationship is linear the cubic model would overfit the training data and lead to large RSS in the test data.
3. We would still expect the training RSS to be lower for the cubic model. Adding higher order terms to moel allows it to better fit the training data, whether the non-linearity is caused by variance in the training data or non-linearity in the true relationship.
4. There is not enough information to tell. If the true relationship is cubic, then the cubic model will more accurately fit the test data than a linear model. But if no, then both models may fit the test data equally badly.

### 5.
Consider the fitted values that result from performing linear regression
without an intercept. In this setting, the $i$th fitted value takes
the form  
$$\hat{y}_i = x_i\hat\beta$$
where
$$ \beta = \left(\sum_{i=1}^n x_iy_i\right) / \left(\sum_{i'=1}^n x_{i'}^2\right)$$
Show that we can write:
$$ \hat{y}_i = \sum_{i=1}^n a_{i'}y_{i'} $$

#### Answer:
$$\hat{y}_i = x_i\beta = x_i\frac{\sum_j x_j y_j}{\sum_k x_k^2}  = \sum_j \frac{x_i x_j}{\sum_k x_k^2}y_j = \sum_j a_j y_j$$

### 6.
Using (3.4), argue that in the case of simple linear regression, the
least squares line always passes through the point ($\bar{x}, \bar{y}$).

#### Answer
The linear regression equation is
$$ \hat{y} = \beta_0 + \beta_1 x $$
Where $\beta_0 = \bar{y} - \beta_1 \bar{x}$

Therefore
$$ \hat{y} = \bar{y} - \beta_1 \bar{x} + \beta_1 x $$

So if $x = \bar{x}$, 
$$ \hat{y} = \bar{y} - \beta_1 \bar{x} + \beta_1 \bar{x} = \bar{y} $$

Therefore we conclude that the line passes through the point $(\bar{x}, \bar{y})$.

### 7.
It is claimed in the text that in the case of simple linear regression
of Y onto X, the R2 statistic (3.17) is equal to the square of the
correlation between X and Y (3.18). Prove that this is the case. For
simplicity, you may assume that $\bar{x} = \bar{y} = 0$.

#### Answer
$$ R^2 = 1 - \frac{RSS}{TSS} = 1 - \frac{\sum_i(y_i - \hat{y}_i)^2}{\sum_i(y_i - \bar{y})^2}$$
Using $\bar{x} = \bar{y} = 0$, we have:
$$ R^2 = 1 - \frac{\sum_i(y_i - \hat{y}_i)^2}{\sum_i(y_i)^2} $$
and
$$ \hat{y}_i = \beta_0 + \beta_1 x_i = \bar{y} - \beta_1 \bar{x} + \beta_1 x_i = \beta_1 x_i = x_i\frac{\sum_j x_j y_j}{\sum_k x_k^2} $$


Substituting into $R^2$ and expanding:

$$ R^2 = 1 - \frac{\sum_i(y_i - [x_i\sum_j x_j y_j / \sum_k x_k^2])^2}{\sum_i y_i^2} $$

$$ R^2 = \frac{\sum_i y_i^2 - (\sum_i y_i^2 - 2 \sum_i y_i x_i \sum_j x_j y_j / \sum_k x_k^2 + \sum_i x_i^2 (\sum_j x_j y_j)^2 / \sum_k x_k^4}{\sum_i y_i^2} $$

$$ R^2 = \frac{2 (\sum_i x_i y_i)^2 / \sum_k x_k^2 - (\sum_j x_j y_j)^2 / \sum_k x_k^2}{\sum_i y_i^2} $$

$$ R^2 = \frac{(\sum_i x_i y_i)^2}{\sum_i x_i^2 \sum_i y_i^2}  \equiv Corr(X,Y)^2$$