In [ ]:
options(repr.plot.width  = 6,
        repr.plot.height = 6)

In [ ]:
library(gamair)
data(hubble)
ls()

In [ ]:
?lm

In [ ]:
hub.mod = lm(y~x-1, data=hubble)
summary(hub.mod)

In [ ]:
print(fitted(hub.mod))
plot(fitted(hub.mod),residuals(hub.mod),xlab="fitted values",
ylab="residuals")

two points, 3, 15, have high variance. lets remove em and see

In [ ]:
hub.mod1 <- lm(y~x-1,data=hubble[-c(3,15),])
summary(hub.mod1)

he Hubble constant estimates have units of $(km)s^{−1} (Mpc)^{−1}$.  
A Mega-parsec is 3.09 × 1019km, so we need to divide $\hat{\beta}$ by this amount, in order to obtain Hubble’s
constant with units of $s^{−1}$.  
The approximate age of the universe, in seconds, is then given by the reciprocal of $\beta^{-1}$.

Here are the two possible estimates expressed in years:

In [ ]:
print(c(coef(hub.mod),coef(hub.mod1)))
hubble.const <- c(coef(hub.mod),coef(hub.mod1))/3.09e19
age <- 1/hubble.const
age/(60^2*24*365)

Lets add a distributionality assumption.  
Let $\epsilon_i \sim \mathcal{N}(0, \sigma^2)$ for all i.  
That is, $Y_i ~\sim~ \mathcal{N}(x_i \beta, \sigma^2)$

Also,

Hence
$$
\hat{\beta} = \mathcal{N}\left( \beta, \left(\sum x_i\right)^{-1} \sigma^2 \right)
$$

# Practical linear models

In [ ]:
data(sperm.comp1)

In [ ]:
pairs(sperm.comp1[,-1])


Following Baker & Bellis, a reasonable model would be
$$
y_i = \beta_0 + t_i \beta_1 + p_i \beta_2 + \epsilon_i
$$
where
* $y_i$: sperm count (count)
* $t_i$: time spent since last copulation (time.ipc)
* $p_i$: proportion of time, since last copulation, that the pair have spent together (prop.partner)
* $\epsilon_i \sim \mathcal{N}(0, \sigma^2)$

In [ ]:
sc.mod1 <- lm(count ~ time.ipc+prop.partner,sperm.comp1)
summary(sc.mod1)

In [ ]:
model.matrix(sc.mod1)

standardized residuals:  s
the residuals have been scaled, by dividing them by their estimated standard deviation

In [ ]:
res1 = residuals.lm(sc.mod1)
std_dev = sqrt(var(res1))
print('std_dev')
print(std_dev)
res_std = res1/std_dev

Cook's distance

$$
d_k = \frac{1}{(p+1)\hat{\sigma}^2} \sum_{i=1}^n
\left(
  \hat{\mu}_i^{[ks]} - \hat{\mu}_i
\right)^2
$$

In [ ]:
fitted.values(sc.mod1)

In [ ]:
fitted.values(sc.mod1)[-5]

In [ ]:
p=2
fv_all = fitted.values(sc.mod1)
var_all = var(fv_all)
cook_number1 <- function(k) {
    fv_k = fitted.values(lm(count ~ time.ipc+prop.partner,sperm.comp1[-k,]))
    sum((fv_k-fv_all[-k])**2)/((p+1)*var_all)
}
cook_distances = sapply(1:15, cook_number1)

In [ ]:
par(mfcol=c(2,2))
plot(fitted.values(sc.mod1), residuals.lm(sc.mod1))
plot(fitted.values(sc.mod1), sqrt(abs(res_std)))
plot(sort(res_std))
plot(cook_distances)