# How to Calculate Standardized Residuals in R

Let $\hat{y}_i = \hat{\beta}_0 + \hat{\beta}_1 x_i$ be the prediction for $Y$ based on the _i_ th value of $X$. Then $e_i = y_i - \hat{y}_i$ represents the _i_ th _residual_ &mdash;this is the difference between the _i_ th observed response value and the _i_ th response value that is predicted by our linear model. We define the _residual sum of squares_ (RSS) as
\begin{align}
\text{RSS} = e^2_1 + e^2_2 + \cdots + e^2_n\text{,}
\end{align}
or equivalently as
\begin{align}\tag{3.3}
\text{RSS} = (y_1 - \hat{\beta}_0 - \hat{\beta}_1 x_1)^2 + (y_2 - \hat{\beta}_0 - \hat{\beta}_1 x_2)^2 + \cdots + (y_n - \hat{\beta}_0 - \hat{\beta}_1 x_n)^2\text{.}
\end{align}

Standardized Residuals are calculated:
\begin{align}
r_i = \frac{e_i}{s(e_i)} = \frac{e_i}{\text{RSE}\sqrt{1-h_{ii}}}
\end{align}

Any standardized residual with an absolute value greater than $3$ is considered an outlier.

In [3]:
data <- data.frame(x=c(8, 12, 12, 13, 14, 16, 17, 22, 24, 26, 29, 30),
                   y=c(41, 42, 39, 37, 35, 39, 45, 46, 39, 49, 55, 57))
#data

lm.fit <- lm(y ~ x, data)
standard_res <- rstandard(lm.fit)
rs_data <- cbind(data,standard_res)
rs_data[order(-standard_res),]

Unnamed: 0_level_0,x,y,standard_res
Unnamed: 0_level_1,<dbl>,<dbl>,<dbl>
1,8,41,1.40517322
12,30,57,1.26973888
11,29,55,0.91057211
2,12,42,0.81017562
7,17,45,0.59610905
3,12,39,0.07491009
8,22,46,-0.05876884
10,26,49,-0.066556
4,13,37,-0.59323342
6,16,39,-0.64248883


In [None]:
plotrs_data$x, standard_res, 