-
Notifications
You must be signed in to change notification settings - Fork 0
/
1_SLR_SumSqs_R_fkingfisher.Rmd
97 lines (79 loc) · 3.04 KB
/
1_SLR_SumSqs_R_fkingfisher.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
---
title: "Simple Linear Regression, Sum of Squares, and R Squared"
author: " Forest Kingfisher"
date: "`r Sys.Date()`"
output: html_document
---
## Exercise 2.12, taken from Montgomery, Douglas C., et al. <u>Introduction to Linear Regression Analysis</u>, John Wiley & Sons, Incorporated, 2012.
The number of pounds of steam used per month at a plant is thought to be related to the average monthly ambient temperature. The past year’s usages and temperatures follow.
```{r, echo=TRUE}
#Import steam data by month
dat212 <- read.csv("https://raw.githubusercontent.com/forestwhite/Regression/main/data/212Table.csv")
names(dat212)[names(dat212) == "ï..Month"] <- "Month"
dat212
```
Generate a scatter plot of the data.
```{r}
#Generate a scatter plot of the data
plot(x = dat212$Temperature,
y = dat212$Usage.l000,
main = "Steam Usage vs. Ambient Temperature",
xlab = "Ambient Temperature, Fahrenheit",
ylab = "Steam Usage, pounds",
xlim = c(20,80),
ylim = c(100,700))
```
Fit a simple linear regression model to the data and generate the intercept and coefficient estimates.
```{r, echo=TRUE}
#generate linear model, scatter plot, and least squares coefficients
lm212<-lm(dat212$Usage.l000~dat212$Temperature)
summary(lm212)
```
Calculate the Sum of Squares Total (SST), the Sum of Squares of the Response (SSR), and the Sum of Squares of the Error/Residuals (SSE), and use them to find the coefficient of determination, R^2.
```{r}
# calculate y_hat estimate and sample average for y
yhat_i<-9.20847*dat212$Temperature-6.33209
y_bar<-mean(dat212$Usage.l000)
#Calculate the sum of squares values
SS_total<-sum((dat212$Usage.l000-y_bar)^2)
SS_total
SS_error<-sum((dat212$Usage.l000-yhat_i)^2)
SS_error
SS_regression<-sum((yhat_i-y_bar)^2)
SS_regression
# calculate coefficient of determination, R^2
r_squared<-SS_regression/SS_total
r_squared
```
### Complete Code
The complete R code used in this analysis is as follows:
```{r, eval=FALSE}
#Import steam data by month
dat212 <- read.csv("https://raw.githubusercontent.com/forestwhite/Regression/main/data/212Table.csv")
names(dat212)[names(dat212) == "ï..Month"] <- "Month"
dat212
#Generate a scatter plot of the data
plot(x = dat212$Temperature,
y = dat212$Usage.l000,
main = "Steam Usage vs. Ambient Temperature",
xlab = "Ambient Temperature, Fahrenheit",
ylab = "Steam Usage, pounds",
xlim = c(20,80),
ylim = c(100,700))
#generate linear model, scatter plot, and least squares coefficients
lm212<-lm(dat212$Usage.l000~dat212$Temperature)
summary(lm212)
# calculate y_hat estimate and sample average for y
yhat_i<-9.20847*dat212$Temperature-6.33209
y_bar<-mean(dat212$Usage.l000)
#Calculate the sum of squares values
SS_total<-sum((dat212$Usage.l000-y_bar)^2)
SS_total
SS_error<-sum((dat212$Usage.l000-yhat_i)^2)
SS_error
SS_regression<-sum((yhat_i-y_bar)^2)
SS_regression
# calculate coefficient of determination, R^2
r_squared<-SS_regression/SS_total
r_squared
```