/
05-regression-c.Rmd
148 lines (96 loc) · 5.78 KB
/
05-regression-c.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
# Multiple Linear Regression
Let's say we want to make a prediction about the following four people who have these scores on the five OCEAN scales:
| |Openness|Conscientousness|Extraversion|Agreeableness|Neuroticism|
|---|:---:|:---:|:---:|:---:|:---:|
|Participant 1| 9 | 8 | 6 | 6 | 5 |
|Participant 2| 8 | 9 | 5 | 5 | 6 |
|Participant 3| 5 | 6 | 8 | 8 | 9 |
|Participant 4| 5 | 6 | 9 | 9 | 8 |
And let's say we have this output from the multiple linear regression model we created using all of the five OCEAN scales as predictors:
```{r, echo = FALSE}
int <- 197.65
ope <- -5.28
con <- -5.68
ext <- -0.4
agr <- 0.64
neu <- 1.89
p1o <- 9
p1c <- 8
p1e <- 6
p1a <- 6
p1n <- 5
p2o <- 8
p2c <- 9
p2e <- 5
p2a <- 5
p2n <- 6
p3o <- 5
p3c <- 6
p3e <- 8
p3a <- 8
p3n <- 9
p4o <- 5
p4c <- 6
p4e <- 9
p4a <- 9
p4n <- 8
hy1 <- int + (ope * p1o) + (con * p1c) + (ext * p1e) + (agr * p1a) + (neu * p1n)
hy2 <- int + (ope * p2o) + (con * p2c) + (ext * p2e) + (agr * p2a) + (neu * p2n)
hy3 <- round(int + (ope * p3o) + (con * p3c) + (ext * p3e) + (agr * p3a) + (neu * p3n),2)
hy4 <- int + (ope * p4o) + (con * p4c) + (ext * p4e) + (agr * p4a) + (neu * p4n)
```
| |Unstandardised Coefficient|t-value|p-value|
|---|:---:|:---:|:---:|
|Intercept|197.65|22.43|.0001|
|Openness|-5.28|-7.68|.0001|
|Conscientiousness|-5.68|-7.44|.0001|
|Extraversion|-0.4|-0.65|.522|
|Agreeableness|0.64|0.69|.495|
|Neuroticism|1.89|2.56|.014|
|---|---|---|---|
|Multiple R^2^| .7619| Adjusted R^2^|.7348|
|F-statistic|28.16 on 5 and 44 DF| p-value| .001|
## Making a Prediction
Now based on the information we have gained above we can actually use that to make predictions about a person we have not measured based on the formula:
$$\hat{Y} = b_{0} + b_{1}X_{1} + b_{2}X_{2} + b_{3}X_{3} + b_{4}X_{4} + b_{5}X_{5}$$
Well technically it is
$$\hat{Y} = b_{0} + b_{1}X_{1} + b_{2}X_{2} + b_{3}X_{3} + b_{4}X_{4} + b_{5}X_{5} + error$$
But we will disregard the error for now. Also you might have noticed the small hat about the $Y$ making it $\hat{Y}$ (pronounced Y-hat). This means we are making a prediction of Y ($\hat{Y}$) as opposed to an actually measured (i.e. observed) value ($Y$).
And to break that formula down a bit we can say:
* $b_{0}$ is the intercept of the model. In this case $b_{0}$ = `r int`
* $b_{1}X_{1}$, for example, is read at the non-standardised coeffecient of the first predictor $b_{1}$ multiplied by the measured value of the first predictor $X_{1}$
* $b_{1}$, $b_{2}$, $b_{3}$, $b_{4}$, $b_{5}$ are the non-standardised coefficient values of the different predictors in the model. There is one non-standardised coefficient for each predictor. For example, let's say Openness is our first predictor and as such $b_{1}$ = `r ope`
* $X_{1}$, $X_{2}$, $X_{3}$, $X_{4}$, $X_{5}$ are the measured values of the different predictors for a participant. For example, for Participant 1, again assuming Openness is our first predictor, then $X_{1}$ = `r p1o`. If Conscientousness is our second predictor, Extraversion our third predictor, Agreeableness our fourth predictor, and Neuroticism our fifth predictor, then $X_{2}$ = `r p1c`, $X_{3}$ = `r p1e`, $X_{4}$ = `r p1a` and $X_{5}$ = `r p1n`.
Then, using the information above, we know can start to fill in the information for **Participant 1** as follows:
$$\hat{Y} = `r int` + (`r ope` \times `r p1o`) + (`r con` \times `r p1c`) + (`r ext` \times `r p1e`) + (`r agr` \times `r p1a`) + (`r neu` \times `r p1n`)$$
And if we start to work that through, dealing with the multiplications first, we see:
$$\hat{Y} = `r int` + (`r ope * p1o`) + (`r con * p1c`) + (`r ext * p1e`) + (`r agr * p1a`) + (`r neu * p1n`)$$
Which then becomes:
$$\hat{Y} = `r int + (ope * p1o) + (con * p1c) + (ext * p1e) + (agr * p1a) + (neu * p1n)`$$
Giving a predicted value of $\hat{Y}$ = `r hy1`, to two decimal places.
Likewise for **Participant 2** we would have:
$$\hat{Y} = `r int` + (`r ope` \times `r p2o`) + (`r con` \times `r p2c`) + (`r ext` \times `r p2e`) + (`r agr` \times `r p2a`) + (`r neu` \times `r p2n`)$$
And if we start to work that through, dealing with the multiplications first, we see:
$$\hat{Y} = `r int` + (`r ope * p2o`) + (`r con * p2c`) + (`r ext * p2e`) + (`r agr * p2a`) + (`r neu * p2n`)$$
Which then becomes:
$$\hat{Y} = `r int + (ope * p2o) + (con * p2c) + (ext * p2e) + (agr * p2a) + (neu * p2n)`$$
Giving a predicted value of $\hat{Y}$ = `r hy2`, to two decimal places.
Likewise for **Participant 3** we would have:
$$\hat{Y} = `r int` + (`r ope` \times `r p3o`) + (`r con` \times `r p3c`) + (`r ext` \times `r p3e`) + (`r agr` \times `r p3a`) + (`r neu` \times `r p3n`)$$
And if we start to work that through, dealing with the multiplications first, we see:
$$\hat{Y} = `r int` + (`r ope * p3o`) + (`r con * p3c`) + (`r ext * p3e`) + (`r agr * p3a`) + (`r neu * p3n`)$$
Which then becomes:
$$\hat{Y} = `r int + (ope * p3o) + (con * p3c) + (ext * p3e) + (agr * p3a) + (neu * p3n)`$$
Giving a predicted value of $\hat{Y}$ = `r hy3`, to two decimal places.
And finally for **Participant 4** we would have:
$$\hat{Y} = `r int` + (`r ope` \times `r p4o`) + (`r con` \times `r p4c`) + (`r ext` \times `r p4e`) + (`r agr` \times `r p4a`) + (`r neu` \times `r p4n`)$$
And if we start to work that through, dealing with the multiplications first, we see:
$$\hat{Y} = `r int` + (`r ope * p4o`) + (`r con * p4c`) + (`r ext * p4e`) + (`r agr * p4a`) + (`r neu * p4n`)$$
Which then becomes:
$$\hat{Y} = `r int + (ope * p4o) + (con * p4c) + (ext * p4e) + (agr * p4a) + (neu * p4n)`$$
Giving a predicted value of $\hat{Y}$ = `r hy4`, to two decimal places.
And so in summary, based on our above model, and the above measured values, we would predict the following values:
* Participant 1, $\hat{Y}$ = `r hy1`
* Participant 2, $\hat{Y}$ = `r hy2`
* Participant 3, $\hat{Y}$ = `r hy3`
* Participant 4, $\hat{Y}$ = `r hy4`