/
01-descriptives-d.Rmd
141 lines (77 loc) · 7.35 KB
/
01-descriptives-d.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
# A full descriptive
To look at a full descriptive example including the mean, standard deviation, standard error, 95% Confidence Intervals, and z-scores, we will use the following data from 10 participants:
```{r we, message=FALSE, warning=FALSE, echo=FALSE}
dat_o <- tibble(Participants = 1:10,
PreTest = c(60, 64, 56, 82, 74, 79, 63, 59, 72, 66),
PostTest = c(68, 75, 62, 85, 73, 85, 64, 59, 73, 70))
dat <- dat_o %>% mutate(X = abs(PostTest - PreTest)) %>% select(Participants, X)
meanx <- dat %>% pull(X) %>% mean()
varx <- dat %>% pull(X) %>% var()
sdx <- dat %>% pull(X) %>% sd()
nx <- dim(dat)[1]
semx <- sdx/sqrt(nx)
zci <- 1.96
knitr::kable(dat, align = "c")
```
### The Mean
We know the formula for the mean is:
$$\overline{X} = \frac{\sum_i^n{x_i}}{n}$$
If we start to fill in the information from above we get:
$$\overline{X} = \frac{`r dat %>% filter(Participants == 1) %>% pull(X)` + `r dat %>% filter(Participants == 2) %>% pull(X)` + `r dat %>% filter(Participants == 3) %>% pull(X)` +`r dat %>% filter(Participants == 4) %>% pull(X)` +`r dat %>% filter(Participants == 5) %>% pull(X)` +`r dat %>% filter(Participants == 6) %>% pull(X)` +`r dat %>% filter(Participants == 7) %>% pull(X)` +`r dat %>% filter(Participants == 8) %>% pull(X)` +`r dat %>% filter(Participants == 9) %>% pull(X)` +`r dat %>% filter(Participants == 10) %>% pull(X)`}{`r dim(dat)[1]`}$$
And if we sum all the values on the top half together we get:
$$\overline{X} = \frac{`r dat %>% pull(X) %>% sum()`}{`r dim(dat)[1]`}$$
And finally divide the top half by the bottom half, leaving:
$$\overline{X} = `r (dat %>% pull(X) %>% sum())/(dim(dat)[1])`$$
We find that the mean, rounded to two decimal places, is **$\overline{X} = `r meanx`$**
### Standard Deviation
Next we want an interval estimate, most commonly the standard deviation. The formula for the standard deviation is:
$$s = \sqrt\frac{\sum_i^n(x_{i} - \overline{x})^2}{n-1}$$
So let's again start by filling in the numbers to the formula:
$$s = \sqrt\frac{(`r dat %>% filter(Participants == 1) %>% pull(X)` - `r meanx`)^2 + (`r dat %>% filter(Participants == 2) %>% pull(X)` - `r meanx`)^2 + (`r dat %>% filter(Participants == 3) %>% pull(X)` - `r meanx`)^2 + (`r dat %>% filter(Participants == 4) %>% pull(X)` - `r meanx`)^2 + (`r dat %>% filter(Participants == 5) %>% pull(X)` - `r meanx`)^2 + \\(`r dat %>% filter(Participants == 6) %>% pull(X)` - `r meanx`)^2 + (`r dat %>% filter(Participants == 7) %>% pull(X)` - `r meanx`)^2 + (`r dat %>% filter(Participants == 8) %>% pull(X)` - `r meanx`)^2 + (`r dat %>% filter(Participants == 9) %>% pull(X)` - `r meanx`)^2 + (`r dat %>% filter(Participants == 10) %>% pull(X)` - `r meanx`)^2}{`r dim(dat)[1]` - 1}$$
And we do the subtractions in all those brackets of the top half and the one on the bottom:
$$s = \sqrt\frac{(`r dat %>% filter(Participants == 1) %>% pull(X) - meanx`)^2 + (`r dat %>% filter(Participants == 2) %>% pull(X) - meanx`)^2 + (`r dat %>% filter(Participants == 3) %>% pull(X) - meanx`)^2 + (`r dat %>% filter(Participants == 4) %>% pull(X) - meanx`)^2 + (`r dat %>% filter(Participants == 5) %>% pull(X) - meanx`)^2 + \\(`r dat %>% filter(Participants == 6) %>% pull(X) - meanx`)^2 + (`r dat %>% filter(Participants == 7) %>% pull(X) - meanx`)^2 + (`r dat %>% filter(Participants == 8) %>% pull(X) - meanx`)^2 + (`r dat %>% filter(Participants == 9) %>% pull(X) - meanx`)^2 + (`r dat %>% filter(Participants == 10) %>% pull(X) - meanx`)^2}{`r dim(dat)[1]-1`}$$
Then we square all those values on the top half:
$$s = \sqrt\frac{`r (dat %>% filter(Participants == 1) %>% pull(X) - meanx)^2` + `r (dat %>% filter(Participants == 2) %>% pull(X) - meanx)^2`+ `r (dat %>% filter(Participants == 3) %>% pull(X) - meanx)^2`+ `r (dat %>% filter(Participants == 4) %>% pull(X) - meanx)^2`+ `r (dat %>% filter(Participants == 5) %>% pull(X) - meanx)^2`+ `r (dat %>% filter(Participants == 6) %>% pull(X) - meanx)^2`+ `r (dat %>% filter(Participants == 7) %>% pull(X) - meanx)^2`+ `r (dat %>% filter(Participants == 8) %>% pull(X) - meanx)^2`+ `r (dat %>% filter(Participants == 9) %>% pull(X) - meanx)^2`+ `r (dat %>% filter(Participants == 10) %>% pull(X) - meanx)^2`}{`r dim(dat)[1]-1`}$$
Sum those values together:
$$s = \sqrt\frac{`r ((dat %>% filter(Participants == 1) %>% pull(X) - meanx)^2) + ((dat %>% filter(Participants == 2) %>% pull(X) - meanx)^2) + ((dat %>% filter(Participants == 3) %>% pull(X) - meanx)^2) + ((dat %>% filter(Participants == 4) %>% pull(X) - meanx)^2) + ((dat %>% filter(Participants == 5) %>% pull(X) - meanx)^2) + ((dat %>% filter(Participants == 6) %>% pull(X) - meanx)^2) + ((dat %>% filter(Participants == 7) %>% pull(X) - meanx)^2) + ((dat %>% filter(Participants == 8) %>% pull(X) - meanx)^2) + ((dat %>% filter(Participants == 9) %>% pull(X) - meanx)^2) + ((dat %>% filter(Participants == 10) %>% pull(X) - meanx)^2)`}{`r dim(dat)[1]-1`}$$
Divide the top half by the bottom half:
$$s = \sqrt{`r varx`}$$
And finally take the square root:
$$s = `r sdx`$$
We find that the standard deviation, rounded to two decimal places, is **$s = `r sdx %>% round2(2)`$**
### Standard Error of the Mean
We also want to include a measure of how our sample relates to the population of interest and to do that we can use the Standard Error of the Mean and 95% Confidence Intervals. The formula for the standard error can be written as:
$$SE = \frac{SD}{\sqrt{n}}$$
We know that our $SD = `r sdx`$ and that we have $n = `r nx`$ participants, so:
$$SE = \frac{`r sdx`}{\sqrt{`r nx`}}$$
And if we do the square root of the bottom half (denominator):
$$SE = \frac{`r sdx`}{`r sqrt(nx)`}$$
And divide the top by the bottom:
$$SE = `r semx`$$
We find that the standard error, rounded to two decimal places, is $SE = `r semx %>% round2(2)`$
### Confidence Intervals
And we can use the standard error not to calculate the Upper and Lower 95% Confidence Intervals using the cut-off value (assuming $\alpha = .05$ and two-tailed) of $z = 1.96$. The key formulas are:
$$Upper \space 95\%CI = \overline{x} + (z \times SE)$$
And
$$Lower \space 95\%CI = \overline{X} - (z \times SE)$$
We know that:
* $\overline{x} = `r meanx`$
* $SE = `r semx`$
* $z = `r zci`$
**Upper 95% CI**
Dealing with the **Upper 95% CI** we get:
$$Upper \space 95\%CI = `r meanx` + (`r zci` \times `r semx`)$$
And if we sort out the bracket first:
$$Upper \space 95\%CI = `r meanx` + `r zci * semx`$$
Which leaves us with:
$$Upper \space 95\%CI = `r meanx + (zci * semx)`$$
**Lower 95% CI**
And now the **Lower 95% CI** we get:
$$Lower \space 95\%CI = `r meanx` - (`r zci` \times `r semx`)$$
And if we sort out the bracket first:
$$Lower \space 95\%CI = `r meanx` - `r zci * semx`$$
Which leaves us with:
$$Lower \space 95\%CI = `r meanx - (zci * semx)`$$
And if we round both those values to two decimal places, then you get **Lower 95%CI = `r (meanx - (zci * semx)) %>% round2(2)`** and **Upper 95%CI = `r (meanx + (zci * semx)) %>% round2(2)`**
### The Write-Up
And if we were to do a decent presentation of this data, including a measure of central tendency (the mean), a measure of spread relating to the sample (the standard deviation), and a measure of spread relating to the population (the 95% CI), then it would start as something like: **"We ran `r nx` participants (M = `r meanx %>% round2(2)`, SD = `r sdx %>% round2(2)`, 95%CI = [`r (meanx - (zci * semx)) %>% round2(2)`, `r (meanx + (zci * semx)) %>% round2(2)`])....."**