-
Notifications
You must be signed in to change notification settings - Fork 0
/
06-numerical.Rmd
305 lines (178 loc) · 12.2 KB
/
06-numerical.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
# Comparing numerical data
Statistical testing for numerical data is different from when we have categorical data as we are no longer limited to using frequencies. Instead, we can test differences in parameters, ranks and other measures. However, nonparametric tests explained at the end can also be used for ordinal variables.
This section covers only comparisons with one or two samples, i.e. when we compare values of one or two groups. Comparing more than two mean values is done using *Analysis of variance* and is considered in the next section. Also, this section only outlines some classical tests and there are actually many more.
## How to decide which test to use?
The particular test that we should choose to compare numeric data depends on
1. how many samples we have,
2. if the values are normally distributed,
3. if samples have equal variances, and
4. if the samples are paired.
This section describes the tests that we can apply in each of these cases. The list below can be used as a guide when deciding which test you should use.
- One sample
- Unpaired samples
- Normally distributed
- *One sample T-test*
- Not normally distributed
- *Wilcoxon signed-rank test*
- Two samples
- Unpaired samples
- Normally distributed
- Equal variance
- *Independent samples T-test assuming equal variances (Student test)*
- Unequal variance
- *Independent samples T-test assuming unequal variances (Welch test)*
- Not normally distributed
- *Mann-Whitney U test*
- Paired samples
- Normally distributed
- *Paired samples T-test / Wilcoxon rank-sum test*
- Not normally distributed
- *Wilcoxon signed-rank test*
- Three samples
- Unpaired samples
- Normally distributed
- *Analysis of Variance*
## One or two samples
### One sample
#### One sample T-test
See @navarro_learning_2018 section 11.2.
This test is used to determine **if our sample is taken from a population with a given mean**. So we test if the sample mean $\bar{x}$ is equal to a hypothetical population mean $\mu$.
Hypotheses:
$H_0: \bar{x} = \mu$
$H_1: \bar{x} \neq \mu$
Test statistic $t$ is calculated as
$$t = \frac{\bar{x} - \mu}{s \div \sqrt{n}},$$
where $\bar{x}$ is the sample mean and $\mu$ the hypothetical population mean that it is tested against and $s$ is sample standard deviation.
Test statistic is evaluated on t-distribution with $n-1$ degrees of freedom ( $df$ ).
Test assumes **normality** and **independence** of data. See @navarro_learning_2018 section 11.2.3.
> In Jamovi: `T-tests > One Sample T-test`.
> In R: `t.test(x, mu)`
### Two samples
#### Independent samples T-test assuming equal variances (Student test)
See @navarro_learning_2018 section 11.3.
This test **compares mean values of two samples to determine if these are equal**. Equality of means implies that **samples come from the same population**.
Hypotheses:
$H_0: \bar{x}_1 = \bar{x}_2$
$H_1: \bar{x}_1 \neq \bar{x}_2$
Test statistic $t$ is calculated as
$$t = \frac{\bar{x}_1 - \bar{x}_2}{se_{\bar{x}_1 - \bar{x}_2}},$$
where $\bar{x}_1$ and $\bar{x}_2$ are the means of samples and $se_{\bar{x}_1 - \bar{x}_2}$ is the standard error of the difference of means that is calculated as follows:
$$se_{\bar{x}_1 - \bar{x}_2} = s_p\sqrt{\frac{1}{n_1} +{\frac{1}{n_2}}},$$
where $n_1$ and $n_2$ are sample sizes and $s_p$ is the *pooled standard deviation* calculated as
$$s_p = \sqrt{\frac{(n_1-1)s_1^2 + (n_2-1)s_2^2}{n_1 + n_2 - 2}},$$
where $s_1$ and $s_2$ are standard deviations of the samples.
Test statistic is evaluated on t-distribution with $n_1+n_2-2$ $df$.
Test assumes **normality** and **independence** of data and **homogeneity of variance**. The latter means that variances of the samples need to be equal. This is true if $\frac{1}{2} < \frac{s_1}{s_2}<2$. See @navarro_learning_2018 section 11.3.7.
> In Jamovi: `T-tests > Independent Samples T-test`.
> In R: `t.test(x, y)`
#### Independent samples T-test not assuming equal variances (Welch test)
See @navarro_learning_2018 section 11.4.
This test is equivalent to previously described test but now we **don't assume equal variances**. Variances are unequal if $s_1 > 2s_2$ or $s_2 > 2s_1$.
Hypotheses are also the same as in case of the Student test.
$H_0: \bar{x}_1 = \bar{x}_2$
$H_1: \bar{x}_1 \neq \bar{x}_2$
Test statistic is the same as for Student test:
$$t = \frac{\bar{x}_1 - \bar{x}_2}{se_{\bar{x}_1 - \bar{x}_2}},$$
where $\bar{x}_1$ and $\bar{x}_2$ are the means of samples and $se_{\bar{x}_1 - \bar{x}_2}$ is the standard error of the difference of means that is calculated as follows:
$$se_{\bar{x}_1 - \bar{x}_2} = \sqrt{\frac{s^2_1}{n_1} + \frac{s^2_2}{n_2}},$$
where $s_1$ and $s_2$ are unbiased standard deviations of the samples.
Test statistic is evaluated on t-distribution where $df$ is calculated as
$$df = \frac{\left(\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}\right)^2}{\frac{\left(s_1^2/n_1\right)^2}{n_1-1} + \frac{\left(s_2^2/n_2\right)^2}{n_2-1}}.$$
Test assumes **normality** and **independence** of data. See @navarro_learning_2018 section 11.4.2
> In Jamovi: `T-tests > Independent Samples T-test | Welch's`.
> In R: `t.test(x, y, var.equal = FALSE)`
## Unpaired or paired samples
Previous tests assumed independence of samples. This is not true if we have paired values. For instance, if samples contain measurements of same observations in two different points in time, then the values representing same observations are paired. In tests for such paired data we test **if the differences between each pair of values are large** enough to be also present in population.
#### Paired samples T-test
See @navarro_learning_2018 section 11.5.
Hypotheses:
$H_0: \bar{x}_1 = \bar{x}_2$
$H_1: \bar{x}_1 \neq \bar{x}_2$
Test statistic $t$ is calculated as
$$t = \frac{\hat{D}}{s_\Delta\div \sqrt{n_1 + n_2}},$$
where $s_\Delta$ is difference in standard deviation expressed as $s_\Delta = s_1 - s_2$ and $\hat{D}$ is the mean of differences between paired values calculated as
$$\hat{D} = \frac{1}{n} \sum_{i=1}^i{(X_{i1} - X_{i2})}.$$
Test statistic is evaluated on t-distribution with $n-1$ $df$ where $n$ is the number of pairs.
Test assumes normality of data.
> In Jamovi: `T-tests > Paired Samples T-test`.
> In R: `t.test(x, y, paired = T)`
## One-tailed or two-tailed tests
See @navarro_learning_2018 section 11.6.
All the tests described above were presented as one-tailed tests. This means that we were not interested in whether the differences are positive or negative. Two-tailed versions of these tests allow us to test not only equality of means but also if one mean is greater or smaller than another.
If we wish to test if **mean of sample ($\bar{x}$) is greater than hypothetical population mean ($\mu$)**, then our hypotheses are the following:
$H_0: \bar{x} \le \mu$
$H_1: \bar{x} \gt \mu$
If we wish to test if **mean of one sample ($\bar{x}_1$) is greater than mean of another sample ($\bar{x}_2$)**, then our hypotheses are the following:
$H_0: \bar{x}_1 \le \bar{x}_2$
$H_1: \bar{x}_1 \gt \bar{x}_2$
> In Jamovi: `T-tests > ... T-test | Group 1 > Group 2 or Group 1 < Group 2`.
> In R: `t.test(x, y, alternative = 'less') or t.test(x, y, alternative = 'greater')`
## Parametric or nonparametric tests
Population parameters (e.g. mean, standard deviance) can be estimated from sample parameters only if we can assume that the **distribution of values in population (and thus in sample) follows normal distribution**. If we can not make assumptions about the distribution or parameters of underlying population values we need to use nonparametric tests.
### Normality
There are various ways to determine whether or not values are normally distributed or not. Here we look at QQ plots and Shapiro-Wilk test.
#### QQ plot
See @navarro_learning_2018 section 11.8.1.
On such plots quantiles of data are plotted against theoretical quantiles representing normal distribution. **If quantiles of data are highly correlated to these theoretical quantiles (relationship follows a straight line), then data is normally distributed**. Interpretation of QQ plot is thus not precise.
> In Jamovi: `Exploration > Descriptives > Plots | Q-Q`.
> In R: `qqnorm(y)`
#### Shapiro-Wilk test
See @navarro_learning_2018 section 11.8.2.
This test determines if data is normally distributed or not. In other words, it tests **if sample comes from a normally distributed population**.
Test statistic $W$ is calculated as
$$W = \frac{(\sum^n_{i=1}{a_ix_i})^2}{\sum^n_{i=1}(x_i - \bar{x})^2}$$
The exact explanation of $a$ and thus the logic is complicated but as always, more extreme value of the test statistic $W$ indicates non-normality. If the test statistic is statistically significant, then data is not normally distributed:
$H_0$: Data is normally distributed
$H_1$: Data is not normally distributed
Note that Shapiro-Wilk test is sensitive to even small deviations from normality if sample size is large (thousands of observations). Also, the test can only be used for sample sizes less than 5000 observations. In such case, consider QQ plot.
> In Jamovi: `Exploration > Descriptives > Statistics | Shapiro-Wilk`.
> In R: `shapiro.test(x)`
### Nonparametric tests
#### Mann-Whitney U test
See @navarro_learning_2018 section 11.9.1.
This is a test to compare **distributions (and medians)** of two **unpaired** samples if we **can't assume normality**. The test is also known as Wilcoxon rank-sum test.
Hypotheses:
$H_{0}$: Distributions (medians) of both samples are the same.
$H_{1}$: Distributions (medians) of samples are different.
Formal definition of $H_{0}$ is as follows: a randomly selected value from one sample is equally likely to be less than or greater than a randomly selected value from a second sample.
Test statistic $U$ is calculated as
$$U = \sum^n_{i=1} \sum^m_{j=1} S(X_1, X_2),$$
where $n$ are rows and $m$ columns of a matrix $S(X_1, X_2)$ described as below.
$$S(X_1, X_2) = \begin{cases}
1 & \text{if } Y < X\\
\frac12 & \text{if } Y = X\\\
1 & \text{if } Y > X\
\end{cases}$$
Basically, we compare all values and **count the times when values from one sample are higher than values from another sample**. The $U$ is just the count of these differences.
For $n \ge 20$, p-value for $U$ is calculated on a normal distribution.
Test assumes independence of samples.
> In Jamovi: `T-tests > Independent Samples T-test | Mann-Whitney U`.
> In R: `wilcox.test(x, y)`
#### Wilcoxon signed-rank test
See @navarro_learning_2018 section 11.9.2
This is similar to Mann-Whitney U test but used for **paired** samples. The test is also known as One sample Wilcoxon test.
Hypotheses:
$H_{0}$: Distributions (medians) of both samples are the same.
$H_{1}$: Distributions (medians) of samples are different.
The W statistic is calculated as:
$$W = \sum^n_{i = 1}(sgn(x_{1i}x_{2i}) \times R_i),$$
where $sgn(x_{1i}x_{2i})$ is sign function (1 for positive difference, -1 for negative) and $R_i$ is the rank of absolute difference.
Basically, we are comparing **how different is the ranking of values between two samples**. The $W$ is just the sum of ranked differences.
For $n \ge 20$, p-value for $W$ is calculated on normal distribution.
Test has no relevant assumptions.
> In Jamovi: `T-tests > Paired Samples T-test | Wilcoxon rank`.
> In R: `wilcox.test(x, y)`
#### Kolmogorov-Smirnov test
This test is equivalent to Mann-Whitney U test, although the calculation is very different. This test **compares the overall shape of two distributions using cumulative distribution function**.
Hypotheses:
$H_{0}$: Distributions of both samples are the same.
$H_{1}$: Distributions of samples are different.
Test statistic $D$ is simply the maximum absolute difference between two cumulative distribution functions.
P-value is determined by the extremity of the test statistic $D$ on Kolmogorov distribution.
> In R: `ks.test(x, y)`
### Interpretation of test results
Suppose that one of the tests above indicates that p-value is $\le \alpha$. There are several ways in which we could interpret this result:
- the differences between the means/medians of two samples are different,
- there is a statistically significant relationship between the two variables,
- one variable has a statistically significant effect on another, or
- the samples come from differen populations.