/
grim.Rmd
454 lines (298 loc) · 18.1 KB
/
grim.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
---
title: "GRIM"
output:
rmarkdown::html_vignette
# fig_width: 6
# fig_height: 5.5
vignette: >
%\VignetteIndexEntry{GRIM}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
bibliography: references.bib
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
pkgload::load_all()
```
```{r setup, message=FALSE}
library(scrutiny)
```
Granularity-related inconsistency of means, or GRIM, is a test for the mathematical consistency of reported means or proportions with the corresponding sample size [@brown_grim_2017]. It can be applied to summary statistics of discrete numerical distributions. GRIM answers a simple question: Is it possible that a granular distribution has both the reported mean or percentage and the reported sample size?
This vignette covers scrutiny's implementation of the GRIM test. It has the following sections --- to get started, though, you only need the first one:
1. The basic `grim()` function and a specialized mapping function, `grim_map()`.
2. Testing sequences of means or proportions.
3. The `audit()` method for summarizing `grim_map()`'s results.
4. The visualization function `grim_plot()`.
5. Statistical benchmarks, such as granularity and the GRIM ratio.
## Basic GRIM testing
### Few cases: `grim()`
To test if a reported mean of 5.27 on a granular scale is GRIM-consistent with a sample size of 43, run this:
```{r}
grim(x = "5.27", n = 43)
```
Note that `x`, the reported mean, needs to be a string. The reason is that strings preserve trailing zeros, which can be crucial for GRIM-testing. Numeric values don't, and even converting them to strings won't help. A workaround for larger numbers of such values, `restore_zeros()`, is discussed in `vignette("wrangling")`.
`grim()` has some further parameters, but all of them can be used from within `grim_map()`. The other parameters will be discussed in that context because `grim_map()` is often the more useful function in practice. Furthermore, although `grim()` is vectorized, `grim_map()` is more safe and convenient for testing multiple combinations of means/proportions and sample sizes.
### Many cases: `grim_map()`
If you want to GRIM-test more than a handful of cases, the recommended way is to enter them into a data frame and to run `grim_map()` on the data frame. Two different ways to do that are discussed in `vignette("wrangling")`, but here, I will only describe an easily accessible solution for a single table.
Copy summary data from a PDF file and paste them into `tibble::tribble()`, which is available via scrutiny:
```{r}
flying_pigs1 <- tibble::tribble(
~x,
"8.97",
"2.61",
"7.26",
"3.64",
"9.26",
"10.46",
"7.39"
) %>%
dplyr::mutate(n = 28)
```
Use RStudio's multiple cursors to draw quotation marks around all the `x` values, and to set commas at the end. See `vignette("wrangling")`, section *With copy and paste*, if you are not sure how to do that.
Now, simply run `grim_map()` on that data frame:
```{r, error=TRUE}
grim_map(flying_pigs1)
```
The `x` and `n` columns are the same as in the input. By default, the number of `items` composing the mean is assumed to be 1. The main result, `consistency`, is the GRIM consistency of the former three columns. On the `ratio` column, see section *The GRIM ratio*.
### Scale items
If a mean is composed of multiple items, set the `items` parameter to that number. Below are hypothetical means of a three-items scale. With the single-item default, half of these are wrongly flagged as inconsistent:
```{r, error=TRUE}
jpap_1 <- tibble::tribble(
~x,
"5.90",
"5.71",
"3.50",
"3.82",
"4.61",
"5.24",
) %>%
dplyr::mutate(n = 40)
jpap_1 %>%
grim_map() # default is wrong here!
```
Yet, all of them are consistent if the correct number of items is stated:
```{r, error=TRUE}
jpap_1 %>%
grim_map(items = 3)
```
It is also possible to include an `items` column in the data frame instead:
```{r, error=TRUE}
jpap_2 <- tibble::tribble(
~x, ~items,
"6.92", 1,
"3.48", 1,
"1.59", 2,
"2.61", 2,
"4.04", 3,
"4.50", 3,
) %>%
mutate(n = 30)
jpap_2 %>%
grim_map()
```
### Percentage conversion
An underappreciated strength of GRIM is testing percentages. Since these are actually decimal numbers inflated by a factor of 100, percentages come with two "free" decimal places. However, percentages are often reported with decimal places beyond those two, which increases the probability of GRIM-inconsistencies unless true values were correctly reported.
Both `grim()` and `grim_map()` have a `percent` parameter which, if set to `TRUE`, divides the `x` values by 100 and increases the decimal count by two, so that percentages can be tested just like means:
```{r}
jpap_3 <- tibble::tribble(
~x, ~n,
"32.5", 438,
"35.6", 455,
"21.7", 501,
"39.3", 516,
)
jpap_3 %>%
grim_map(percent = TRUE)
```
### Reconstructed values
Set `show_rec` to `TRUE` if you want the values that were reconstructed during GRIM-testing to be displayed in the output. They will be columns prefixed with `rec_`:
```{r}
pigs1 %>%
grim_map(show_rec = TRUE) %>%
dplyr::select(4:8) # output cut down for printing
```
The additional columns are ---
- `rec_sum`: the sum total from which the mean or proportion was ostensibly derived.
- `rec_x_upper`: the upper reconstructed `x` value.
- `rec_x_lower`: the lower reconstructed `x` value.
- `rec_x_upper_rounded_up`: the `rec_x_upper` value rounded up.
- `rec_x_upper_rounded_down`: the `rec_x_upper` value rounded down.
- `rec_x_lower_rounded_up`: the `rec_x_lower` value rounded up.
- `rec_x_lower_rounded_down`: the `rec_x_lower` value rounded down.
The last four columns depend on `rounding`. Here, they follow the default `"up_or_down"`, leading to two columns for each of `rec_x_upper` and `rec_x_lower`. With a singular `rounding` procedure, such as `"up"`, there would only be one column each, and thus, two in total. The difference between these numbers is not greatly important, however, because rounding up and down mostly delivers the same results.
Internally, GRIM-consistency is determined by whether or not a stated `x` value is near-identical to either `rec_x_upper_rounded` or `rec_x_lower_rounded`. This algorithm follows the charitable and conservative protocol outlined by Brown and Heathers (2017). The `rec_*` columns were inspired by @bauer_expression_2021's Table 1 but present values in slightly different ways.
## Summarizing results with `audit()`
Following up on a call to `grim_map()`, the generic function `audit()` summarizes GRIM test results:
```{r}
flying_pigs1 %>%
grim_map() %>%
audit() %>%
dplyr::select(1:5) # output cut down for printing
```
These columns are ---
1. `incons_cases`: number of GRIM-inconsistent value sets.
2. `all_cases`: total number of value sets.
3. `incons_rate`: proportion of GRIM-inconsistent value sets.
4. `mean_grim_ratio`: average of GRIM ratios.
5. `incons_to_ratio`: ratio of `incons_rate` to `mean_ratio`.
6. `testable_cases`: number of GRIM-testable value sets (i.e., those with a positive ratio).
7. `testable_rate`: proportion of GRIM-testable value sets.
## Visualizing results with `grim_plot()`
There is a specialized visualization function for GRIM test results, `grim_plot()`:
```{r, error=TRUE, fig.width=6, fig.height=5.5}
jpap_5 <- tibble::tribble(
~x, ~n,
"7.19", 28,
"4.56", 34,
"0.42", 27,
"1.31", 25,
"3.48", 34,
"4.27", 29,
"6.21", 30,
"3.11", 18,
"5.39", 36,
"5.66", 18,
)
jpap_5 %>%
grim_map() %>%
grim_plot()
```
`grim_plot()` can only be called on `grim_map()`'s output. It will fail otherwise:
```{r, error=TRUE}
grim_plot(mtcars)
```
With its unusual optics, this plot will probably not fit everyone's taste. The results of `grim_plot()` are like those of error detection in general: They are not pretty, but they put the unvarnished truth on display.
The plot is strictly based on the laws governing GRIM. Its background raster shows all consistent (light) and inconsistent (dark) value pairs for two decimal places. Empirical values are shown in blue if consistent and red if inconsistent. Color settings and other ggplot2-typical options are available via arguments. Read about them at `grim_plot()`'s documentation.
You might notice the light vertical lines at $N = 40$ and $N = 80$: Few values are flagged as inconsistent here. This reflects `grim_map()`'s charitable default of accepting values rounded either up *or* down from 5. If a different `rounding` specification is chosen in the `grim_map()` call, the plot raster will adjust automatically (although it will often be the same as before):
```{r, fig.width=6, fig.height=5.5}
jpap_5 %>%
grim_map(rounding = "ceiling") %>%
grim_plot()
```
All `rounding` values other than `up_from`, `down_from`, and `up_from_or_down_from` are supported.
Speed is not much of a concern here because all the rasters are based on data already stored within the package (in [R/sysdata.rda](https://github.com/lhdjung/scrutiny/blob/main/R/sysdata.rda)), so they don't need to be generated on the spot every time the function is called. See R/data-gen.R for the way they were generated.
## Testing numeric sequences with `grim_map_seq()`
GRIM analysts might be interested in a mean or percentage value's numeric neighborhood. Suppose you found multiple GRIM inconsistencies as in out example `pigs1` data. You might wonder whether they are due to small reporting or computing errors.
Use `grim_map_seq()` to GRIM-test the values surrounding the reported means and sample sizes:
```{r}
out_seq1 <- grim_map_seq(pigs1)
out_seq1
```
### Summaries with `audit_seq()`
As this output is a little unwieldy, run `audit_seq()` on the results:
```{r}
audit_seq(out_seq1)
```
Here is what the output columns mean:
- `x` and `n` are the original inputs, reconstructed and tested for `consistency` here.
- `hits` is the number of GRIM-consistent value combinations found within the specified `dispersion` range.
- `diff_x` reports the absolute difference between `x` and the next consistent dispersed value (in dispersion steps, not the actual numeric difference). `diff_x_up` and `diff_x_down` report the difference to the next higher or lower consistent value, respectively.
- `diff_n`, `diff_n_up`, and `diff_n_down` do the same for `n`.
The default for `dispersion` is `1:5`, for five steps up and down. When the `dispersion` sequence gets longer, the number of hits tends to increase:
```{r}
out_seq2 <- grim_map_seq(pigs1, dispersion = 1:10)
audit_seq(out_seq2)
```
### Visualizing GRIM-tested sequences
It's curious what happens when we plot the output of `grim_map_seq()`. Like regular GRIM plots, however, it does give us a sense of how many tested values are consistent:
```{r, fig.width=6, fig.height=5.5}
grim_plot(out_seq1)
```
The crosses appear because `grim_map_seq()` creates sequences around both `x` and `n`. Restrict this process to any one of these with the `var` argument:
```{r, fig.width=6, fig.height=5.5}
out_seq1_only_x <- grim_map_seq(pigs1, var = "x")
out_seq1_only_n <- grim_map_seq(pigs1, var = "n")
grim_plot(out_seq1_only_x)
grim_plot(out_seq1_only_n)
```
## Handling unknown group sizes with `grim_map_total_n()`
### Problems from underreporting
Unfortunately, some studies that report group averages don't report the corresponding group sizes --- only a total sample size. This makes any direct GRIM-testing impossible because only `x` values are known, not `n` values. All that is feasible here in terms of GRIM is to take a number around half the total sample size, go up and down from it, and check which *hypothetical* group sizes are consistent with the reported group means. `grim_map_total_n()` semi-automates this process, motivated by a recent GRIM analysis [@bauer_expression_2021].
The study examined by @bauer_expression_2021 reported means of 5.3 and 4.71 for a total sample size of 40. With equal group sizes (i.e., 20 in each group), only 5.3 is GRIM-consistent, not 4.71. However, Bauer and Francis looked for a plausible scenario in which both means were consistent. They checked if those scenarios came about if the 40 participants were distributed across the two groups in slightly different ways.
More precisely, they went from a 20/20 group split to a 19/21 split, then to an 18/22 split, and finally to a 17/23 split. In the latter scenario, both means are consistent if 17 is paired with 4.71 and 23 with 5.3.
### Semi-automated solution
Instead of going about this manually, call `grim_map_total_n()`, followed by `audit_total_n()` for summarizing the results. It will find two further plausible scenarios in which both means are consistent; more on that below.
```{r}
df <- tibble::tibble(x1 = "4.71", x2 = "5.3", n = 40)
# Detailed results:
df_tested <- grim_map_total_n(df)
df_tested
# Summary:
audit_total_n(df_tested)
```
Each "hit" is a scenario in which both `x1` and `x2` are GRIM-consistent with one of the two hypothetical group sizes. By default (`dispersion = 0:5`), the function goes five steps up and down from `n`.
### Testing both ways
@bauer_expression_2021 only took those scenarios into account in which 4.71 was combined with the respective smaller group and 5.3 with the larger one, so 17/23 is the only "hit" they found (my term). However, the converse way of assigning hypothetical group sizes to the reported means would be equally justified. `grim_map_total_n()`, therefore, conducts two sets of GRIM tests: one for each way of pairing means and group sizes.
It thus finds the group sizes 19/21 and 16/24 to be GRIM-consistent if 5.3 is combined with the smaller group and 4.71 with the larger one (i.e., with pairing reversed from Bauer and Francis' analysis). In the `audit_total_n()` summary of the function's output, results from original pairing are named `hits_forth`, those from reversed pairing are named `hits_back`, and their sum is named `hits_total`.
This example only features one case --- the `df` tibble has just a single row. It could have any number of rows, though; and `grim_map_total_n()` would determine and count the "hits" for each of them. See the *Examples* section of `grim_map_total_n()`'s documentation.
## GRIM statistics
### The GRIM ratio
#### Formula
The `ratio` column in a tibble returned by `grim_map()` is the "GRIM ratio", i.e.:
$$
\frac{10^D - NL}{10^D}
$$
where $D$ is the number of decimal places in `x` (the mean or proportion), $N$ is the sample size, and $L$ is the number of scale items. Because $N, L \geq 1$, the GRIM ratio ranges from $-\infty$ to $1 - \frac{1}{10^D}$, asymptotically approaching 1. Its upper bound will be 0.9 if $D = 1$ and 0.99 if $D = 2$, etc.
#### Functions
`grim_ratio()` takes the arguments `x`, `n`, `items`, and `percent` as in `grim()` and `grim_map()`:
```{r}
grim_ratio(x = 1.42, n = 72)
grim_ratio(x = 5.93, n = 80, items = 3)
# Enter `x` as a string to preserve trailing zeros:
grim_ratio(x = "84.20", n = 40, percent = TRUE)
# Upper bounds:
grim_ratio_upper(x = 1.42)
grim_ratio_upper(x = "84.20", percent = TRUE)
```
In addition, `grim_total()` takes the same arguments but returns only the numerator of the above formula:
```{r}
grim_total(x = 1.42, n = 72)
grim_total(x = 5.93, n = 80, items = 3)
grim_total(x = "84.20", n = 40, percent = TRUE) # Enter `x` as string to preserve trailing zero
```
If `grim_map()`'s `prob` argument is set to `TRUE`, it adds a `prob` column that shows the probability of GRIM inconsistency. `prob` is derived from left-censoring the `ratio` column at 0, so it is equal to `ratio` if and only if $0 \leq ratio$. If $ratio < 0$, then $prob = 0$. (The GRIM ratio cannot be 1 or greater.)
#### Interpretation
If the GRIM ratio is non-negative, it can be interpreted as the proportion of inconsistent value sets corresponding to a given set of parameters. This is also the probability that a randomly chosen mean is GRIM-inconsistent. If the ratio is negative, the probability is 0.
Similarly, if the `grim_total()` value is non-negative, it can be interpreted as the total number of GRIM inconsistencies corresponding to a given set of parameters. If it is negative, that total is 0.
#### Origins
Although the term "GRIM ratio" is new, the formula is arguably implicit in Brown and Heathers' (2017) paper on GRIM. The numerator is a transformation of the formula presented on p. 364, and the authors discuss a common special case of the ratio (interpreted as a proportion) on p. 367:
> With reporting to two decimal places, for a sample size $N < 100$ [and a single item], a random mean value will be consistent in approximately $N$% of cases.
Assuming $N = 70$ and inserting all of these values into the above formula returns
$$
\frac{10^2-70×1}{10^2} = 0.3
$$
so that a random mean will be inconsistent in about 30% of cases and, conversely, consistent in about 70%.
Here is the same in code (assuming an arbitrary mean with two decimal places):
```{r}
grim_ratio(x = 0.99, n = 70)
```
Thus, all I did regarding the GRIM ratio was to make the general formula explicit and give it a name. Researchers may judge for themselves how useful it is for further analyses.
### Granularity and scale items
The granularity of a non-continuous distribution is the minimal amount by which two means or proportions of the distribution can differ. It is derived from the sample size and the number of scale items. The number of items, in turn, naturally follows from the distribution's sample size and granularity.
#### Formulas
The granularity ($G$) formula is
$$
G = \frac{1}{NL}
$$
where $N$ is the sample size and $L$ is the number of items.
The scale items formula is the converse:
$$
L = \frac{1}{NG}
$$
#### Functions
Suppose you have an ordinal distribution with 80 observations and five items. To get its granularity, run this:
```{r}
grim_granularity(n = 80, items = 4)
```
Now, imagine a distribution with 50 observations and a granularity of 0.01. To get the number of its items (actual or effective), use this code:
```{r}
grim_items(n = 50, gran = 0.01)
```
As the number of items itself has a granularity of 1, a call to `grim_items()` that doesn't return whole numbers indicates a problem in earlier computations. A warning to that effect will be displayed:
```{r}
grim_items(n = c(50, 65, 93), gran = 0.02)
```
# References