-
Notifications
You must be signed in to change notification settings - Fork 56
/
INFO-2.Rmd
424 lines (333 loc) · 25.8 KB
/
INFO-2.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
---
title: "Advanced Information Management"
output: html_document
---
```{r setup, message=FALSE, warning=FALSE, include=FALSE}
library(pointblank)
```
In the [*Intro to Information Management*](../articles/INFO-1.html) article, we learned all about how to synthesize information on a table, giving us a useful report that can be published and widely shared. We used a **pointblank** *informant* with a set of information functions to generate *info text* and put that text into the appropriate report sections. We're going to take this a few steps further and look into some more functionality makes *info text* more dynamic and also include a finalizing step in this workflow that accounts for evolving data.
## Getting Snippets of Useful Text With the `info_snippet()` Function
A great source of information about the table can be the table itself. Suppose you want to show some categorical values from a particular column. Maybe you'd like to display the range of values in an important numeric column. Perhaps show some KPI values that can be calculated using data in the table? This can all be done with the `info_snippet()` function. You give the snippet a name and you give it a function call. Let's do this for the `small_table` dataset available in **pointblank**. This is what that table looks like:
```{r paged.print=FALSE}
small_table
```
If you wanted the mean value of data in column `d` rounded to one decimal place, one such way we could do it is with this expression:
```{r}
small_table %>% .$d %>% mean() %>% round(1)
```
Inside of an `info_snippet()` call, which is used after creating the *informant* object, the expression would look like this:
```{r eval=FALSE}
informant <-
create_informant(
tbl = small_table,
tbl_name = "small_table",
label = "Example No. 2"
) %>%
info_snippet(
snippet_name = "mean_d",
fn = ~ . %>% .$d %>% mean() %>% round(1)
)
```
The `small_table` dataset is associated with the `informant` as the target table, so, it's represented as the leading `.` in the functional sequence given to `fn` inside of `info_snippet()`. It's important to note that there's a leading `~`, making this expression a RHS formula (i.e., we don't want to execute anything here, at this time). Lastly, the snippet has been given the name `"mean_d"`. We know that this snippet will produce the value `2304.7` so what can we do with that? We should put that value into some *info text* and use the `snippet_name` as the key. It works similarly to how the **glue** package does text interpolation, and here's the continuation of the above example:
```{r eval=FALSE}
informant <-
informant %>%
info_columns(
columns = d,
info = "This column contains fairly large numbers (much larger than
those numbers in column `a`. The mean value is {mean_d}, which is
far greater than any number in that other column."
)
```
Within the text, there's the use of curly braces and the name of the snippet (`{mean_d}`). That's where the `2304.7` value will be inserted. This methodology for inserting the computed values of snippets can be performed wherever *info text* is provided (in either of the `info_tabular()`, `info_columns()`, and `info_section()` functions). Let's take a look at the report by printing the `informant` object
```{r eval=FALSE}
informant
```
<div style="text-align: center;"><img src="https://silly-jackson-b3dec8.netlify.app/informant_report_7.png"
alt="This is a tabular report entitled 'Pointblank Information'. The header contains information about the input table (identifying it as a tibble called 'small_table') and includes the number of rows (13) and columns (8). In the table body, the section entitled 'Columns' has a row for each column that has been identified in 'small_table'. Each row consists of the column name (enclosed in a box) and the column type (for example, 'Date', 'integer', etc.). There are 8 columns in the input table so there are 8 rows in this reporting table ('date_time', 'date', and 'a' through to 'f'). The row concerning column 'd' contains the text: 'This column contains fairly large numbers (much larger than those numbers in column 'a'. The mean value is {mean_d}, which is far greater than any number in that other column.'"
width=80%></div>
<br>
Hmm. There is `"... {mean_d} ..."` text in the report that should have been replaced with the mean value of column `d`. What gives? Well, there's one finalizing step that needs to be done and should always be done to wrap up the *Information Management* workflow and that is the use of the `incorporate()` function. Let's write the whole thing again and finish it off with a call to `incorporate()`.
```{r eval=FALSE}
informant <-
create_informant(
tbl = small_table,
tbl_name = "small_table",
label = "Example No. 2"
) %>%
info_snippet(
snippet_name = "mean_d",
fn = ~ . %>% .$d %>% mean() %>% round(1)
) %>%
info_columns(
columns = d,
info = "This column contains fairly large numbers (much larger than
those numbers in column `a`. The mean value is {mean_d}, which is
far greater than any number in that other column."
) %>%
incorporate()
informant
```
<div style="text-align: center;"><img src="https://silly-jackson-b3dec8.netlify.app/informant_report_8.png"
alt="This is a tabular report entitled 'Pointblank Information'. Which is similar to the table presented just before. The key difference is that the row concerning column 'd' now contains the text: 'This column contains fairly large numbers (much larger than those numbers in column 'a'. The mean value is 2304.70, which is far greater than any number in that other column.' This revised text now contains a numeric figure instead of the placeholder for it."
width=80%></div>
<br>
This time, sweet success. The value appears and the overall formatting looks great! This is a very useful thing, so long as we remember to use the `incorporate()` function to make it happen (more on that in the next section).
## Ensuring That Snippets (and Other Table Metadata Element) Are Up-to-Date
Tables can change with time. Whether that data source is a public dataset, an organization's data table, or a continually-updated Excel file (😱), we should be ready for change. In the previous example, we used the `incorporate()` function to finalize the report. Without it, our snippet didn't work. There are two major things that `incorporate()` does for you in the *Information Management* workflow.
1. Evaluation of text snippets in all `info_snippet()` calls, and, insertion of snippets in *info text* within `"{<snippet_name>}"`.
2. Updating of table row and column counts in the header of the report.
We really are incorporating aspects of the table into the report with `incorporate()` but might also think of it as regenerating, refreshing, or renewing the table. It gives **pointblank** license to access the table the same way that `interrogate()` does in the [**VALID-I**]((../articles/VALID-I.html)) validation workflow. On the first use of `incorporate()`, all text snippets will be put in their places; subsequent uses of `incorporate()` will replace the appropriate text as necessary. Every use of `incorporate()` will update the row and column counts in the header.
Here's a short demo of the header changing, because it's pretty instructive. Let's use our `small_table` object as `target_table`. With `dim()` we can be totally sure of the table dimensions.
```{r}
target_table <- small_table
dim(target_table)
```
Let's allow an informant to access the `target_table` through the `tbl` argument but, in this case, the expression is `~ target_table` (it simply gets the table from the global workspace). After using `incorporate()` and printing the `informant_tt` object, let's just examine the header.
```{r eval=FALSE}
informant_tt <-
create_informant(
tbl = ~ target_table,
tbl_name = "target_table",
label = "Example No. 3"
) %>%
incorporate()
informant_tt
```
<div style="text-align: center;"><img src="https://silly-jackson-b3dec8.netlify.app/informant_report_9.png"
alt="This is a tabular report entitled 'Pointblank Information'. This excerpt presents only the header of the tabular report. The key information about the input table identifies it as a tibble called 'target_table'). The header includes the number of rows (13) and columns (8)."
width=80%></div>
<p align="center">This is an excerpt of the complete report, showing just the header.</p>
<br>
The number of rows and columns reported in the header checks out: 13 rows and 8 columns.
Now, let's manually enlarge the `target_table` and print the new row and column counts.
```{r}
target_table <-
dplyr::bind_rows(small_table, small_table) %>%
dplyr::mutate(g = a + c)
dim(target_table)
```
We've got our informant object, let's see how `incorporate()` keeps pace with the change.
```{r eval=FALSE}
informant_tt %>% incorporate()
```
<div style="text-align: center;"><img src="https://silly-jackson-b3dec8.netlify.app/informant_report_10.png"
alt="This is a tabular report entitled 'Pointblank Information'. This excerpt presents only the header of the tabular report. The key information about the input table identifies it as a tibble called 'target_table'). Compared to the last figure, the header contains an updated row count of 26 (compared to 13 previously)."
width=80%></div>
<p align="center">This is an excerpt of the complete report, showing just the header.</p>
<br>
Great! Using `incorporate()` has accurately updated the reporting of row and column counts in the header. And it's also very much worth noting that the use of `tbl = ~ target_table` rather than `tbl = target_table` is important here. Had the latter been used in `create_informant()`, that table would be bound to the informant in its initial state (with 13 rows and 8 columns) and any updates to the table wouldn't be reflected in the reporting upon using `incorporate()`. The table-prep formula used here is meant for re-obtaining the table each and every time the table is needed.
In short, unless you have no uses of `info_snippet()` and the table isn't expected to change, it's recommended to use `incorporate()` as the final call in this workflow.
## Helpful **pointblank** Functions that Work Exceedingly Well with `info_snippet()`
There are a few functions available in **pointblank** that make it much easier to get commonly-used text snippets. All of them begin with the `snip_` prefix and they are:
- `snip_list()`: Gets a list of column categories
- `snip_lowest()`: Gets the lowest value from a column
- `snip_highest()`: Gets the highest value from a column
Each of these functions can be used directly as a `fn` value and we don't have to specify the table since its assumed that the target table is where we'll be snipping data from. Let's have a look at each of these in action.
### The `snip_list()` Function
When describing some aspect of the target table, we may want to extract some values from a column and include them as a piece of info text. We'd want the values to be nicely formatted as a list (with commas) and we'd probably prefer that this be constrained to a certain size (so as to not potentially generate massive amounts of text). This can be efficiently done with `snip_list()`. Let's experiment with the combination of `snip_list()` and `info_snippet()`, extending the **palmerpenguins** example from the [*Intro to Information Management*](../articles/INFO-I.html) article.
```{r eval=FALSE}
informant_pp <-
create_informant(
tbl = ~ palmerpenguins::penguins,
tbl_name = "penguins",
label = "The `penguins` dataset from the **palmerpenguins** 📦."
) %>%
info_columns(
columns = species,
`ℹ️` = "A factor denoting penguin species ({species_snippet})."
) %>%
info_columns(
columns = island,
`ℹ️` = "A factor denoting island in Palmer Archipelago, Antarctica
({island_snippet})."
) %>%
info_snippet(
snippet_name = "species_snippet",
fn = snip_list(column = "species")
) %>%
info_snippet(
snippet_name = "island_snippet",
fn = snip_list(column = "island")
) %>%
incorporate()
informant_pp
```
<div style="text-align: center;"><img src="https://silly-jackson-b3dec8.netlify.app/informant_report_11.png"
alt="This is a tabular report entitled 'Pointblank Information'. The header contains information about the input table, which is the 'penguins' dataset). Information on the row and column count is available (344 rows and 8 columns). In the table body, the section entitled 'Columns' has a row for each column that has been identified in 'penguins'. Presented are only the first two rows, corresponding to the first two columns in the penguins dataset: 'species' and 'island'. The 'species' row contains the text: 'A factor denoting penguin species (Adelie, Gentoo, and Chinstrap)'. The 'island' row contains this text: 'A factor denoting island in Palmer Archipelago, Antarctica (Torgersen, Biscoe, and Dream)'."
width=80%></div>
<p align="center">This is an excerpt of the complete report, showing just the header and part of the <strong>COLUMNS</strong> section.</p>
<br>
This seemed to work out quite well. No need for determining what these strings are and then hardcoding them to the *info text*, `snip_list()` did all the work here.
This also works for numeric values. Let's use `snip_list()` to provide a text snippet based on values in the `year` column (which is an `integer` column):
```{r eval=FALSE}
informant_pp <-
informant_pp %>%
info_columns(
columns = year,
`ℹ️` = "The study year ({year_snippet})."
) %>%
info_snippet(
snippet_name = "year_snippet",
fn = snip_list(column = "year")
) %>%
incorporate()
informant_pp
```
<div style="text-align: center;"><img src="https://silly-jackson-b3dec8.netlify.app/informant_report_12.png"
alt="This is an except of the tabular report entitled 'Pointblank Information', based on the 'penguins' dataset. Presented is only the row that corresponds to the 'year' column in the penguins dataset. The 'year' row contains the text: 'The study year (2007, 2008, and 2009)'."
width=80%></div>
<p align="center">This is an excerpt of the complete report, showing just the bottom of the <strong>COLUMNS</strong> section and the footer.</p>
<br>
Again, no issues with the formatting and display of values. We got the *info text* `"The study year ("2007", "2008", and "2009" )."` for our efforts here and it saved us from having to determine this, plus, should the data be updated with new `year` values, that will be reflected in this info text upon using `incorporate()`. Refreshed *info text* really provides huge benefits, especially when the data changes a lot (e.g., database tables).
### The `snip_lowest()` and `snip_highest()` Functions
We can get the lowest and highest values from a column and inject those formatted values into some *info_text*. Let's do that for some of the measured values in the `penguins` dataset with `snip_lowest()` and `snip_highest()`.
```{r eval=FALSE}
informant_pp <-
informant_pp %>%
info_columns(
columns = bill_length_mm,
`ℹ️` = "A number denoting bill length"
) %>%
info_columns(
columns = bill_depth_mm,
`ℹ️` = "A number denoting bill depth (in the range of
{min_depth} to {max_depth} millimeters)."
) %>%
info_columns(
columns = flipper_length_mm,
`ℹ️` = "An integer denoting flipper length"
) %>%
info_columns(
columns = matches("length"),
`ℹ️` = "(in units of millimeters)."
) %>%
info_columns(
columns = flipper_length_mm,
`ℹ️` = "Largest observed is {largest_flipper_length} mm."
) %>%
info_snippet(
snippet_name = "min_depth",
fn = snip_lowest(column = "bill_depth_mm")
) %>%
info_snippet(
snippet_name = "max_depth",
fn = snip_highest(column = "bill_depth_mm")
) %>%
info_snippet(
snippet_name = "largest_flipper_length",
fn = snip_highest(column = "flipper_length_mm")
) %>%
incorporate()
informant_pp
```
<div style="text-align: center;"><img src="https://silly-jackson-b3dec8.netlify.app/informant_report_13.png"
alt="This is a tabular report entitled 'Pointblank Information'. The header contains information about the input table (identifying it as a tibble called 'penguins') and includes the number of rows (344) and columns (8). In the table body, the section entitled 'Columns' has a row for each column that has been identified in 'penguins'. Each row consists of the column name (enclosed in a box) and the column type (for example, 'Date', 'integer', etc.). Descriptive information for each column in the 'penguins' dataset is available in each row. Here is a listing for each: (1)
species, A factor denoting penguin species (Adélie, Chinstrap, and Gentoo); (2) island, A factor denoting island in Palmer Archipelago, Antarctica (Biscoe, Dream, or Torgersen); (3) bill_length_mm, A number denoting bill length (in units of millimeters); (4) bill_depth_mm, A number denoting bill depth (in the range of 13.1 to 21.5 millimeters); (5) flipper_length_mm, An integer denoting flipper length (in units of millimeters), largest observed is 231 mm.; (6) body_mass_g has no descriptive text and neither does the row (7), which corresponds to the sex column; however, (8) year, has the text of The study year (e.g., 2007, 2008, 2009)."
width=80%></div>
<br>
We can see from the report output that we can creatively use the lowest and highest values obtained by `snip_lowest()` and `snip_highest()` to specify a range or simply show some maximum value. While the ordering of the `info_columns()` calls in the example affects the overall layout of the text (through the text appending behavior), the placement of `info_snippet()` calls *does not* matter. And, again, we must use `incorporate()` to update all of the text snippets and render them in their appropriate locations (inside each `{<snippet_name>}`).
## *Text Tricks*
While your *info text* can be jazzed up with Markdown, there are a few extra tricks that make authoring the text a bit more pleasurable. Once you know about these *text tricks* you'll be able to express information in many more interesting ways.
### Links and Dates
If you have links in your text, **pointblank** will try to identify them and style them nicely. This amounts to using a pleasing, light-blue color and underlines that appear on hover. It doesn't take much to style links but it does require *something*. So, Markdown links written as `< link url >` or `[ link text ]( link url )` will both get the transformation treatment.
Sometimes you want dates to stand out from text. Try enclosing a date expressed in the ISO-8601 standard with parentheses, like this: `(2004-12-01)`. What will happen is that date will be set in a monospaced variation of the reporting font, and, it will be underlined in a striking shade of purple.
Here's how we might use these features while otherwise adding more information to the **palmerpenguins** reporting:
```{r eval=FALSE}
informant_pp <-
informant_pp %>%
info_tabular(
`R dataset` = "The goal of `palmerpenguins` is to provide a great dataset
for data exploration & visualization, as an alternative to `iris`. The
latest CRAN release was published on (2020-07-25).",
`data collection` = "Data were collected and made available by Dr. Kristen
Gorman and the [Palmer Station, Antarctica LTER](https://pal.lternet.edu),
a member of the [Long Term Ecological Research Network](https://lternet.edu).",
citation = "Horst AM, Hill AP, Gorman KB (2020). palmerpenguins: Palmer
Archipelago (Antarctica) penguin data. R package version 0.1.0.
<https://allisonhorst.github.io/palmerpenguins/>.
doi: 10.5281/zenodo.3960218."
) %>%
incorporate()
informant_pp
```
<div style="text-align: center;"><img src="https://silly-jackson-b3dec8.netlify.app/informant_report_14.png"
alt="This is a tabular report entitled 'Pointblank Information'. The header contains information about the input table (identifying it as a tibble called 'penguins') and includes the number of rows (344) and columns (8). In the table body, there is a section called 'Table' and three subsections: (1) 'R Dataset', (2) 'Data Collection', and (3) 'Citation'. The text for each of these subsections is available in (and taken verbatim from) the previous code listing. Notably, the date presented in the first subsection (2020-07-25) is prominently underlined and the text is styled with a monospaced variant of the report font."
width=80%></div>
<p align="center">This is an excerpt of the complete report, showing just the <strong>TABLE</strong> section and the header.</p>
<br>
### Labels
We can take portions of text and present them as labels. These will help you call out important attributes in short form and may eliminate the need for oft-repeated statements. You might apply to labels to signify priority, category, or any other information you find useful. To do this we have two options,
1. Use double parentheses around text to capture it in a rectangular label: `((label text))`
2. Use triple parentheses to capture text into a rounded-rectangular label: `(((label text)))`
```{r eval=FALSE}
informant_pp <-
informant_pp %>%
info_columns(
columns = body_mass_g,
`ℹ️` = "An integer denoting body mass."
) %>%
info_columns(
columns = c(ends_with("mm"), ends_with("g")),
`ℹ️` = "((measured))"
) %>%
info_section(
section_name = "additional notes",
`data types` = "(((factor))) (((numeric))) (((integer)))"
) %>%
incorporate()
informant_pp
```
<div style="text-align: center;"><img src="https://silly-jackson-b3dec8.netlify.app/informant_report_15.png"
alt="This is an except of a tabular report entitled 'Pointblank Information'. The header is not presented but, instead, there are two sections in the report: 'Columns' and 'Additional Notes'. The first section contains information on each of the columns from the penguins dataset. Notably, some of the information entries are adorned with a 'Measured' label (they appear at the end of the descriptive text entries, enclosed in a box). These labels occur in the rows describing the 'bill_length_mm', 'bill_depth_mm', 'flipper_length_mm', and 'body_mass_g' columns. In the 'Additional Notes' section of the report table (at the bottom), there is a subsection called 'Data Types'. Within that there are three labels enclosed in rounded boxes; these labels are 'factor', 'numeric', and 'integer'."
width=80%></div>
<p align="center">This is an excerpt of the complete report, showing just the <strong>COLUMNS</strong> and <strong>ADDITIONAL NOTES</strong> sections.</p>
<br>
### Get Stylin'
If you want to use CSS styles on spans of *info text*, it's possible with the following construction:
`[[ info text ]]<< CSS style rules >>`
It's important to ensure that each CSS rule is concluded with a `;` character in this syntax. Styling the word `factor` inside a piece of *info text* might look like this:
`This is a [[factor]]<<color: red; font-weight: 300;>> value.`
Where the result looks something like this:
<div style="text-align: center;"><img src="https://silly-jackson-b3dec8.netlify.app/info_text_styled.png"
alt="This is a snippet of text that reads 'This is a factor value.' Moreover, the word 'factor' is written in red whereas the rest of the text is in a dark gray."
width=40%></div>
<br>
There are many CSS style rules that can be used. Here's a sample of a few useful ones:
- `color: <a color value>;` (text color)
- `background-color: <a color value>;` (the text's background color)
- `text-decoration: (overline | line-through | underline);`
- `text-transform: (uppercase | lowercase | capitalize);`
- `letter-spacing: <a +/- length value>;`
- `word-spacing: <a +/- length value>;`
- `font-style: (normal | italic | oblique);`
- `font-weight: (normal | bold | 100-900);`
- `font-variant: (normal | bold | 100-900);`
- `border: <a color value> <a length value> (solid | dashed | dotted);`
Continuing with our **palmerpenguins** reporting, we'll add some more *info text* and take the opportunity to add CSS style rules using the `[[ ]]<< >>` syntax.
```{r eval=FALSE}
informant_pp <-
informant_pp %>%
info_columns(
columns = sex,
`ℹ️` = "A [[factor]]<<text-decoration: underline;>>
denoting penguin sex (female or male)."
) %>%
info_section(
section_name = "additional notes",
keywords = "
[[((penguins))]]<<border-color: platinum; background-color: #F0F8FF;>>
[[((Antarctica))]]<<border-color: #800080; background-color: #F2F2F2;>>
[[((measurements))]]<<border-color: #FFB3B3; background-color: #FFFEF4;>>
"
) %>%
incorporate()
informant_pp
```
<div style="text-align: center;"><img src="https://silly-jackson-b3dec8.netlify.app/informant_report_16.png"
alt="This is an except of a tabular report entitled 'Pointblank Information'. The header is not presented but, instead, there is one section displayed from the full report: 'Additional Notes'. There are two subsections in that: 'Data Types' and 'Keywords'. Within the 'Data Types' subsection there are three labels enclosed in rounded boxes; these labels are 'factor', 'numeric', and 'integer'. Within the 'Keyword' subsection there are three different labels enclosed in square boxes; these labels are 'penguins', 'Antarctica', and 'measurements'. These last three labels have customs colors for their box borders and their background fill."
width=80%></div>
<p align="center">This is an excerpt of the complete report, showing just the bottom of the <strong>COLUMNS</strong> section, the <strong>ADDITIONAL NOTES</strong> section, and the footer.</p>
<br>
With the above `info_columns()` and `info_section()` function calls, we are able to style a single word (with an underline) and even style labels (changing the border and background colors). The syntax here is somewhat forgiving, allowing you to put line breaks between `]]` and `<<` and between style rules so that lines of markup don't have to be overly long.
So, what do you think of all these *text tricks*? You got to admit they can spice up the proceedings. More of them will inevitably be added as development on **pointblank** proceeds. But that's it for now. *Don't you think you've had enough?*