-
Notifications
You must be signed in to change notification settings - Fork 0
/
data.qmd
730 lines (575 loc) · 24.5 KB
/
data.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
# Data {#sec-data}
```{r}
#| eval: true
#| echo: false
#| include: false
source("_common.R")
library(dplyr)
```
```{r}
#| label: co_box_tldr
#| echo: false
#| results: asis
#| eval: true
co_box(
color = "b",
look = "default", hsize = "1.15", size = "1.10",
header = "TLDR Data",
fold = TRUE,
contents = "
**Data files in your app-package:**
- **`data/`**: data files that will be used in your application (i.e., become part of your app-package namespace and accessed via `pkg::data`) should in stored in `data/`\n
- Add data to the data/ directory using `usethis::use_data()`\n
- Include the `LazyData: true` field in the `DESCRIPTION`\n
- **`data-raw/`**: scripts used to prepare data files can be created with `usethis::use_data_raw()`\n
- Store intermediate data files in `data-raw/`\n
- Store output files in `data/` (or inst/extdata/`)\n
- **`inst/extdata/`**: 'External' data files (i.e., non-R formatted data files) can be stored in `inst/extdata` and accessed using `system.file()`. \n
**Workflow:** start with the script that creates/downloads/wrangles your data using `usethis::use_data_raw()`, keep any intermediate or non-R formatted files in `inst/extdata/`, then export the final object to `data/` with `usethis::use_data()` \n
"
)
```
---
We've documented the functions in `moviesApp` and successfully managed the dependencies with the `NAMESPACE` and `DESCRIPTION` files. In this chapter, we're going to cover how make sure the `movie.RData` file becomes part of `moviesApp`, and other locations for data files in app-packages. For information on how to store and retrieve inside your application, see the chapter on [app Data](app_data.qmd).
:::: {.callout-tip collapse='true' appearance='default'}
## [Accessing the code examples]{style='font-weight: bold; font-size: 1.15em;'}
::: {style='font-size: 0.95em; color: #282b2d;'}
I've created the [`shinypak` R package](https://mjfrigaard.github.io/shinypak/) In an effort to make each section accessible and easy to follow:
Install `shinypak` using `pak` (or `remotes`):
```{r}
#| code-fold: false
#| message: false
#| warning: false
#| eval: false
# install.packages('pak')
pak::pak('mjfrigaard/shinypak')
```
Review the chapters in each section:
```{r}
#| code-fold: false
#| message: false
#| warning: false
#| collapse: true
library(shinypak)
list_apps(regex = '^07')
```
Launch an app:
```{r}
#| code-fold: false
#| eval: false
launch(app = "07_data")
```
:::
::::
## App-package data
Data in R packages are typically stored in one of three folders: `data/`, `data-raw/`, and `inst/extdata/`. The folder you use will depend on the format, accessibility, and intended purpose of the data file.[^data-pkgs-1]
[^data-pkgs-1]: Read more about the data folder in the ['Data in packages' section of Writing R Extenstions](https://cran.r-project.org/doc/manuals/r-release/R-exts.html#Data-in-packages) and the ['Data' chapter of R Packages, 2ed](https://r-pkgs.org/data.html).
## [`data/`]{style="font-size: 1.05em; font-weight: bold;"} {#sec-data-data}
The primary location for data is the `data/` folder. Objects in `data/` folder are available in your package namespace when it's installed and loaded, and can be accessed with the `::` syntax. See the example below of the `storms` data from `dplyr`:
```{r}
#| eval: true
#| code-fold: false
#| collapse: true
library(dplyr)
head(dplyr::storms)
```
### [`LazyData`]{style="font-size: 1.05em"}
Data files become part of a package when they're added to the `data/` folder and `LazyData: true` is added to the `DESCRIPTION` file.
- `LazyData: true`: this means the data are only loaded into memory if they are explicitly accessed by the user or a function in the package. Until then, only the dataset names is loaded. This user-friendly practice is the default for most R packages.
- `LazyData: false` (or omitted): accessing a data file from the package requires explicitly loading it using the `data()` function.
Files in `data/` should be in the `.rda` or `.RData` format. Below are the steps for adding `movies` to `moviesApp`:
1. Move the `movies.RData` file into a newly created the `data/` folder:
``` sh
moviesApp/
└──data/
└── movies.RData
```
2. Include `LazyData: true` in the `DESCRIPTION` file (I've added it above `Imports:`):
``` sh
Package: moviesApp
Version: 0.0.0.9000
Type: Package
Title: movies app
Description: A movies data shiny application.
Author: John Smith [aut, cre]
Maintainer: John Smith <John.Smith@email.io>
License: GPL-3
RoxygenNote: 7.2.3
Encoding: UTF-8
Roxygen: list(markdown = TRUE)
LazyData: true
Imports:
shiny,
ggplot2,
rlang,
stringr,
tools
```
3. Load, document, and install.
[<kbd>Ctrl/Cmd</kbd> + <kbd>Shift</kbd> + <kbd>L</kbd>]{style="font-style: italic; font-weight: bold; font-size: 1.10em"}
``` sh
ℹ Loading moviesApp
```
[<kbd>Ctrl/Cmd</kbd> + <kbd>Shift</kbd> + <kbd>D</kbd>]{style="font-style: italic; font-weight: bold; font-size: 1.10em"}
``` sh
==> devtools::document(roclets = c('rd', 'collate', 'namespace'))
ℹ Updating moviesApp documentation
ℹ Loading moviesApp
Documentation completed
```
[<kbd>Ctrl/Cmd</kbd> + <kbd>Shift</kbd> + <kbd>B</kbd>]{style="font-style: italic; font-weight: bold; font-size: 1.10em"}
In the **Build** pane, you'll notice a few new `** data` lines of output after adding data:
``` sh
** data
*** moving datasets to lazyload DB
** byte-compile and prepare package for lazy loading
```
We can check to see if movies has been included in `moviesApp` using the `package::data` syntax:
![`movies` is now part of `moviesApp`](images/data_movies_namespace.png){#fig-data_movies_namespace width='100%' fig-align='center'}
### [`usethis::use_data()`]{style="font-size: 0.95em"}
If you'd prefer to store data using the `.rda` format, the `usethis` package has the [`use_data()` function](https://usethis.r-lib.org/reference/use_data.html) that will automatically store an object in `data/` in the `.rda` format.
To use `usethis::use_data()`, we can load the `movies` data into the global environment with `load("movies.RData")`, then run `usethis::use_data(movies)`:
```{r}
#| eval: false
#| code-fold: false
usethis::use_data(movies)
```
```{verbatim}
#| eval: false
#| code-fold: false
✔ Setting active project to '/path/to/moviesApp'
✔ Adding 'R' to Depends field in DESCRIPTION
✔ Creating 'data/'
✔ Saving 'movies' to 'data/movies.rda'
• Document your data (see 'https://r-pkgs.org/data.html')
```
The `Depends:` field is added to the `DESCRIPTION` file with an R version (this ensures the data files will be loaded)
```{verbatim}
#| eval: false
#| code-fold: false
Depends:
R (>= 2.10)
```
*`use_data()` will also add `LazyData: true` to the `DESCRIPTION`*
## Documenting data {#sec-document-data}
Documenting data can be tedious, but it's worth the effort if you'll be sharing your package with collaborators. There are multiple ways to store dataset documentation. A common approach is to create a `data.R` file in the `R/` folder.[^data-ggplot2-example]
[^data-ggplot2-example]: The `ggplot2` package has a great example of documenting datasets in the [R/data.R](https://github.com/tidyverse/ggplot2/blob/main/R/data.R) file
```{r}
#| eval: false
#| code-fold: false
fs::file_create("R/data.R")
```
In `data.R`, we provide a `@title`, `@description`, and `@details` for the data (with or without the tags), followed by `@format`:
```{r}
#| eval: false
#| code-fold: false
#' @title IMDB movies data
#'
#' @description
#' Movie review data. Note: these data come from the [Building Web Applications with Shiny course](https://rstudio-education.github.io/shiny-course/).
#'
#' @details
#' Read more about acquiring these data in the ['Web Scraping and programming' section of Data science in a box](https://datasciencebox.org/02-exploring-data#web-scraping-and-programming)
#'
#' @format
```
### [`@format`]{style="font-size: 0.90em;"}
The text following `@format` is a one-sentence description of the data (with it's dimensions).
```{r}
#| eval: false
#| code-fold: false
#' @title IMDB movies data
#'
#' @description
#' Movie review data. Note: these data come from the [Building Web Applications with Shiny course](https://rstudio-education.github.io/shiny-course/).
#'
#' @details
#' Read more about acquiring these data in the ['Web Scraping and programming' section of Data science in a box](https://datasciencebox.org/02-exploring-data#web-scraping-and-programming)
#'
#' @format A data frame with [] rows and [] variables:
```
### [`\describe` & `\item`]{style="font-size: 0.90em;"}
Each variable (column) in the data is documented with a combination of `\describe` and `\item` (**pay close attention to the curly brackets**):
```{r}
#| eval: false
#| code-fold: false
#' \describe{
#' \item{variable}{description}
#' }
```
After closing the curly brackets in `\describe`, place the name of the data in quotes (`"movies"`) on the following line.
Below is the documentation for the first five columns in the `movies` dataset:
```{r}
#| eval: false
#| code-fold: false
#' @title IMDB movies data
#'
#' @description
#' Movie review data. Note: these data come from the [Building Web Applications with shiny course](https://rstudio-education.github.io/shiny-course/).
#'
#' @details
#' Read more about acquiring these data in the ['Web Scraping and programming' section of Data science in a box](https://datasciencebox.org/02-exploring-data#web-scraping-and-programming)
#'
#' @format A data frame with 651 rows and 34 variables:
#' \describe{
#' \item{title}{movie title}
#' \item{title_type}{type, fct (Documentary, Feature Film, TV Movie)}
#' \item{genre}{movie genre, fct (Action & Adventure, Animation, etc.}
#' \item{runtime}{movie length in minutes, num, avg = 106, sd = 19.4}
#' \item{mpaa_rating}{movie rating, fct (G, NC-17, PG, PG-13, R, Unrated)}
#' }
#'
"movies"
```
If we load and document `moviesApp`, we can see a preview of the help file:
[<kbd>Ctrl/Cmd</kbd> + <kbd>Shift</kbd> + <kbd>L</kbd>]{style="font-style: italic; font-weight: bold; font-size: 1.10em"}
```{verbatim}
#| eval: false
#| code-fold: false
ℹ Loading moviesApp
```
[<kbd>Ctrl/Cmd</kbd> + <kbd>Shift</kbd> + <kbd>D</kbd>]{style="font-style: italic; font-weight: bold; font-size: 1.10em"}
```{verbatim}
#| eval: false
#| code-fold: false
==> devtools::document(roclets = c('rd', 'collate', 'namespace'))
ℹ Updating moviesApp documentation
ℹ Loading moviesApp
Writing movies.Rd
Documentation completed
```
```{r}
#| eval: false
#| code-fold: false
?movies
```
::: {#fig-data_data_help}
![The `movies` help file](images/data_data_help.png){#fig-09_data_rd width='100%' fig-align='center'}
Documentation for the `movies` dataset
:::
I've provided documentation for the full `movies` dataset below.
```{r}
#| eval: false
#| code-fold: true
#| code-summary: 'show/hide full movies data documenation'
#' @title IMDB movies data
#'
#' @description
#' Movie review data. Note: these data come from the [Building Web Applications with Shiny course](https://rstudio-education.github.io/shiny-course/).
#'
#' @details
#' Read more about acquiring these data in the ['Web Scraping and programming' section of Data science in a box](https://datasciencebox.org/02-exploring-data#web-scraping-and-programming)
#'
#' @format A data frame with 651 rows and 34 variables:
#' \describe{
#' \item{title}{movie title}
#' \item{title_type}{type, fct (Documentary, Feature Film, TV Movie)}
#' \item{genre}{movie genre, fct (Action & Adventure, Animation, etc.}
#' \item{runtime}{movie length in minutes, num, avg = 106, sd = 19.4}
#' \item{mpaa_rating}{movie rating, fct (G, NC-17, PG, PG-13, R, Unrated)}
#' \item{studio}{name of studio, chr}
#' \item{thtr_rel_date}{Theatre release date, POSIXct, min = 1970-05-19 21:00:00, max = 2014-12-24 21:00:00}
#' \item{thtr_rel_year}{Theatre release year, num, min = 1970, max = 2014}
#' \item{thtr_rel_month}{Theatre release month, num, min = 1, max =12}
#' \item{thtr_rel_day}{Theatre release day, num, min = 1, max =31}
#' \item{dvd_rel_date}{DVD release date, POSIXct, min = 1991-03-27 21:00:00, max = 2015-03-02 21:00:00}
#' \item{dvd_rel_year}{DVD release year, num, min = 1991, max = 2015}
#' \item{dvd_rel_month}{DVD release month, num, min = 1, max = 12}
#' \item{dvd_rel_day}{DVD release day, num, min = 1, max = 31}
#' \item{imdb_rating}{Internet movie database rating, avg = 6.49, sd = 1.08}
#' \item{imdb_num_votes}{Internet movie database votes, avg = 57533, sd = 112124}
#' \item{critics_rating}{Rotten tomatoes rating, fct (Certified Fresh, Fresh, Rotten)}
#' \item{critics_score}{Rotten tomatoes score, avg = 57.7, sd = 28.4}
#' \item{audience_rating}{Audience rating, fct (Spilled, Upright)}
#' \item{audience_score}{Audience score, avg = 62.4, sd = 20.2}
#' \item{best_pic_nom}{Best picture nomination, fct (no, yes)}
#' \item{best_pic_win}{Best picture win, fct (no, yes)}
#' \item{best_actor_win}{Best actor win, fct (no, yes)}
#' \item{best_actress_win}{Best actress win, fct (no, yes)}
#' \item{best_dir_win}{Best director win, fct (no, yes)}
#' \item{top200_box}{Top 20 box-office, fct (no, yes)}
#' \item{director}{Name of director, chr}
#' \item{actor1}{Name of leading actor, chr}
#' \item{actor2}{Name of supporting actor, chr}
#' \item{actor3}{Name of #3 actor, chr}
#' \item{actor4}{Name of #4 actor, chr}
#' \item{actor5}{Name of #5 actor, chr}
#' \item{imdb_url}{IMDB URL}
#' \item{rt_url}{Rotten tomatoes URL}
#' }
#'
"movies"
```
### Using `movies`
After documenting the movies data in `data.R`, we'll remove the call to `load()` in the `mod_scatter_display_server()` function and replace it with a direct call to the dataset:
```{r}
#| eval: false
#| code-fold: false
mod_scatter_display_server <- function(id, var_inputs) {
shiny::moduleServer(id, function(input, output, session) {
inputs <- reactive({
plot_title <- tools::toTitleCase(var_inputs()$plot_title)
list(
x = var_inputs()$x,
y = var_inputs()$y,
z = var_inputs()$z,
alpha = var_inputs()$alpha,
size = var_inputs()$size,
plot_title = plot_title
)
})
output$scatterplot <- renderPlot({
plot <- scatter_plot(
df = movies, # <1>
x_var = inputs()$x,
y_var = inputs()$y,
col_var = inputs()$z,
alpha_var = inputs()$alpha,
size_var = inputs()$size
)
plot +
ggplot2::labs(
title = inputs()$plot_title,
x = stringr::str_replace_all(tools::toTitleCase(inputs()$x), "_", " "),
y = stringr::str_replace_all(tools::toTitleCase(inputs()$y), "_", " ")
) +
ggplot2::theme_minimal() +
ggplot2::theme(legend.position = "bottom")
})
})
}
```
1. The `movies` data from our package namespace
After loading, documenting, and installing the package, we see the following application:
:::: {.column-body-outset-right}
![`movies_app()` with `movies` data file](images/data_movies_app.png){#fig-data_movies_app width='100%' fig-align='center'}
::::
### More examples
To illustrate other options for data documentation, we'll use the [`dplyr` package.](https://github.com/tidyverse/dplyr) `dplyr` stores its data in the `data/` folder:
```{verbatim}
#| eval: false
#| code-fold: false
data/
├── band_instruments.rda
├── band_instruments2.rda
├── band_members.rda
├── starwars.rda
└── storms.rda
```
The documentation for the datasets in `dplyr` are stored in `R/` using a `data-` prefix:
```{verbatim}
#| eval: false
#| code-fold: false
R/
├── data-bands.R
├── data-starwars.R
└── data-storms.R
```
The three `band_` datasets have documented in a single file, [`data-bands.R`](https://github.com/tidyverse/dplyr/blob/main/R/data-bands.R):
```{r}
#| code-summary: 'show/hide documentation for dplyr::band_ datasets'
#| code-fold: true
#| eval: false
# from the dplyr github repo:
# https://github.com/tidyverse/dplyr/blob/main/R/data-bands.R
#
#' Band membership
#'
#' These data sets describe band members of the Beatles and Rolling Stones. They
#' are toy data sets that can be displayed in their entirety on a slide (e.g. to
#' demonstrate a join).
#'
#' `band_instruments` and `band_instruments2` contain the same data but use
#' different column names for the first column of the data set.
#' `band_instruments` uses `name`, which matches the name of the key column of
#' `band_members`; `band_instruments2` uses `artist`, which does not.
#'
#' @format Each is a tibble with two variables and three observations
#' @examples
#' band_members
#' band_instruments
#' band_instruments2
"band_members"
#' @rdname band_members
#' @format NULL
"band_instruments"
#' @rdname band_members
#' @format NULL
"band_instruments2"
```
In the example above, note that two of the datasets (`band_instruments` and `band_instruments2`) have the `@format` set to `NULL`, and define the help search name with `@rdname`. The `@examples` tag can be used to view the dataset when users click '**Run Examples**'
Either method works--what's important is that each dataset in your package *has* documentation.
```{r}
#| label: co_box_data_data
#| echo: false
#| results: asis
#| eval: true
co_box(
color = "g", fold = TRUE, look = "default",
hsize = "1.15", size = "1.10",
header = "Documenting data in `data/`",
contents = "
Documenting data requires the following `roxygen2` structure:
\`\`\`r
#'
#' @title single-sentence describing [data]
#'
#' @description
#' Single-paragraph describing [data]
#'
#' @format [data] number of rows and columns:
#' \\describe{
#' \\item{variable}{description}
#' \\item{variable}{description}
#' }
#'
\"[data]\"
\`\`\`
Replace `[data]` with the name of your dataset.")
```
## [`data-raw/`]{style="font-size: 1.05em; font-weight: bold;"} {#sec-data-data-raw}
The `data-raw` folder is not an official directory in the standard R package structure, but it's a common location for any data processing or cleaning scripts, and the raw data file for datasets stored in `data/`.[^data-raw-2]
[^data-raw-2]: Read more about the `data-raw` folder in [R Packages, 2ed](https://r-pkgs.org/data.html#sec-data-data-raw)
```{r}
#| label: co_box_data_raw
#| echo: false
#| results: asis
#| eval: true
co_box(
color = "o", fold = TRUE, look = "default",
hsize = "1.15", size = "1.10",
header = "Scripts for creating `movies` data",
contents = "
The code used to produce the `movies` dataset in the `data/` directory might* come from [this GitHub repo](https://github.com/mine-cetinkaya-rundel/rotten). If so, the `data-raw` folder is where [the data processing and preparation scritps](https://github.com/mine-cetinkaya-rundel/rotten/tree/master/working) would be stored (along with a copy of the data in `.csv` format) before saving a copy in the `data/` folder.
*I say 'might' because it's not clear if the `movies.RData` is the output from these `.R` files (although many of the column names match).
"
)
```
### More examples
If we look at the data in the [`dplyr` package](https://github.com/tidyverse/dplyr) again, we can see the [`data-raw/` folder](https://github.com/tidyverse/dplyr/tree/main/data-raw) contains a combination of `.R` and `.csv` files:
```{verbatim}
#| eval: false
#| code-fold: true
data-raw/
├── band_members.R
├── starwars.R
├── starwars.csv
└── storms.R
1 directory, 4 files
```
In this example, the [`starwars.R` script](https://github.com/tidyverse/dplyr/blob/main/data-raw/starwars.R) downloads & prepares `starwars`, then saves a `.csv` copy of the data [in `data-raw`]((https://github.com/tidyverse/dplyr/blob/main/data-raw/starwars.csv)).
## [`inst/extdata/`]{style="font-size: 1.05em; font-weight: bold;"} {#sec-data-inst-extdata}
The `extdata` folder (inside `inst/`) can be used for external datasets in other file formats (`.csv`, `.tsv`, `.txt`, `.xlsx`, etc).[^inst-extdata-3] The data files in `inst/extdata/` aren't directly loadable using the `package::data` syntax or the `data()` function like with the `data/` directory. These files can be imported using the file path accessor function, `system.file()`.
[^inst-extdata-3]: Read more about the `inst/extdata/` folder in [R Packages, 2ed](https://r-pkgs.org/data.html#sec-data-extdata)
For example, if we create the `inst/extdata/` and save a copy of `movies` as a [`.fst` file](https://www.fstpackage.org/fst/):
```{r}
#| eval: false
#| code-fold: false
library(fs)
library(tibble)
library(fst)
```
``` sh
fst package v0.9.8
```
```{r}
#| eval: false
#| code-fold: false
fs::dir_create("inst/extdata/")
fst::write_fst(
x = movies,
path = "inst/extdata/movies.fst",
compress = 75)
```
``` sh
fstcore package v0.9.14
(OpenMP was not detected, using single threaded mode)
```
Then load, document, and install `moviesApp`:
[<kbd>Ctrl/Cmd</kbd> + <kbd>Shift</kbd> + <kbd>L</kbd> / <kbd>D</kbd> / <kbd>B</kbd>]{style="font-style: italic; font-weight: bold; font-size: 1.10em"}
We can import `movies.fst` using `system.file()` to create a path to the file:
```{r}
#| eval: false
#| code-fold: false
tibble::as_tibble(
fst::read_fst(path =
system.file("extdata/", "movies.fst", package = "moviesApp")
)
)
```
```{r}
## A tibble: 651 × 34
# title title_type genre runtime mpaa_rating studio thtr_rel_date
# <chr> <fct> <fct> <dbl> <fct> <fct> <dttm>
# 1 Filly… Feature F… Drama 80 R Indom… 2013-04-18 21:00:00
# 2 The D… Feature F… Drama 101 PG-13 Warne… 2001-03-13 21:00:00
# 3 Waiti… Feature F… Come… 84 R Sony … 1996-08-20 21:00:00
# 4 The A… Feature F… Drama 139 PG Colum… 1993-09-30 21:00:00
# 5 Malev… Feature F… Horr… 90 R Ancho… 2004-09-09 21:00:00
# 6 Old P… Documenta… Docu… 78 Unrated Shcal… 2009-01-14 21:00:00
# 7 Lady … Feature F… Drama 142 PG-13 Param… 1985-12-31 21:00:00
# 8 Mad D… Feature F… Drama 93 R MGM/U… 1996-11-07 21:00:00
# 9 Beaut… Documenta… Docu… 88 Unrated Indep… 2012-09-06 21:00:00
# 10 The S… Feature F… Drama 119 Unrated IFC F… 2012-03-01 21:00:00
## ℹ 641 more rows
## ℹ 27 more variables: thtr_rel_year <dbl>, thtr_rel_month <dbl>,
## thtr_rel_day <dbl>, dvd_rel_date <dttm>, dvd_rel_year <dbl>,
## dvd_rel_month <dbl>, dvd_rel_day <dbl>, imdb_rating <dbl>,
## imdb_num_votes <int>, critics_rating <fct>, critics_score <dbl>,
## audience_rating <fct>, audience_score <dbl>, best_pic_nom <fct>,
## best_pic_win <fct>, best_actor_win <fct>, best_actress_win <fct>, …
## ℹ Use `print(n = ...)` to see more rows
```
We'll cover `inst/` and `system.file()` in more detail in the next chapter.
```{r}
#| label: git_box_pkgApp_06_data
#| echo: false
#| results: asis
#| eval: true
git_margin_box(
contents = "standard",
fig_pw = '75%',
branch = "07_data",
repo = 'moviesApp')
```
## Recap {.unnumbered}
It's common for Shiny apps to require data, so knowing how to store and access these files in your app-package will make it easier to load and reproducible in other environments. Here are a few other things to consider when including data in your app-package:
```{r}
#| label: co_box_data_recap
#| echo: false
#| results: asis
#| eval: true
co_box(
color = "g", fold = FALSE,
look = "default", hsize = "1.15", size = "1.10",
header = "Recap: Package data files",
contents = "
- `data/`: All data files stored in `data/` will be 'lazy loaded' (see below) when the package is installed and loaded.
- **Loading**: include the `LazyData: true` field in the `DESCRIPTION` file so the data is only loaded when it's used (and it increases package loading speed).
- **Size**: large data files can inflate the size of your app-package, making it harder for users to download and install. CRAN also has a size limit for packages (if you plan on submitting your app-package).
- **Format**: data files in `data/` must be either `.rda` or `.RData` format.
- **Documentation**: document the data files in either a single `R/data.R` file or individual `.R` files. Documentation should include the following `roxygen2` format:
\`\`\`r
#'
#' @title
#'
#' @description
#'
#' @format
#' \\describe{
#' \\item{variable}{description}
#' }
#'
'data'
\`\`\`
- `inst/extdata/`: Store external data in the `inst/extdata/` directory and access it using `system.file()`. This can be helpful if your app-package needs access to data files that are not R objects. For faster loading, consider the [`fst`](https://www.fstpackage.org/fst/) or [`feather`](https://github.com/wesm/feather) formats.
"
)
```
```{r}
#| label: git_contrib_box
#| echo: false
#| results: asis
#| eval: true
git_contrib_box()
```