data.qmd

# Data {#sec-data}

```{r}
#| eval: true 
#| echo: false 
#| include: false
source("_common.R")
library(dplyr)
```

```{r}
#| label: co_box_tldr
#| echo: false
#| results: asis
#| eval: true
co_box(
  color = "b", 
  look = "default", hsize = "1.15", size = "1.10",
  header = "TLDR Data", 
  fold = TRUE,
  contents = "
**Data files in your app-package:** 
  
- **`data/`**: data files that will be used in your application (i.e., become part of your app-package namespace and accessed via `pkg::data`) should in stored in `data/`\n
    - Add data to the data/ directory using `usethis::use_data()`\n
    - Include the `LazyData: true` field in the `DESCRIPTION`\n
  
- **`data-raw/`**: scripts used to prepare data files can be created with `usethis::use_data_raw()`\n
    - Store intermediate data files in `data-raw/`\n
    - Store output files in `data/` (or inst/extdata/`)\n
  
- **`inst/extdata/`**: 'External' data files (i.e., non-R formatted data files) can be stored in `inst/extdata` and accessed using `system.file()`. \n
    
**Workflow:** start with the script that creates/downloads/wrangles your data using `usethis::use_data_raw()`, keep any intermediate or non-R formatted files in `inst/extdata/`, then export the final object to `data/` with `usethis::use_data()` \n
  
  "
)
```

---

We've documented the functions in `moviesApp` and successfully managed the dependencies with the `NAMESPACE` and `DESCRIPTION` files. In this chapter, we're going to cover how make sure the `movie.RData` file becomes part of `moviesApp`, and other locations for data files in app-packages. For information on how to store and retrieve inside your application, see the chapter on [app Data](app_data.qmd).


:::: {.callout-tip collapse='true' appearance='default'}

## [Accessing the code examples]{style='font-weight: bold; font-size: 1.15em;'}

::: {style='font-size: 0.95em; color: #282b2d;'}

I've created the [`shinypak` R package](https://mjfrigaard.github.io/shinypak/) In an effort to make each section accessible and easy to follow:
  
Install `shinypak` using `pak` (or `remotes`):

```{r}
#| code-fold: false 
#| message: false
#| warning: false
#| eval: false
# install.packages('pak')
pak::pak('mjfrigaard/shinypak')
```

Review the chapters in each section:
  
```{r}
#| code-fold: false 
#| message: false
#| warning: false
#| collapse: true
library(shinypak)
list_apps(regex = '^07')
```

Launch an app: 

```{r}
#| code-fold: false 
#| eval: false
launch(app = "07_data")
```

::: 

::::


## App-package data 

Data in R packages are typically stored in one of three folders: `data/`, `data-raw/`, and `inst/extdata/`. The folder you use will depend on the format, accessibility, and intended purpose of the data file.[^data-pkgs-1]

[^data-pkgs-1]: Read more about the data folder in the ['Data in packages' section of Writing R Extenstions](https://cran.r-project.org/doc/manuals/r-release/R-exts.html#Data-in-packages) and the ['Data' chapter of R Packages, 2ed](https://r-pkgs.org/data.html).

## [`data/`]{style="font-size: 1.05em; font-weight: bold;"} {#sec-data-data}

The primary location for data is the `data/` folder. Objects in `data/` folder are available in your package namespace when it's installed and loaded, and can be accessed with the `::` syntax. See the example below of the `storms` data from `dplyr`:

```{r}
#| eval: true
#| code-fold: false
#| collapse: true 
library(dplyr)
head(dplyr::storms)
```

### [`LazyData`]{style="font-size: 1.05em"}

Data files become part of a package when they're added to the `data/` folder and `LazyData: true` is added to the `DESCRIPTION` file. 

-   `LazyData: true`: this means the data are only loaded into memory if they are explicitly accessed by the user or a function in the package. Until then, only the dataset names is loaded. This user-friendly practice is the default for most R  packages.

-   `LazyData: false` (or omitted): accessing a data file from the package requires explicitly loading it using the `data()` function.

Files in `data/` should be in the `.rda` or `.RData` format. Below are the steps for adding `movies` to `moviesApp`:

1. Move the `movies.RData` file into a newly created the `data/` folder:

``` sh
moviesApp/
    └──data/
        └── movies.RData
```

2. Include `LazyData: true` in the `DESCRIPTION` file (I've added it above `Imports:`):

``` sh
Package: moviesApp
Version: 0.0.0.9000
Type: Package
Title: movies app
Description: A movies data shiny application.
Author: John Smith [aut, cre]
Maintainer: John Smith <John.Smith@email.io>
License: GPL-3
RoxygenNote: 7.2.3
Encoding: UTF-8
Roxygen: list(markdown = TRUE)
LazyData: true
Imports:
  shiny,
  ggplot2,
  rlang,
  stringr,
  tools
```

3. Load, document, and install.

[<kbd>Ctrl/Cmd</kbd> + <kbd>Shift</kbd> + <kbd>L</kbd>]{style="font-style: italic; font-weight: bold; font-size: 1.10em"}

``` sh
ℹ Loading moviesApp
```

[<kbd>Ctrl/Cmd</kbd> + <kbd>Shift</kbd> + <kbd>D</kbd>]{style="font-style: italic; font-weight: bold; font-size: 1.10em"}

``` sh
==> devtools::document(roclets = c('rd', 'collate', 'namespace'))

ℹ Updating moviesApp documentation
ℹ Loading moviesApp
Documentation completed
```

[<kbd>Ctrl/Cmd</kbd> + <kbd>Shift</kbd> + <kbd>B</kbd>]{style="font-style: italic; font-weight: bold; font-size: 1.10em"}

In the **Build** pane, you'll notice a few new `** data` lines of output after adding data:

``` sh
** data
*** moving datasets to lazyload DB
** byte-compile and prepare package for lazy loading
```

We can check to see if movies has been included in `moviesApp` using the `package::data` syntax: 


![`movies` is now part of `moviesApp`](images/data_movies_namespace.png){#fig-data_movies_namespace width='100%' fig-align='center'}

### [`usethis::use_data()`]{style="font-size: 0.95em"}

If you'd prefer to store data using the `.rda` format, the `usethis` package has the [`use_data()` function](https://usethis.r-lib.org/reference/use_data.html) that will automatically store an object in `data/` in the `.rda` format.

To use `usethis::use_data()`, we can load the `movies` data into the global environment with `load("movies.RData")`, then run  `usethis::use_data(movies)`: 

```{r}
#| eval: false 
#| code-fold: false
usethis::use_data(movies)
```

```{verbatim}
#| eval: false 
#| code-fold: false
✔ Setting active project to '/path/to/moviesApp'
✔ Adding 'R' to Depends field in DESCRIPTION
✔ Creating 'data/'
✔ Saving 'movies' to 'data/movies.rda'
• Document your data (see 'https://r-pkgs.org/data.html')
```

The `Depends:` field is added to the `DESCRIPTION` file with an R version (this ensures the data files will be loaded)

```{verbatim}
#| eval: false 
#| code-fold: false
Depends: 
    R (>= 2.10)
```

*`use_data()` will also add `LazyData: true` to the `DESCRIPTION`*

## Documenting data {#sec-document-data}

Documenting data can be tedious, but it's worth the effort if you'll be sharing your package with collaborators. There are multiple ways to store dataset documentation. A common approach is to create a `data.R` file in the `R/` folder.[^data-ggplot2-example]

[^data-ggplot2-example]: The `ggplot2` package has a great example of documenting datasets in the [R/data.R](https://github.com/tidyverse/ggplot2/blob/main/R/data.R) file

```{r}
#| eval: false 
#| code-fold: false
fs::file_create("R/data.R")
```

In `data.R`, we provide a `@title`, `@description`, and `@details` for the data (with or without the tags), followed by `@format`:

```{r}
#| eval: false 
#| code-fold: false
#' @title IMDB movies data 
#'
#' @description
#' Movie review data. Note: these data come from the [Building Web Applications with Shiny course](https://rstudio-education.github.io/shiny-course/). 
#' 
#' @details
#' Read more about acquiring these data in the ['Web Scraping and programming' section of Data science in a box](https://datasciencebox.org/02-exploring-data#web-scraping-and-programming)   
#'
#' @format

```

### [`@format`]{style="font-size: 0.90em;"}

The text following `@format` is a one-sentence description of the data (with it's dimensions).

```{r}
#| eval: false 
#| code-fold: false
#' @title IMDB movies data 
#'
#' @description
#' Movie review data. Note: these data come from the [Building Web Applications with Shiny course](https://rstudio-education.github.io/shiny-course/). 
#' 
#' @details
#' Read more about acquiring these data in the ['Web Scraping and programming' section of Data science in a box](https://datasciencebox.org/02-exploring-data#web-scraping-and-programming) 
#' 
#' @format A data frame with [] rows and [] variables:

```


### [`\describe` & `\item`]{style="font-size: 0.90em;"}

Each variable (column) in the data is documented with a combination of `\describe` and `\item` (**pay close attention to the curly brackets**):

```{r}
#| eval: false 
#| code-fold: false
#' \describe{
#'  \item{variable}{description}
#' }
```

After closing the curly brackets in `\describe`, place the name of the data in quotes (`"movies"`) on the following line. 

Below is the documentation for the first five columns in the `movies` dataset:

```{r}
#| eval: false 
#| code-fold: false
#' @title IMDB movies data 
#'
#' @description
#' Movie review data. Note: these data come from the [Building Web Applications with shiny course](https://rstudio-education.github.io/shiny-course/). 
#' 
#' @details
#' Read more about acquiring these data in the ['Web Scraping and programming' section of Data science in a box](https://datasciencebox.org/02-exploring-data#web-scraping-and-programming) 
#'
#' @format A data frame with 651 rows and 34 variables:
#' \describe{
#'  \item{title}{movie title}
#'  \item{title_type}{type, fct (Documentary, Feature Film, TV Movie)}
#'  \item{genre}{movie genre, fct (Action & Adventure, Animation, etc.}
#'  \item{runtime}{movie length in minutes, num, avg = 106, sd = 19.4}
#'  \item{mpaa_rating}{movie rating, fct (G, NC-17, PG, PG-13, R, Unrated)}
#' }
#'
"movies"
```

If we load and document `moviesApp`, we can see a preview of the help file:

[<kbd>Ctrl/Cmd</kbd> + <kbd>Shift</kbd> + <kbd>L</kbd>]{style="font-style: italic; font-weight: bold; font-size: 1.10em"}

```{verbatim}
#| eval: false
#| code-fold: false
ℹ Loading moviesApp
```

[<kbd>Ctrl/Cmd</kbd> + <kbd>Shift</kbd> + <kbd>D</kbd>]{style="font-style: italic; font-weight: bold; font-size: 1.10em"}

```{verbatim}
#| eval: false 
#| code-fold: false
==> devtools::document(roclets = c('rd', 'collate', 'namespace'))

ℹ Updating moviesApp documentation
ℹ Loading moviesApp
Writing movies.Rd
Documentation completed
```


```{r}
#| eval: false
#| code-fold: false
?movies
```


::: {#fig-data_data_help}
![The `movies` help file](images/data_data_help.png){#fig-09_data_rd width='100%' fig-align='center'}

Documentation for the `movies` dataset
:::

I've provided documentation for the full `movies` dataset below. 

```{r}
#| eval: false 
#| code-fold: true 
#| code-summary: 'show/hide full movies data documenation'
#' @title IMDB movies data 
#'
#' @description
#' Movie review data. Note: these data come from the [Building Web Applications with Shiny course](https://rstudio-education.github.io/shiny-course/). 
#' 
#' @details
#' Read more about acquiring these data in the ['Web Scraping and programming' section of Data science in a box](https://datasciencebox.org/02-exploring-data#web-scraping-and-programming) 
#'
#' @format A data frame with 651 rows and 34 variables:
#' \describe{
#'  \item{title}{movie title}
#'  \item{title_type}{type, fct (Documentary, Feature Film, TV Movie)}
#'  \item{genre}{movie genre, fct (Action & Adventure, Animation, etc.}
#'  \item{runtime}{movie length in minutes, num, avg = 106, sd = 19.4}
#'  \item{mpaa_rating}{movie rating, fct (G, NC-17, PG, PG-13, R, Unrated)}
#'  \item{studio}{name of studio, chr}
#'  \item{thtr_rel_date}{Theatre release date, POSIXct, min = 1970-05-19 21:00:00, max = 2014-12-24 21:00:00}
#'  \item{thtr_rel_year}{Theatre release year, num, min = 1970, max = 2014}
#'  \item{thtr_rel_month}{Theatre release month, num, min = 1, max =12}
#'  \item{thtr_rel_day}{Theatre release day, num, min = 1, max =31}
#'  \item{dvd_rel_date}{DVD release date, POSIXct, min = 1991-03-27 21:00:00, max = 2015-03-02 21:00:00}
#'  \item{dvd_rel_year}{DVD release year, num, min = 1991, max = 2015}
#'  \item{dvd_rel_month}{DVD release month, num, min = 1, max = 12}
#'  \item{dvd_rel_day}{DVD release day, num, min = 1, max = 31}
#'  \item{imdb_rating}{Internet movie database rating, avg = 6.49, sd = 1.08}
#'  \item{imdb_num_votes}{Internet movie database votes, avg = 57533, sd = 112124}
#'  \item{critics_rating}{Rotten tomatoes rating, fct (Certified Fresh, Fresh, Rotten)}
#'  \item{critics_score}{Rotten tomatoes score, avg = 57.7, sd = 28.4}
#'  \item{audience_rating}{Audience rating, fct (Spilled, Upright)}
#'  \item{audience_score}{Audience score, avg = 62.4, sd = 20.2}
#'  \item{best_pic_nom}{Best picture nomination, fct (no, yes)}
#'  \item{best_pic_win}{Best picture win, fct (no, yes)}
#'  \item{best_actor_win}{Best actor win, fct (no, yes)}
#'  \item{best_actress_win}{Best actress win, fct (no, yes)}
#'  \item{best_dir_win}{Best director win, fct (no, yes)}
#'  \item{top200_box}{Top 20 box-office, fct (no, yes)}
#'  \item{director}{Name of director, chr}
#'  \item{actor1}{Name of leading actor, chr}
#'  \item{actor2}{Name of supporting actor, chr}
#'  \item{actor3}{Name of #3 actor, chr}
#'  \item{actor4}{Name of #4 actor, chr}
#'  \item{actor5}{Name of #5 actor, chr}
#'  \item{imdb_url}{IMDB URL}
#'  \item{rt_url}{Rotten tomatoes URL}
#' }
#'
"movies"
```

### Using `movies`

After documenting the movies data in `data.R`, we'll remove the call to `load()` in the `mod_scatter_display_server()` function and replace it with a direct call to the dataset:

```{r}
#| eval: false 
#| code-fold: false 
mod_scatter_display_server <- function(id, var_inputs) {
  shiny::moduleServer(id, function(input, output, session) {

  inputs <- reactive({
    plot_title <- tools::toTitleCase(var_inputs()$plot_title)
      list(
        x = var_inputs()$x,
        y = var_inputs()$y,
        z = var_inputs()$z,
        alpha = var_inputs()$alpha,
        size = var_inputs()$size,
        plot_title = plot_title
      )
  })
  output$scatterplot <- renderPlot({
    plot <- scatter_plot(
      df = movies, # <1>
      x_var = inputs()$x,
      y_var = inputs()$y,
      col_var = inputs()$z,
      alpha_var = inputs()$alpha,
      size_var = inputs()$size
    )
    plot +
      ggplot2::labs(
        title = inputs()$plot_title,
        x = stringr::str_replace_all(tools::toTitleCase(inputs()$x), "_", " "),
        y = stringr::str_replace_all(tools::toTitleCase(inputs()$y), "_", " ")
      ) +
      ggplot2::theme_minimal() +
      ggplot2::theme(legend.position = "bottom")
  })
})
}
```
1. The `movies` data from our package namespace

After loading, documenting, and installing the package, we see the following application: 

:::: {.column-body-outset-right}

![`movies_app()` with `movies` data file](images/data_movies_app.png){#fig-data_movies_app width='100%' fig-align='center'}

::::

### More examples

To illustrate other options for data documentation, we'll use the [`dplyr` package.](https://github.com/tidyverse/dplyr) `dplyr` stores its data in the `data/` folder:  

```{verbatim}
#| eval: false 
#| code-fold: false 
data/
├── band_instruments.rda
├── band_instruments2.rda
├── band_members.rda
├── starwars.rda
└── storms.rda

```

The documentation for the datasets in `dplyr` are stored in `R/` using a `data-` prefix:

```{verbatim}
#| eval: false 
#| code-fold: false 
R/
├── data-bands.R
├── data-starwars.R
└── data-storms.R

```

The three `band_` datasets have documented in a single file, [`data-bands.R`](https://github.com/tidyverse/dplyr/blob/main/R/data-bands.R):  

```{r}
#| code-summary: 'show/hide documentation for dplyr::band_ datasets'
#| code-fold: true
#| eval: false
# from the dplyr github repo: 
# https://github.com/tidyverse/dplyr/blob/main/R/data-bands.R
# 
#' Band membership
#'
#' These data sets describe band members of the Beatles and Rolling Stones. They
#' are toy data sets that can be displayed in their entirety on a slide (e.g. to
#' demonstrate a join).
#'
#' `band_instruments` and `band_instruments2` contain the same data but use
#' different column names for the first column of the data set.
#' `band_instruments` uses `name`, which matches the name of the key column of
#' `band_members`; `band_instruments2` uses `artist`, which does not.
#'
#' @format Each is a tibble with two variables and three observations
#' @examples
#' band_members
#' band_instruments
#' band_instruments2
"band_members"

#' @rdname band_members
#' @format NULL
"band_instruments"

#' @rdname band_members
#' @format NULL
"band_instruments2"
```

In the example above, note that two of the datasets (`band_instruments` and `band_instruments2`) have the `@format` set to `NULL`, and define the help search name with `@rdname`. The `@examples` tag can be used to view the dataset when users click '**Run Examples**' 

Either method works--what's important is that each dataset in your package *has* documentation.

```{r}
#| label: co_box_data_data
#| echo: false
#| results: asis
#| eval: true
co_box(
  color = "g", fold = TRUE, look = "default",
  hsize = "1.15", size = "1.10",
  header = "Documenting data in `data/`",
  contents = "

Documenting data requires the following `roxygen2` structure:

\`\`\`r
#' 
#' @title single-sentence describing [data]
#' 
#' @description
#' Single-paragraph describing [data]
#' 
#' @format [data] number of rows and columns:
#' \\describe{
#'  \\item{variable}{description}
#'  \\item{variable}{description}
#' }
#'
\"[data]\"
\`\`\`


Replace `[data]` with the name of your dataset.")
```


## [`data-raw/`]{style="font-size: 1.05em; font-weight: bold;"} {#sec-data-data-raw}

The `data-raw` folder is not an official directory in the standard R package structure, but it's a common location for any data processing or cleaning scripts, and the raw data file for datasets stored in `data/`.[^data-raw-2]

[^data-raw-2]: Read more about the `data-raw` folder in [R Packages, 2ed](https://r-pkgs.org/data.html#sec-data-data-raw)

```{r}
#| label: co_box_data_raw
#| echo: false
#| results: asis
#| eval: true
co_box(
  color = "o", fold = TRUE, look = "default",
  hsize = "1.15", size = "1.10",
  header = "Scripts for creating `movies` data",
  contents = "
The code used to produce the `movies` dataset in the `data/` directory might* come from [this GitHub repo](https://github.com/mine-cetinkaya-rundel/rotten). If so, the `data-raw` folder is where [the data processing and preparation scritps](https://github.com/mine-cetinkaya-rundel/rotten/tree/master/working) would be stored (along with a copy of the data in `.csv` format) before saving a copy in the `data/` folder. 
  
*I say 'might' because it's not clear if the `movies.RData` is the output from these `.R` files (although many of the column names match).

  "
)
```


### More examples

If we look at the data in the [`dplyr` package](https://github.com/tidyverse/dplyr) again, we can see the [`data-raw/` folder](https://github.com/tidyverse/dplyr/tree/main/data-raw) contains a combination of `.R` and `.csv` files: 

```{verbatim}
#| eval: false 
#| code-fold: true 
data-raw/
├── band_members.R
├── starwars.R
├── starwars.csv
└── storms.R

1 directory, 4 files
```

In this example, the [`starwars.R` script](https://github.com/tidyverse/dplyr/blob/main/data-raw/starwars.R) downloads & prepares `starwars`, then saves a `.csv` copy of the data [in `data-raw`]((https://github.com/tidyverse/dplyr/blob/main/data-raw/starwars.csv)).

## [`inst/extdata/`]{style="font-size: 1.05em; font-weight: bold;"} {#sec-data-inst-extdata}

The `extdata` folder (inside `inst/`) can be used for external datasets in other file formats (`.csv`, `.tsv`, `.txt`, `.xlsx`, etc).[^inst-extdata-3] The data files in `inst/extdata/` aren't directly loadable using the `package::data` syntax or the `data()` function like with the `data/` directory. These files can be imported using the file path accessor function,  `system.file()`.

[^inst-extdata-3]: Read more about the `inst/extdata/` folder in [R Packages, 2ed](https://r-pkgs.org/data.html#sec-data-extdata)

For example, if we create the `inst/extdata/` and save a copy of `movies` as a [`.fst` file](https://www.fstpackage.org/fst/): 

```{r}
#| eval: false 
#| code-fold: false 
library(fs)
library(tibble)
library(fst)
```

``` sh
fst package v0.9.8
```

```{r}
#| eval: false 
#| code-fold: false
fs::dir_create("inst/extdata/")
fst::write_fst(
  x = movies, 
  path = "inst/extdata/movies.fst", 
  compress = 75)
```

``` sh
fstcore package v0.9.14
(OpenMP was not detected, using single threaded mode)
```

Then load, document, and install `moviesApp`:

[<kbd>Ctrl/Cmd</kbd> + <kbd>Shift</kbd> + <kbd>L</kbd> / <kbd>D</kbd> / <kbd>B</kbd>]{style="font-style: italic; font-weight: bold; font-size: 1.10em"}

We can import `movies.fst` using `system.file()` to create a path to the file:

```{r}
#| eval: false 
#| code-fold: false 
tibble::as_tibble(
  fst::read_fst(path = 
      system.file("extdata/", "movies.fst", package = "moviesApp")
    )
  )
```

```{r}
## A tibble: 651 × 34
#    title  title_type genre runtime mpaa_rating studio thtr_rel_date      
#    <chr>  <fct>      <fct>   <dbl> <fct>       <fct>  <dttm>             
#  1 Filly… Feature F… Drama      80 R           Indom… 2013-04-18 21:00:00
#  2 The D… Feature F… Drama     101 PG-13       Warne… 2001-03-13 21:00:00
#  3 Waiti… Feature F… Come…      84 R           Sony … 1996-08-20 21:00:00
#  4 The A… Feature F… Drama     139 PG          Colum… 1993-09-30 21:00:00
#  5 Malev… Feature F… Horr…      90 R           Ancho… 2004-09-09 21:00:00
#  6 Old P… Documenta… Docu…      78 Unrated     Shcal… 2009-01-14 21:00:00
#  7 Lady … Feature F… Drama     142 PG-13       Param… 1985-12-31 21:00:00
#  8 Mad D… Feature F… Drama      93 R           MGM/U… 1996-11-07 21:00:00
#  9 Beaut… Documenta… Docu…      88 Unrated     Indep… 2012-09-06 21:00:00
# 10 The S… Feature F… Drama     119 Unrated     IFC F… 2012-03-01 21:00:00
## ℹ 641 more rows
## ℹ 27 more variables: thtr_rel_year <dbl>, thtr_rel_month <dbl>,
##   thtr_rel_day <dbl>, dvd_rel_date <dttm>, dvd_rel_year <dbl>,
##   dvd_rel_month <dbl>, dvd_rel_day <dbl>, imdb_rating <dbl>,
##   imdb_num_votes <int>, critics_rating <fct>, critics_score <dbl>,
##   audience_rating <fct>, audience_score <dbl>, best_pic_nom <fct>,
##   best_pic_win <fct>, best_actor_win <fct>, best_actress_win <fct>, …
## ℹ Use `print(n = ...)` to see more rows
```

We'll cover `inst/` and `system.file()` in more detail in the next chapter.

```{r}
#| label: git_box_pkgApp_06_data
#| echo: false
#| results: asis
#| eval: true
git_margin_box(
  contents = "standard",
  fig_pw = '75%', 
  branch = "07_data", 
  repo = 'moviesApp')
```

## Recap {.unnumbered}

It's common for Shiny apps to require data, so knowing how to store and access these files in your app-package will make it easier to load and reproducible in other environments. Here are a few other things to consider when including data in your app-package: 

```{r}
#| label: co_box_data_recap
#| echo: false
#| results: asis
#| eval: true
co_box(
  color = "g", fold = FALSE,
  look = "default", hsize = "1.15", size = "1.10",
  header = "Recap: Package data files",
  contents = "
- `data/`: All data files stored in `data/` will be 'lazy loaded' (see below) when the package is installed and loaded.  
  
- **Loading**: include the `LazyData: true` field in the `DESCRIPTION` file so the data is only loaded when it's used (and it increases package loading speed).
  
- **Size**: large data files can inflate the size of your app-package, making it harder for users to download and install. CRAN also has a size limit for packages (if you plan on submitting your app-package).
  
- **Format**: data files in `data/` must be either `.rda` or `.RData` format. 
  
- **Documentation**: document the data files in either a single `R/data.R` file or individual `.R` files. Documentation should include the following `roxygen2` format:
  
    \`\`\`r
    #' 
    #' @title 
    #' 
    #' @description
    #' 
    #' @format 
    #' \\describe{
    #'  \\item{variable}{description}
    #' }
    #'
    'data'
    \`\`\`
  
- `inst/extdata/`: Store external data in the `inst/extdata/` directory and access it using `system.file()`. This can be helpful if your app-package needs access to data files that are not R objects. For faster loading, consider the [`fst`](https://www.fstpackage.org/fst/) or [`feather`](https://github.com/wesm/feather) formats.
  
  "
)
```


```{r}
#| label: git_contrib_box
#| echo: false
#| results: asis
#| eval: true
git_contrib_box()
```