pharmaverse · kaz462 · Aug 9, 2023 · Jun 27, 2023 · Jun 27, 2023 · Jun 27, 2023
diff --git a/_pkgdown.yml b/_pkgdown.yml
@@ -114,3 +114,5 @@ navbar:
       href: articles/pr_review_guidance.html
     - text: Release Strategy
       href: articles/release_strategy.html
+    - text: Test Data Guidance
+      href: articles/test_data_guidance.html
diff --git a/inst/WORDLIST b/inst/WORDLIST
@@ -41,6 +41,7 @@ adex
 adlb
 admiralci
 advs
+anonymized
 codebase
 cyclomatic
 datatable
@@ -56,10 +57,9 @@ flexibilities
 functions’
 funder
 github
-hotfixes
 hotfix
+hotfixes
 insightsengineering
-lifecycle
 linter
 lintr
 lockfile

diff --git a/vignettes/test_data_guidance.Rmd b/vignettes/test_data_guidance.Rmd
@@ -0,0 +1,58 @@
+---
+title: "Test Data Guidance"
+output: rmarkdown::html_vignette
+vignette: >
+  %\VignetteIndexEntry{Test Data Guidance}
+  %\VignetteEngine{knitr::rmarkdown}
+---
+
+```{r setup, include = FALSE}
+knitr::opts_chunk$set(
+  collapse = TRUE,
+  comment = "#>"
+)
+```
+
+# Introduction
+
+[`admiraldata`](https://github.com/pharmaverse/admiraldata) provides a one-stop-shop for test data in the [`admiral`](https://pharmaverse.github.io/admiral/cran-release/) family of packages. This includes datasets that are therapeutic area (TA)-agnostic (`DM`, `VS`, `EG`, etc.) as well TA-specific ones (`RS`, `TR`, `OE`, etc.).
+
+# Data Sources
+
+Some of the test datasets has been sourced from the [CDISC pilot project](https://github.com/cdisc-org/sdtm-adam-pilot-project), while other datasets have been constructed ad-hoc by the admiral team. Please check the [Github repository](https://github.com/pharmaverse/admiral.test/tree/main/data) for detailed information regarding the source of specific datasets.
+
+# Naming Conventions {#Name}
+
+-   Datasets/programs that are TA-agnostic are prefixed with `admiral_` (e.g. `admiral_dm`).
+-   Datasets/programs that are TA-specific are prefixed with the `admiral` extension package from which they derive (e.g. `admiralonco_rs`, `admiralophtha_oe`).
+-   Consistent name between datasets and programs.
+
+**Note**: *If a domain is used by multiple TAs, [`admiraldata`](https://github.com/pharmaverse/admiraldata) may provide multiple versions of the corresponding test dataset. For instance, the package contains `admiral_ex` and `admiralophtha_ex` as the latter contains ophthalmology-specific variables such as `EXLAT` and `EXLOC`, and `EXROUTE` is exchanged for a plausible ophthalmology value.*
+
+# How To Update
+
+Firstly, make a GitHub issue in this repo with the planned updates and tag `@pharmaverse/admiral` so that one of the development core team can sanity check the request. Then there are two main ways to extend the test data: either by adding new datasets or extending existing datasets with new records/variables. Whichever method you choose, it is worth noting the following:
+
+-   Programs that generate test data are stored in the `dev/` folder.
+-   Each of these programs is written as a standalone R script: if any packages need to be loaded for a given program, then call `library()` at the start of the program (but please do **not** call `library(admiraldata)`).
+-   Most of the packages that you are likely to need will already be specified in the `renv.lock` file, so they will already be installed if you have been keeping in sync--you can check this by entering `renv::status()` in the Console. However, you may also wish to install [`metatools`](https://pharmaverse.github.io/metatools/) and [`ggplot2`](https://ggplot2.tidyverse.org/), which are currently not specified in the `renv.lock` file. If you feel that you need to install any other packages in addition to those just mentioned, then please tag `@pharmaverse/admiral` to discuss with the development core team.
+-   When you have created a program in the `dev/` folder, you need to run it as a standalone R script, in order to generate a test dataset that will become part of the [`admiraldata`](https://github.com/pharmaverse/admiraldata) package, but you do not need to build the package.
+-   Following [best practice](https://r-pkgs.org/data.html#sec-data-data), each dataset is stored as a `.rda` file whose name is consistent with the name of the dataset: for example, the dataset `dm` should be renamed to `raw_dm` before saving it as `raw_dm.rda`; if you save `dm` as `raw_dm.rda` and subsequently load the `.rda` file, then `dm` (not `raw_dm`) will be loaded into the global environment.
+-   The programs in `dev/` are stored within the [`admiraldata`](https://github.com/pharmaverse/admiraldata) GitHub repository, but they are **not** part of the [`admiraldata`](https://github.com/pharmaverse/admiraldata) package--the `dev/` folder is specified in `.Rbuildignore`.
+-   When you run a program that is in the `dev/` folder, you generate a dataset that is written to the `data/` folder, which will become part of the [`admiraldata`](https://github.com/pharmaverse/admiraldata) package.
+-   The names of test datasets are specified in `R/data.R`, for the purpose of generating documentation in the `man/` folder.
+
+## Adding New Datasets
+
+-   Create a program in the `dev/` folder, named `<name>.R`, where `<name>` should follow the [Naming conventions](#Name) and be consistent with the dataset name, to generate the test data and output (e.g., `<name>.rda`) to the `data/` folder. Use CDISC pilot data such as `admiral_dm` as input in this program in order to create realistic synthetic data that remains consistent with other domains. Note that **no personal data should be used** as part of this package, even if anonymized.
+-   Run the program.
+-   Reflect this update, by specifying `<name>` in `R/data.R`.
+-   Run `devtools::document()` in order to update `NAMESPACE` and update the `.Rd` files in `man/`.
+
+## Updating Existing Datasets
+
+-   Rename the source dataset as `raw_<name>`, where `<name>` is the domain name (e.g., rename `ds` to `raw_ds`), and then save it to the `data/` folder as `raw_<name>.rda` (e.g., `save(raw_ds, file = "data/raw_ds.rda")`).
+-   Create a program in the `dev/` folder, named `update_<name>.R`, to load `raw_<name>.rda`, make the updates, and output `admiral_<name>.rda` to the `data/` folder.
+-   Run the program.
+-   Reflect this update, by specifying both `raw_<name>` and `admiral_<name>` in `R/data.R`.
+-   Run `devtools::document()` in order to update `NAMESPACE` and update the `.Rd` files in `man/`.