vignettes/a-Glossary-and-templates.Rmd

---
title: "Glossary and templates"
output: 
  rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Glossary and templates}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  eval = FALSE,
  comment = "#>",
  warning = FALSE,
  message = FALSE,
  echo = TRUE)

```


```{r, eval = TRUE, echo = FALSE}
source('datatables.R')

```


The glossary describes the main objects used in Rmonize functions. The main 
components of each object are listed and must have the names as presented to be 
used in functions, except where indicated in square brackets ([…]). 
An asterisk (<strong>*</strong>) indicates a required object or component. You 
can download templates or find additional documentation where available using the 
links provided.

## DataSchema

List of core variables to generate across datasets and related metadata.

<button><a 
  href="https://maelstrom-research.github.io/Rmonize-documentation/templates/dataschema%20-%20template.xlsx"
  download>Download an Excel template</a>
</button>

### Variables *

Metadata table containing the list of core variables to generate across datasets 
and related metadata about the variables. This table is required and uses the 
following columns.

<div style="padding-left: 10px; padding-right: 10px; background-color: #eee;">
<table style="display: table">
<thead>
<tr class="header">
<th>Name</th>
<th width="75%">Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>index</td>
<td>Index to order variables in the table.</td>
</tr>
<tr>
<td>name *</td>
<td>Name of the DataSchema variable. Each entry must be unique. The first entry 
must be the primary identifier variable (e.g., participant unique ID).</td>
</tr>
<tr>
<td>label</td>
<td>Short description of the DataSchema variable. A language can be specified 
using a language code, such as 'label:en' for english or 'label:fr' for french.
</td>
</tr>
<tr>
<td>valueType</td>
<td>
Value type of the input dataset variable (e.g., text, integer, decimal, 
boolean, date, datetime). See 
<a href="https://opaldoc.obiba.org/en/dev/magma-user-guide/value/type.html">
additional details</a>.</td>
</tr>
</tbody>
</table>
</div>

### Categories

Metadata table containing the list of categories and related metadata (coding 
and description of the response options) defined for categorical variables 
(if any). If there are categorical variables defined, this table is required 
and uses the following columns. 

<div style="padding-left: 10px; padding-right: 10px; background-color: #eee;">
<table style="display: table">
<thead>
<tr class="header">
<th>Name</th>
<th width="75%">Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>variable *</td>
<td>Name of the DataSchema variable to which the category belongs. This column 
is required if the Categories table is present. The value must also be present 
in the column 'name' in the Variables table. </td>
</tr>
<tr>
<td>name *</td>
<td>Category code value. This column is required if the table Categories is 
present. The combination of 'variable' and 'name' within the Categories table 
(i.e., the combination of DataSchema variable and category code value) must be 
unique.</td>
</tr>
<tr>
<td>label</td>
<td>Short description of the category code value. A language can be specified 
using a language code, such as 'label:en' for english or 'label:fr' for french.
</td>
</tr>
<tr>
<td>missing</td>
<td>Boolean value (TRUE/FALSE or 1/0) indicating if the value in 'name' is 
interpreted as a missing value (e.g., question skipped by design in a 
questionnaire or a response option <em>"Prefer not to answer"</em>).</td>
</tr>
</tbody>
</table>
</div>

## Input Dataset

Data table containing a collection of variables to process under the 
DataSchema format.

<div style="padding-left: 10px; padding-right: 10px; background-color: #eee;">
<table style="display: table">
<thead>
<tr class="header">
<th>Name</th>
<th width="75%">Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>[col_1] *</td>
<td>First variable in the input dataset, typically the identifier or index. 
A dataset must have at least one variable.</td>
</tr>
<tr>
<td>[col_2] ...</td>
<td>Additional variable(s) in the input dataset.</td>
</tr>
</tbody>
</table>
</div>

## Input Data Dictionary


List of variables in an input dataset and related metadata.

<button><a 
  href="https://maelstrom-research.github.io/Rmonize-documentation/templates/data_dictionary%20-%20template.xlsx"
  download>Download an Excel template</a>
</button>

### Variables *

Metadata table containing the list of variables in an input dataset and metadata 
about the input variables. If a data dictionary is defined for an input dataset, 
this table is required and uses the following columns.

<div style="padding-left: 10px; padding-right: 10px; background-color: #eee;">
<table style="display: table">
<thead>
<tr class="header">
<th>Name</th>
<th width="75%">Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>index</td>
<td>Index to order variables in the table. </td>
</tr>
<tr>
<td>name *</td>
<td>Name of the input dataset variable. 
Each entry must be unique. The first entry is typically the primary identifier 
variable (e.g., participant unique ID).</td>
</tr>
<tr>
<td>label</td>
<td>Short description of the input dataset variable. A language can be specified 
using a language code, such as 'label:en' for english or 'label:fr' for french.
</td>
</tr>
<tr>
<td>valueType</td>
<td>
Value type of the input dataset variable (e.g., text, integer, decimal, 
boolean, date, datetime). See 
<a href="https://opaldoc.obiba.org/en/dev/magma-user-guide/value/type.html">
additional details</a>.</td>
</tr>
</tbody>
</table>
</div>

### Categories

Metadata table containing the list of categories and related metadata (coding 
and description of the response options) defined for categorical variables 
(if any). If there are categorical variables defined, this table is required 
and uses the following columns. 

<div style="padding-left: 10px; padding-right: 10px; background-color: #eee;">
<table style="display: table">
<thead>
<tr class="header">
<th>Name</th>
<th width="75%">Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>variable *</td>
<td>Name of the input dataset variable to which the category belongs. This column 
is required if the Categories table is present. The value must also be present 
in the column 'name' in the Variables table. </td>
</tr>
<tr>
<td>name *</td>
<td>Category code value. This column is required if the table Categories is 
present. The combination of 'variable' and 'name' within the Categories table 
(i.e., the combination of DataSchema variable and category code value) must be 
unique.</td>
</tr>
<tr>
<td>label</td>
<td>Short description of the category code value. A language can be specified 
using a language code, such as 'label:en' for english or 'label:fr' for french.
</td>
</tr>
<tr>
<td>missing</td>
<td>Boolean value (TRUE/FALSE or 1/0) indicating if the value in 'name' is 
interpreted as a missing value (e.g., question skipped by design in a 
questionnaire or a response option <em>"Prefer not to answer"</em>).</td>
</tr>
</tbody>
</table>
</div>

## Dossier

Set of one or more input dataset(s) and their associated input 
data dictionary(ies).

<div style="padding-left: 10px; padding-right: 10px; background-color: #eee;">
<table style="display: table">
<thead>
<tr class="header">
<th>Name</th>
<th width="75%">Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>[input_dataset_1] *</td>
<td>Data table containing a collection of variables to process under the 
DataSchema formats and its associated input data dictionary. At least one 
input dataset is required. The input dataset name is defined by the user and is 
indicated in the Data Processing Elements column 'input_dataset'. This name 
identifies the source of input variables for data processing.</td>
</tr>
<tr>
<td>[input_dataset_2] ...</td>
<td>Additional input dataset and associated data dictionary.</td>
</tr>
</tbody>
</table>
</div>

## Data Processing Elements

Metadata table containing the elements defining the possibility for each 
input dataset to generate each DataSchema variable and, where applicable, the 
algorithms to generate harmonized variables in the DataSchema formats.

<button><a 
  href="https://maelstrom-research.github.io/Rmonize-documentation/templates/data_processing_elements%20-%20template.csv"
  download>Download an csv template</a>
</button>

See 
[additional documentation for Data Processing Elements](https://maelstrom-research.github.io/Rmonize-documentation/articles/b-Data-processing-elements.html).

<div style="padding-left: 10px; padding-right: 10px; background-color: #eee;">
<table style="display: table">
<thead>
<tr class="header">
<th>Name</th>
<th width="75%">Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>index</td>
<td>Index to order algorithms in the table. </td>
</tr>
<tr>
<td>dataschema_variable *</td>
<td>Name of the DataSchema variable being generated (must match a variable in 
the DataSchema).The first entry must be the primary identifier variable (e.g., 
participant unique ID).</td>
</tr>
<tr>
<td>label</td>
<td>Short description of the DataSchema variable (as in the DataSchema).</td>
</tr>
<tr>
<td>valueType</td>
<td>Value type of the DataSchema variable (as in the DataSchema).</td>
</tr>
<tr>
<td>input_dataset *</td>
<td>Name of the Input Dataset used to generate the DataSchema variable (as named 
in the Dossier).</td>
</tr>
<tr>
<td>input_variables *</td>
<td>Name of the variable(s) in the 'input_dataset' used to generate 
the DataSchema variable.</td>
</tr>
<tr>
<td>Mlstr_harmo:rule_category *</td>
<td>Type of algorithm used to generate the DataSchema variable from the input 
variables. The first entry must be the creation of a harmonized primary 
identifier variable (e.g., participant unique ID).</td>
</tr>
<tr>
<td>Mlstr_harmo:algorithm *</td>
<td>Algorithm used to generate the DataSchema variable from the 
input variables.</td>
</tr>
<tr>
<td>Mlstr_harmo:status</td>
<td>Possibility to generate the DataSchema variable from the input dataset. 
This is considered "complete" if the DataSchema variable can be generated from 
the input dataset or "impossible" if not.</td>
</tr>
<tr>
<td>Mlstr_harmo:status_detail</td>
<td>Additional information about the possibility to generate the DataSchema 
variable from the input dataset. If 'Mlstr_harmo:status' is "complete", the 
information could be considered "identical" or "compatible" with the DataSchema 
variable. If 'Mlstr_harmo:status' is "impossible", the information could be 
considered "incompatible" or "unavailable" for harmonization.</td>
</tr>
<tr>
<td>Mlstr_harmo:comment</td>
<td>Additional information about the inputs or algorithms to document with the 
harmonized variable.</td>
</tr>
</tbody>
</table>
</div>

## Harmonized Dataset

Data table containing a collection of harmonized variables processed under
the DataSchema formats. 

<div style="padding-left: 10px; padding-right: 10px; background-color: #eee;">
<table style="display: table">
<thead>
<tr class="header">
<th>Name</th>
<th width="75%">Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>[harmonized_variable_1] *</td>
<td>First harmonized variable. This is the primary identifier variable (e.g., 
participant unique ID). Variables in the harmonized dataset are generated in 
the order defined in the DataSchema.</td>
</tr>
<tr>
<td>[harmonized_variable_2] ...</td>
<td>Additional harmonized variable.</td>
</tr>
</tbody>
</table>
</div>

## Harmonized Data Dictionary

List of variables in a harmonized dataset and related metadata.

### Variables *

Metadata table containing the list of variables in a harmonized dataset 
and metadata about the harmonized variables (taken from the DataSchema and
Data Processing Elements).

<div style="padding-left: 10px; padding-right: 10px; background-color: #eee;">
<table style="display: table">
<thead>
<tr class="header">
<th>Name</th>
<th width="75%">Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>index</td>
<td>Index to order variables in the table (taken from the DataSchema).</td>
</tr>
<tr>
<td>name *</td>
<td>Name of the harmonized variable (taken from the DataSchema).</td>
</tr>
<tr>
<td>label</td>
<td>Short description of the harmonized variable (taken from the DataSchema).</td>
</tr>
<tr>
<td>valueType</td>
<td>Value type of the harmonized variable (taken from the DataSchema).</td>
</tr>
<tr>
<td>Mlstr_harmo:rule_category</td>
<td>Type of algorithm used to generate the DataSchema variable from the 
input variables (taken from the Data Processing Elements).</td> 
</tr>
<tr>
<td>Mlstr_harmo:algorithm</td>
<td>Algorithm used to generate the harmonized variable from the 
input variables (taken from the Data Processing Elements).</td> 
</tr>
<tr>
<td>Mlstr_harmo:status</td>
<td>Possibility to generate the DataSchema variable from the 
input dataset (taken from the Data Processing Elements).</td> 
</tr>
<tr>
<td>Mlstr_harmo:status_detail</td>
<td>Additional information about the possibility to generate the DataSchema 
variable from the input dataset (taken from the Data Processing Elements).</td> 
</tr>
<tr>
<td>Mlstr_harmo:comment</td>
<td>Additional information about the inputs or algorithms to document with the 
harmonized variable (taken from the Data Processing Elements).</td> 
</tr>
</tbody>
</table>
</div>

### Categories

Metadata table containing the list of categories and related metadata (taken 
from the DataSchema).

<div style="padding-left: 10px; padding-right: 10px; background-color: #eee;">
<table style="display: table">
<thead>
<tr class="header">
<th>Name</th>
<th width="75%">Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>variable *</td>
<td>Name of the harmonized variable to which the category belong (taken from 
the DataSchema).</td>
</tr>
<tr>
<td>name *</td>
<td>Category code value (taken from the DataSchema).</td>
</tr>
<tr>
<td>label</td>
<td>Short description of the category code value (taken from the DataSchema).
</td>
</tr>
<tr>
<td>missing</td>
<td>Boolean value (TRUE/FALSE or 1/0) indicating if the value in 'name' is 
interpreted as a missing value (taken from the DataSchema).</td>
</tr>
</tbody>
</table>
</div>

## Harmonized Dossier

Set of one or more harmonized dataset(s) and their associated data 
dictionary(ies).

<div style="padding-left: 10px; padding-right: 10px; background-color: #eee;">
<table style="display: table">
<thead>
<tr class="header">
<th>Name</th>
<th width="75%">Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>[harmonized_dataset_1] *</td>
<td>Data table containing a collection of harmonized variables processed under 
the DataSchema format and its associated data dictionary. There is one 
harmonized dataset per input dataset.</td>
</tr>
<tr>
<td>[harmonized_dataset_2] ...</td>
<td>Additional harmonized dataset and its associated data dictionary.</td>
</tr>
</tbody>
</table>
</div>

## Pooled Harmonized Dataset

Combined data table containing multiple harmonized datasets processed under the 
same DataSchema formats.

<div style="padding-left: 10px; padding-right: 10px; background-color: #eee;">
<table style="display: table">
<thead>
<tr class="header">
<th>Name</th>
<th width="75%">Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>[harmonized_dataset_1] *</td>
<td>First harmonized variable. This is the primary unique identifier variable. 
Variables in the harmonized dataset are generated in the order defined in 
the DataSchema.</td>
</tr>
<tr>
<td>[harmonized_dataset_2] ...</td>
<td>Additional harmonized variable.</td>
</tr>
</tbody>
</table>
</div>

## Pooled Harmonized Data Dictionary

List of variables in a pooled harmonized dataset and related metadata.

### Variables *

Metadata table containing the list of variables in a pooled harmonized dataset 
and metadata about the harmonized variables (taken from the DataSchema).

<div style="padding-left: 10px; padding-right: 10px; background-color: #eee;">
<table style="display: table">
<thead>
<tr class="header">
<th>Name</th>
<th width="75%">Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>index</td>
<td>Index to order variables in the table (taken from the DataSchema).</td>
</tr>
<tr>
<td>name *</td>
<td>Name of the harmonized variable (taken from the DataSchema).</td>
</tr>
<tr>
<td>label</td>
<td>Short description of the harmonized variable (taken from the DataSchema).</td>
</tr>
<tr>
<td>valueType</td>
<td>Value type of the harmonized variable (taken from the DataSchema).</td>
</tr>
</tbody>
</table>
</div>

### Categories

Metadata table containing the list of categories and related metadata (taken 
from the DataSchema).

<div style="padding-left: 10px; padding-right: 10px; background-color: #eee;">
<table style="display: table">
<thead>
<tr class="header">
<th>Name</th>
<th width="75%">Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>variable *</td>
<td>Name of the harmonized variable to which the category belong (taken from 
the DataSchema).</td>
</tr>
<tr>
<td>name *</td>
<td>Category code value (taken from the DataSchema).</td>
</tr>
<tr>
<td>label</td>
<td>Short description of the category code value (taken from the DataSchema).
</td>
</tr>
<tr>
<td>missing</td>
<td>Boolean value (TRUE/FALSE or 1/0) indicating if the value in 'name' is 
interpreted as a missing value (taken from the DataSchema).</td>
</tr>
</tbody>
</table>
</div>