Skip to content
Switch branches/tags
Go to file
Cannot retrieve contributors at this time
title: "Preparing data for finalfit"
author: "Ewen Harrison"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Preparing data for finalfit}
```{r setup, include = FALSE}
collapse = TRUE,
comment = "#>"
This vignette shows you how to upload and prepare any dataset for use with finalfit. The demonstration will use the `boot::melanoma`. Use `?boot::melanoma` to see the help page with data description. I will use `library(tidyverse)` methods. First I'll `write_csv()` the data just to demonstrate reading it.
## Read data
Note the various options in `read_csv()`, including providing column names, variable type, missing data identifier etc.
# Save example
write_csv(boot::melanoma, "boot.csv")
# Read data
melanoma = read_csv("boot.csv")
## Column types
Note the output shows how the columns/variables have been parsed. For full details see `?readr::cols()`.
### Continuous data
* Integer (whole numbers) - `col_integer()`
* Double or numeric (real numbers; the name comes from "double-precision floating point") - `col_double()`
### Categorical data
* Factor (a fixed set of names/strings or numbers) - `col_factor()`
* Character (sequences letters, numbers, and symbols) - `col_character()`
* Logical (containing only TRUE or FALSE) - `col_logical()`
### Dates and times
* Date - `col_date()`
* Time - `col_time()`
* Date-time - `col_datetime()`
## Check data
`ff_glimpse()` provides a convenient overview of all data in a tibble or data frame. It is particularly important that factors are correctly specified. Hence, `ff_glimpse()` separates variables into continuous and categorcial. As expected, no factors are yet specified in the melanoma dataset.
If you wish to see the variables in the order in which they appear in the data frame or tibble, `missing_glimpse()` or `tibble::glimpse()` are useful.
## Specify factors
Use an original description of the data (often called a data dictionary) to correctly assign and label any factor variables. This can be done in a single pipe.
melanoma %>%
status.factor = factor(status, levels = c(1, 2, 3),
labels = c("Died from melanoma", "Alive", "Died from other causes")) %>%
sex.factor = factor(sex, levels = c(1, 0),
labels = c("Male", "Female")) %>%
ulcer.factor = factor(ulcer, levels = c(1, 0),
labels = c("Present", "Absent")) %>%
) -> melanoma
Everything looks good and you are ready to start analysis.