This repository contains materials used to train Public Health Scotland analysts in R using Scottish Morbidity Record (SMR) data. While originally written to be relevant to analysts working in LIST, they should also be relevant to analysts in other teams who use SMR data.
Please note that this GitHub repository contains the main copy of this training material. Any local copies which exist on the network will not be maintained.
To download this repository, click the green 'Clone or download' button and then click 'Download ZIP'. Unzip the folder in a location on the network which is accessible via the RStudio server.
To open the project in the RStudio server, click File -> Open Project -> navigate to the folder where the project is saved -> open the smr-training.Rproj
file.
This code uses RStudio Projects, which are a way of bundling together related files and scripts. RStudio Projects come with a .Rproj
file, and wherever this file is saved is
where RStudio sets the working directory, from which other filepaths can be defined relatively using the here package. A new project which follows the recommended structure within PHS can be created using the phstemplates package.
Type getwd()
into the RStudio console to get the working directory for this project.
The below table contains an approximate and non-exhaustive list of equivalent functions in R and SPSS which are commonly used in analysis of SMR data. The R functions come from the dplyr, tidyr and magrittr packages, part of the tidyverse collection of packages.
Please note that, where not explicitly stated, it is assumed in the R code listed in the below table that the data have first been piped (%>%
or %<>%
) to the function, for example:
-
new_df <- old_df %>%
arrange(x) %>%
filter(x = first(x))
-
df %<>%
select(x, y) %>%
mutate(z = x + y)
R | SPSS |
---|---|
arrange(x) |
SORT CASES BY X (A) |
arrange(desc(x)) |
SORT CASES BY X (D) |
first(x) |
FIRST(X) |
last(x) |
LAST(X) |
filter(x == 2) |
SELECT IF X = 2 |
filter(x != 2) |
SELECT IF NOT (X = 2) |
select(x) |
/KEEP X |
select(-x) |
/DROP X |
mutate(x = 2) |
COMPUTE X = 2 |
drop_na(x) |
SELECT IF NOT (SYSMIS(X)) |
df %<>% left_join(lookup, by = "common_variable") |
MATCH FILES /FILE = * /TABLE = "/PATH/TO/LOOKUP" /BY COMMON_VARIABLE |
df %<>% group_by(x) %>% summarise(y = sum(y)) %>% ungroup() |
AGGREGATE OUTFILE = * /BREAK X /Y = SUM(Y) |