# Aim

The aim of this practical session is to learn how to compute Mantel-Haenzsel summary rate ratios (RRs) and decide whether a summary measure is appropriate.

Throughout this session we will be analysing the Whitehall dataset. To read in the dataset, type:

In [None]:
library(tidyverse)

In [None]:
library(haven)

In [None]:
whitehall_df <- read_dta("Data_files-20211113/WHITEHALL.dta")

# Defining follow-up information


Remember from Practical 1, before we are able to analyse follow-up data we must first define the dates of entry and exit into the study and the outcome (or ‘failure’) variable.

The outcome is the overall mortality all, whereas the time variables are timein and timeout. Both of these are expressed in days, so we need to set the scale to be 365.25 days to produce analyses in terms of person-years. Type:

In [None]:
whitehall_df_2 <- whitehall_df %>%
    mutate(followup_time = as.numeric(difftime(whitehall_df$timeout, 
                                               whitehall_df$timein, 
                                               units = "days")) / 365.25)

# Stratum-specific rates

To investigate how overall mortality varies according to age at entry to the study, we will recode age at entry into suitable groups,

In [None]:
whitehall_df_3 <- whitehall_df_2 %>%
    mutate(agecat = as.factor(case_when(agein < 45 ~ 0,
                              agein < 50 ~ 1,
                              agein < 55 ~ 2,
                              agein < 60 ~ 3,
                              agein < 65 ~ 4,
                              agein < 70 ~ 5,
                              TRUE ~ 999)))

We can use the `summarise` command to check the distribution of agecat. Type:

In [None]:
whitehall_df_3 %>%
    group_by(agecat) %>%
    summarise(n = n()) %>%
    mutate(percent = n / sum(n) * 100) %>%
    mutate(cum = cumsum(percent))

Now, to examine how mortality rates change with age at entry we can use `Surv` from the `survival` package, and `SurvRate` from the `biostat3` package. Type:

In [None]:
library(survival)
library(biostat3)


In [None]:
survRate(Surv(followup_time/1000, all) ~ agecat, 
         data=whitehall_df_3)

The rates increase quite dramatically with each age category. This is best displayed in a graph. To produce a graph displaying the mortality trend with age at entry, use `ggplot`. However, if the rate ratios between successive categories are similar, the differences between successive log(rates) should be constant, therefore if we plot the rates on a log scale we should see a linear relationship with age. Type:

In [None]:
survRate(Surv(followup_time/1000, all) ~ agecat, 
         data=whitehall_df_3) %>%
    #Put the data into ggplot
    ggplot(aes(x = agecat, y = rate)) +
    #Plot the points
    geom_point() +
    #Add in error bars
    geom_errorbar(aes(ymin=lower, ymax=upper)) +
    #Transform the y axis into logarithmic
    scale_y_continuous(trans='log2') +
    #Specify x-axis label names
    scale_x_discrete(labels=seq(40,65,5))

We can see that the trend in rates across age is approximately linear on the log scale. This indicates that the rate ratio from one age group to the next is similar.

## Comparison between groups: rate ratios

To obtain rate ratios for the effect of age at entry with the youngest age group as the baseline we will use the `Surv` and `survRate`.

In [None]:
survRate(Surv(followup_time/1000, all) ~ agecat, data=whitehall_df_3) %>%
    filter(agecat == 0)

In [None]:
#Get the rate for agecat of 0
agecat_0 <- survRate(Surv(followup_time/1000, all) ~ agecat, data=whitehall_df_3) %>%
    filter(agecat == 0) %>%
    purrr::pluck("rate")

#Get the rate for agecat of 1
agecat_1 <- survRate(Surv(followup_time/1000, all) ~ agecat, data=whitehall_df_3) %>%
    filter(agecat == 1) %>%
    purrr::pluck("rate")

#Get the ratio
agecat_1 / agecat_0

In [None]:
#Get the rate for agecat of 0
agecat_0 <- survRate(Surv(followup_time/1000, all) ~ agecat, data=whitehall_df_3) %>%
    filter(agecat == 0) %>%
    purrr::pluck("rate")

#Get the rate for agecat of 2
agecat_2 <- survRate(Surv(followup_time/1000, all) ~ agecat, data=whitehall_df_3) %>%
    filter(agecat == 2) %>%
    purrr::pluck("rate")

#Get the ratio
agecat_2 / agecat_0

In [None]:
#Get the rate for agecat of 0
agecat_0 <- survRate(Surv(followup_time/1000, all) ~ agecat, data=whitehall_df_3) %>%
    filter(agecat == 0) %>%
    purrr::pluck("rate")

#Get the rate for agecat of 3
agecat_3 <- survRate(Surv(followup_time/1000, all) ~ agecat, data=whitehall_df_3) %>%
    filter(agecat == 3) %>%
    purrr::pluck("rate")

#Get the ratio
agecat_3 / agecat_0

In [None]:
#Get the rate for agecat of 0
agecat_0 <- survRate(Surv(followup_time/1000, all) ~ agecat, data=whitehall_df_3) %>%
    filter(agecat == 0) %>%
    purrr::pluck("rate")

#Get the rate for agecat of 4
agecat_4 <- survRate(Surv(followup_time/1000, all) ~ agecat, data=whitehall_df_3) %>%
    filter(agecat == 4) %>%
    purrr::pluck("rate")

#Get the ratio
agecat_4 / agecat_0

In [None]:
#Get the rate for agecat of 0
agecat_0 <- survRate(Surv(followup_time/1000, all) ~ agecat, data=whitehall_df_3) %>%
    filter(agecat == 0) %>%
    purrr::pluck("rate")

#Get the rate for agecat of 5
agecat_5 <- survRate(Surv(followup_time/1000, all) ~ agecat, data=whitehall_df_3) %>%
    filter(agecat == 5) %>%
    purrr::pluck("rate")

#Get the ratio
agecat_5 / agecat_0

It is clear that the rate ratios for consecutive age-groups versus the youngest age-group (baseline) increase with age.

The rate for the 45-49-year age-group is 1.17 times that of the 40-44 year age-group; 

The rate for the 50-54-year age-group is 2.77 times that of the 40-44 year age-group; 

The rate for the 55-59-year age-group is 4.58 times that of the 40-44 year age-group; and so on.

## Stratified estimates for the exposure of interest

Let’s say the main exposure of interest is grade of employment (coded: 1 = high grade, 2 = low grade). We will examine the all-cause mortality rates for low and high grades of employment, then estimate the rate ratio for low-grade employees versus high-grade employees.

In [None]:
survRate(Surv(followup_time/1000, all) ~ grade, 
         data=whitehall_df_3)

In [None]:
#Get the rate for grade of 1
grade_1 <- survRate(Surv(followup_time/1000, all) ~ grade, 
                    data=whitehall_df_3) %>%
    filter(grade == 1) %>%
    purrr::pluck("rate")

#Get the rate for grade of 2
grade_2 <- survRate(Surv(followup_time/1000, all) ~ grade, 
                    data=whitehall_df_3) %>%
    filter(grade == 2) %>%
    purrr::pluck("rate")

#Get the ratio
grade_2 / grade_1

To assess for potential confounding or effect modification by age at entry on the effect of employment grade use stmh with grade, stratified by agecat. Type:

In [None]:
#Get the rate for grade of 1
grade_1 <- survRate(Surv(followup_time/1000, all) ~ grade + agecat, data=whitehall_df_3) %>%
    filter(grade == 1) %>%
    purrr::pluck("rate")

#Get the rate for grade of 2
grade_2 <- survRate(Surv(followup_time/1000, all) ~ grade + agecat, data=whitehall_df_3) %>%
    filter(grade == 2) %>%
    purrr::pluck("rate")

#Get the ratios
tibble(survRate(Surv(followup_time/1000, all) ~ grade + agecat, data=whitehall_df_3) %>%
       filter(grade == 1) %>% 
       dplyr::select("agecat"), 
       "rate ratio" = grade_2 / grade_1)

R can't give us rate ratios but can do odds ratios with `epiDisplay`'s `cc` for crude odds ratios and `mhor` for adjusted

In [None]:
library(epiDisplay)

In [None]:
cc(whitehall_df_3$all, whitehall_df_3$grade,
              graph = FALSE)

In [None]:
mhor(whitehall_df_3$all, 
     whitehall_df_3$grade,
     whitehall_df_3$agecat, design = "cohort",
     graph = FALSE)

#### To examine whether the crude OR estimate for low grade, OR=2.66, is confounded by age at entry, we will examine the age-specific estimates. First we will assess whether there is any effect modification (also called ‘interaction’). The null hypothesis is ‘no effect modification’ and the large p-value (P=0.71) does not provide any evidence against the null hypothesis. Also, the values of the age-specific ORs are quite similar (they range from 1.2 to 3.57 with no evidence of a trend) therefore it is appropriate to summarise them using the Mantel-Haenszel estimate, OR=1.56 (95% CI 1.20 to 2.03). 

#### However, the results do show that the crude estimate of the effect of grade is strongly confounded by age at entry (ORcrude = 2.66, compared to RRM-H = 1.56).

# Review exercise

### 1) In order to analyse CHD mortality, set the time and the CHD mortality outcome variables with stset (remember that the time variables are timein and timeout, the outcome is chd, the identifier is id, and the scale should be set to years). Use the tab command on the chd variable to check that the number of deaths from CHD corresponds with the number given in the stset output.

 This is not needed in R

### 2) Recode age at entry into 5-year age-groups and investigate how CHD mortality varies according to age at entry to the study using strate.

In [None]:
survRate(Surv(followup_time/1000, chd) ~ agecat, 
         data=whitehall_df_3)

Rates increase dramatically in older age brackets

### 3) Examine the CHD mortality rates for low and high grades of employment. Use the stmh command to estimate the rate ratio for low-grade employees versus high-grade employees.

In [None]:
survRate(Surv(followup_time/1000, chd) ~ grade, 
         data=whitehall_df_3)

Rates for low-grade employees are almost double that for high-grade

### 4) Use stmh to examine the effect of grade, stratified by age at entry:

Use `epiDisplay`'s `cc` for unadjusted and `mhor` for adjusted odds ratios

In [None]:
cc(whitehall_df_3$chd, whitehall_df_3$grade,
   t

In [None]:
mhor(whitehall_df_3$chd, 
     whitehall_df_3$grade,
     whitehall_df_3$agecat, design = "cohort",
     graph = FALSE)

#### Is there any evidence of interaction between employment grade and age at entry? 

The crude OR of 1.87 is higher than the adjusted of OR 1.39 so perhaps some confounding.

#### Examine the result of the test for interaction. Is the effect of grade confounded by age at entry?

There is no clear trend and not strong evidence of confounding by age. 