# Practical 7

## Aim

To learn how to carry out a simple logistic regression analysis

In [None]:
library(tidyverse)

## Reading in the dataset and identifying relevant variables

In this practical session we will use a dataset from the study of helminths in Uganda.

To read in the dataset, type:

In [None]:
library(haven)

In [None]:
helminths_df <- read_dta("Data_files-20211113/helminths.dta")

In [None]:
head(helminths_df)

In  this  analysis  we  will  work  with  the  variable  representing  hookworm  infection.  It  is currently called hk_bin. To make this more clear, we will rename the variable. Type:

In [None]:
helminths_df_2 <- helminths_df %>%
    mutate(hookworm = hk_bin)

This has now renamed the variable to hookworm 
 
In this analysis we will look at the association between severe anaemia and exposure to hookworm infection.  We will also consider how (if at all) the association changes with age and malaria infection status. 
 
**anaemic_sev**  
    coded: 0=no, 1=yes 
 
**hookworm** is the variable name for hookworm infection status  
    coded:  0=uninfected, 1=infected 

**agegrp** is the variable name for age-group 
    coded: 0=<20, 1=20-24, 2=25-29, 3=30+ 

**malaria** 
    coded: 0=uninfected, 1=infected 

To produce frequency distributions for anaemic_sev, hookworm and agegrp use `CrossTable` from the package `gmodels`, type: 

In [None]:
library(gmodels)

In [None]:
CrossTable(helminths_df_2$anaemic_sev)

There were 275 women with severe anaemia

In [None]:
CrossTable(helminths_df_2$hookworm)

1,022 women were hookworm infected and 1,395 were not infected with hookworm. 

In [None]:
CrossTable(helminths_df_2$agegrp)

There were 607 women aged <20 years; 906 women aged 20 to 24 years; 545 women aged 25 to 29 years; and 359 women aged 30+ years.

## Testing for an association

For an initial examination of the association between severe anaemia and hookworm 
infection use the `CrossTable` command.  Type: 

In [None]:
CrossTable(helminths_df_2$anaemic_sev, helminths_df_2$hookworm,
prop.r = FALSE, prop.c = TRUE, chisq = TRUE)

From the table we can see that 17.7% of women infected with hookworm had severe anaemia, compared to 6.7% of women who were uninfected. This is very strong evidence (P<0.001) against the null hypothesis of *no association between severe anaemia and hookworm infection residence.* 

To examine the odds of severe anaemia by hookworm infection status there is no good replacement for `STATA`'s `tabodds` command, so we'll do it by hand:

Create a table

In [None]:
hookworm_anaemia_table <- 
    table(helminths_df_2$hookworm, helminths_df_2$anaemic_sev)

Calculate the odds

In [None]:
hookworm_anaemia_odds <- 
    hookworm_anaemia_table[, 2] / hookworm_anaemia_table[, 1]

Calculate the standard error

In [None]:
hookworm_anaemia_se <- sqrt((1 / sum(hookworm_anaemia_table[, 2])) +
    (1 / sum(hookworm_anaemia_table[, 1])))
hookworm_anaemia_ef <- exp(1.96 * hookworm_anaemia_se)


Calulate upper and lower 95% confidence interval bonds

In [None]:
hookworm_anaemia_lower <- hookworm_anaemia_odds / hookworm_anaemia_ef
hookworm_anaemia_upper <- hookworm_anaemia_odds * hookworm_anaemia_ef

Bind them together into a data frame and give it readable names

In [None]:
tibble(hookworm_anaemia_table,
hookworm_anaemia_odds,
hookworm_anaemia_lower,
hookworm_anaemia_upper)

In [None]:
hookworm_anaemia_df <- data.frame(cbind(hookworm_anaemia_table,
    hookworm_anaemia_odds,
    hookworm_anaemia_lower,
    hookworm_anaemia_upper,
    stringsAsFactors = FALSE))
names(hookworm_anaemia_df) <- c("controls", "cases", "odds", "[95% Conf.", "Interval]")

Now see the output

In [None]:
hookworm_anaemia_df

And test for homogeneity

In [None]:
table(helminths_df_2$hookworm, helminths_df_2$anaemic_sev) %>%
    chisq.test()

We can see that the odds of severe anemia are greater among hookworm infected 
women, and that the P-value (P<0.001) provides very strong evidence against the null 
hypothesis of no difference in odds of severe anaemia by hookworm infection status. Therefore we can conclude that the underlying ‘true’ odds of severe anaemia is greater in hookworm infected women than in uninfected.

Use `epi.2by2` from the package `epiR` to obtain an odds ratio estimate. Type:

In [None]:
library("epiR")

In [None]:
epi.2by2(table(factor(helminths_df_2$hookworm, levels = c(1, 0)),
         factor(helminths_df_2$anaemic_sev, levels = c(1, 0))),
         method = "cross.sectional", digits = 2)

Therefore, the odds of severe anaemia in hookworm infected women are 2.98 times that 
in hookworm uninfected women (95% CI 2.29 to 3.88, P<0.001).

## Logistic regression with one binary exposure 