<a href="https://colab.research.google.com/github/lukmanaj/biostats_practice/blob/main/chapter_01.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Question 1

Self‐reported injuries among left‐ and right‐handed people were compared in a
survey of 1986 college students in British Columbia, Canada. Of the 180 left‐
handed students, 93 reported at least one injury, and 619 of the 1716 right‐
handed students reported at least one injury in the same period. Arrange the
data in a 2 × 2 table and calculate the proportion of people with at least one
injury during the period of observation for each group.

In [None]:
library(tidyverse)

── [1mAttaching core tidyverse packages[22m ──────────────────────── tidyverse 2.0.0 ──
[32m✔[39m [34mdplyr    [39m 1.1.4     [32m✔[39m [34mreadr    [39m 2.1.5
[32m✔[39m [34mforcats  [39m 1.0.0     [32m✔[39m [34mstringr  [39m 1.5.1
[32m✔[39m [34mggplot2  [39m 3.5.1     [32m✔[39m [34mtibble   [39m 3.2.1
[32m✔[39m [34mlubridate[39m 1.9.4     [32m✔[39m [34mtidyr    [39m 1.3.1
[32m✔[39m [34mpurrr    [39m 1.0.2     
── [1mConflicts[22m ────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()
[36mℹ[39m Use the conflicted package ([3m[34m<http://conflicted.r-lib.org/>[39m[23m) to force all conflicts to become errors


In [None]:
# Define the data
data <- tibble(
  Handedness = c("Left-handed", "Right-handed"),
  Total = c(180, 1716),
  Injuries = c(93, 619),
  No_Injuries = c(180 - 93, 1716 - 619)
)

# Add proportions
data <- data |>
  mutate(
    Proportion_Injuries = Injuries / Total
  )

data


Handedness,Total,Injuries,No_Injuries,Proportion_Injuries
<chr>,<dbl>,<dbl>,<dbl>,<dbl>
Left-handed,180,93,87,0.5166667
Right-handed,1716,619,1097,0.3607226


In [None]:
# Display the proportion of injuries for each group
data |>
  select(Handedness, Proportion_Injuries)


Handedness,Proportion_Injuries
<chr>,<dbl>
Left-handed,0.5166667
Right-handed,0.3607226


# Question 2

A study was conducted to evaluate the hypothesis that tea consumption and
premenstrual syndrome are associated. A group of 188 nursing students and
64 tea factory workers were given questionnaires. The prevalence of
premenstrual syndrome was 39% among the nursing students and 77% among
the tea factory workers. How many people in each group have premenstrual
syndrome? Arrange the data in a 2 × 2 table.

In [None]:
# Define the data
data <- tibble(
  Group = c("Nursing Students", "Tea Factory Workers"),
  Total = c(188, 64),
  Prevalence = c(0.39, 0.77)
)

# Calculate the number with and without PMS
data <- data |>
  mutate(
    With_PMS = round(Prevalence * Total),
    Without_PMS = Total - With_PMS
  )

# Arrange data in a 2x2 table
table_2x2 <- data |>
  select(Group, With_PMS, Without_PMS)

# Print the 2x2 table
table_2x2

Group,With_PMS,Without_PMS
<chr>,<dbl>,<dbl>
Nursing Students,73,115
Tea Factory Workers,49,15


# Question 3

The relationship between prior condom use and tubal pregnancy was assessed in a population-based case-control study at Group Health Cooperative of Puget Sound during 1981–1986. The results are shown in Table E1.3.

**Question:** Compute the group size and the proportion of subjects in each group who never used condoms.

**Table E1.3**

| Condom Use | Cases | Controls |
|------------|-------|----------|
| Never      | 176   | 488      |
| Always     | 51    | 186      |

In [None]:
# Load the necessary library
library(tidyverse)

# Define the data
data <- tibble(
  Condom_Use = c("Never", "Always"),
  Cases = c(176, 51),
  Controls = c(488, 186)
)

# Compute the group totals
group_totals <- data |>
  summarise(
    Total_Cases = sum(Cases),
    Total_Controls = sum(Controls)
  )

group_totals

Total_Cases,Total_Controls
<dbl>,<dbl>
227,674


In [None]:
# Add proportions for "Never" condom use
data <- data |>
  filter(Condom_Use == "Never") |>
  mutate(
    Proportion_Cases = Cases / group_totals$Total_Cases,
    Proportion_Controls = Controls / group_totals$Total_Controls
  )

data


Condom_Use,Cases,Controls,Proportion_Cases,Proportion_Controls
<chr>,<dbl>,<dbl>,<dbl>,<dbl>
Never,176,488,0.7753304,0.7240356


# Question 4


Epidemic keratoconjunctivitis (EKC) or “shipyard eye” is an acute infectious disease of the eye. A case of EKC is defined as an illness:
- Consisting of redness, tearing, and pain in one or both eyes for more than three days’ duration;
- Diagnosed as EKC by an ophthalmologist.

In late October 1977, physician A (one of the two ophthalmologists providing the majority of specialized eye care to the residents of a central Georgia county; population 45,000) saw a nurse who had returned from a vacation in Korea with severe EKC. She received symptomatic therapy and was warned that her eye infection could spread to others; nevertheless, numerous cases of illness similar to hers soon occurred in the patients and staff of the nursing home (nursing home A) where she worked. Table E1.4 provides the exposure history of 22 persons with EKC between 27 October 1977 and 13 January 1978 (when the outbreak stopped after proper control techniques were initiated). Nursing home B, included in this table, is the only other area chronic-care facility.

**Question:** Compute and compare the proportions of cases from the two nursing homes. What would be your conclusion?

**Table E1.4**

| Exposure Cohort | Number Exposed | Number of Cases |
|-----------------|----------------|-----------------|
| Nursing Home A  | 64             | 16              |
| Nursing Home B  | 238            | 6               |

---

In [None]:
data <- tibble(
  exposure_cohort = c("Nursing home A", "Nursing home B"),
  number_exposed = c(64,238),
  number_of_cases = c(16,6)
)
data

exposure_cohort,number_exposed,number_of_cases
<chr>,<dbl>,<dbl>
Nursing home A,64,16
Nursing home B,238,6


In [None]:
data |>
  mutate(
    prop_cases = number_of_cases/number_exposed
  )

exposure_cohort,number_exposed,number_of_cases,prop_cases
<chr>,<dbl>,<dbl>,<dbl>
Nursing home A,64,16,0.25
Nursing home B,238,6,0.02521008


The proportion of cases is more than ten times bigger for nursing home A than for nursing home B.


In [None]:
prop_test <- prop.test(
  x = data$number_of_cases,
  n = data$number_exposed,
  alternative = "two.sided"
)
prop_test

“Chi-squared approximation may be incorrect”



	2-sample test for equality of proportions with continuity correction

data:  data$number_of_cases out of data$number_exposed
X-squared = 34.48, df = 1, p-value = 4.308e-09
alternative hypothesis: two.sided
95 percent confidence interval:
 0.1069371 0.3426427
sample estimates:
    prop 1     prop 2 
0.25000000 0.02521008 


In [None]:
# Define the contingency table
contingency_table <- matrix(
  c(16, 6, 64 - 16, 238 - 6),  # Cases and non-cases for both groups
  nrow = 2,
  byrow = TRUE
)
colnames(contingency_table) <- c("Cases", "Non-Cases")
rownames(contingency_table) <- c("Nursing Home A", "Nursing Home B")

# Perform Fisher's Exact Test
fisher_test <- fisher.test(contingency_table, alternative = "two.sided")

# Print the test results
print(fisher_test)



	Fisher's Exact Test for Count Data

data:  contingency_table
p-value = 8.445e-08
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
  4.452899 41.850736
sample estimates:
odds ratio 
   12.7318 



In [None]:
contingency_table

Unnamed: 0,Cases,Non-Cases
Nursing Home A,16,6
Nursing Home B,48,232


# Question 5


In August 1976, tuberculosis was diagnosed in a high school student (index case) in Corinth, Mississippi. Subsequently, laboratory studies revealed that the student’s disease was caused by drug-resistant tubercle bacilli. An epidemiologic investigation was conducted at the high school. Table E1.5 gives the rate of positive tuberculin reactions, determined for various groups of students according to the degree of exposure to the index case.

**Questions:**
1. Compute and compare the proportions of positive cases for the two exposure levels. What would be your conclusion?
2. Calculate the odds ratio associated with high exposure. Does this result support your conclusion in part (1)?

**Table E1.5**

| Exposure Level | Number Tested | Number Positive |
|----------------|---------------|-----------------|
| High           | 129           | 63              |
| Low            | 325           | 36              |

---


In [None]:
# Define the data
data <- tibble(
  exposure_level = c("High", "Low"),
  number_tested = c(129, 325),
  number_positive = c(63, 36)
)

# Calculate proportions
data <- data |>
  mutate(
    proportion_positive = number_positive / number_tested
  )
# Print the data with proportions
data

exposure_level,number_tested,number_positive,proportion_positive
<chr>,<dbl>,<dbl>,<dbl>
High,129,63,0.4883721
Low,325,36,0.1107692


Those with high exposure level have more positive cases as seen by the proportions

In [None]:
# Calculate the odds ratio using a contingency table
contingency_table <- matrix(
  c(63, 129 - 63, 36, 325 - 36),  # Positive and negative cases for each group
  nrow = 2,
  byrow = TRUE
)
rownames(contingency_table) <- c("High Exposure", "Low Exposure")
colnames(contingency_table) <- c("Positive", "Negative")

# Perform Fisher's Exact Test to calculate OR
fisher_test <- fisher.test(contingency_table, alternative = "two.sided")

# Print results
print(fisher_test)



	Fisher's Exact Test for Count Data

data:  contingency_table
p-value < 2.2e-16
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
  4.566232 12.884197
sample estimates:
odds ratio 
  7.616828 



Yes, the result supports the conclusion.

# Question 6


Consider the data taken from a study that attempts to determine whether the use of electronic fetal monitoring (EFM) during labor affects the frequency of cesarean section deliveries. Of the 5,824 infants included in the study, 2,850 were electronically monitored and 2,974 were not. The outcomes are listed in Table E1.6.

**Questions:**
1. Compute and compare the proportions of cesarean delivery for the two exposure groups. What would be your conclusion?
2. Calculate the odds ratio associated with EFM exposure. Does this result support your conclusion in part (1)?

**Table E1.6**

| Cesarean Delivery | EFM Exposure Yes | EFM Exposure No | Total |
|-------------------|------------------|------------------|-------|
| Yes               | 358              | 229              | 587   |
| No                | 2,492            | 2,745            | 5,237 |
| **Total**         | 2,850            | 2,974            | 5,824 |

---



In [None]:
# Data for EFM Exposure
efm_data <- tibble(
  cesarean_delivery = c("Yes", "No"),
  efm_yes = c(358, 2492),
  efm_no = c(229, 2745)
)

# Add totals
efm_data <- efm_data |>
  mutate(
    total_yes = sum(efm_yes),
    total_no = sum(efm_no)
  )

# Compute proportions
efm_data <- efm_data |>
  mutate(
    prop_efm_yes = efm_yes / total_yes,
    prop_efm_no = efm_no / total_no
  )
efm_data



cesarean_delivery,efm_yes,efm_no,total_yes,total_no,prop_efm_yes,prop_efm_no
<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
Yes,358,229,2850,2974,0.125614,0.07700067
No,2492,2745,2850,2974,0.874386,0.92299933


In [None]:
# Calculate odds for each group
odds_efm_yes <- efm_data$efm_yes[1] / efm_data$efm_yes[2]
odds_efm_no <- efm_data$efm_no[1] / efm_data$efm_no[2]

# Calculate odds ratio
odds_ratio_efm <- odds_efm_yes / odds_efm_no
odds_ratio_efm

A study was conducted to investigate the effectiveness of bicycle safety helmets in preventing head injury. The data consist of a random sample of 793 persons who were involved in bicycle accidents during a one-year period (Table E1.7).

**Questions:**
1. Compute and compare the proportions of head injury for the group with helmets versus the group without helmets. What would be your conclusion?
2. Calculate the odds ratio associated with not using helmets. Does this result support your conclusion in part (1)?

**Table E1.7**

| Head Injury | Wearing Helmet Yes | Wearing Helmet No | Total |
|-------------|---------------------|-------------------|-------|
| Yes         | 17                  | 130               | 147   |
| No          | 218                 | 428               | 646   |

In [None]:
# Data for Helmet Use
helmet_data <- tibble(
  head_injury = c("Yes", "No"),
  helmet_yes = c(17, 218),
  helmet_no = c(130, 428)
)

# Add totals
helmet_data <- helmet_data |>
  mutate(
    total_yes = sum(helmet_yes),
    total_no = sum(helmet_no)
  )

# Compute proportions
helmet_data <- helmet_data |>
  mutate(
    prop_helmet_yes = helmet_yes / total_yes,
    prop_helmet_no = helmet_no / total_no
  )

helmet_data



head_injury,helmet_yes,helmet_no,total_yes,total_no,prop_helmet_yes,prop_helmet_no
<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
Yes,17,130,235,558,0.07234043,0.2329749
No,218,428,235,558,0.92765957,0.7670251


In [None]:
# Calculate odds for each group
odds_helmet_yes <- helmet_data$helmet_yes[1] / helmet_data$helmet_yes[2]
odds_helmet_no <- helmet_data$helmet_no[1] / helmet_data$helmet_no[2]

# Calculate odds ratio
odds_ratio_helmet <- odds_helmet_yes / odds_helmet_no
odds_ratio_helmet