# Chi-Square & Fisher's Exact Tests in R

## Overview

These tests assess relationships between **categorical variables**. Rather than comparing means, they compare observed frequencies to expected frequencies under the null hypothesis of no association.

| Test | Use Case |
|---|---|
| Chi-square test of independence | Are two categorical variables associated? (2+ levels each) |
| Chi-square goodness-of-fit | Does an observed frequency distribution match an expected one? |
| Fisher's exact test | Independence test for small samples where chi-square assumptions are violated |
| McNemar's test | Paired/repeated categorical data — change in proportions for matched pairs |

## Applications by Sector

| Sector | Example |
|---|---|
| **Ecology** | Is species presence/absence associated with habitat type? Do substrate categories differ in frequency across sites? Is the observed ratio of life stages consistent with expected population structure? |
| **Healthcare** | Is disease incidence associated with exposure category? Does treatment assignment differ by patient demographic? Is the observed proportion of adverse events consistent with background rates? |
| **Finance** | Is loan default associated with credit score category? Are fraud flags distributed equally across transaction types? |
| **Insurance** | Is claim filing rate associated with policy tier? Does coverage type differ between age groups? |

---

## Assumptions Checklist

**Chi-square test of independence:**
- [ ] Both variables are categorical (nominal or ordinal)
- [ ] Observations are independent
- [ ] Expected frequency ≥ 5 in at least 80% of cells — check before running
- [ ] Expected frequency ≥ 1 in all cells
- [ ] No single observation contributes to more than one cell

> **If expected frequencies are too low:** Use Fisher's exact test (works for any sample size but computationally intensive for large tables), or collapse categories if conceptually justified.

**Chi-square goodness-of-fit:**
- [ ] One categorical variable with 2+ levels
- [ ] Expected frequencies specified a priori (not derived from the data)
- [ ] Expected frequency ≥ 5 per cell

**Fisher's exact test:**
- [ ] Categorical variables (typically 2×2, but R handles larger tables)
- [ ] Independent observations
- [ ] Use when any expected cell frequency < 5

---

## Setup

In [None]:
# ── Libraries ────────────────────────────────────────────────────────────────
library(tidyverse)    # data manipulation and visualization
library(ggplot2)      # visualization
library(rstatix)      # tidy-friendly effect sizes
library(vcd)          # mosaic plots and association measures

# ── Reproducibility ──────────────────────────────────────────────────────────
set.seed(42)

## Data

We use two built-in datasets:
- `HairEyeColor`: hair color × eye color × sex frequencies — independence and goodness-of-fit examples
- A small simulated 2×2 table — Fisher's exact test example (small expected counts)

In [None]:
# ── HairEyeColor: collapse to 2D table (females only) ────────────────────────
hair_eye <- margin.table(HairEyeColor, margin = c(1, 2))
print(hair_eye)

# ── Convert to data frame for ggplot ─────────────────────────────────────────
hair_eye_df <- as.data.frame(hair_eye)
head(hair_eye_df)

# ── Small 2x2 table: species detection at two sites ──────────────────────────
# Simulates: was a species detected (yes/no) at two habitat types
detection_table <- matrix(
  c(3, 1,   # disturbed: detected, not detected
    8, 2),  # undisturbed: detected, not detected
  nrow = 2,
  dimnames = list(
    Habitat  = c("Disturbed", "Undisturbed"),
    Detected = c("Yes", "No")
  )
)
print(detection_table)
# Small counts — Fisher's exact test is more appropriate than chi-square here

---

## Assumptions Testing

### Check Expected Cell Frequencies

This is the key assumption to verify *before* choosing between chi-square and Fisher's exact test.

In [None]:
# ── Expected frequencies for hair × eye table ────────────────────────────────
chisq_test <- chisq.test(hair_eye)
round(chisq_test$expected, 1)
# All expected values should be >= 5
# If any cell has expected < 5: consider Fisher's exact or collapsing categories

# ── Proportion of cells with expected >= 5 ───────────────────────────────────
expected <- chisq_test$expected
cat(sprintf("Cells with expected >= 5: %.0f%%\n",
            mean(expected >= 5) * 100))
# Target: >= 80% of cells have expected frequency >= 5

# ── Check detection table (small n) ──────────────────────────────────────────
chisq.test(detection_table)$expected
# Expected values < 5 → use Fisher's exact test instead

---

## Chi-Square Test of Independence

**Question:** Is hair color associated with eye color?  
**H₀:** Hair color and eye color are independent  
**H₁:** Hair color and eye color are associated

In [None]:
# ── Chi-square test ───────────────────────────────────────────────────────────
chisq_result <- chisq.test(hair_eye)
print(chisq_result)

# ── Observed vs. expected frequencies ────────────────────────────────────────
cat("\nObserved:\n"); print(chisq_result$observed)
cat("\nExpected:\n"); print(round(chisq_result$expected, 1))

# ── Standardized residuals ────────────────────────────────────────────────────
# Values > |2| indicate cells driving the association
cat("\nStandardized residuals:\n")
print(round(chisq_result$stdres, 2))

# ── Effect size: Cramér's V ───────────────────────────────────────────────────
# Ranges 0-1; comparable across tables of different sizes
# Small: 0.1 | Medium: 0.3 | Large: 0.5
rstatix::cramer_v(hair_eye)

# ── Mosaic plot: visualize cell contributions ────────────────────────────────
vcd::mosaic(hair_eye,
            shade = TRUE,        # color by standardized residuals
            legend = TRUE,
            main = "Hair Color × Eye Color Association")
# Blue cells: observed > expected; Red cells: observed < expected

---

## Chi-Square Goodness-of-Fit Test

**Question:** Is the observed distribution of hair colors consistent with an expected distribution?  
**H₀:** Observed frequencies match expected proportions  
**H₁:** Observed frequencies deviate from expected proportions

> The expected proportions must come from theory, prior data, or a reference population — **not** from the data being tested.

In [None]:
# ── Observed hair color frequencies ──────────────────────────────────────────
hair_counts <- margin.table(HairEyeColor, margin = 1)
print(hair_counts)

# ── Test against equal expected proportions ───────────────────────────────────
chisq.test(hair_counts)
# Default: tests against equal proportions across all categories

# ── Test against specified expected proportions ───────────────────────────────
# Hypothetical expected proportions from a reference population
expected_props <- c(Black = 0.10, Brown = 0.50, Red = 0.10, Blond = 0.30)
chisq.test(hair_counts, p = expected_props)

# ── Visualization ─────────────────────────────────────────────────────────────
obs_df <- data.frame(
  Hair     = names(hair_counts),
  Observed = as.numeric(hair_counts),
  Expected = as.numeric(sum(hair_counts) * expected_props)
) %>%
  pivot_longer(cols = c(Observed, Expected), names_to = "Type", values_to = "Count")

ggplot(obs_df, aes(x = Hair, y = Count, fill = Type)) +
  geom_col(position = "dodge") +
  scale_fill_manual(values = c("#4a8fff", "#ff6b6b")) +
  labs(title = "Observed vs. Expected Hair Color Frequencies",
       subtitle = "Goodness-of-fit test against reference proportions",
       y = "Frequency", x = "Hair Color") +
  theme_minimal()

---

## Fisher's Exact Test

**Question:** Is species detection associated with habitat type, given small sample sizes?  

Fisher's exact test computes the exact probability of the observed table (and all more extreme tables) under H₀. It does not rely on the chi-square approximation and is valid for any sample size, including very small n.

> Use Fisher's exact test when any expected cell frequency < 5 in a 2×2 table, or when n < 20.

In [None]:
# ── Fisher's exact test ───────────────────────────────────────────────────────
fisher_result <- fisher.test(detection_table)
print(fisher_result)
# Reports: p-value, odds ratio, and 95% CI for the odds ratio

# ── Interpretation of odds ratio ─────────────────────────────────────────────
# OR > 1: event more likely in first group
# OR < 1: event less likely in first group
# OR = 1: no association (H0)
# 95% CI not containing 1 → significant at alpha = 0.05

# ── Two-sided vs. one-sided ───────────────────────────────────────────────────
# Two-sided (default): tests for any association
# One-sided: only justified with directional a priori hypothesis
fisher.test(detection_table, alternative = "greater")  # first group has higher odds

# ── Visualization for 2x2 table ───────────────────────────────────────────────
as.data.frame(detection_table) %>%
  ggplot(aes(x = Habitat, y = Freq, fill = Detected)) +
  geom_col(position = "fill") +
  scale_fill_manual(values = c("#4fffb0", "#ff6b6b")) +
  scale_y_continuous(labels = scales::percent) +
  labs(title = "Species Detection by Habitat Type",
       subtitle = "Fisher's exact test (small sample)",
       y = "Proportion", x = "Habitat") +
  theme_minimal()

---

## McNemar's Test (Paired Categorical Data)

**Question:** Did the proportion of individuals classified in each category change between two time points or conditions?  

Use when the same subjects are classified under two conditions — the categorical analog to the paired t-test.

In [None]:
# ── Simulated example: site classification before and after restoration ────────
# 'Healthy' vs 'Degraded' at same sites, two time points
before_after <- matrix(
  c(20, 10,   # before healthy: remained healthy, became degraded
    5,  15),  # before degraded: became healthy, remained degraded
  nrow = 2,
  dimnames = list(
    Before = c("Healthy", "Degraded"),
    After  = c("Healthy", "Degraded")
  )
)
print(before_after)

# ── McNemar's test ────────────────────────────────────────────────────────────
mcnemar.test(before_after)
# Tests whether off-diagonal cells (discordant pairs) are symmetric
# p < 0.05: significant change in classification between conditions

---

## Reporting Results

In [None]:
# ── Chi-square reporting format ───────────────────────────────────────────────
cat(sprintf(
  "Chi-square test of independence: X²(%d) = %.2f, p = %.3f, V = %.2f\n",
  chisq_result$parameter,
  chisq_result$statistic,
  chisq_result$p.value,
  as.numeric(rstatix::cramer_v(hair_eye))
))
# Report as: X²(df) = X.XX, p = .XXX, Cramér's V = .XX

# ── Fisher's exact reporting format ──────────────────────────────────────────
cat(sprintf(
  "Fisher's exact test: OR = %.2f, p = %.3f, 95%% CI [%.2f, %.2f]\n",
  fisher_result$estimate,
  fisher_result$p.value,
  fisher_result$conf.int[1],
  fisher_result$conf.int[2]
))
# Report as: OR = X.XX, p = .XXX, 95% CI [X.XX, X.XX]

---

## Common Pitfalls

**1. Using chi-square with small expected frequencies**  
Always check expected frequencies before running. If any cell has expected < 5, use Fisher's exact test. Running `chisq.test()` on sparse tables gives unreliable p-values.

**2. Confusing observed and expected frequencies**  
The test compares observed counts to what would be expected *if the variables were independent*. Expected frequencies are calculated from marginal totals — they are not predetermined proportions unless running a goodness-of-fit test.

**3. Using chi-square for paired or repeated categorical data**  
If the same subjects are measured under two conditions, use McNemar's test. Standard chi-square assumes independent observations.

**4. Ignoring effect size**  
A significant chi-square with large n can reflect a trivially small association. Always report Cramér's V alongside the p-value.

**5. Collapsing categories to fix expected frequency violations**  
Only collapse categories if it is conceptually defensible, not just to make the test work. Document the decision.

**6. Using proportions instead of counts as input**  
`chisq.test()` requires raw counts, not proportions or percentages. Passing proportions without specifying total n will give wrong results.

---
*r_methods_library · Samantha McGarrigle · [github.com/samantha-mcgarrigle](https://github.com/samantha-mcgarrigle)*