# T-Tests in R

## Overview

T-tests compare means between one or two groups to determine whether observed differences are likely to reflect a true population-level difference or are attributable to random chance.

| Test | Use Case |
|---|---|
| One-sample t-test | Compare a sample mean to a known or hypothesized value |
| Independent two-sample t-test | Compare means from two unrelated groups |
| Welch's t-test | Two independent groups with unequal variances (preferred default) |
| Paired t-test | Compare means from the same subjects under two conditions |

## Applications by Sector

| Sector | Example |
|---|---|
| **Ecology** | Is mean shell length of *Mya arenaria* at an acidified site different from a reference population? Do two tidal marsh sites differ in mean bird abundance? |
| **Healthcare** | Is mean recovery time for patients on treatment A different from the population average? Are post-op outcomes different between two surgical approaches? |
| **Finance** | Is the mean return of a portfolio significantly different from a benchmark? Did a policy change shift average transaction values? |
| **Insurance** | Is the mean claim amount different between two policyholder segments? Did an intervention change average time-to-settlement? |

---

## Assumptions Checklist

Review these before running any t-test. Tests for each are run inline below.

**All t-tests:**
- [ ] Data are continuous (interval or ratio scale)
- [ ] Observations are independent (no repeated measures unless using paired test)
- [ ] No extreme outliers that would distort the mean

**One-sample & independent two-sample:**
- [ ] Data are approximately normally distributed *within each group* (less critical with n > 30 due to CLT)

**Independent two-sample (Student's):**
- [ ] Equal variances between groups (use Levene's or F-test to check; if violated, use Welch's instead)

**Paired t-test:**
- [ ] The *differences* between pairs are approximately normally distributed
- [ ] Pairs are meaningfully matched (same subject, same site, etc.)

> **Tip:** Welch's t-test does not assume equal variances and performs nearly as well as Student's when variances *are* equal. It is the safer default for two-sample comparisons.

---

## Setup

In [None]:
# ── Libraries ────────────────────────────────────────────────────────────────
library(tidyverse)    # data manipulation and ggplot2 visualization
library(ggpubr)       # publication-ready plots with stat annotations
library(car)          # Levene's test for equality of variances
library(effectsize)   # Cohen's d and other effect size measures
library(rstatix)      # tidy-friendly statistical tests

# ── Reproducibility ──────────────────────────────────────────────────────────
set.seed(42)

## Data

We use two built-in datasets:
- `sleep`: paired data — extra hours of sleep gained by patients on two drugs (n=10 per group)
- `iris`: independent groups — petal lengths for two *Iris* species

These are small, well-understood datasets. The same code structure applies directly to larger industry datasets.

In [None]:
# ── sleep dataset ─────────────────────────────────────────────────────────────
# extra: additional hours of sleep vs. control
# group: drug 1 or drug 2
# ID: patient identifier (paired)
head(sleep)

# ── iris subset: two species for independent samples example ──────────────────
iris_two <- iris %>%
  filter(Species %in% c("setosa", "versicolor"))

# Quick summaries
sleep %>% group_by(group) %>%
  summarise(n = n(), mean = mean(extra), sd = sd(extra))

iris_two %>% group_by(Species) %>%
  summarise(n = n(), mean = mean(Petal.Length), sd = sd(Petal.Length))

---

## Assumptions Testing

### Normality

We use the **Shapiro-Wilk test** (best for n < 50) and **Q-Q plots** (visual confirmation).  
- H₀: data are normally distributed  
- p > 0.05 → no evidence against normality  
- p ≤ 0.05 → evidence of non-normality → consider nonparametric alternative (see `nonparametric_tests.ipynb`)

> With n > 30, mild non-normality is generally acceptable due to the Central Limit Theorem. With small samples, normality matters more.

In [None]:
# ── Shapiro-Wilk: sleep data, by group ───────────────────────────────────────
sleep %>%
  group_by(group) %>%
  summarise(shapiro_p = shapiro.test(extra)$p.value)
# Interpretation: p > 0.05 for both groups → normality assumption met

# ── Shapiro-Wilk: iris petal lengths, by species ─────────────────────────────
iris_two %>%
  group_by(Species) %>%
  summarise(shapiro_p = shapiro.test(Petal.Length)$p.value)

# ── Q-Q plots: visual check ───────────────────────────────────────────────────
par(mfrow = c(1, 2))
qqnorm(sleep$extra[sleep$group == 1], main = "Q-Q: Drug 1")
qqline(sleep$extra[sleep$group == 1], col = "steelblue")
qqnorm(sleep$extra[sleep$group == 2], main = "Q-Q: Drug 2")
qqline(sleep$extra[sleep$group == 2], col = "steelblue")
par(mfrow = c(1, 1))
# Points should fall close to the diagonal line → normality supported

### Equality of Variances (for independent two-sample test only)

We use **Levene's test** (robust to non-normality, preferred over Bartlett's):  
- H₀: variances are equal across groups  
- p > 0.05 → variances are equal → Student's t-test is appropriate  
- p ≤ 0.05 → variances are unequal → use Welch's t-test (`var.equal = FALSE`)

In [None]:
# ── Levene's test: iris petal lengths ────────────────────────────────────────
car::leveneTest(Petal.Length ~ Species, data = iris_two)
# If p < 0.05: use Welch's (var.equal = FALSE) in the t-test below
# If p > 0.05: equal variances, Student's t-test is appropriate

# ── Visual check: boxplot with variance comparison ───────────────────────────
ggplot(iris_two, aes(x = Species, y = Petal.Length, fill = Species)) +
  geom_boxplot(alpha = 0.7) +
  geom_jitter(width = 0.1, alpha = 0.4) +
  labs(title = "Petal Length by Species",
       subtitle = "Visual check: are spreads (IQRs) similar?",
       y = "Petal Length (cm)") +
  theme_minimal() +
  theme(legend.position = "none")

---

## One-Sample T-Test

**Question:** Is the mean extra sleep in drug group 1 significantly different from zero?  
(i.e., does drug 1 have any effect at all?)

**H₀:** μ = 0 (drug has no effect)  
**H₁:** μ ≠ 0 (drug has an effect)

In [None]:
# ── One-sample t-test ─────────────────────────────────────────────────────────
drug1 <- sleep$extra[sleep$group == 1]

t.test(drug1, mu = 0)  # mu = hypothesized population mean

# ── Effect size: Cohen's d ────────────────────────────────────────────────────
# d = (sample mean - hypothesized mean) / SD
# Small: 0.2 | Medium: 0.5 | Large: 0.8
effectsize::cohens_d(drug1, mu = 0)

# ── Interpretation guide ──────────────────────────────────────────────────────
# t:  test statistic — larger absolute value = more evidence against H0
# df: degrees of freedom (n - 1)
# p-value: probability of observing this result if H0 were true
# 95% CI: plausible range for the true mean difference
# Report as: t(df) = X.XX, p = X.XX, d = X.XX, 95% CI [X.XX, X.XX]

---

## Independent Two-Sample T-Test (Welch's — recommended default)

**Question:** Is mean petal length different between *Iris setosa* and *I. versicolor*?  

**H₀:** μ₁ = μ₂  
**H₁:** μ₁ ≠ μ₂

> `var.equal = FALSE` is the default in R's `t.test()` and gives Welch's test.  
> Set `var.equal = TRUE` only if Levene's test confirmed equal variances *and* you have theoretical reason to assume equal population variances.

In [None]:
# ── Welch's two-sample t-test (default) ──────────────────────────────────────
t.test(Petal.Length ~ Species, data = iris_two, var.equal = FALSE)

# ── Student's t-test (equal variances assumed) ────────────────────────────────
# Only use if Levene's test was non-significant
t.test(Petal.Length ~ Species, data = iris_two, var.equal = TRUE)

# ── Effect size: Cohen's d ────────────────────────────────────────────────────
effectsize::cohens_d(Petal.Length ~ Species, data = iris_two)

# ── Visualization ─────────────────────────────────────────────────────────────
ggboxplot(iris_two, x = "Species", y = "Petal.Length",
          fill = "Species", palette = c("#4a8fff", "#4fffb0"),
          add = "jitter") +
  stat_compare_means(method = "t.test",
                     label = "p.format",
                     label.x = 1.4) +
  labs(title = "Petal Length by Species",
       subtitle = "Welch's two-sample t-test",
       y = "Petal Length (cm)") +
  theme(legend.position = "none")

---

## Paired T-Test

**Question:** Does drug 2 produce significantly more extra sleep than drug 1 in the *same patients*?

**H₀:** mean difference = 0  
**H₁:** mean difference ≠ 0  

> The paired test is more powerful than the independent test when data are matched, because it removes between-subject variability. Use it whenever measurements come from the same subject, site, or unit under two conditions.

In [None]:
# ── Reshape to wide format for clarity ───────────────────────────────────────
sleep_wide <- sleep %>%
  pivot_wider(names_from = group,
              values_from = extra,
              names_prefix = "drug_")
head(sleep_wide)

# ── Check normality of DIFFERENCES (not raw values) ──────────────────────────
differences <- sleep_wide$drug_2 - sleep_wide$drug_1
shapiro.test(differences)

# ── Paired t-test ─────────────────────────────────────────────────────────────
t.test(sleep_wide$drug_2, sleep_wide$drug_1, paired = TRUE)
# Equivalent formula interface:
# t.test(extra ~ group, data = sleep, paired = TRUE)

# ── Effect size ───────────────────────────────────────────────────────────────
effectsize::cohens_d(sleep_wide$drug_2, sleep_wide$drug_1, paired = TRUE)

# ── Visualization: paired lines plot ─────────────────────────────────────────
ggpaired(sleep, x = "group", y = "extra",
         id = "ID",
         fill = "group",
         palette = c("#4a8fff", "#4fffb0"),
         line.color = "gray60",
         line.size = 0.5) +
  stat_compare_means(paired = TRUE,
                     label = "p.format",
                     label.x = 1.4) +
  labs(title = "Extra Sleep by Drug",
       subtitle = "Paired t-test — same patients, two conditions",
       y = "Extra Sleep (hours)",
       x = "Drug") +
  theme(legend.position = "none")

---

## Reporting Results

Standard APA-style reporting format:

In [None]:
# ── Extract and format results cleanly ────────────────────────────────────────
result <- t.test(extra ~ group, data = sleep, paired = TRUE)
d      <- effectsize::cohens_d(sleep_wide$drug_2, sleep_wide$drug_1, paired = TRUE)

cat(sprintf(
  "Paired t-test: t(%d) = %.2f, p = %.3f, d = %.2f, 95%% CI [%.2f, %.2f]\n",
  result$parameter,
  result$statistic,
  result$p.value,
  d$Cohens_d,
  result$conf.int[1],
  result$conf.int[2]
))

# ── Effect size interpretation ────────────────────────────────────────────────
# |d| < 0.2  : negligible
# |d| = 0.2  : small
# |d| = 0.5  : medium
# |d| >= 0.8 : large

---

## Common Pitfalls

**1. Using Student's t-test without checking equal variances**  
Default to Welch's (`var.equal = FALSE`). It is robust when variances are equal and correct when they are not.

**2. Reporting only p-values without effect sizes**  
A p-value tells you whether a difference exists; Cohen's d tells you whether it matters. With large n, trivially small differences will be statistically significant. Always report both.

**3. Forgetting that the paired test requires normality of *differences*, not raw values**  
Run `shapiro.test()` on the difference vector, not on each group separately.

**4. Using a two-sample test on paired data**  
This discards the pairing structure and loses statistical power. If data are matched (same subject pre/post, same patient on two treatments), always use the paired test.

**5. Multiple comparisons**  
If running t-tests across many groups, type I error accumulates. Use ANOVA with post-hoc correction (see `anova.ipynb`) or apply Bonferroni/FDR correction.

**6. Confusing one-tailed vs. two-tailed tests**  
Two-tailed is the default and appropriate for most cases. One-tailed is only justified if you had a directional hypothesis *before* seeing the data. In industry contexts, two-tailed is almost always the right choice.

---

## Nonparametric Alternatives

If normality assumptions are violated and n is small:

| T-Test | Nonparametric Equivalent | R Function |
|---|---|---|
| One-sample | Wilcoxon signed-rank | `wilcox.test(x, mu = 0)` |
| Independent two-sample | Mann-Whitney U | `wilcox.test(y ~ group)` |
| Paired | Wilcoxon signed-rank | `wilcox.test(x, y, paired = TRUE)` |

See `nonparametric_tests.ipynb` for full implementations.

---
*r_methods_library · Samantha McGarrigle · [github.com/samantha-mcgarrigle](https://github.com/samantha-mcgarrigle)*