# Class 1 Group Exercise: Analyzing the Effects of Vitamin C on Tooth Growth

## Background
The `ToothGrowth` dataset contains results from an experiment studying the effect of vitamin C on tooth growth in guinea pigs. Each animal received one of three dose levels of vitamin C (0.5, 1, or 2 mg/day) by one of two delivery methods: orange juice (OJ) or ascorbic acid (VC).

## Your Task
Working in groups, perform a comprehensive statistical analysis to answer: **Does delivery method affect tooth growth, and what sample size would we need for a follow-up study?**

This exercise combines skills from:
- Data manipulation and visualization (GettingStarted)
- Hypothesis testing (HypothesisTesting)
- Correlation analysis (Correlation)
- Power analysis (PowerAnalysis)

## Part 1: Data Exploration (15 minutes)

1. Load the `ToothGrowth` dataset and examine its structure
2. Calculate summary statistics (mean, sd) for tooth length by supplement type AND dose
3. Create a boxplot showing tooth length by supplement type, faceted by dose
4. Create a scatter plot of dose vs length, colored by supplement type

In [None]:
library(ggplot2)
library(dplyr)

# Load data
data("ToothGrowth")
head(ToothGrowth)

# Your code here: Summary statistics by group


# Your code here: Boxplot


# Your code here: Scatter plot

## Part 2: Normality Assessment (10 minutes)

Before performing t-tests, we need to check if our data is approximately normally distributed.

1. Create histograms of tooth length for each supplement type
2. Create Q-Q plots for each supplement type
3. Perform Shapiro-Wilk tests for normality on each group
4. Based on your findings, is it appropriate to use a t-test?

In [None]:
# Separate data by supplement type
oj_data <- ToothGrowth[ToothGrowth$supp == "OJ", "len"]
vc_data <- ToothGrowth[ToothGrowth$supp == "VC", "len"]

# Your code here: Histograms


# Your code here: Q-Q plots


# Your code here: Shapiro-Wilk tests


## Part 3: Hypothesis Testing (15 minutes)

Test whether there is a significant difference in tooth length between delivery methods.

1. State your null and alternative hypotheses
2. Perform a two-sample t-test comparing OJ vs VC (overall, ignoring dose)
3. Perform t-tests for each dose level separately (0.5, 1.0, 2.0 mg)
4. Interpret the results - at which dose levels is there a significant difference?

In [None]:
# Overall t-test: OJ vs VC
t.test(len ~ supp, data = ToothGrowth)

# Your code here: T-test for dose = 0.5


# Your code here: T-test for dose = 1.0


# Your code here: T-test for dose = 2.0


## Part 4: Correlation Analysis (10 minutes)

Examine the relationship between dose and tooth length.

1. Calculate the Pearson correlation between dose and length for each supplement type
2. Is the correlation stronger for OJ or VC?
3. Create a scatter plot with regression lines for each supplement type
4. What does this tell you about the dose-response relationship?

In [None]:
# Correlation for OJ
oj_subset <- ToothGrowth[ToothGrowth$supp == "OJ", ]
cor.test(oj_subset$dose, oj_subset$len)

# Your code here: Correlation for VC


# Your code here: Scatter plot with regression lines


## Part 5: Power Analysis (10 minutes)

Plan a follow-up study to confirm the most interesting finding.

1. Using the dose level with the largest difference between OJ and VC, calculate Cohen's d effect size
2. What is the statistical power of the original study (n=10 per group) to detect this effect?
3. How many guinea pigs per group would you need for 80% power? For 90% power?
4. If budget limits you to 15 animals per group, what power would you achieve?

In [None]:
library(effsize)
library(pwr)

# Filter to dose with largest difference (hint: try dose = 1.0)
dose1_oj <- ToothGrowth[ToothGrowth$supp == "OJ" & ToothGrowth$dose == 1.0, "len"]
dose1_vc <- ToothGrowth[ToothGrowth$supp == "VC" & ToothGrowth$dose == 1.0, "len"]

# Your code here: Calculate Cohen's d


# Your code here: Power of original study


# Your code here: Sample size for 80% and 90% power


# Your code here: Power with n=15


## Group Discussion Questions

1. **Main Finding:** Based on your analysis, does delivery method (OJ vs VC) affect tooth growth? Does the answer depend on dose?

2. **Clinical Significance:** Even if statistically significant, is the difference in tooth length clinically meaningful? What additional information would you need?

3. **Study Design:** If you were designing a follow-up study:
   - What sample size would you recommend and why?
   - Would you test all dose levels or focus on specific ones?
   - What potential confounding variables should be controlled?

4. **Limitations:** What are the limitations of this analysis? Consider:
   - Multiple comparisons problem
   - Generalizability from guinea pigs to humans
   - Other statistical approaches that might be more appropriate

In [None]:
# Space for your group's notes and conclusions
