# Springboard Data Analytics Assignment

**Author:** Marcella Morgan
**Image Credit:** Images generated with the help of ChatGPT (OpenAI).
![Lady tasting tea + normal curve](images/lady_drinking_tea.png)


## Introduction  

In this assignment I’m working through four problems that all use simulation to explore different ideas in statistics.  

1. **Lady Tasting Tea (Extended):**  
   I’ll extend the classic experiment to 12 cups instead of 8, simulate the chance of guessing correctly, and compare the results to the original setup.  

2. **Normal Distribution:**  
   I’ll generate lots of small samples from a normal distribution and compare the sample standard deviation (ddof=1) with the population version (ddof=0). The goal is to see the difference on histograms and think about what happens with bigger samples.  

3. **t-Tests and Type II Errors:**  
   Here I’ll run simulations of t-tests while changing the difference in means. I’ll measure how often the test fails to reject the null when it should (type II errors) and see how that changes as the effect gets stronger.  

4. **ANOVA vs t-Tests:**  
   I’ll generate three groups with different means and compare running one ANOVA versus doing three separate t-tests. The point is to see why ANOVA is better when looking at more than two groups.  

Overall, the assignment is about practicing simulation, interpreting results, and understanding why we choose one statistical test over another.  


## Problem 1: Lady Testing Tea

In this section, we replicate and extend the classic Lady Tasting Tea experiment. The original design had 8 cups (4 tea-first, 4 milk-first). By chance, the probability of guessing correctly was small but non-negligible. Here, we increase the challenge by preparing 12 cups (8 tea-first and 4 milk-first). We simulate this setup repeatedly by shuffling the cups, recording how often the participant could identify all cups correctly purely by guessing. We then compare the estimated probability with the original experiment, interpret its implications for statistical power, and reflect on whether one might reasonably tighten or relax the significance threshold (p-value) when moving from the original 8-cup design to this 12-cup version.



### Running the experiment

We'll start by running the experiment with 8 cups tea-first and 4 cups milk first.

In [8]:
import numpy as np
import math
import itertools
import random
import matplotlib.pyplot as plt


# Number of cups of tea in total.
no_cups = 8

# Number of cups of tea with milk in first.
no_cups_milk_first = 4

# Number of cups of tea with tea in first.
no_cups_tea_first = 4

# Number of ways of selecting four cups from eight.
ways = math.comb(no_cups, no_cups_milk_first)

# Show.
ways

70

So the lady has a 1 in 70 chance of randomly choosing the milk first teas

In [9]:
# Number of cups of tea in total.
no_cups = 12

# Number of cups of tea with milk in first.
no_cups_milk_first = 8

# Number of cups of tea with tea in first.
no_cups_tea_first = 4

# Number of ways of selecting four cups from eight.
ways = math.comb(no_cups, no_cups_milk_first)

# Show.
ways

495

So the lady has a 1 in 495 chance of selecting the milk first tea by chance - so likely with this experiment she can tell

## Problem 2: 

This task demonstrates the distinction between sample standard deviation (unbiased estimator) and population standard deviation (biased but lower variance). We generate a very large number of small samples (100,000 samples of size 10) from a standard normal distribution. For each, we calculate the standard deviation using two different definitions:

ddof=0: divides by n, appropriate for a full population.

ddof=1: divides by n-1, correcting bias when estimating from a sample.

We then plot histograms of these results on the same axes, using transparency to highlight differences. The visualization should reveal a small but systematic shift between the two. We conclude by discussing how increasing the sample size would shrink this difference, illustrating the consistency of estimators.

## Problem 3:

In this section, we investigate type II errors, which occur when a test fails to reject the null hypothesis despite there being a true difference. We systematically vary the mean difference (d) between two normal distributions, from 0 to 1.0 in steps of 0.1. For each value of d:

Draw two samples of size 100 (one from N(0,1), the other from N(d,1)).

Perform an independent two-sample t-test, using a 5% significance level.

Repeat the process 1,000 times and record the proportion of times the null hypothesis was not rejected.

We then plot type II error rates against effect size, showing the classic trade-off: small differences are harder to detect, while large differences result in low type II error. This simulation reinforces the importance of effect size and sample size in determining test power.

## Problem 4:

In this task, we simulate a classic comparison of statistical approaches for multiple group testing. We generate three independent samples (each size 30) from normal distributions with means 0, 0.5, and 1 (standard deviation fixed at 1). Two approaches are then applied:

One-way ANOVA: tests the null hypothesis that all three group means are equal in a single, global test.

Multiple independent t-tests: three pairwise comparisons (1 vs 2, 1 vs 3, 2 vs 3).

We compare the results and discuss why ANOVA is typically preferred: it provides a unified test that controls type I error across multiple comparisons. Running several t-tests increases the risk of false positives (familywise error rate), making ANOVA more robust and interpretable for multi-group scenarios.

# END