In [None]:
#FROM : https://www.statology.org/central-limit-theorem-in-r/

#make this example reproducible
set.seed(0)

#create random variable with sample size of 1000 that is uniformally distributed
data <- runif(n=1000, min=2, max=6)

#create histogram to visualize distribution of turtle shell widths
hist(data, col='steelblue', main='Histogram of Turtle Shell Widths')

In [2]:
#create empty vector to hold sample means
sample5 <- c()

#take 1,000 random samples of size n=5
n = 1000
for (i in 1:n){
sample5[i] = mean(sample(data, 5, replace=TRUE))
}

In [None]:
#calculate mean and standard deviation of sample means
mean(sample5)

sd(sample5)

In [None]:
#create histogram to visualize sampling distribution of sample means
hist(sample5, col ='steelblue', xlab='Turtle Shell Width', main='Sample size = 5')

In [None]:
#create empty vector to hold sample means
sample30 <- c()

#take 1,000 random samples of size n=30
n = 1000
for (i in 1:n){
sample30[i] = mean(sample(data, 30, replace=TRUE))
}

#create histogram to visualize sampling distribution of sample means
hist(sample30, col ='steelblue', xlab='Turtle Shell Width', main='Sample size = 30')

In [None]:
#calculate mean and standard deviation of sample means
mean = mean(sample30)

sd = sd(sample30)

In [None]:
library(ggplot2)

set.seed(22032022) 

# generate random numbers 
randomNumbers <- rnorm(n=1000, mean=10, sd=10)

randomNumbers_mean <- mean(randomNumbers)
randomNumbers_sd <- sd(randomNumbers)

ggplot(data.frame(randomNumbers = randomNumbers), aes(randomNumbers)) +
  geom_histogram(aes(
    fill = ifelse(
      randomNumbers > randomNumbers_mean + randomNumbers_sd |
        randomNumbers < randomNumbers_mean - randomNumbers_sd,
      "Outside range",
      "Within range"
    )
  ), 
  binwidth = 1, col = "gray") +
  geom_density(aes(y = after_stat(count))) + 
  stat_function(fun = function(x) dnorm(x, mean = 10, sd = 10) * 1000,
                color = "blue") +
  labs(fill = "Data")

## In-Class Exercise: Central Limit Theorem with Exponential Distribution

The example above used a uniform distribution. Now demonstrate the CLT using an **exponential distribution**, which is highly skewed.

1. Generate 1000 random values from an exponential distribution with rate = 0.5 using `rexp(n=1000, rate=0.5)`
2. Create a histogram of the raw data - notice it's right-skewed
3. Take 1000 random samples of size n=5 and calculate the mean of each sample
4. Plot the distribution of these sample means
5. Repeat step 3-4 with sample size n=30
6. Compare the two distributions - how does increasing sample size affect the shape?

**Questions to answer:**
- What is the theoretical mean of an exponential distribution with rate=0.5? (Hint: mean = 1/rate)
- How close are your sample means to this theoretical value?
- At what sample size does the distribution of means start looking approximately normal?

In [None]:
set.seed(42)  # For reproducibility

# Step 1: Generate exponential data
exp_data <- rexp(n=1000, rate=0.5)

# Step 2: Histogram of raw data
hist(exp_data, col='coral', main='Exponential Distribution (rate=0.5)')

# Your code here: Steps 3-6