#Top

[![Open in GitHub](https://img.shields.io/badge/Open%20Folder%20in-GitHub-181717?logo=github&logoColor=white)](https://github.com/lindsayalexandra14/ds_portfolio/tree/main/1_projects/statistical_analysis/a_b_testing/large_sample)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/14z04b4n0SpjGhuju6nHQ2R0i2_6OdII8#scrollTo=REWlVp6lIpYJ)

![Alt text](https://github.com/lindsayalexandra14/ds_portfolio/raw/main/2_images/templates/notebook/headers/beige/AB%20Testing%20Large%20Sample%20Size.png)

**test**
*   This hypothetical experiment tests two Landing Pages (control vs. treatment)
*   The sample size is 32,000 users
*   I will use Z-Test for Two Proportions, which is good for large sample sizes
*   I am trying to prove that the treatment performed better than the control because the team is interested in moving forward with the treatment
  
**result**  
*  It was established from the test that the treatment performed better with significance (at alpha=0.05). The treatment has a higher conversion rate than the control by at least 1.71%+ points. The practical significance is very small (cohen's h = 0.06) but the cost of implementing the treatment is low and due to the high volume, there will be high business impact and the revenue will outweigh the cost. The result had the full desired statistical power (>80%)
  
**recommendation**
*  Due to the significance, power, and high business impact, I will recommend moving forward with implementing the treatment

**tl;dr for results**

*   Skip to "Results Summary" at the end





#Setup

![Alt text](https://github.com/lindsayalexandra14/ds_portfolio/raw/main/2_images/templates/notebook/headers/beige/Setup.png)

In [None]:
install.packages('pwr')
install.packages('glue')

In [3]:
library(pwr)
library(glue)

#Test Design

![Alt text](https://github.com/lindsayalexandra14/ds_portfolio/raw/main/2_images/templates/notebook/headers/beige/Test%20Design.png)

In [9]:
#Parameters
alpha <- 0.05            # Significance level
power <- 0.80            # Statistical power (Probability of detecting an effect when it exists; 0.8 is standard)
control=0.14             # Baseline rate
effect <- 0.05           # Desired relative effect (e.g., 5% lift over baseline)
mde <- control * effect   # Minimum Detectable Effect (MDE)
  # Minimum difference you want to detect in absolute terms
treatment= control + mde  #Treatment rate (includes effect)
cat(sprintf("Test Design:\n"))
cat("\n")
cat(paste('Control:', control, "\n"))
cat(paste('Treatment:', treatment, "\n"))

#Hypothesis
p_1=treatment
p_2=control
p1_label = "Treatment"
p2_label = "Control"

alternative = "greater" # in reference to p1:
# A one-sided test is used when you want to test if one group performs
# specifically better or worse than the other (a directional hypothesis).
# p1 is "greater" than p2
# p1 is "less" than p2
# p1 is different from ()"two.sided" p2

hypothesis <- switch(alternative,
  greater = sprintf("%s (%.4f) is greater than %s (%.4f)", p1_label, p_1, p2_label, p_2),
  less = sprintf("%s (%.4f) is less than %s (%.4f)", p1_label, p_1, p2_label, p_2),
  two.sided = sprintf("%s (%.4f) is different from %s (%.4f)", p1_label, p_1, p2_label, p_2),
)

cat("Hypothesis:",hypothesis)

# Effect size (Cohen's h: standardized effect size for proportions)

effect_size = ES.h(treatment, control)

interpret_effect_size <- function(effect_size) {
  if (abs(effect_size) < 0.2) return("negligible")
  if (abs(effect_size) < 0.5) return("small")
  if (abs(effect_size) < 0.8) return("medium")
  return("large")
}


cat(sprintf("\nMinimum Detectable Effect (MDE): %.3f %s\n", mde,interpret_effect_size(effect_size)))
cat(sprintf("Effect Size (Cohen's h): %.3f\n", effect_size))

#Sample Size

# determine the minimum number of samples for each group

# pwr.2p.test requires inputting the effect size
result1 <- pwr.2p.test(h=effect_size, sig.level=alpha, power=power,alternative=alternative)

# Inputting effect
cat(paste("Sample Size: (alternative)", alternative, ": n =", round(result1$n)), "\n")

Test Design:

Control: 0.14 
Treatment: 0.147 
Hypothesis: Treatment (0.1470) is greater than Control (0.1400)
Minimum Detectable Effect (MDE): 0.007 negligible
Effect Size (Cohen's h): 0.020
Sample Size: (alternative) greater : n = 31011 


Cohen's h benchmarks:

0.2 = small effect

0.5 = medium effect

0.8 = large effect

If the effect is tiny, it will require a very large sample size to detect.

#Results

![Alt text](https://github.com/lindsayalexandra14/ds_portfolio/raw/main/2_images/templates/notebook/headers/beige/Results.png)

In [23]:
cat("Results:")
cat("\n")

# Data
n_observations_control <- 32000
n_observations_treatment <- 32050

conversions_control <- 4300
conversions_treatment <- 5000

n1 <- n_observations_treatment
n2 <- n_observations_control

# print(p1_label) # set above in test design
# print(p2_label)

# Conversion Rates
conv_rate_control = (conversions_control / n_observations_control)
conv_rate_treatment = (conversions_treatment / n_observations_treatment)

p1=conv_rate_treatment #assign p1 vs. p2, test alternative references p1
p2=conv_rate_control

c1=conversions_treatment
c2=conversions_control

n1=n_observations_treatment
n2=n_observations_control

cat("\n")
print(glue("p1: ","{p1_label} Conversion Rate: {round(p1 * 100, 2)}%"))
print(glue("p2: ","{p2_label} Conversion Rate: {round(p2 * 100, 2)}%"))

# Result Hypothesis
result_hypothesis <- switch(alternative,
  greater = sprintf("%s (%.4f) is greater than %s (%.4f)", p1_label, p1, p2_label, p2),
  less = sprintf("%s (%.4f) is less than %s (%.4f)", p1_label, p1, p2_label, p2),
  two.sided = sprintf("%s (%.4f) is different from %s (%.4f)", p1_label, p1, p2_label, p2),
)
cat("\n")
cat("Result Hypothesis:",result_hypothesis)
cat("\n")
cat("\n")

# Uplift
uplift = (p1 - p2) / p2

# Absolute Difference
abs_diff = abs(p1 - p2)

# Cohen's h function
proportion_effectsize <- function(p1, p2) {
  2 * asin(sqrt(p1)) - 2 * asin(sqrt(p2))
}

h <- proportion_effectsize(p1, p2)

# Interpret effect size
interpret_h <- function(h) {
  if (abs(h) < 0.2) return("negligible")
  if (abs(h) < 0.5) return("small")
  if (abs(h) < 0.8) return("medium")
  return("large")
}
cat(sprintf("Absolute difference: %.3f (%.1f%%)\n", abs_diff, abs_diff * 100))
print(glue("Uplift: {round(uplift * 100, 2)}%"))
cat(sprintf("Cohen's h: %.3f\n", h))
cat(sprintf("Effect size interpretation: %s\n", interpret_h(h)))

#z-test
x <- c(c1, c2)  # successes
n <- c(n1, n2)  # totals

# Run two-proportion test
# Correction not needed with large sample size
test_result <- prop.test(x = x, n = n, alternative = alternative, correct = FALSE)

print(test_result)

# Extract p-value from result
p_value <- test_result$p.value

print(sprintf("p-value: %.4f", p_value))

# Extract confidence interval (for the difference in proportions):
conf_int <- test_result$conf.int

print(paste((1-alpha)*100,"%","confidence interval: [",conf_int[1],",",conf_int[2],']'))
cat("\n")

# Confidence interval for conversion rate:
se_p1 <- sqrt(p1 * (1 - p1) / n1)
lower_ci_p1 <- p1 - 1.96 * se_p1
upper_ci_p1 <- p1 + 1.96 * se_p1

cat(sprintf("%s 95%% CI: %.4f to %.4f\n",p1_label, lower_ci_p1, upper_ci_p1))

se_p2 <- sqrt(p2 * (1 - p2) / n2)
lower_ci_p2 <- p2 - 1.96 * se_p2
upper_ci_p2 <- p2 + 1.96 * se_p2

cat(sprintf("%s 95%% CI: %.4f to %.4f\n",p2_label, lower_ci_p2, upper_ci_p2))

# Effective sample size (harmonic mean for unequal n)
n_effective <- (2 * n1 * n2) / (n1 + n2)

# Calculate power
power_result <- pwr.2p.test(h = h, n = n_effective, sig.level = alpha, alternative = alternative)
print(power_result)

# Extract the power
power_pct <- round(power_result$power * 100, 1)

cat("Result Power:", power_pct, "%\n\n")



Results:

p1: Treatment Conversion Rate: 15.6%
p2: Control Conversion Rate: 13.44%

Result Hypothesis: Treatment (0.1560) is greater than Control (0.1344)

Absolute difference: 0.022 (2.2%)
Uplift: 16.1%
Cohen's h: 0.061
Effect size interpretation: negligible

	2-sample test for equality of proportions without continuity correction

data:  x out of n
X-squared = 60.366, df = 1, p-value = 3.938e-15
alternative hypothesis: greater
95 percent confidence interval:
 0.01705418 1.00000000
sample estimates:
   prop 1    prop 2 
0.1560062 0.1343750 

[1] "p-value: 0.0000"
[1] "95 % confidence interval: [ 0.0170541815700391 , 1 ]"

Treatment 95% CI: 0.1520 to 0.1600
Control 95% CI: 0.1306 to 0.1381

     Difference of proportion power calculation for binomial distribution (arcsine transformation) 

              h = 0.06144038
              n = 32024.98
      sig.level = 0.05
          power = 1
    alternative = greater

NOTE: same sample sizes

Result Power: 100 %



# Results Summary

![Alt text](https://github.com/lindsayalexandra14/ds_portfolio/raw/main/2_images/templates/notebook/headers/beige/Results%20Summary.png)

***performance:***.
With 95% confidence, the treatment has a higher conversion rate than the control by at least 1.71%+ points (based on the lower bound CI of 0.0171). This supports the hypothesis that treatment is better than control.

***significance:***.
Because the p-value (0.000) is less than alpha (0.050), and the 95% confidence interval for the difference does not contain 0, this result is statistically significant at the 95% confidence level. The practical significance is low (cohen's h = 0.06) but the business impact is high based on domain knowledge.

***power:***.
Our test was adequately powered (e.g., ~100% power vs. 80% desired), meaning we had a strong chance of detecting a true difference if one existed.

If you repeated your experiment or data collection many times under the same conditions, then 95% of those calculated confidence intervals would contain the true population conversion rate

#Recommendation

![Alt text](https://github.com/lindsayalexandra14/ds_portfolio/raw/main/2_images/templates/notebook/headers/beige/Recommendation.png)

Due to the significance, power, and high business impact, I will recommend moving forward with implementing the treatment

