Skip to content

josephwang-ds/AB-Testing

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

A/B Test Analysis — Cookie Cats Gate Placement & Player Retention

Can moving a game gate from level 30 to level 40 improve retention? The answer depends on who your players are.

This project goes beyond a standard A/B test. Instead of a single global significance test, it examines heterogeneous treatment effects across player engagement segments — and simulates a segment-based rollout policy that outperforms any one-size-fits-all decision.


Business Question

The Cookie Cats mobile game team moved a progression gate from level 30 → level 40. The hypothesis: a later gate reduces friction, keeping players engaged longer.

But global metrics can hide the real story. This analysis asks:

  • Does the gate move actually improve Day 7 retention overall?
  • Who benefits — and who doesn't?
  • Can a smarter rollout policy beat both options?

Key Findings

Metric gate_30 gate_40 Δ Significant?
D1 Retention 44.8% 44.2% −0.6pp No (p = 0.07)
D7 Retention 19.0% 18.2% −0.8pp Yes (p = 0.016)

Headline: Moving the gate to level 40 slightly hurts Day 7 retention overall. But this global result masks a clear segmentation pattern.

Heterogeneous Effects by Engagement Band

Engagement Gate 30 D7 Gate 40 D7 Winner
Light (Q1) 8.1% 7.6% gate_30
Casual (Q2) 15.3% 14.9% gate_30
Engaged (Q3) 24.7% 25.1% gate_40
Power (Q4) 38.2% 39.6% gate_40

Light and casual players retain better with the earlier gate. Engaged and power players benefit from the later one.

Policy Simulation (per 100,000 installs)

Policy Expected D7 Retained
Global gate_30 18,980
Global gate_40 18,190
Segment-based (Q1-Q2 → gate_30, Q3-Q4 → gate_40) 19,640

A segment-aware policy retains 660 more players per 100k installs than the best single gate — a ~3.5% lift with no additional cost.


Analysis Structure

notebook/
├── 01_eda_retention.ipynb      # Distributions, engagement features, retention patterns
├── 02_abtest_core.ipynb        # z-tests + bootstrap CI for D1 and D7 retention
└── 03_advanced_segments.ipynb  # Heterogeneous effects, logistic regression, policy simulation

Methods Used

  • Two-proportion z-test for global significance
  • Bootstrap confidence intervals for robustness check
  • Logistic regression with interaction terms (version × log_rounds)
  • Policy simulation comparing global vs segment-based rollout

Why This Matters

Most A/B test readouts stop at "significant vs not significant." This analysis shows:

  1. Global metrics can be misleading — the same treatment hurts light users and helps power users
  2. Segment-based policies outperform global decisions without requiring more experiments
  3. The right question isn't "which gate is better?" but "which gate is better for whom?"

This framework applies directly to pricing decisions, feature rollouts, and recommendation systems.


Quick Start

# Using conda
conda env create -f environment.yml
conda activate abtest-cookiecats

# Or pip
pip install -r requirements.txt

Run notebooks in order: 010203

Dataset: Cookie Cats A/B Test — Kaggle


Author: Joseph Wang · josephjwang.com · GitHub

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors