# Experiment Pattern 0 - Simple A/A

This is the first and simplest experiment pattern, demonstrating a successful experiment that does nothing (is an A/A) and furthermore does not demonstrate any analysis capabilities.

In addition, each visitor only shows up once, and there are no repeat visitors. 

In [2]:
from   datetime import datetime
from   growthbook import GrowthBook
import itertools
import pandas as pd
from scipy import stats

We utilize three main datasets:
* The `visit_log` covers the basics of each visit that occurs.
* The `exposure_log` covers the logging of when an explicit **exposure** occurs.
* The `metrics_log` covers the measurement of what happens as the users visit.

In [3]:
visit_log    = []
exposure_log = []
metrics_log  = []

def add_to_visit_log(user_id):
  visit_log.append((user_id, datetime.now()))

def add_to_exposure_log(user_id, variant_name):
  exposure_log.append((user_id, variant_name, datetime.now()))

This is fairly boilerplate code that covers the core usage of [GrowthBook](https://growthbook.io).

In [4]:
def on_experiment_viewed(experiment, result):
  add_to_exposure_log(result.hashValue, result.name)

gb = GrowthBook(
  api_host = "https://cdn.growthbook.io",
  client_key = "sdk-dznJ5SXyna94Omdi",
  on_experiment_viewed = on_experiment_viewed
)

gb.load_features()

This is our simulation code. 

In [5]:
def simulate_visit(user_id, **kwargs):
  gb.set_attributes({
    "id": user_id
  })

  feature_variant_1 = None

  add_to_visit_log(user_id)
    
  if gb.is_on("variant-1"):
    feature_variant_1 = True
  else:
    feature_variant_1 = False 

def experiment_000_basic_aa(n_trials):
  for i in range(n_trials):
    simulate_visit(f"user_{i}")

In [7]:
experiment_000_basic_aa(100000)

## Our Main Insight - Balance Check

One of the most important health checks of an experiment is the **Sample Ratio Mismatch** check, which checks that the observed balance of your experiment matches the configured balance of your experiment. 

This is a classic use-case for the [Chi-Squared Test](https://en.wikipedia.org/wiki/Pearson%27s_chi-squared_test).

In [14]:
def balance_check(df, variant_percentages):
    df_agg = df.groupby("variant_name").agg(
        user_count = pd.NamedAgg(column = "user_id", aggfunc = "nunique")).reset_index()
    df_agg["variant_percentage"] = [variant_percentages[x] for x in df_agg["variant_name"]]
    
    
    
    pairwise_checks = []
    for pair in itertools.combinations(df_agg["variant_name"], 2):
        df_tmp = df_agg[(df_agg["variant_name"] == pair[0]) | (df_agg["variant_name"] == pair[1])]
        
        total_user_count         = df_tmp["user_count"].sum()
        total_variant_percentage = df_tmp["variant_percentage"].sum()
        df_tmp["expected_count"] = [total_user_count * x / total_variant_percentage for x in df_tmp["variant_percentage"]]
        
        chisq_results = stats.chisquare(f_obs = df_tmp["user_count"], f_exp = df_tmp["expected_count"])
        pairwise_checks.append((pair[0], pair[1], chisq_results.pvalue, "PASS" if chisq_results.pvalue > 0.05 else "FAIL"))

    return(pd.DataFrame(data = pairwise_checks, columns = ("variant_name_1", "variant_name_2", "p_value", "pass_fail")))

In [15]:
df_exposure_log = pd.DataFrame(data = exposure_log, columns = ("user_id", "variant_name", "exposure_timestamp"))
balance_check(df_exposure_log, {"Control": 0.1, "Variation 1": 0.1})

Unnamed: 0,variant_name_1,variant_name_2,p_value,pass_fail
0,Control,Variation 1,0.815904,PASS
