<a href="https://colab.research.google.com/github/sheniabosch/sql_business_exploration/blob/main/SB_ENIAC_case_study_CHI_SQUARE.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Libraries

In [None]:
import pandas as pd
from scipy.stats import chi2_contingency

#1. Data for experiment

##1.1 importing Data

In [None]:
url = 'https://drive.google.com/file/d/18mPEn1pbX8wxwjlSHgJbV9p5ZAnXC7t9/view?usp=sharing'
path = 'https://drive.google.com/uc?export=download&id='+url.split('/')[-2]
eniac_a = pd.read_csv(path)

url = 'https://drive.google.com/file/d/1WCzwDDjATFsR_ZY_y3wwBy9T_PtWB6l6/view?usp=sharing'
path = 'https://drive.google.com/uc?export=download&id='+url.split('/')[-2]
eniac_b = pd.read_csv(path)

url = 'https://drive.google.com/file/d/1s_XE9YLhuSN9phcukmFPjS5QxqRC5IFM/view?usp=sharing'
path = 'https://drive.google.com/uc?export=download&id='+url.split('/')[-2]
eniac_c = pd.read_csv(path)

url = 'https://drive.google.com/file/d/1eOPqJBKT0AHyBpPgEGCJszlDf9ZUrk_0/view?usp=sharing'
path = 'https://drive.google.com/uc?export=download&id='+url.split('/')[-2]
eniac_d = pd.read_csv(path)

#2. Experiment

Since we have 4 samples and the data is categorical ("click" or "no click"), we are going to check if our tests resault statistically significant or not with Chi-swuared test.

## 2.1 Stating the Null and the Alternative Hypotheses.

$H_o$ : The 4 versions of the button are equally likely to receive clicks, and the observed differences are due to chance.

$H_1$ : The observed differences are not due to chance: there is at least one version that got so many more/much less clicks than the others that this can hardly be explained just by chance

##2.2 Defining the alpha.

Since making type | error is less important than making type || error in our case, we are going to wide our threshold here with being 90% confident.

In [None]:
alpha = 0.05

##2.3 Collecting Data

In [None]:
visit_a = 25326
click_a = eniac_a['No. clicks'].loc[eniac_a['Name'] == 'SHOP NOW'].squeeze()
no_click_a = visit_a - click_a

visit_b = 24747
click_b = eniac_b['No. clicks'].loc[eniac_b['Name'] == 'SHOP NOW'].squeeze()
no_click_b = visit_b - click_b

visit_c = 24876
click_c = eniac_c['No. clicks'].loc[eniac_c['Name'] == 'SEE DEALS'].squeeze()
no_click_c = visit_c - click_c

visit_d = 25233
click_d = eniac_d['No. clicks'].loc[eniac_d['Name'] == 'SEE DEALS'].squeeze()
no_click_d = visit_d - click_d


observed_data = pd.DataFrame([[click_a, click_b, click_c, click_d],[no_click_a, no_click_b, no_click_c, no_click_d]], columns=['Ver_A', 'Ver_B', 'Ver_C', 'Ver_D'], index=['Click', 'No_Click'])

##2.4 Calculating Test Resualt

In [None]:
chisq, pvalue, df, excepted = chi2_contingency(observed_data)

##2.5 Interpreting the test resault

In [None]:
if pvalue < alpha:
  print('we are rejecting the null hypothesis, CTR of 4 versions are not equal, and the differences is not by chance.')
else:
  print('we dont have enough evidence to reject the null hypothesis')

we are rejecting the null hypothesis, CTR of 4 versions are not equal, and the differences is not by chance.


##2.6 Post-hoc

Since we came to resault that 4 versions CTR difference didnt occur by chance and they are indeed acting different, we need to do a post-hoc test.

In [None]:
#calculating CTR for each version
ctr_dict = {}

for vers in observed_data.columns:
  clicks = observed_data.loc['Click', vers]
  no_click = observed_data.loc['No_Click', vers]
  ctr = round(clicks/(clicks+no_click)*100, 2)
  ctr_dict[vers] = ctr

In [None]:
ver_q = observed_data.shape[1]
number_of_tests = int(ver_q*(ver_q-1)/2)
alpha_posthoc = alpha/number_of_tests

ver_names = observed_data.columns.tolist()

significant_resaults = []

for i in range(ver_q):
    ver1_name = ver_names[i]
    for j in range(i+1, ver_q):
        ver2_name = ver_names[j]

        click1 = observed_data.loc['Click', ver1_name]
        no_click1 = observed_data.loc['No_Click', ver1_name]

        click2 = observed_data.loc['Click', ver2_name]
        no_click2 = observed_data.loc['No_Click', ver2_name]

        paired_vers = [[click1, no_click1],[click2, no_click2]]

        chisq, pvalue, df, excepted = chi2_contingency(paired_vers)

        if pvalue < alpha_posthoc:

            ctr1 = ctr_dict[ver1_name]
            ctr2 = ctr_dict[ver2_name]

            if ctr1 > ctr2:
                significant_resaults.append(f"{ver1_name} (CTR: {ctr1:.2f}%) > {ver2_name} (CTR: {ctr2:.2f}%)")
                print(f"  CONCLUSION: {ver1_name} (CTR: {ctr1:.2f}%) is significantly BETTER than {ver2_name} (CTR: {ctr2:.2f}%)")
            else:
                significant_resaults.append(f"{ver2_name} (CTR: {ctr2:.2f}%) > {ver1_name} (CTR: {ctr1:.2f}%)")
                print(f"  CONCLUSION: {ver2_name} (CTR: {ctr2:.2f}%) is significantly BETTER than {ver1_name} (CTR: {ctr1:.2f}%)")
        else:
            print(f"  CONCLUSION: No significant difference between {ver1_name} and {ver2_name}")

if significant_resaults:
  for v, c in sorted(ctr_dict.items(), key=lambda item: item[1], reverse=True):
    print(f" {v}: {c: .2f}%")
else:
    print("No statistically significant differences were found between any pairs.")

  CONCLUSION: Ver_A (CTR: 2.02%) is significantly BETTER than Ver_B (CTR: 1.14%)
  CONCLUSION: No significant difference between Ver_A and Ver_C
  CONCLUSION: Ver_A (CTR: 2.02%) is significantly BETTER than Ver_D (CTR: 0.76%)
  CONCLUSION: Ver_C (CTR: 2.12%) is significantly BETTER than Ver_B (CTR: 1.14%)
  CONCLUSION: Ver_B (CTR: 1.14%) is significantly BETTER than Ver_D (CTR: 0.76%)
  CONCLUSION: Ver_C (CTR: 2.12%) is significantly BETTER than Ver_D (CTR: 0.76%)
 Ver_C:  2.12%
 Ver_A:  2.02%
 Ver_B:  1.14%
 Ver_D:  0.76%


# Interpretation with HPR and DOR

In [None]:
#drop off rate:
dor_a =.09
dor_b =0
dor_c =.13
dor_d =.125


In [None]:
#homepage return rate
hpr_a =0.051
hpr_b =0
hpr_c =0.048
hpr_d =0.26

In [None]:
# CTR values (higher is better)
ctrs = {
    'A': 2.02,
    'B': 1.14,
    'C': 2.12,
    'D': 0.76
}

# Drop-Off Rates (lower is better)
dors = {
    'A': 0.09,
    'B': 0.00,
    'C': 0.13,
    'D': 0.125
}

# Homepage Return Rates (lower is better)
hprs = {
    'A': 0.051,
    'B': 0.00,
    'C': 0.048,
    'D': 0.26
}

# Normalize function
def normalize(metric_dict, reverse=False):
    values = list(metric_dict.values())
    min_val = min(values)
    max_val = max(values)
    if max_val == min_val:
        return {k: 0.5 for k in metric_dict}  # handle division by zero
    return {
        k: (1 - (v - min_val) / (max_val - min_val)) if reverse else ((v - min_val) / (max_val - min_val))
        for k, v in metric_dict.items()
    }

# Normalize each metric
norm_ctr = normalize(ctrs)          # higher is better
norm_dor = normalize(dors, True)    # lower is better
norm_hpr = normalize(hprs, True)    # lower is better

# Weight for each metric (adjust as needed)
weights = {
    'ctr': 0.5,
    'dor': 0.25,
    'hpr': 0.25
}

# Compute final score
scores = {}
for version in ctrs.keys():
    scores[version] = (
        norm_ctr[version] * weights['ctr'] +
        norm_dor[version] * weights['dor'] +
        norm_hpr[version] * weights['hpr']
    )

# Rank versions
ranked_versions = sorted(scores.items(), key=lambda x: x[1], reverse=True)

# Output
print("Overall Ranking:")
for i, (version, score) in enumerate(ranked_versions, 1):
    print(f"{i}. Version_{version} - Score: {score:.4f}")


Overall Ranking:
1. Version_A - Score: 0.7411
2. Version_C - Score: 0.7038
3. Version_B - Score: 0.6397
4. Version_D - Score: 0.0096


The final conclustion is to reject the null hypothesis. Version A is significantly better performing then versions B, C, and D.
Eniac should continue to use the white shop now button.
Investigations should also be made into the data collection errors that occured with version B HPR and DOR collection.

## **note**
pd.set_option("display.float_format", lambda x: "%.2f" % x)

dispays a short version of longer numbers, use instead of rounding floats


check bounnc rate to, its in the code above

create a visualization to round out as a project