<a href="https://colab.research.google.com/github/hannamakarova/AB_Test_Analysis/blob/main/P1_AB_testing_results_full.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# A/B Test Analysis
Goal: Analyze A/B test results to find if differences in key metrics are significant.

Metrics Analyzed:
* add_payment_info/session
* add_shipping_info/session
* begin_checkout/session
* new_accounts/session

Segments: Results will be analyzed by device (e.g., mobile, desktop) and channel (e.g., Organic Search, Paid Search).

Steps in the Project

Load Data:

Import the dataset and check its structure.
Clean missing or incorrect data.

Define Metrics:

Calculate conversion rates for each metric using:
* Numerator: Successful events (e.g., add_payment_info).
* Denominator: Total number of sessions.

Segment Data:

Group results by test, device, and channel for detailed insights.

Statistical Testing:

Compare experimental and control groups using:

* Z-statistics to measure differences.
* P-values to check significance.
* Mark results as "significant" or "not significant."

Export Results:

Save the analysis in a CSV file.
Highlight significant results for easy visualization.


Tableau: https://public.tableau.com/app/profile/hanna.makarova5369/viz/ABTest_17371305971180/Story1?publish=yes

Drive: https://drive.google.com/file/d/1qRmKA5LQZmJ5JvtVIoGtGjBpGI7yfAsQ/view?usp=sharing

In [None]:
import pandas as pd
from scipy.stats import norm
import numpy as np

# Mount Google Drive
from google.colab import drive
drive.mount("/content/drive")
%cd /content/drive/MyDrive/Python_data

# Load the dataset
data = pd.read_csv("ab_testing.csv")

# Define the metrics to analyze and their corresponding event names
metrics = {
    "add_payment_info/session": "add_payment_info",
    "add_shipping_info/session": "add_shipping_info",
    "begin_checkout/session": "begin_checkout",
    "new account/session": "new account",
}

# Initialize a list to store aggregated results
aggregated_data = []

# Iterate through all unique combinations of test, device, and channel
for test in data['test'].unique():
    for device in data['device'].unique():
        for channel in data['channel'].unique():
            # Filter data for the current combination of test, device, and channel
            test_device_channel_data = data[
                (data['test'] == test) &
                (data['device'] == device) &
                (data['channel'] == channel)
            ]

            # Calculate metrics for each defined metric/event
            for metric, event_name in metrics.items():
                # Calculate numerator and denominator for experiment group (test_group == 2)
                numerator_ev = test_device_channel_data[
                    (test_device_channel_data['test_group'] == 2) &
                    (test_device_channel_data['event_name'] == event_name)
                ]['value'].sum()

                denominator_ev = test_device_channel_data[
                    (test_device_channel_data['test_group'] == 2) &
                    (test_device_channel_data['event_name'] == "session")
                ]['value'].sum()

                # Calculate numerator and denominator for control group (test_group == 1)
                numerator_co = test_device_channel_data[
                    (test_device_channel_data['test_group'] == 1) &
                    (test_device_channel_data['event_name'] == event_name)
                ]['value'].sum()

                denominator_co = test_device_channel_data[
                    (test_device_channel_data['test_group'] == 1) &
                    (test_device_channel_data['event_name'] == "session")
                ]['value'].sum()

                # Calculate conversion rates
                conversion_rate_ev = numerator_ev / denominator_ev if denominator_ev != 0 else 0
                conversion_rate_co = numerator_co / denominator_co if denominator_co != 0 else 0

                # Calculate percentage change in the metric
                metric_change = (conversion_rate_ev / conversion_rate_co * 100 - 100) if conversion_rate_co != 0 else 0

                # Calculate statistical significance (z-statistic and p-value)
                p_pool = (numerator_ev + numerator_co) / (denominator_ev + denominator_co) if denominator_ev + denominator_co > 0 else 0
                se_pool = np.sqrt(p_pool * (1 - p_pool) * (1 / denominator_ev + 1 / denominator_co)) if denominator_ev > 0 and denominator_co > 0 else 0
                z_stat = (conversion_rate_ev - conversion_rate_co) / se_pool if se_pool > 0 else 0
                p_value = 2 * (1 - norm.cdf(abs(z_stat)))

                # Append the results for this metric
                aggregated_data.append({
                    "test_number": test,
                    "device": device,
                    "channel": channel,
                    "metric": metric,
                    "numerator_ev": numerator_ev,
                    "denominator_ev": denominator_ev,
                    "conversion_rate_ev": conversion_rate_ev,
                    "numerator_co": numerator_co,
                    "denominator_co": denominator_co,
                    "conversion_rate_co": conversion_rate_co,
                    "metric_change_%": metric_change,
                    "z_stat": z_stat,
                    "p_value": p_value,
                    "significant": p_value < 0.05  # True if p-value < 0.05
                })

# Convert the aggregated results into a DataFrame
results = pd.DataFrame(aggregated_data)

# Save the results to a CSV file
results.to_csv('aggregated_results.csv', index=False)

# Display the results
print(results)

# Download the results file to local machine
from google.colab import files
files.download('aggregated_results.csv')


Mounted at /content/drive
/content/drive/MyDrive/Python_data
     test_number   device         channel                     metric  \
0              2  desktop  Organic Search   add_payment_info/session   
1              2  desktop  Organic Search  add_shipping_info/session   
2              2  desktop  Organic Search     begin_checkout/session   
3              2  desktop  Organic Search        new account/session   
4              2  desktop     Paid Search   add_payment_info/session   
..           ...      ...             ...                        ...   
235            3   tablet       Undefined        new account/session   
236            3   tablet          Direct   add_payment_info/session   
237            3   tablet          Direct  add_shipping_info/session   
238            3   tablet          Direct     begin_checkout/session   
239            3   tablet          Direct        new account/session   

     numerator_ev  denominator_ev  conversion_rate_ev  numerator_co  \
0  

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>