### Evaluating the Influence of Color Alteration on Webpage Click Rates through A/B Testing

A ficticious company called "Quantum Dynamics Innovations, Inc." has a made a change to a prominent call-to-action (CTA) button which takes visitors to the contact page.

__Original Design (A):__

CTA button color: Blue

__Variation (B):__

CTA button color: Green

**The goal is to determine if the color change significantly influences user behavior and whether it improves or hinders the clicks to the contact page.**

A Bayesian approach will be used.

***
#### Importing and cleaning the test results

In [19]:
import pandas as pd
from datetime import datetime, timedelta
from scipy.stats import beta
import numpy as np

In [20]:
print("First 5 rows of dataset")
a_b_test = pd.read_csv("data/ab_data.csv")
print(a_b_test.head())
print("\n","shape of dataset (rows, columns)")
print(a_b_test.shape)
print("\n","column names and datatypes")
print(a_b_test.dtypes)
print("\n","NaN values")
print(a_b_test.isnull().sum())

First 5 rows of dataset
   user_id                   timestamp      group landing_page  converted
0   851104  2017-01-21 22:11:48.556739    control     old_page          0
1   804228  2017-01-12 08:01:45.159739    control     old_page          0
2   661590  2017-01-11 16:55:06.154213  treatment     new_page          0
3   853541  2017-01-08 18:28:03.143765  treatment     new_page          0
4   864975  2017-01-21 01:52:26.210827    control     old_page          1

 shape of dataset (rows, columns)
(294478, 5)

 column names and datatypes
user_id          int64
timestamp       object
group           object
landing_page    object
converted        int64
dtype: object

 NaN values
user_id         0
timestamp       0
group           0
landing_page    0
converted       0
dtype: int64


Data has no missing values and column names and datatypes are formatted correctly. To define the columns:

- __user_id__ : This should be unique and show the first time a unique user_id visited the website by using a users cookies. __Each user should only belong to one group.__

- __timestamp__: The time stamp that the user visited to page. 

- __group__: Whether the user is part of the control group (old page) or the treatment group (new page).

- __landing_page__: Which page the user was shown, the old page (no change to button) or the new page (button changed to green).

- __converted__: Whether the user clicked on the button to be taken to the contact details page. 1 for click, 0 for no-click.


The user_id field will now be validated for unique-ness.

In [21]:
print("Duplicate values in user_id")
duplicate_user_id = a_b_test[a_b_test.duplicated(subset='user_id', keep=False)].sort_values(by='user_id')
print(duplicate_user_id.head())

print("\n","How many duplicate values in user_id")
print(duplicate_user_id.shape)

print("\n","% of entries that are for user_id's that have duplicates")
print(round((duplicate_user_id.shape[0]/a_b_test.shape[0]),4)*100)

Duplicate values in user_id
        user_id                   timestamp      group landing_page  converted
230259   630052  2017-01-17 01:16:05.208766  treatment     new_page          0
213114   630052  2017-01-07 12:25:54.089486  treatment     old_page          1
22513    630126  2017-01-14 13:35:54.778695  treatment     old_page          0
251762   630126  2017-01-19 17:16:00.280440  treatment     new_page          0
183371   630137  2017-01-20 02:08:49.893878    control     old_page          0

 How many duplicate values in user_id
(7788, 5)

 % of entries that are for user_id's that have duplicates
2.64


These non-unique user_id's have entries where they are in one group, but have been shown both versions of the page. This goes against the control of this experiment and must have been done in error. In practice, only showing one version of a webpage is quite complicated and these errors happen.

The first visit to the webpage for each user_id will be taken as golden truth.

In [22]:
a_b_test = a_b_test.drop_duplicates(subset='user_id', keep=False)

user_ids_first_value = duplicate_user_id.loc[duplicate_user_id.groupby('user_id')['timestamp'].idxmin()]

a_b_test = pd.concat([a_b_test,user_ids_first_value],axis=0)

print("duplicate user_id values in cleaned dataset")
print(a_b_test[a_b_test.duplicated(subset='user_id', keep=False)].sort_values(by='user_id').shape[0])


print("\n","how many weeks the experiment ran for?")
print((datetime.strptime(a_b_test["timestamp"].max(), '%Y-%m-%d %H:%M:%S.%f') - datetime.strptime(a_b_test["timestamp"].min(), '%Y-%m-%d %H:%M:%S.%f'))/ timedelta(weeks=1))

duplicate user_id values in cleaned dataset
0

 how many weeks the experiment ran for?
3.1428390904877643


The timestamp field will now be used to create a new field, "week_of_experiment"

In [23]:
a_b_test['week'] = a_b_test['timestamp'].apply(lambda x: datetime.strptime(x, '%Y-%m-%d %H:%M:%S.%f').isocalendar()[1])

print("number of users that visited per week")
print(a_b_test['week'].value_counts())

number of users that visited per week
week
2    92734
3    91588
1    85735
4    20527
Name: count, dtype: int64


In [24]:
a_b_test['converted_category'] = a_b_test['converted'].apply(lambda x: 'Clicked' if x==1 else 'Did not click')

data_control = a_b_test[(a_b_test['week']!=1) & (a_b_test['group']=='control')][['converted_category','converted']]
data_treatment = a_b_test[(a_b_test['week']!=1) & (a_b_test['group']=='treatment')][['converted_category','converted']]

print("Control group: overall % split of click vs non click\n\n", round(a_b_test[(a_b_test['week']!=1) & (a_b_test['group']=='control')]['converted_category'].value_counts(normalize=True)*100,2))

print("\n\nTreatment group: overall % split of click vs non click\n\n", round(a_b_test[(a_b_test['week']!=1) & (a_b_test['group']=='treatment')]['converted_category'].value_counts(normalize=True)*100,2))

print("\n\nBefore properly analysing it could be hypothesised that the experiment did not have a positive effect on proportion of users clicking the button, or that it even had a detrimental effect.")

Control group: overall % split of click vs non click

 converted_category
Did not click    87.92
Clicked          12.08
Name: proportion, dtype: float64


Treatment group: overall % split of click vs non click

 converted_category
Did not click    88.08
Clicked          11.92
Name: proportion, dtype: float64


Before properly analysing it could be hypothesised that the experiment did not have a positive effect on proportion of users clicking the button, or that it even had a detrimental effect.


***
#### Analysing the Test Results

A Bayesian approach to this analysis requires a _prior distribution_. For the purposes of this report the first week of experiment will be used to return the prior distribution. 

_In practice, this data would be provided before the experiment has begun._

The first weeks data will be sampled randomly 10,000 times with 2,000 elements being taken each sample and a mean calculated. This mean will be appended to a list.


In [25]:
prior = a_b_test[(a_b_test['week'] == 1) & (a_b_test['group']=='control')]

prior_means = []

for i in range(10000):
    prior_means.append(prior.sample(2000)['converted'].mean())

Using the beta.fit function the values for 'prior_alpha' and 'prior_beta' will be estimated.

floc=0 and flac=1 are used to fix the location and the scale (respectively). This is because the beta distribution is defined in the interval [0,1] as we are dealing with mean conversion (we cannot have > 100% (or 1.0) conversion).

For each week of experiment, the results will be calculated to show how each week effects the confidence and validaty of result.

In [26]:
prior_alpha, prior_beta, _, _ = beta.fit(prior_means, floc=0, fscale=1)

for i,week in enumerate(sorted(a_b_test['week'].unique()[a_b_test['week'].unique()>1])):
    experiment_data = a_b_test[(a_b_test['week'] > 1) & (a_b_test['week'] <= week)]
    control_data = experiment_data[experiment_data['group']=='control']['converted']
    treatment_data = experiment_data[experiment_data['group']=='treatment']['converted']

   # calculate conversion figures and rates and overall lift (the percentage difference in conversion rate between control and treatment versions)
    control_converted = control_data.sum()
    treatment_converted = treatment_data.sum()
    control_non_converted = len(control_data) - control_converted
    treatment_non_converted = len(treatment_data) - treatment_converted
    control_conversion = round(control_data.sum() * 100/ len(control_data), 3)
    treatment_conversion = round(treatment_data.sum() * 100/ len(treatment_data), 3)
    lift = round((treatment_conversion - control_conversion) / control_conversion , 3)

    # calculate posterior parameters with conversion rates
    posterior_control = beta(prior_alpha + control_converted, prior_beta + control_non_converted)
    posterior_treatment = beta(prior_alpha + treatment_converted, prior_beta + treatment_non_converted)

    # sample from posteriors
    control_samples = posterior_control.rvs(2000)
    treatment_samples = posterior_treatment.rvs(2000)
    probability = np.mean(treatment_samples > control_samples)

    # calculate mean and variance of the posterior for control and treatment groups
    (control_mu), (control_var) = posterior_control.stats()
    (treatment_mu), (treatment_var) = posterior_treatment.stats()
    lift_percentage = (treatment_samples - control_samples) / control_samples

    if i == 2:
        i = round((datetime.strptime(a_b_test["timestamp"].max(), '%Y-%m-%d %H:%M:%S.%f') - datetime.strptime(a_b_test["timestamp"].min(), '%Y-%m-%d %H:%M:%S.%f'))/ timedelta(weeks=1),2) - 1
    else:
        i = i+1

    print(f"\nRunning the experiment for {i} weeks","\n", f"Probability that the treatment > control: {probability*100:.2f}%","\n", f"by this much lift: {lift*100:.2f}%")
    print(f"Control Posterior: Mean: {control_mu:.3f}, Variance: {control_var:.8f}") 
    print(f"Treatment Posterior: Mean: {treatment_mu:.3f}, Variance: {treatment_var:.8f}") 



Running the experiment for 1 weeks 
 Probability that the treatment > control: 25.60% 
 by this much lift: -1.20%
Control Posterior: Mean: 0.120, Variance: 0.00000217
Treatment Posterior: Mean: 0.118, Variance: 0.00000215

Running the experiment for 2 weeks 
 Probability that the treatment > control: 18.00% 
 by this much lift: -1.20%
Control Posterior: Mean: 0.121, Variance: 0.00000112
Treatment Posterior: Mean: 0.119, Variance: 0.00000111

Running the experiment for 2.14 weeks 
 Probability that the treatment > control: 12.80% 
 by this much lift: -1.40%
Control Posterior: Mean: 0.121, Variance: 0.00000102
Treatment Posterior: Mean: 0.119, Variance: 0.00000101


__Treatment__: button changed colour from blue to green. No other changes to the webpage or it’s content.

__Control__: no change to the webpage.

__Conversion__: user clicked and proceeded to the Contact section of the webpage.

We can conclude that __after 2.14 weeks of experiment__ the treatment conversion was greater than the control in only 256 out of 2,000 posterior samples. __There is a 12.80% probability that the treatment has a greater conversion rate than the control.__

__The control had a greater conversion than the treatment in 1,744 out of 2,000 posterior samples.__ Furthermore, __the lift conversion is -1.4%.__ __The change in button colour__ actually had a __detrimental impact on users clicking it__, and proceeding to the Contact section of the webpage.

__The variances in the results__ were very small at 0.00000102 and 0.00000101 for the control and treatment respectively. This means the distribution of 2,000 samples is __concentrated around the mean__. This gives __a greater confidence in the result.__ 

__The advice would be to not proceed with the change and to stop the experiment.__

__In practice this conclusion could have been reached beforehand.__ This is one of the benefits of Bayesian testing as __you can continuously monitor your results.__
