# s4ch10
## A/B Testing: Improving UX Experience

Explore the Data from the Experiment
- Connect
- Learn 
- Help
- Services

The metrics that were deemed more relevant to track were the following:
- **Click-through rate for the homepage**. Selected as a measure of the initial ability of the category title to attract users.
- **Drop-off rate for the category pages.** Percentage of users who leave the site from a given page, selected as a measure of the ability of the category page to meet user expectations.
- **Homepage-return rate for the category pages**. percentage of users who navigated from the library homepage to the category page, then returned back to the homepage. This sequence of actions provides clues as to whether a user discovered the desired option on the category page; if not, the user would likely then return to the homepage to continue navigation. Homepage-return rate was therefore selected as a measure of the ability of the category page to meet user expectations.

It was decided that for a version to be considered superior, a min increase in click-through rate of 40% had to be detected.
When the variations from the original website are very stark, a conservative approach is to the test variables in this case study were largely non-disruptive, 100% of website visitors were included

Currently, the CTR for Interact sits at around 2%, and the page has around 1650 visitors every day. The desired Statistical Significance was 90%. With these numbers, a power calculator like this one can be used to decide on the length of the test. The length of the experiment was established at 21 days:
```
[from A/B tasty]

How many users do you need?
- conversion rate 2%
- min detected effect 30%
- statistical significance 90%
- (statistical power 80%)
- req number of tested visitors per variation 7,057

How long should your test run?
- avg daily visitor 1650
- no of variations 5 
- req duration in days 21
```
The test ran between 29.05.13-18.06.13.

Attached at the bottom of this lesson, you will find the data that was extracted from crazyegg, a service that tracks traffic to websites and provides insights and well structured data.

Explore the data and tackle these questions:
- What was the click-through rate for each version?
- Which version was the winner?
- Do the results seem conclusive?
In the next lesson we will perform the significance test that will tell us whether the results are statistically significant.

In [None]:
# prep

In [158]:
# import libs
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import scipy.stats as stats
import math
from scipy.stats import ttest_1samp

# pd.options.display.float_format = '{:.2f}'.format

In [159]:
# load cvs
path = '/Users/riasnazary/data/CrazyEgg/'
interact = pd.read_csv(path + 'hp_v1_interact/v1-interact.csv')
connect = pd.read_csv(path + 'hp_v2_connect/v2_connect.csv')
learn = pd.read_csv(path + 'hp_v3_learn/v3_learn.csv')
help1 = pd.read_csv(path + 'hp_v4_help/v4_help.csv')
services = pd.read_csv(path + 'hp_v5_services/v5_services.csv')

### What was the click-through rate for each version?

In [160]:
# 100 * amount clicks / amount impression

In [161]:
# changing col name of 'no. cliks' to 'click'
interact.rename(columns={'No. clicks': 'clicks'}, inplace=True)
connect.rename(columns={'No. clicks': 'clicks'}, inplace=True)
learn.rename(columns={'No. clicks': 'clicks'}, inplace=True)
help1.rename(columns={'No. clicks': 'clicks'}, inplace=True)
services.rename(columns={'No. clicks': 'clicks'}, inplace=True)

In [163]:
# show visits and clicks
pd.set_option('display.max_colwidth', None)
(
    interact.iat[1,5],
    connect.iat[1,5],
    learn.iat[1,5],
    help1.iat[1,5],
    services.iat[1,5],
)

('created 5-29-2013   •   20 days 4 hours 21 mins   •   10283 visits, 3714 clicks',
 'created 5-29-2013   •   20 days 7 hours 34 mins   •   2742 visits, 1587 clicks',
 'created 5-29-2013   •   20 days 12 hours 21 mins   •   2747 visits, 1652 clicks',
 'created 5-29-2013   •   20 days 4 hours 59 mins   •   3180 visits, 1717 clicks',
 'created 5-29-2013   •   20 days 4 hours 59 mins   •   2064 visits, 1348 clicks')

In [None]:
c = (
connect
    .query('Name == "CONNECT"')
    .groupby('clicks')
    .agg('clicks')
    .count()
    .sum()
)
# print("{:.2f}".format(round(c*100/2742, 2)))
# t*100/2742

In [193]:
# ctr
ctr = (
    interact.query('Name == "INTERACT"').agg('clicks').sum() * 100 / 10283,
    connect.query('Name == "CONNECT"').agg('clicks').sum() * 100 / 2742,
    learn.query('Name == "LEARN"').agg('clicks').sum() * 100 / 2747,
    help1.query('Name == "HELP"').agg('clicks').sum() * 100 / 3180,
    services.query('Name == "SERVICES"').agg('clicks').sum() * 100 / 2064,
)
for i in ctr:
    print("{:.2f}".format(round(i, 2)))

# interact 0.41 %
# connect 1.93 %
# learn 0.76 %
# help 1.19 % 
# services 2.18 %

0.41
1.93
0.76
1.19
2.18


### Which version was the winner?

In [None]:
# v5 services with 0.05% ctr

### Do the results seem conclusive?

In [None]:
# in a way yes, since the majority was pointing out that "interact" was not helpful in terms of a click name