# A/B Testing

* A/B testing relies on **classic statistical test for statistical significance**.
* Often, we want to know whether the difference of the CTR of the two web designs are entirely due to random chance, or statistically significant. 
* Therefore, we want to check what is the probability of observing a difference equally or more extreme than the one you observed, given that we assume the difference is entirely due to chance, i.e, the p-value.
* **A/B Testing:**
    * is often used by organizations to see if a change in strategy gives difference, hopefully better, results.
    * The null hypothesis of an A/B test is often: the test statistic is impervious to the change. A low p-value implies that the change in strategy lead to a change in performance.

* **Example:** `The vote for the Civil Rights Act in 1964`

The Civil Rights Act of 1964 was one of the most important pieces of legislation ever passed in the USA. Excluding `present` and `abstain` votes, **153 House Democrats** and **136 Republicans** voted yea. 

However, **91 Democrats** and **35 Republicans** voted nay. Did party affiliation make a difference in the vote?





In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

In [None]:
# Construct arrays of data: dems, reps
dems = np.array([True] * 153 + [False] * 91)
reps = np.array([True] * 136 + [False] * 35)

In [None]:
# Write a function to resample the data with permutation
def permutation_sample(data1, data2):
    """Generate a permutation sample from two data sets."""

    # Concatenate the data sets: data
    data = np.concatenate((data1, data2)) #needs to be put in a tuple

    # Permute the concatenated array: permuted_data
    permuted_data = np.random.permutation(data)

    # Split the permuted array into two: perm_sample_1, perm_sample_2
    perm_sample_1 = permuted_data[:len(data1)]
    perm_sample_2 = permuted_data[len(data1):]

    return perm_sample_1, perm_sample_2

# Visualize Permutated sample distribution
for _ in range(50):
    # Generate permutation samples
    perm_sample_1, perm_sample_2 = permutation_sample(rain_june, rain_november)

    # Compute ECDFs
    x_1, y_1 = ecdf(perm_sample_1)
    x_2, y_2 = ecdf(perm_sample_2)

    # Plot ECDFs of permutation sample
    _ = plt.plot(x_1, y_1, marker='.', linestyle='none',
                 color='red', alpha=0.02)
    _ = plt.plot(x_2, y_2, marker='.', linestyle='none',
                 color='blue', alpha=0.02)

# Create and plot ECDFs from original data
x_1, y_1 = ecdf(rain_june)
x_2, y_2 = ecdf(rain_november)
_ = plt.plot(x_1, y_1, marker='.', linestyle='none', color='red')
_ = plt.plot(x_2, y_2, marker='.', linestyle='none', color='blue')

# Label axes, set margin, and show plot
plt.margins(0.02)
_ = plt.xlabel('monthly rainfall (mm)')
_ = plt.ylabel('ECDF')
plt.show()

NameError: name 'rain_june' is not defined

In [None]:
def draw_perm_reps(data_1, data_2, func, size=1):
    """Generate multiple permutation replicates."""

    # Initialize array of replicates: perm_replicates
    perm_replicates = np.empty(size)

    for i in range(size):
        # Generate permutation sample
        perm_sample_1, perm_sample_2 = permutation_sample(data_1, data_2)

        # Compute the test statistic
        perm_replicates[i] = func(perm_sample_1, perm_sample_2)

    return perm_replicates

In [None]:
# Two inputs are required to use the below draw_perm_reps() function, but the second is not used.
def  frac_yea_dems(dems,reps):
    """
    Compute Fraction of Democrats yea votes
    """

    frac = np.sum(dems)/len(dems)
    return frac

In [None]:
# Acquire permutation samples: perm_replicates
perm_replicates = draw_perm_reps(dems, reps, frac_yea_dems, 10000)

In [None]:
p_values  = np.sum(perm_replicates <= 153/244) / len(perm_replicates)

In [None]:
print('p-value =', p_values)

p-value = 0.0001


To answer this question, you will evaluate the hypothesis that the party of a House member has no bearing on his or her vote. You will use the fraction of Democrats voting in favor as your test statistic and evaluate the probability of observing a fraction of Democrats voting in favor at least as small as the observed fraction of 153/244. (That’s right, at least as small as. In 1964, it was the Democrats who were less progressive on civil rights issues.) To do this, permute the party labels of the House voters and then arbitrarily divide them into “Democrats” and “Republicans” and compute the fraction of Democrats voting yea.

<a style='text-decoration:none;line-height:16px;display:flex;color:#5B5B62;padding:10px;justify-content:end;' href='https://deepnote.com?utm_source=created-in-deepnote-cell&projectId=f5f90ba1-3290-463e-8fc6-44108f4fa21b' target="_blank">
 </img>
Created in <span style='font-weight:600;margin-left:4px;'>Deepnote</span></a>