# Problem 1

Suppose a director of marketing with many years of experience tells you he believes very strongly that the variant without images (B) won’t perform any differently than the original variant. How could you account for this in our model? Implement this change and see how your final conclusions change as well.

# Solution 1

In [None]:
import pandas as pd
import numpy as np

from scipy.stats import beta

# Get number of tests to run and samples in the test
TEST_SAMPLES = 100
NUMBER_OF_TESTS = 1000
# Setup priors for A & B
CLICKED_PRIORS = [5,0]
UNCLICKED_PRIORS = [0,0]

# Name variants so we have less typing
a = 'Variant A'
b = 'Variant B'

# Build a dataframe to hold all the relevant info
df = pd.DataFrame(index = [a,b], data = {'Clicked':[36,50], 'Not Clicked': [114, 110]})
df['Priors Clicked'] = CLICKED_PRIORS
df['Priors Not Clicked'] = UNCLICKED_PRIORS
df['Click Through With Priors'] = (df['Clicked']+df['Priors Clicked'])/(df['Clicked']+df['Priors Clicked']+df['Not Clicked']+df['Priors Not Clicked'])
df['Click Through With Priors'] = (df['Click Through With Priors']).astype(int)



# Create a dataframe with clicks as 1s and not clicked items as 0 for the simulation later
sample_df = pd.DataFrame( data ={
    a: df.loc[a]['Click Through With Priors']*[1]+(100-df.loc[a]['Click Through With Priors'])*[0],
    b: df.loc[b]['Click Through With Priors']*[1]+(100-df.loc[b]['Click Through With Priors'])*[0]
    }
)

# Create counters and get length of the range for later
wins_for_A = 0
wins_for_B= 0
ties = 0
range_high = sample_df.index.size-1
# Run tests
for test in range(NUMBER_OF_TESTS):
    # Randomly selects clicked or non clicked variables from the dataframe
    resultsA =sample_df[a].iloc[np.random.randint(low = 0,high = range_high, size = TEST_SAMPLES)].sum()
    resultsB =sample_df[b].iloc[np.random.randint(low = 0,high = range_high, size = TEST_SAMPLES)].sum()
    # Record results
    if resultsA>resultsB:
        wins_for_A+=1
    elif resultsA<resultsB:
        wins_for_B+=1
    elif resultsA==resultsB:
        # print(f"Test number {test} Total Clicks A: {resultsA} Total Clicks B: {resultsB}")
        ties+=1
# Display Results
print(f"Out of {NUMBER_OF_TESTS} A had the higher click through rate {wins_for_A} times, B had it {wins_for_B} with {ties} ties")
        


# Problem 2
The lead designer sees your results and insists that there’s no way that variant B should perform better with no images. She feels that you should assume the conversion rate for variant B is closer to 20 percent than 30 percent. Implement a solution for this and again review the results of our analysis.
# Solution 2

In [None]:
import pandas as pd
import numpy as np
import plotly.express as px
from scipy.stats import beta

# Get number of tests to run and samples in the test
TEST_SAMPLES = 100
NUMBER_OF_TESTS = 1000
# Setup priors for A & B
CLICKED_PRIORS = [5,0]
UNCLICKED_PRIORS = [0,90]

# Name variants so we have less typing
a = 'Variant A'
b = 'Variant B'

# Build a dataframe to hold all the relevant info
df = pd.DataFrame(index = [a,b], data = {'Clicked':[36,50], 'Not Clicked': [114, 110]})
df['Priors Clicked'] = CLICKED_PRIORS
df['Priors Not Clicked'] = UNCLICKED_PRIORS
df['Click Through With Priors'] = (df['Clicked']+df['Priors Clicked'])/(df['Clicked']+df['Priors Clicked']+df['Not Clicked']+df['Priors Not Clicked'])
df['Click Through With Priors'] = (df['Click Through With Priors']*100).astype(int)
display(df)


# Create a dataframe with clicks as 1s and not clicked items as 0 for the simulation later
sample_df = pd.DataFrame( data ={
    a: df.loc[a]['Click Through With Priors']*[1]+(100-df.loc[a]['Click Through With Priors'])*[0],
    b: df.loc[b]['Click Through With Priors']*[1]+(100-df.loc[b]['Click Through With Priors'])*[0]
    }
)

# Create counters and get length of the range for later
wins_for_A = 0
wins_for_B= 0
ties = 0
range_high = sample_df.index.size-1
# Run tests
for test in range(NUMBER_OF_TESTS):
    # Randomly selects clicked or non clicked variables from the dataframe
    resultsA =sample_df[a].iloc[np.random.randint(low = 0,high = range_high, size = TEST_SAMPLES)].sum()
    resultsB =sample_df[b].iloc[np.random.randint(low = 0,high = range_high, size = TEST_SAMPLES)].sum()
    # Record results
    if resultsA>resultsB:
        wins_for_A+=1
    elif resultsA<resultsB:
        wins_for_B+=1
    elif resultsA==resultsB:
        # print(f"Test number {test} Total Clicks A: {resultsA} Total Clicks B: {resultsB}")
        ties+=1
# Display Results
print(f"Out of {NUMBER_OF_TESTS} A had the higher click through rate {wins_for_A} times, B had it {wins_for_B} with {ties} ties")
        


# Problem 3
Assume that being 95 percent certain means that you’re more or less “convinced” of a hypothesis. Also assume that there’s no longer any limit to the number of emails you can send in your test. If the true conversion for A is 0.25 and for B is 0.3, explore how many samples it would take to convince the director of marketing that B was in fact superior. 

Explore the same for the lead designer. You can generate samples of conversions with the following snippet of R:

true.rate <- 0.25 

number.of.samples <- 100

results <- runif(number.of.samples) <= true.rate
# Solution 3

In [None]:
import pandas as pd
import numpy as np
import plotly.express as px
from scipy.stats import beta

confidence_int = .95
upper_confidence_bound = 1-((1-confidence_int)/2)
lower_confidence_bound = (1-confidence_int)/2

A_conversion = .25
B_conversion = .3

A_ActRatio = [A_conversion,1-A_conversion]
B_ActRatio = [B_conversion,1-B_conversion]
priors = {"clicked":{a:300,b:0},"unclicked":{a:0,b:700}}

i=1
b_min = -1
a_max = 0
while b_min < a_max:
    distA = beta(A_ActRatio[0]*i+priors["clicked"][a],A_ActRatio[1]*i+priors["unclicked"][a])
    distB = beta(B_ActRatio[0]*i+priors["clicked"][b],B_ActRatio[1]*i+priors["unclicked"][b])
    b_min = distB.ppf(lower_confidence_bound)
    a_max = distA.ppf(upper_confidence_bound)
    i+=1
print(i,b_min, a_max)

x=np.arange(0,1,.001)
fig=px.line(x=x,y=[distA.cdf(x),distB.cdf(x)])

fig=px.line(x=x,y=[distA.pdf(x),distB.pdf(x)])
fig.update_layout(title="Results of A B test PDF")
fig.show()

fig.update_layout(title="Results of A B test CDF")
fig.show()