## Analysis Methods

For this analysis we will be using the nonparametric method of the N-1 chi-square test. This test makes fewer assumptions about the underlying data than z- or t-tests. [1][2]

\begin{align}
{\chi}^2 = \frac{(ad-bc)^2(N-1)}{mnrs}
\end{align}

Where
<br>
<table style="border: 1px solid black;">
  <tr>
    <th> </th>
    <th>Pass (Yes)</th> 
    <th>Fail (No)</th>
    <th>Total</th>
  </tr>
  <tr>
    <td>Design A</td>
    <td><i>a</i></td>
    <td><i>b</i></td>
    <td><i>m</i></td>
  </tr>
  <tr>
    <td>Design B</td>
    <td><i>c</i></td>
    <td><i>d</i></td>
    <td><i>n</i></td>
  </tr>
  <tr>
    <td>Total</td>
    <td><i>r</i></td>
    <td><i>s</i></td>
    <td><i>N</i></td>
  </tr>
</table>

An alternative way of analyzing this A/B test is to compare the differences in proportions with the N-1 two proportion tests (using the normal z distribution); the N-1 two proportion test is mathematically equivalent to the N-1 chi-square test.[1][3] 
<br><br>
When working with more variations beyond the standard A/B test (A/B/C etc), it is best practice to use the Chi-Squared test [1], thus for consistency sake we will use N-1 chi-square test in this A/B test.

<b>Power</b>
<br>
The power of the N-1 chi-square independence test is given by:

\begin{align}
1 - \beta = F_{df},\lambda(x_{crit})
\end{align}

where,
<br>
&emsp;<i>$F_{df}$</i> is the cumulative distribution function
<br>
&emsp;<i>$x_{crit}$</i> is the critical value
<br>
&emsp;<i>&lambda;</i>=<i>$w^{2}$n</i>, is the noncentrality parameter where <i>w</i> is the &phi; effect size.
<br>
and

\begin{align}
\varphi = \sqrt{\frac{ {\chi}^2}{n-1} }
\end{align}

## Hypothesis

If search result rankings are based on the new 5 Tier Ranking Model, then users will be more likely to click on a search result that is listed because the search listing will return more accurate results.

## Analysis

The test was run for a total of 22 days at the end of that time period we collected the following samples:

<table style="border: 1px solid black;">
  <tr>
    <th>Version</th>
    <th>Conversion</th>
    <th>No Conversion</th> 
    <th>Total</th>
  </tr>
  <tr>
    <td>Control (A)</td>
    <td>19398</td>
    <td>1377</td>
    <td>20775</td>
  </tr>
  <tr>
    <td>Variation (B)</td>
    <td>19046</td>
    <td>1807</td>
    <td>20853</td>
  </tr>
    <tr>
    <td>Total</td>
    <td>38444</td>
    <td>3184</td>
    <td>41628</td>
  </tr>
</table>

In [3]:
# load in necessary libraries
import numpy as np
import scipy.stats as stats
import statsmodels.api as sm
import matplotlib.pyplot as plt
import matplotlib.path as mpath
import pandas as pd
from pandas.core import datetools

In [4]:
# using the standard chi square test
results = [ [19398, 19046], [1377, 1807] ]
chi2, p, ddof, expected = stats.chi2_contingency( results )
msg = "Test Statistic: {}\np-value: {}\nDegrees of Freedom: {}\n"
print( msg.format( chi2, p, ddof ) )
print( expected )

Test Statistic: 60.86057099196304
p-value: 6.126365201099651e-15
Degrees of Freedom: 1

[[19185.98299222 19258.01700778]
 [ 1589.01700778  1594.98299222]]


In [5]:
# give old chi value a more intuitive name
oldchisq = chi2
# total users in test
N = 41628

# calculate chi with N-1 correction
newchisq = oldchisq*N/(N-1)
# new p value with N-1 correction
newp = 1 - stats.chi2.cdf(newchisq, 1)
newp

6.106226635438361e-15

#### Was it statistically significant?

<i>Yes, the test result was statistically significant.</i>

The p-value is calculated as 0.000000000000006. In other words, if this distribution was due to chance, we would see exactly this distribution only 6.0E-13% of the time.

#### Did we account for Type II errors?

In [None]:
# conduct power analysis

### Overtime, was the test statistically significant?

In [None]:
# junk delete
# list = [[[19398, 19046], [1377, 1807]], [[10813, 10227], [461,887]]]

In [None]:
# junk delete
# for i in list:
#    print(i)

In [None]:
# junk delete
#results2 = list[1]
#chi2, p, ddof, expected = stats.chi2_contingency( results2 )
#msg = "Test Statistic: {}\np-value: {}\nDegrees of Freedom: {}\n"
#print( msg.format( chi2, p, ddof ) )
#print( expected )

In [6]:
# upload conversion data over time into dataframe
daily_results = pd.read_csv('data.csv')

In [7]:
# check structure of dataframe
daily_results.head(3)

Unnamed: 0,Day,Conversions_A,No_Conversions_A,Visits_A,Conversions_B,No_Conversions_B,Visits_B
0,"May 21, 2018",778,-128,650,606,-4,602
1,"May 22, 2018",2115,-38,2077,2049,38,2087
2,"May 23, 2018",3361,-15,3346,3183,110,3293


In [8]:
# insert column for total users in test per day
# this is used in later calculations.
daily_results['Total_Visits'] = daily_results['Visits_A'] + daily_results['Visits_B']

In [9]:
# check that new column was created successfully
daily_results.head(3)

Unnamed: 0,Day,Conversions_A,No_Conversions_A,Visits_A,Conversions_B,No_Conversions_B,Visits_B,Total_Visits
0,"May 21, 2018",778,-128,650,606,-4,602,1252
1,"May 22, 2018",2115,-38,2077,2049,38,2087,4164
2,"May 23, 2018",3361,-15,3346,3183,110,3293,6639


In [10]:
# Now we will need to perform some data restructuring so that
# we can use it with stats library and iterate over dates.


# create dataframe for conversion values only
Conversions = daily_results[['Conversions_A', 'Conversions_B']].copy()

In [11]:
# convert conversion dataframe into list
Conversions = Conversions.values.tolist()

In [12]:
# create dataframe from non-conversion values only
NonConversions = daily_results[['No_Conversions_A', 'No_Conversions_B']].copy()

In [13]:
# convert nonconversion dataframe into list
NonConversions = NonConversions.values.tolist()

In [14]:
# create a list from our conversion and nonconversion lists
new_list = [*zip(Conversions, NonConversions)]
# check structure of list
new_list

[([778, 606], [-128, -4]),
 ([2115, 2049], [-38, 38]),
 ([3361, 3183], [-15, 110]),
 ([4512, 4179], [-105, 185]),
 ([5326, 5079], [66, 257]),
 ([5578, 5293], [51, 286]),
 ([5902, 5574], [105, 358]),
 ([6872, 6450], [237, 512]),
 ([8248, 7736], [290, 623]),
 ([9515, 9048], [425, 727]),
 ([10813, 10227], [461, 887]),
 ([11697, 11313], [581, 881]),
 ([11884, 11478], [628, 959]),
 ([12241, 11791], [661, 1044]),
 ([13518, 13142], [748, 1099]),
 ([14838, 14464], [925, 1240]),
 ([16283, 15695], [1005, 1463]),
 ([17488, 17041], [1205, 1590]),
 ([18409, 18004], [1316, 1693]),
 ([18585, 18277], [1339, 1660]),
 ([18893, 18563], [1375, 1730]),
 ([19695, 19354], [1379, 1846])]

In [36]:
new_list_copy = new_list.copy()

In [None]:
for in new_list_copy:
    print

In [21]:
x = [1, 2, 3]
y = [4, 5, 6]
zipped = zip(x, y)
list(zipped)

x2, y2 = zip(*zip(x, y))
x == list(x2) and y == list(y2)

True

In [39]:
for row in new_list_copy:
    conver, nonconver = zip(*new_list_copy)
    for item in conver, nonconver:
        for number in item:
            if number < 0:
                new_list_copy.remove(row)

TypeError: '<' not supported between instances of 'list' and 'int'

In [38]:
new_list_copy

[([2115, 2049], [-38, 38]),
 ([3361, 3183], [-15, 110]),
 ([4512, 4179], [-105, 185]),
 ([5326, 5079], [66, 257]),
 ([5578, 5293], [51, 286]),
 ([5902, 5574], [105, 358]),
 ([6872, 6450], [237, 512]),
 ([8248, 7736], [290, 623]),
 ([9515, 9048], [425, 727]),
 ([10813, 10227], [461, 887]),
 ([11697, 11313], [581, 881]),
 ([11884, 11478], [628, 959]),
 ([12241, 11791], [661, 1044]),
 ([13518, 13142], [748, 1099]),
 ([14838, 14464], [925, 1240]),
 ([16283, 15695], [1005, 1463]),
 ([17488, 17041], [1205, 1590]),
 ([18409, 18004], [1316, 1693]),
 ([18585, 18277], [1339, 1660]),
 ([18893, 18563], [1375, 1730]),
 ([19695, 19354], [1379, 1846])]

In [None]:
    #for number in conver:
    #    if number < 0:
    #        new_list.remove(row)

In [None]:
for i in new list:
    for j in i:
        for k in j:
            if k < 0:
                
    
    
        for k in i:
            for j in k:
                if j > 0:
                    print(i)

In [None]:
# create list for total users in each test

# extract the column we want
Total_Visits = daily_results['Total_Visits']
# convert column into list
Total_Visits = Total_Visits.values.tolist()

In [None]:
# junk delete
# using the standard chi square test
chi2, p, ddof, expected = stats.chi2_contingency( new_list[4] )
msg = "Test Statistic: {}\np-value: {}\nDegrees of Freedom: {}\n"
print( msg.format( chi2, p, ddof ) )

In [None]:
# junk delete
# give old chi value a more intuitive name
oldchisq = chi2
# total users in test
N = 41628

# calculate chi with N-1 correction
newchisq = oldchisq*N/(N-1)
# new p value with N-1 correction
updated_p = 1 - stats.chi2.cdf(newchisq, 1)
updated_p

In [None]:
historical_p = []
for i,j in [(i,j) for i in new_list for j in Total_Visits]:
    for k in i:
            for j in k:
                if j > 0:
                    print(i)

In [None]:
            chi, p, ddof, expected = stats.chi2_contingency( i )
            N = j
            newchisq = chi*N(N-1)
            updated_p = 1 - stats.chi2.cdf(newchisq, 1)
            historical_p.append(updated_p)

In [None]:
star = mpath.Path.unit_regular_star(6)
circle = mpath.Path.unit_circle()
# concatenate the circle with an internal cutout of the star
verts = np.concatenate([circle.vertices, star.vertices[::-1, ...]])
codes = np.concatenate([circle.codes, star.codes])
cut_star = mpath.Path(verts, codes)


plt.plot(np.arange(10)**2, '--r', marker=cut_star, markersize=15)

plt.show()

In [None]:
plt.scatter(list)
plt.plot(list)
plt.show()

### Test Results

### Recommendations

<b><u>References</b></u>
<br>
[1] Sauro, Jeff, and James R. Lewis. <i>Quantifying the User Experience: Practical Statistics for User Research</i>. Morgan Kaufmann, 2016.
<br>
<br>
[2] Campbell, I. (2007), <i>Chi‐squared and Fisher–Irwin tests of two‐by‐two tables with small sample recommendations</i>. Statist. Med., 26: 3661-3675. doi:10.1002/sim.2832
<br>
<br>
[3] Wallis, Sean. <i>z-Squared: The Origin and Application of χ2</i>. 2010, www.ucl.ac.uk/english-usage/staff/sean/resources/z-squared.pdf.