### Analysis Methods

For this analysis we will be using the nonparametric method of the N-1 chi-square test. This test makes fewer assumptions about the underlying data than z- or t-tests. [1][2]

\begin{align}
{\chi}^2 = \frac{(ad-bc)^2(N-1)}{mnrs}
\end{align}

Where
<br>
<table style="border: 1px solid black;">
  <tr>
    <th> </th>
    <th>Pass (Yes)</th> 
    <th>Fail (No)</th>
    <th>Total</th>
  </tr>
  <tr>
    <td>Design A</td>
    <td><i>a</i></td>
    <td><i>b</i></td>
    <td><i>m</i></td>
  </tr>
  <tr>
    <td>Design B</td>
    <td><i>c</i></td>
    <td><i>d</i></td>
    <td><i>n</i></td>
  </tr>
  <tr>
    <td>Total</td>
    <td><i>r</i></td>
    <td><i>s</i></td>
    <td><i>N</i></td>
  </tr>
</table>

An alternative way of analyzing this A/B test is to compare the differences in proportions with the N-1 two proportion tests (using the normal z distribution); the N-1 two proportion test is mathematically equivalent to the N-1 chi-square test.[3] 
<br><br>
When working with more variations beyond the standard A/B test (A/B/C etc), it is best practice to use the Chi-Squared test [1], thus for consistency sake we will use N-1 chi-square test in this A/B test.

<b>Power</b>
<br>
The power of the N-1 chi-square independence test is given by:

\begin{align}
1 - \beta = F_{df},\lambda(x_{crit})
\end{align}

where,
<br>
&emsp;<i>$F_{df}$</i> is the cumulative distribution function
<br>
&emsp;<i>$x_{crit}$</i> is the critical value
<br>
&emsp;<i>&lambda;</i>=<i>$w^{2}$n</i>, is the noncentrality parameter where <i>w</i> is the &phi; effect size.
<br>
and

\begin{align}
\varphi = \sqrt{\frac{ {\chi}^2}{n-1} }
\end{align}

### Hypothesis

If search result rankings are based on the new 5 Tier Ranking Model, then users will be more likely to click on a search result that is listed because the search listing will return more accurate results.

### Analysis

The test was run for a total of 22 days at the end of that time period we collected the following samples:

<table style="border: 1px solid black;">
  <tr>
    <th>Version</th>
    <th>Conversion</th>
    <th>No Conversion</th> 
    <th>Total</th>
  </tr>
  <tr>
    <td>Control (A)</td>
    <td>19398</td>
    <td>1377</td>
    <td>20775</td>
  </tr>
  <tr>
    <td>Variation (b)</td>
    <td>19046</td>
    <td>1807</td>
    <td>20853</td>
  </tr>
    <tr>
    <td>Total</td>
    <td>38444</td>
    <td>3184</td>
    <td>41628</td>
  </tr>
</table>

In [11]:
# load in necessary libraries
import numpy as np
import pandas as pd
import scipy.stats as stats
import statsmodels.api as sm

  from pandas.core import datetools


In [4]:
# create dataframe for end results
d = {'con': [19398, 19046], 'non-con': [1377, 1807]}
end_results = pd.DataFrame(data=d)

In [5]:
#print dataframe
end_results

Unnamed: 0,con,non-con
0,19398,1377
1,19046,1807


In [7]:
# using the standard chi square test
house = [ [19398, 19046], [1377, 1807] ]
chi2, p, ddof, expected = stats.chi2_contingency( house )
msg = "Test Statistic: {}\np-value: {}\nDegrees of Freedom: {}\n"
print( msg.format( chi2, p, ddof ) )
print( expected )

Test Statistic: 60.86057099196304
p-value: 6.126365201099651e-15
Degrees of Freedom: 1

[[19185.98299222 19258.01700778]
 [ 1589.01700778  1594.98299222]]


In [8]:
chi2

60.86057099196304

In [10]:
# give old chi value a more intuitive name
oldchisq = chi2
# total users in test
N = 41628

# calculate chi with N-1 correction
newchisq = oldchisq*N/(N-1)
# new p value with N-1 correction
newp = 1 - stats.chi2.cdf(newchisq, 1)
newp

6.106226635438361e-15

In [22]:
# try it out with z test of proportions
#import statsmodels.api as sm
#z_score, p_value = sm.stats.proportions_ztest( [19398, 19046], [1377, 1807] )

#msg2 = "Critical Value: {}\np-value: {}\n"
#print(msg.format(z_score, p_value))


z_score, p_value = sm.stats.proportions_ztest([19398, 20775], [19046, 20853]) #successes and visits for A and then B
z_score

  std_diff = np.sqrt(var_)


nan

In [23]:
p_value

nan

The p-value is calculated as 0.000000000000006. In other words, if this distribution was due to chance, we would see exactly this distribution only 0.06% of the time

### Test Results

### Recommendations

<b><u>References</b></u>
<br>
[1] Sauro, Jeff, and James R. Lewis. <i>Quantifying the User Experience: Practical Statistics for User Research</i>. Morgan Kaufmann, 2016.
<br>
<br>
[2] Campbell, I. (2007), <i>Chi‐squared and Fisher–Irwin tests of two‐by‐two tables with small sample recommendations</i>. Statist. Med., 26: 3661-3675. doi:10.1002/sim.2832
<br>
<br>
[3] Wallis, Sean. <i>z-Squared: The Origin and Application of χ2</i>. 2010, www.ucl.ac.uk/english-usage/staff/sean/resources/z-squared.pdf.