# Confidence intervals and A/B testing

Ryan Reece     
2018-02-26

## Introduction

## Following the blitzresults.com example

Working out the example [here](http://www.blitzresults.com/en/ab-tests/) with the $\chi^2$ method.

In [1]:
N_A1 = 960.
N_A2 =  40.
N_B1 = 1120.
N_B2 =  80.
N_A = N_A1 + N_A2
N_B = N_B1 + N_B2
N_1 = N_A1 + N_B1
N_2 = N_A2 + N_B2
rho = N_2/(N_1+N_2)
print rho

0.0545454545455


In [4]:
chi2 = ((N_A1 - N_A*(1-rho))**2)/(N_A*(1-rho)) + ((N_A2 - N_A*rho)**2)/(N_A*rho) + ((N_B1 - N_B*(1-rho))**2)/(N_B*(1-rho)) + ((N_B2 - N_B*rho)**2)/(N_B*rho)
print chi2

7.52136752137


Checking the $\chi^2$ term-by-term:

In [7]:
print ((N_A1 - N_A*(1-rho))**2)/(N_A*(1-rho))
print ((N_A2 - N_A*rho)**2)/(N_A*rho)
print ((N_B1 - N_B*(1-rho))**2)/(N_B*(1-rho))
print ((N_B2 - N_B*rho)**2)/(N_B*rho)

0.223776223776
3.87878787879
0.18648018648
3.23232323232


## See also

References:

-   [Wikipedia: Lady tasting tea](https://en.wikipedia.org/wiki/Lady_tasting_tea)
-   [Wikipedia: Student's $t$-test](https://en.wikipedia.org/wiki/Student%27s_t-test)
-   [Wikipedia: Welch's $t$-test](https://en.wikipedia.org/wiki/Welch%27s_t-test)
-   [Tables of critical values](https://home.ubalt.edu/ntsbarsh/Business-stat/StatistialTables.pdf)

Illuminating discussions:

-   [KhanAcademy: hypothesis testing videos](https://www.khanacademy.org/math/statistics-probability/significance-tests-one-sample/more-significance-testing-videos/v/large-sample-proportion-hypothesis-testing)
-   [Quora: What is an intuitive explanation for how the $t$, $z$, and $\chi^2$-distributions are related?](https://www.quora.com/What-is-an-intuitive-explanation-for-how-the-t-distribution-normal-distribution-F-distribution-and-Chi-square-distribution-relate-to-each-other-Why-do-all-these-different-distributions-exist-and-when-do-we-use-each-in-statistical-testing/answer/Chandrima-Das)
-   [Quora: How can I do an A/B test in Python?](https://www.quora.com/How-can-I-do-an-A-B-test-in-Python)
-   [The Satterthwaite Formula for Degrees of Freedom in the Two-Sample $t$-Test](https://secure-media.collegeboard.org/apc/ap05_stats_allwood_fin4prod.pdf)
-   [A/B Testing Statistics: An Intuitive Guide For Non-Mathematicians](https://conversionsciences.com/blog/ab-testing-statistics/)

Some worked examples:

-   [Stats notes at The Seashore](http://www.theseashore.org.uk/theseashore/Stats%20for%20twits/T%20Test.html)
-   [$\chi^2$-test for independence (biology example)](https://www.biologyforlife.com/x2-test-for-independence.html)
-   [$\chi^2$-test for A/B-tests at blitzresults.com](http://www.blitzresults.com/en/ab-tests)
-   [$\chi^2$-test for A/B-tests at elegantthemes.com](https://www.elegantthemes.com/blog/resources/how-to-determine-statistical-significance-when-ab-testing-with-divi-leads)
-   [2-sample $t$-test for A/B-tests in R](https://rexplorations.wordpress.com/2015/08/13/hypothesis-tests-2-sample-tests-ab-tests/)
-   [A/B Testing with Hierarchical Models in Python (Bayesian)](https://blog.dominodatalab.com/ab-testing-with-hierarchical-models-in-python/)

Toolkits:

-   [scipy.stats.ttest_ind](https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.stats.ttest_ind.html)

Stackexchange maybes:

-   [are-different-p-values-for-chi-squared-and-z-test-expected](https://stats.stackexchange.com/questions/141547/are-different-p-values-for-chi-squared-and-z-test-expected-for-testing-differenc)
-   [a-b-tests-z-test-vs-t-test-vs-chi-square-vs-fisher-exact-test](https://stats.stackexchange.com/questions/178854/a-b-tests-z-test-vs-t-test-vs-chi-square-vs-fisher-exact-test)



## Taken from Mark

In [5]:
#from statsmodels.stats.proportion import proportions_ztest
#
#count = np.array([4031, 1777])
#nobs = np.array([202672, 114128])
#proportions_ztest(count,nobs,value=None, alternative='two-sided')
## stat, pval =proportions_ztest(count, nobs, value)
## print('%.80f' % pval)

In [6]:
#import statsmodels.stats.api as sms
#es = sms.proportion_effectsize(0.019889, 0.015570)
#sms.NormalIndPower().solve_power(es, power=0.8, alpha=0.05, ratio=1)