# More on A/B Testing 

One popular use of data from A/B tests is web optimization - desgining user experiments to understand users engagements to new online services, products or features.

Another popular use of data from A/B tests is in advertising to test the effectiveness of advertising campaigns - designing experiments to test if certain actions can help/prevent the occurrence of an positive/adverse event.

There are many job types that use data from A/B tests, as these statistical tools allow businesses to make user experiences more successful as a way to increase revenue and reduce risk.



###  A Business Problem


A company decides to create a new version of their webpage in order to increase the number of people signing for their services through their website - simple alternatives such as: changing the look of a single button on a webpage or different layouts and headlines, etc. 

To determine which version of the webpage is more effective they use A/B testing. 

The design experiment: The company shows 1,000 people the old webpage (group A) and to another 1,000 people the new webpage (group B). It uses randomzed selection and assignment into the two groups. For example, when someone visits their website the site sends them to one of the two  (A/B) webpages, and which one they are sent to is chosen at random.

The observed difference between the two groups:


In [3]:
df

Unnamed: 0,group,total,converted
0,A,1000,490
1,B,1000,510


suggests that the new version (B) converts better than the current verion (A). 



To determine if the differences in signing up rates between A and B were statistically significant requires the use of statistical testing. I used simple controlled experiment. See code: https://github.com/kyramichel/Statistical_Testing/blob/master/A_B%20Testing.ipynb


## In other words

A/B testing  refers to the task of determining the best version A or B. That is, to decide if the new version of the webpage B increases signing up rates and should be launched. 


We have a sample of 2000 users, where 50% exposed to version A and 50% to version B. The spit was random, in this case 50/50.

The data suggests that the new version (B) converts better than the current verion (A). This is an empirical claim that we want to test. 


## Statistical testing

The inquiry is what is the probability that the observed difference in signup rate in this experiment is due to chance. This is null hypothesis **H0**


### The test statistics 

reduces to simple counting.

The number of ways to chose a sample of 1000 from a set of 2000 where oder doesn't matter and replacement aren't allowed is: comb(2000,1000)

The observed pattern in signup corresponds to comb(1000, 490) * comb(1000,510) out of comb(2000,1000):

In [12]:
from math import comb
d= comb(2000, 1000)
d1 =comb(1000, 490)
d2 = comb(1000,510)

In [15]:
import decimal
d = decimal.Decimal(d)
format(d, '.6e')

'2.048152e+600'

In [16]:
format(d1, '.6e')


'2.213346e+299'

In [17]:
format(d2, '.6e')

'2.213346e+299'



p = Pr(data | H0) = comb(1000, 490) * comb(1000,510) / comb(2000,1000):

In [20]:
p = d1*d2/d
format(p, '.6e')

'2.391864e-2'

## Summary

Since this probability p < 5% means we reject H0 and infer that the experiment results i.e., observed difference between the two groups is a statistically significant result and we could go ahead with the new design.

The signigficance level chosen is 5%. 


Note:
    
If we code "success " and "failures" in sign up as 0 and 1, the parameter of interest is a population proportion, even in fact it is also the population mean. 

If the simple random sampling is without replacement the covariance between the two samples is non-zero.

See Analysis of variance (ANOVA): https://github.com/kyramichel/Statistical_Testing/blob/master/A_B%20Testing.ipynb
