Bayesian A/B Testing Analysis
------

> Every time I make an A/B testing joke, I feel the urge to make a second joke with slightly different wording to see which is funnier.

By The End Of This Session You Should Be Able To:
----

- Explain the difference between Frequentist and Bayesian analysis for A/B testing
- Conduct Bayesian A/B testing

<center><img src="images/freq.png" width="500"/></center>

Great for crop yields, awful for A/B testing
-----

- Deduction from P(data | H_0), by setting α in advance 
- Reject H_0 if P(data | H_0) < α
- The sample size needs to be fixed in advance
- Point estimates and standard errors or 95% confidence intervals 


<center><img src="images/bay.png" width="500"/></center>

aka, The Better Way
-----

- Induction from P(θ | data), starting with P(θ)
- Update results as data is collected
- Estimate posterior distributions and credible intervals

### Bayes Theorem  

$$ P(\theta \ | \ x) = \frac{P(x \ | \ \theta) P(\theta)}{P(x)} $$

* $P(\theta)$: **prior**, initial belief  
* $P(x \ | \ \theta)$: **likelihood** of data  
* $P(\theta \ | \ x)$: **posterior**, updated belief
* $P(x)$: normalizing constant

$$ \text{posterior} \propto \text{prior} \times \text{likelihood} $$

Why Bayesian A/B Testing?
------

There are three major advantages the Bayesian A/B test has over the frequentist method:

1. Easier to interpret
2. Peeking is okay
3. Adjust testing in the middle

1. Easier to interpret
------

For example, you can easily compute the probability that version B is better than version A. In contrast, a frequentist test does not allow this even as a possibility.

2. Peeking is okay
------

The Bayesian test is measuring the probability at time t that B is better than A (or vice versa). 

You can look at the data, check if the test is finished, and stop the test early if the result is highly conclusive. 

This philosophical advantage is also an empirical, business advantage when dealing with the reality that timing a test is often at odds with the reality that in a standard test, your results may come after a point where you ABSOLUTELY NEED to make a decision.

3. Adjust testing in the middle
------

You can alter your test material in the middle of the test.  

If you believe your test is not going anywhere, you can use your current posteriors as new priors for what is essentially the start of a new test without any major interruptions in your development flow. 

Standard tests do __not__ allow for this because of the “no peeking” problem. This means a bayesian approach is extremely useful to anyone involved in either Agile Development or Agile Marketing, since the testing schedule can be made to work with your development lifecycle.  

Conjugate Prior  
-----

If you want to continuously updated your posterior want your prior to belong to same distribution family.

<center><img src="https://www.johndcook.com/conjugate_prior_diagram.png" width="500"/></center>

How are Coins Flips related to CTR?
-----

Conjugate Prior  for CTR
-----

A Beta prior has the following PDF:  

$$ P(p) = \frac{p^{\alpha - 1} (1 - p)^{\beta - 1}}{B(\alpha, \beta)} $$  

A Binomial random variable has the following PDF:  

$$ P(x \ | \ p) = \binom{n}{x} p^x (1 - p)^{n - x} $$  



If we multiply them, we get a Beta distribution again:  

$$ 
\begin{align*} 
\text{posterior} &\propto \text{prior } \times \text{likelihood}   \\ 
&=  \frac{p^{\alpha - 1} (1 - p)^{\beta - 1}}{B(\alpha, \beta)} \times \binom{n}{x} p^x (1 - p)^{n - x} \\
&\propto p^{\alpha - 1} (1 - p)^{\beta - 1} \times p^x (1 - p)^{n - x} \\
&\propto p^{\alpha + x - 1} (1 - p)^{\beta + n - x - 1}
\end{align*} $$

$\Rightarrow \text{posterior } \sim Beta(\alpha + x, \beta + n - x) $  

Credible Interval  
------
<br>
<center><img src="http://jov.arvojournals.org/data/Journals/JOV/935592/m_i1534-7362-16-10-25-f05.png" width="500"/></center>

The interval estimate we obtain based on the posterior distribution is called the **credible interval**

Unlike the frequentist confidence interval, we can interpret the credible interval in terms of probability

<br>

Summary
----

- 

----
Bonus Material
-----

<center><img src="images/stat sig.jpg" width="300"/></center>

The Boss's Complaint
----

<center><img src="http://vignette2.wikia.nocookie.net/dilbert/images/9/95/Dilbert_PHB.jpg/revision/latest?cb=20090512154211" height="500"/></center>

You've been running A/B tests, evaluating them, and making useful business decisions for months. You're confident that you're actually making correct decisions. Life seems good. But then in a meeting…

Your boss asks, *"So when can we wrap up that button test?"*

*"It doesn't look like it will stop any time soon,"* you reply.

*"I know,"* says your boss, *"we've been running it for three months, it has over 10k conversions. We have other ideas to try, we need to kill that test."*

*"But,"* you say, *"it is nowhere near significance. Stopping it is completely wrong under the procedure we're using, we'll have no idea whether we're making the right decision!"*

Your boss snaps, "*Doing 'the right decision' could easily take a million more conversions! If we knew how to get to a million conversions, we'd all be rich and wouldn't care about the color of the button! We'll declare the red button the winner, and you need to figure out a better way to stop the next test."*

Clearly your boss has a point. Businesses need to be able to make decisions, and have a limited ability to collect data. But how can we say that in math? And what are the consequences if we do?

Why are hypothesis tests under fire?  

There are many reasons, and a crucial one is that hypothesis tests and p-values are hard to understand and hard to explain.  There are arbitrary thresholds (0.05?) and the results are binary - you can either reject the null hypothesis or fail to reject the null hypothesis.  And is that what you really care about? 

Which of these two statements is more appealing:

1. "We rejected the null hypothesis that A = B with a p-value of 0.043."

2. "There is an 85% chance that A has a 5% lift over B."

Bayesian modeling can answer questions like #2 directly.

> Every time I make an A/B testing joke, I feel the urge to make a second joke with slightly different wording to see which is funnier.

**Exercise**:  

* Find $P(p_A > p_B)$ and $P(p_A > p_B + 0.1)$
* Give a 95% credible interval for the difference between $p_A$ and $p_B$

Check for understanding
------

If I say µ is random variable, I am more Frequentist or Bayesian?

- In frequentist statistics:
    * µ is fixed and unknown
    * x̄  is a random variable

- In Bayesian statistics:
    * µ is a random variable
    * x̄  is observed and fixed