Bayesian A/B Testing (Conjugate Priors)

* Objectives:
    * Frequentist vs Bayesian
    * Bayes' Theorem
    * Likelihood
    * Posterior
    * Conjugate Prior

1) Frequentist vs Bayesian
* Frequentist (Hypothesis Testing) Steps:
    1. Define a Metric - declare null and alternative hypothesis
    2. Set Parameters - significance, number of observations, etc.
    3. Run Experiment - follow the experiment rigidly
    4. Compute Test Statistic - check that it is the appropriate one 
    5. Calculate p-value
    6. Draw Conclusions - reject $H_0$ in favor of $H_A$, or fail to reject $H_0$
* Frequentist A/B Testing **Limitations**:
    * (-) If one the pages your testing appears to be obviously better, can you scrap the experiment and declare it the winner?
    * (-) At the end of an experiment, can you quantify how much better the winning page is than the loosing pages?
        * Example: "It's 95% likely that page A is better than page B"
    * (-) If, after you've begin your test, your boss comes to you with another version of the page and asks you to test it too, can you update the test?
* Bayesian Steps:
    1. Define a Metric - to quantify what "better" means
    2. Run Test - collect data
    3. Continually Monitor Results - can decide to stop at any time
    4. Suggest Next Step - based on probabilities calculated

2) Bayesian A/B Testing
* **Bayes' Theorem** - captures the essence of Bayesian modeling where the uncertainty about some unknown parameter can be quantified
    * Bayes' Equation: $$P(\theta\mid y)=\frac{P(y\mid\theta)P(\theta)}{P(y)}$$
        * $P(y\mid\theta)$ - Likelihood
        * $P(\theta)$ - Prior
        * $P(y)$ - Normalizing Constant
        * $P(y)=\int P(y,\theta)d\theta$
    * Bayes' Theorem Distilled: $$Posterior \propto Likelihood \times Prior$$
        * Try associating "likelihood" and "posterior" for click-through rate example
    * Application of Bayes' Theorem to A/B Testing:
        * Trying to determine which of options A & B is "better"
        * Example: trying to determine which page layout has a higher click-through rate
* **Likelihood** - distribution for the data (as a function of the parameter)
    * Click-through Rate Example: What are the possible outcomes for each observation in our data for a single page?
        * Answer: A click-through occurs or not (binary result)
        * How are our data distributed? (a.k.a. what is the **likelihood**?) 
            * Binomial distribution ($k$ successes in $n$ trials): $$P(X=k)=\big(_k^n\big)p^k(1-p)^{n-k}$$
* **Posterior** - updated knowledge about $\theta$ after seeing the data
    * Click-through Rate Example: With respect to the data from a single page, what are the possible values that a click-through rate can take on?
        * Answer: Any real value in the open interval [0,1]
        * What distribution should we use to model this click-through rate parameter? (a.k.a. what is the **posterior**?)
            * Beta distribution (support over the continuous interval [0,1]): $$X\text{ ~ }Beta(\alpha,\beta) = \frac{1}{Beta(\alpha,\beta)}x^{\alpha-1}(1-x)^{\beta-1}$$
            ![beta_dist](beta_dist.png)
            * $E[X]=\frac{\alpha}{\alpha+\beta}$
            * $\text{Mode}=\frac{\alpha-1}{\alpha+\beta-2}$
* Combining Likelihood & Posterior Distributions Together:
    * The two distributions from above:
        * Likelihood: $\big(_k^n\big)p^k(1-p)^{n-k}$
        * Posterior: $\frac{1}{Beta(\alpha,\beta)}x^{\alpha-1}(1-x)^{\beta-1}$
    * The two distributions have the **same** form!
* **Prior** - a distribution that encodes our beliefs about the possible values that the parameter in question, $\theta$, can take on before we collect any data
    * Click-through Rate Example: How do we choose our prior, since technically, the only requirement is that it is a distribution, a.k.a. $\int P(\theta)=1$?
    * More specific to our current situation though, we might ask the question, what distribution would we like to use to encode our prior beliefs about $\theta$?
    * Functional form of likelihood and posterior known: $$Posterior \propto Likelihood \times Prior$$ $$Beta \propto Binomial \times Prior$$ $$p^{\alpha-1}(1-p)^{\beta-1} \propto p^k(1-p)^{n-k} \times Prior$$
    * In this case, the obvious distribution we should choose to encode our prior beliefs about $P(\theta)$ is the $Beta$ distribution: $$Posterior \propto Likelihood \times Prior$$ $$Posterior \propto Binomial \times Beta$$ $$Posterior \propto p^k(1-p)^{n-k} \times p^{\alpha-1}(1-p)^{\beta-1}$$ $$Posterior \propto p^{k+\alpha-1}(1-p)^{n-k+\beta-1}$$
        * **Beta Conjugate Prior** - $p^{k+\alpha-1}(1-p)^{n-k+\beta-1}$
        * Parameters of this distribution: $$\alpha_1=k+\alpha$$ $$\beta_1=n-k+\beta$$ $$Posterior \text{ ~ } Beta(k+\alpha,n-k+\beta)$$
        * Now, take the number of conversions, $k$, and the total number of views, $n$, for each of our pages and plug them into our posterior distribution
    * The values $\alpha$ and $\beta$ are the parameters for the $Beta$ distribution that are cleverly chosen to describe prior beliefs
        * $\alpha$ and $\beta$ values encode our prior beliefs about the values that $\theta$ could possibly take on
        * From the Beta distribution, there are several ways to choose $\alpha$ and $\beta$:
            * Choose an uninformative prior: $\alpha=1$ and $\beta=1$
            * Act like you have observed some data before that represents data your experiment has to overcome
* Simulating Bayesian A/B Testing:
    ```python
    from numpy.random import beta

    num_samples = 10000
    alpha = beta = 1
    site_a_simulation = beta(num_conv_a + alpha, num_views_a - num_conv_a + beta, size=num_samples)
    site_b_simulation = beta(num_conv_b + alpha, num_views_b - num_conv_b + beta, size=num_samples)
    ```
    * What's the probability that site A has a higher conversion rate than site B?
    ```python
    np.mean(site_a_simulation > site_b_simulation)
    ```
    * What's the probability that site A has a 5% higher conversion rate than site B?
    ```python
    np.mean(site_a_simulation > (site_b_simulation + 0.05))
    ```
    * Visualizing the simulation:
    ![site_a_vs_b](site_a_vs_b.png)