# Homework #1: AB-testing<br>and the Multi-Armed Bayesian Bandit

### You have three choices... choose wisely, my friends...


|Option|The "Red" one|The "Blue" one|The "Other" one|
|-|-|-|-|
|Unknown Probability of Success|$\theta_A$ | $\theta_B$ | $\theta_C$ |

$$p(\theta_j|x_j,n_j) \propto \theta^{x_j+\alpha_j-1}(1-\theta_j)^{n-x+\beta_j-1}  \Rightarrow \; \text{What distribution?}$$

- Try one out, and collect that data update...
    - What's the data?
    - What's the update for the posterior in question?
- Which one of the three choices will you try out? How will you choose? 


- Hints: <u>You can use *simulation* to find out the *relative belief* (i.e., probability) that each of the choices is the best.</u> Posterior distributions characterize your beliefs about the parameters $\theta_A, \theta_B$ and $\theta_C$. What can you learn by repeatedly sampling values from the posterior distribution while comparing the values of each triplet? If you know the chances that A, B, and C are the best choice, how could you balance ***exploration versus exploitation*** when choosing which of the possible options to collect the next data point on next?



We begin by defining the parameters $\theta_A = 0.5, \theta_B = 0.4, \theta_C = 0.1$.
As defined in the problem, we begin with an understanding that the probability of success for the "Red", "Blue" and "Other" are given by the Beta distribution. 
To select the game with the highest probability of success, I begin by selecting one of the options assuming the highest probability of success is the same initially across all games, I would then play that game 10 times and then compute the posterior distribution for all games with this updated data. Afterwards, the probability of each game has changed, and I would choose the game with the highest probability and repeat this process 100 times until the best game is found. 


The data we collect will be after playing the game selected, and will be the number of success and losses. Based on this data, we update the posterior for each game, in particular, adding 1 to the alpha prior if the game is a success and 1 to the beta prior if the game is a loss. Initially, all three choices can be chosen randomly since their probabilities of success follow a beta distribution. 

By repeatedly sampling values from the posterior distribution while comparing the values of each triplet, this updating process helps in making informed decisions based on the latest available information. Explorations versus exploitation is balanced when choosing the possible options to collect the next data point as there is a tradeoff between exploring games to learn more about their success and exploiting the information we already have to maxmize the expected success of a game. If we know a particular game has a higher probability of success, we "exploit" that knowledge by using it as the next game in the simulation. 

In [15]:
from scipy import stats
import random
import numpy as np
import matplotlib.pyplot as plt

def update_beta_prior(alpha_prior, beta_prior, successes, trials):
    alpha_posterior = alpha_prior + successes
    beta_posterior = beta_prior + trials - successes
    return (alpha_posterior, beta_posterior)

# Initial parameters for each option
options = ["Red", "Blue", "Other"]
alpha_priors = {"Red": 1, "Blue": 1, "Other": 1}
beta_priors = {"Red": 1, "Blue": 1, "Other": 1}
posteriors = {"Red": [], "Blue": [] , "Other": []}

initial_parameters = {"Red": 0.5, "Blue": 0.4, "Other": 0.1}
current_probabilities = initial_parameters.copy()

# Arbitrarily choosing number of iterations of updating 
num_iterations = 100

for i in range(num_iterations):
    # Initially, all games have same probability so the first game is chosen
    # After initial run, the game with the largest probability is chosen
    chosen_option = max(current_probabilities, key=current_probabilities.get)
    
    # Simulate playing the chosen option
    success = stats.binom(n=10, p=current_probabilities[chosen_option]).rvs(size=1)[0] 
        
    # Update posteriors
    alpha_priors[chosen_option], beta_priors[chosen_option] = update_beta_prior(
        alpha_priors[chosen_option], beta_priors[chosen_option], success, 10
    )
    
    # Update current probabilities of each game 
    for option in options:
        current_probabilities[option] = stats.binom.pmf(success, 10, initial_parameters[option])

    # Store posteriors for analysis
    for option in options:
        posteriors[option].append((alpha_priors[option], beta_priors[option]))
    
print("Final Probabilities:")
for option in options:
    final_posterior = stats.beta(alpha_priors[option], beta_priors[option])
    probability = final_posterior.mean()
    print(f"{option}: Probability={probability}")


Final Probabilities:
Red: Probability=0.22321428571428573
Blue: Probability=0.26327433628318586
Other: Probability=0.2895927601809955
