In [11]:
import numpy as np
from scipy import stats

# Initialize the bandits with alpha and beta values for Beta(1,1) prior
def initialize_bandits(num_bandits=3):
    alphas = np.ones(num_bandits)  # Alpha values
    betas = np.ones(num_bandits)   # Beta values
    return alphas, betas

# Update the alpha and beta values of the chosen bandit based on the outcome
def update_posterior(chosen_bandit, success, alphas, betas):
    if success:
        alphas[chosen_bandit] += 1
    else:
        betas[chosen_bandit] += 1
    return alphas, betas

# Run the simulation for a specified number of rounds
def run_simulation(num_rounds=100, true_success_rates=[0.3, 0.5, 0.7]):
    alphas, betas = initialize_bandits(len(true_success_rates))
    
    for _ in range(num_rounds):
        # Sample from the posterior of each bandit and pick the one with the highest sample
        samples = [stats.beta(a, b).rvs() for a, b in zip(alphas, betas)]
        chosen_bandit = np.argmax(samples)

        # Simulate the outcome based on the true success rate of the chosen bandit
        success = stats.binom(n=1, p=true_success_rates[chosen_bandit]).rvs(size=1) == 1
        alphas, betas = update_posterior(chosen_bandit, success, alphas, betas)

    return alphas, betas

# Running the modified simulation
alphas_modified, betas_modified = run_simulation()

# Display the final beliefs for each bandit and the PDF value at 0.5
for i, (a, b) in enumerate(zip(alphas_modified, betas_modified)):
    bandit_beta = stats.beta(a, b)
    print(f"Bandit {i+1}: Beta({a}, {b})")


Bandit 1: Beta(4.0, 7.0)
Bandit 2: Beta(8.0, 7.0)
Bandit 3: Beta(61.0, 19.0)


Bandits are initialized with uniform beliefs about success rates (Beta(1,1) distributions).Then, I set "The 'Red' one" to Bandit 1; "The 'Blue' one" to Bandit 2; and "The 'Other' one" to Bandit 3. In each round, a bandit is selected based on sampling from these Beta distributions, balancing between trying new options and choosing the best-known option. After choosing a bandit, its success or failure is simulated, and the corresponding Beta distribution is updated to reflect this new information. The simulation provides the final updated Beta distributions for each bandit, indicating our beliefs about their success rates after all trials. By analyzing the final posterior distributions, we can infer the Bandit 3 is most likely to be the best choice based on the accumulated data.