# Bayesian Bandits - Interactive Demo

**DS-122: Bayesian Testing & Decision Making**

This notebook contains the interactive multi-armed bandit demo for live classroom interaction.

## Setup

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import randint

## Define Helper Functions

In [None]:
def update(distribution, likelihood):
    '''Standard Bayesian update function'''
    distribution['probs'] = distribution['probs'] * likelihood
    prob_data = distribution['probs'].sum()
    distribution['probs'] = distribution['probs'] / prob_data
    return distribution

def update_bandit(belief, outcome):
    '''Update belief about a machine based on win/loss outcome'''
    if outcome == 'W':
        update(belief, belief.index)
    elif outcome == 'L':
        update(belief, 1 - belief.index)

## Initialize the Bandits

We have 4 slot machines, each with an unknown probability of winning.

In [None]:
# Create uniform prior
p_prior = pd.DataFrame(index=np.arange(101)/100)
p_prior['probs'] = 1/101

# Initialize beliefs for 4 machines
beliefs = [p_prior.copy() for i in range(4)]
play_history = []

# True probabilities (hidden from us!)
actual_probs = [0.10, 0.20, 0.30, 0.40]

print("âœ… 4 machines initialized with uniform priors")
print("ðŸŽ° Ready to play!")

## Visualization Functions

In [None]:
def show_beliefs():
    '''Display current beliefs for all 4 machines'''
    fig, axes = plt.subplots(1, 4, figsize=(14, 3))
    for i, pmf in enumerate(beliefs):
        pmf.plot(ax=axes[i], lw=3, legend=False, color='blue')
        mean_p = np.sum(pmf.index * pmf['probs'])
        axes[i].set_title(f'Machine {i} (mean={mean_p:.2f})', size=12)
        axes[i].set_xlabel('Win Probability (p)', size=10)
        axes[i].set_ylabel('Probability', size=10)
    plt.tight_layout()
    plt.show()

def play_machine(machine_num):
    '''Play a specific machine and update beliefs'''
    # Play the machine
    p = actual_probs[machine_num]
    outcome = 'W' if np.random.random() < p else 'L'
    
    # Update belief
    update_bandit(beliefs[machine_num], outcome)
    
    # Record history
    play_history.append((machine_num, outcome))
    
    # Show result
    print(f"\n{'='*60}")
    print(f"ðŸŽ° Played Machine {machine_num}: {outcome}!")
    wins = len([h for h in play_history if h[0]==machine_num and h[1]=='W'])
    plays = len([h for h in play_history if h[0]==machine_num])
    print(f"   History: {plays} plays, {wins} wins ({wins/plays*100:.1f}%)")
    print(f"{'='*60}\n")
    
    # Show updated beliefs
    show_beliefs()

def show_summary():
    '''Show summary of all plays'''
    print("\n" + "="*60)
    print("SUMMARY OF ALL PLAYS")
    print("="*60)
    for i in range(4):
        plays = len([h for h in play_history if h[0]==i])
        wins = len([h for h in play_history if h[0]==i and h[1]=='W'])
        win_rate = wins/plays*100 if plays > 0 else 0
        print(f"Machine {i}: {plays:2d} plays, {wins:2d} wins ({win_rate:5.1f}%) | True p = {actual_probs[i]:.2f}")
    print("="*60 + "\n")

## Current State - Before We Start

All machines start with uniform beliefs (we know nothing about them).

In [None]:
show_beliefs()

---

## ðŸŽ® LET'S PLAY!

### Instructions:
1. Students call out a machine number (0, 1, 2, or 3)
2. Run: `play_machine(X)` where X is the chosen machine
3. Watch the beliefs update!
4. Repeat 5-10 times

**Try to maximize your winnings!**

---

In [None]:
# Play a machine - change the number based on student suggestions!
play_machine(0)  # <-- Change this number (0-3)

In [None]:
# Play another machine
play_machine(1)  # <-- Change this number (0-3)

In [None]:
# Keep playing...
play_machine(2)  # <-- Change this number (0-3)

In [None]:
# And more...
play_machine(3)  # <-- Change this number (0-3)

In [None]:
# Continue as needed
play_machine(0)  # <-- Change this number (0-3)

## Summary of Our Strategy

In [None]:
show_summary()

---

## Discussion Questions

1. **What strategy did we use?**
   - Did we explore all machines?
   - Did we favor machines that seemed better?
   - Did we ever check "worse" machines again?

2. **Could we do better?**
   - Is there an optimal strategy?
   - How do we balance exploration vs exploitation?

3. **Thompson Sampling** (return to slides)
   - Automatically balances exploration and exploitation
   - Uses posterior distributions to guide decisions

---

## Reset (if needed)

Run this cell to reset and start over:

In [None]:
# Reset everything
beliefs = [p_prior.copy() for i in range(4)]
play_history = []
print("âœ… Reset complete! Ready to play again.")
show_beliefs()