UPPER CONFIDENCE BOUND

BENEFITS

- Try each machine once in the beginning
- After that, pick the machine that has the best average score
- If two machines seem close ,give more chances to the one that hasn't been tested with

PROBLEM
- If one machine was lucky in the first few rounds, UCB might keep picking it , even if another machine is actually better

In [33]:
import math
import random

In [34]:
n_machines = 3
n_trials = 10

- count[i] stores how many times machine i has been played
- rewards[i] stores how many total rewards were received from machine i
- initially, all values are 0 because we have not played yet

In [35]:
count = [0] * n_machines
rewards = [0] * n_machines

In [36]:
true_rewards = [0.2, 0.5, 0.8]  # True probabilities of success for each machine

In [37]:
# this loop runs 10 times(since n_trials = 10)
#each time, agent chooses a slot machine to play based on UCB algorithm
for t in range(1,n_trials + 1):
    # If any machine has not played yet(0 in count), we choose it first
    #this ensures each machine is played at least once before applying UCB formula, thi is called exploration
    if 0 in count:
        # If any machine has not been tried, try it
        arm = count.index(0)
    else:        
        # Calculate UCB for each machine
        #average reward + exploration bonus
        ucb_values = [
            rewards[i] / count[i] + math.sqrt(2 * math.log(t) / count[i])
            for i in range(n_machines)
        ]
        arm = ucb_values.index(max(ucb_values)) #pick the machine with the highest UCB value

    reward = 1 if random.random() < true_rewards[arm] else 0 
    count[arm] += 1
    rewards[arm] += reward

In [38]:
#show results
for i in range(n_machines):
    print(f"Machine {i + 1}: Played {count[i]} ;Estimated win rate: {rewards[i] / count[i]:.2f}")

print("Best Machine to play:", rewards.index(max(rewards)) + 1)


Machine 1: Played 4 ;Estimated win rate: 0.25
Machine 2: Played 2 ;Estimated win rate: 0.00
Machine 3: Played 4 ;Estimated win rate: 0.50
Best Machine to play: 3
