# Upper Confidence Bound (UCB)
We will be using UCB to solve the Multi-Armed Bandit Problem.

The confidence bound bounds the current and expected return values.
- For each iteration (round), the UCB algorithm picks the machine with the greatest upper confidence bound

### How Upper Confidence Bound Changes
If the player wins a round, then the confidence bound and current return value of the machine shifts up.  
If the player loses a round, then the confidence bound and current return value of the machine shifts down.

As each round is played, the confidence bound of the machine played shrinks because the program is more confident that the current return value is close to the expected return value.

After enough iterations, the algorithm converges to the most optimal machine to play because that machine's upper confidence bound will always be the greatest out of the other confidence bounds.

# UCB Algorithm Visualization
Here are the probability density function (pdf) graphs for each machine.

<img src="images/ucb/distributions_graphs.png" height="75%" width="75%"></img>
- D5 has the best distribution since the curve is to the most-right (high return values)

Let's plot each distribution's expected return (denoted by the dotted white-line) onto a diagram.
- The program will NOT know these expected return values, but for visualization-sake we'll show them on the diagram

<img src="images/ucb/distribution_diagram.png" height="75%" width="75%"></img>

Initialize the confidence bounds and current return value for each machine.

<img src="images/ucb/initialize_distributions.png" height="75%" width="75%"></img>

### First Round
The program will pick any machine because all of the upper confidence bounds are equal to each other.

The program picks the D3 machine, and we lose this round.

<img src="images/ucb/round_1.png" height="75%" width="75%"></img>
- The confidence bound and the current return value shifts down because we lost this round
- The confidence bound shrinks a little because we're more confident in our current return value converging to the actual return value

### Second Round
The program will pick at random D1, D2, D4, or D5 because they have the greatest upper confidence bounds.

The program picks the D4 machine, and we win this round.

<img src="images/ucb/round_2.png" height="75%" width="75%"></img>
- The confidence bound and the current return value shifts up because we won this round
- The confidence bound shrinks a little because we're more confident in our current return value converging to the actual return value

### Third Round
The program will pick at random D1, D2, or D5 because they have the greatest upper confidence bounds.

The program picks the D1 machine, and we win this round.

<img src="images/ucb/round_3.png" height="75%" width="75%"></img>
- The confidence bound and the current return value shifts up because we won this round
- The confidence bound shrinks a little because we're more confident in our current return value converging to the actual return value

### After a Few Rounds
After a few rounds, the program will sooner or later always choose to D5 because it will converge to almost always have the greatest upper confidence bound. And we can also confirm this because the expected return value is the largest.

<img src="images/ucb/few_rounds.png" height="75%" width="75%"></img>

In [1]:
# import the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

In [3]:
# import the data set
ads_opt_df = pd.read_csv("datasets/ads_ctr_optimization.csv")

"""
Each row represents a user, and if they clicked the ad (1) or did not click the ad (0).

In the real world, we wouldn't have this data set and instead the UCB algorithm would
explore and exploit simutaneously to determine the optimal ad while users are clicking.
Therefore, this data set is created as the algorithm runs.

For testing purposes, we're going to simulate the "real world" in the UCB algorithm.
"""
ads_opt_df.head()

Unnamed: 0,Ad 1,Ad 2,Ad 3,Ad 4,Ad 5,Ad 6,Ad 7,Ad 8,Ad 9,Ad 10
0,1,0,0,0,1,0,0,0,1,0
1,0,0,0,0,0,0,0,0,1,0
2,0,0,0,0,0,0,0,0,0,0
3,0,1,0,0,0,0,0,1,0,0
4,0,0,0,0,0,0,0,0,0,0
