In [1]:
import numpy as np
import plotly.graph_objects as go

In [2]:
class MultiArmedBandit:
    def __init__(self, n_ads, epsilon):
        self.n_ads = n_ads  # Number of ad slots
        self.epsilon = epsilon  # Exploration rate
        self.ads_count = np.zeros(n_ads)  # Counts of each ad selected
        self.ads_rewards = np.zeros(n_ads)  # Total rewards (clicks) for each ad

    def select_ad(self):
        if np.random.rand() < self.epsilon:
            # Explore: Select a random ad
            return np.random.randint(self.n_ads)
        else:
            # Exploit: Select the ad with the highest average reward
            return np.argmax(self.ads_rewards / (self.ads_count + 1e-5))  # Add small value to avoid division by zero

    def update_ad(self, ad_selected, reward):
        self.ads_count[ad_selected] += 1  # Increment the count for the selected ad
        self.ads_rewards[ad_selected] += reward  # Increment the total rewards for the selected ad

    def simulate(self, n_impressions):
        results = np.zeros((self.n_ads, n_impressions))  # Array to track average CTRs
        actual_ctrs = np.array([0.1, 0.12, 0.08, 0.15])  # Example CTRs for each ad slot

        for i in range(n_impressions):
            ad_selected = self.select_ad()  # Select an ad following Epsilon-Greedy strategy
            reward = 1 if np.random.rand() < actual_ctrs[ad_selected] else 0  # Simulate a click
            self.update_ad(ad_selected, reward)  # Update the ad's statistics
            results[:, i] = self.ads_rewards / (self.ads_count + 1e-5)  # Update averaged results

        return results

Class Initialization:

MultiArmedBandit class initializes the number of ad slots and the exploration rate.
It tracks the count of selections for each ad slot and the cumulative rewards (clicks) from each.
Ad Selection (select_ad method):

The method determines whether to explore (select a random ad) or exploit (select the ad with the highest estimated CTR).
A small constant is added in the division to avoid division by zero errors.
Updating Ad Performance (update_ad method):

Updates the count and rewards for the selected ad based on user clicks.
Simulation (simulate method):

It simulates the process of ad impressions over a specified number of trials (1,000 in this case).
Actual click-through rates are predefined for each ad slot to simulate real-world performance.

Epsilon-Greedy Exploration and Exploitation Strategy
The core mechanism of the algorithm is the Epsilon-Greedy strategy, which strikes a balance between exploration and exploitation:

Exploration: The algorithm selects a random ad slot with a probability of ( \epsilon ) (0.1). This ensures that 10% of the time, the algorithm explores other ad slots regardless of their current performance. This exploration allows the model to adapt if a previously underperforming ad begins to show better performance.

Exploitation: When the algorithm chooses to exploit, it selects the ad with the highest average reward based on past performance. However, the addition of the exploration component ensures it doesnâ€™t solely rely on current trends.

In [3]:
# Parameters
n_ads = 4  # Number of ad slots (top banner, sidebar, footer, pop-up)
epsilon = 0.1  # Exploration rate: 10% of the time we explore
n_impressions = 1000  # Total ad impressions

In [5]:
# Initialize the Multi-Armed Bandit
bandit = MultiArmedBandit(n_ads, epsilon)

# Run the simulation
results = bandit.simulate(n_impressions)

# Visualize the results using Plotly
fig = go.Figure()

# Define names for each ad slot
ad_slot_names = ["Top Banner", "Sidebar", "Footer", "Pop-Up"]

# Add traces for each ad slot with their names
for i in range(n_ads):
    fig.add_trace(go.Scatter(
        x=np.arange(n_impressions),
        y=results[i],
        mode='lines',
        name=ad_slot_names[i],  # Use the ad slot name
        line=dict(width=2)
    ))

# Update layout of the figure
fig.update_layout(
    title='Ad Slot Performance Over Time',
    xaxis_title='Impressions',
    yaxis_title='Average CTR',
    legend_title='Ad Slots',
    hovermode="x unified",
    template='plotly_dark'
)

# Show the figure
fig.show()

Ad Slot 1 exhibits a high CTR at the beginning, reaching nearly 1.0, before stabilizing. This suggests that it received a large number of impressions early on or had better initial performance.

 The ad positions stabilize at different CTR levels:

    Pop-up: Stabilizes at the highest CTR, indicating it is the most effective position overall.
    Footer: Shows the second-best performance.
    Top Banner and Sidebar: Both perform similarly and exhibit lower CTRs compared to Pop-up and Footer.

The algorithm used in the simulation (epsilon-greedy) successfully balances exploration (trying all ad slots) and exploitation (focusing on better-performing slots). Over time, the algorithm correctly identifies and prioritizes the best-performing ad slots.

This analysis helps optimize ad placement by identifying which ad slots perform best. By focusing more on the higher-performing slots, it is possible to increase overall CTR and improve ad revenue.