# sim-1

First simulation. Something basic to get started.

This is about programmatic display ads.
The main focus is on variable ad impact. The impact of each ad will vary.
If there is enough impact, there is a chance of a purchase.
We won't start with variations in activity by hour of day or by user,
and assume that each user has an equal chance of receiving each ad.

## Notes about Programmatic Display simulations

- There are about 200M adults in the US between 18 and 65.
We assume that all ads are going to this group and that almost all
of this group is active online.

- We'll use a sample of the population for simulations. For example,
a 1% sample would be about 2M people which would be similar to a 
large market area.

- We'll use an "addressable Market" % in some simulations to 
reduce the sample size further. We'll assume that ads get proportionally 
split between the addressable market and the rest of the population but 
purchases will only be made by users in the addressable market.

- Ad budgets with sonstant CPM and limited targeting are often between
\\$1 and \\$5 CPM with total budgets from \\$10K to \\$1M or more per month. 
This roughly translates to between 0.01 and 2.0 ad per user in a month,
or 0.0025 to 0.5 ad per user per week.
If simulations have a fixed budget and CPM, we can specify the number of 
ad impressions as a proportion of the total number of users.



In [1]:
# Constants for detrmine the number of users

US_POPULATION = 200_000_000

MARKET_PROPORTION = 0.01
MARKET_SIZE = round(US_POPULATION * MARKET_PROPORTION)

ADDRESSABLE_PROPORTION = 0.1
ADDRESSABLE_SIZE = round(MARKET_SIZE * ADDRESSABLE_PROPORTION)

SIMULATION_USER_COUNT = ADDRESSABLE_SIZE

print(SIMULATION_USER_COUNT)

200000


### Ad impact

In this simulation, we'll have the "impact" of each ad vary randomly 
with a mean impact of 0.2 and all impact values between 0 and 1. 
Ad impact will decay over time. When the total impact for a user reaches 
a level specific to the user, we'll say that the user makes a purchase.


In [2]:
from scipy.stats import beta
import numpy as np


In [3]:
HOURS_IN_DAY = 24
DAYS_IN_WEEK = 24
HOURS_IN_WEEK = DAYS_IN_WEEK * HOURS_IN_DAY

DAILY_IMPACT_DECAY = 0.9
HOURLY_IMPACT_DECAY = DAILY_IMPACT_DECAY ** (1.0 / HOURS_IN_DAY)

def week_hour_diff(first_week, first_hour, second_week, second_hour):
    """
    Find the number of hours between two week-hour tuples.
    Assume that the first week-hour is not after the second week-hour.
    """
    return (second_week - first_week) * HOURS_IN_WEEK \
        + (second_hour - first_hour)

class AdViewer:
    """
    Object representing a person who views ads.
    """
    
    def __init__(self, purchase_impact_thrshold):
        self.purchase_impact_thrshold = purchase_impact_thrshold
        self.last_ad_week_hour = None
        self.ad_view_count = 0
        self.ad_impact = 0
        self.last_purchase_week_hour = None
        self.purchase_count = 0
        
    def view_ad(self, week, hour, impact):
        if self.last_ad_week_hour is None:
            self.ad_impact = impact
        else:
            last_week, last_hour = self.last_ad_week_hour 
            hours_since_last = week_hour_diff(last_week, last_hour, 
                                              week, hour)
            self.ad_impact = impact \
                + self.ad_impact * HOURLY_IMPACT_DECAY ** hours_since_last

        self.last_ad_week_hour = week, hour
        self.ad_view_count += 1

        if (self.ad_impact >= self.purchase_impact_thrshold
                and self.purchase_count == 0):
            self.last_purchase_week_hour = week, hour
            self.purchase_count += 1

In [4]:
# We'll create beta distributions for the simulation using the mean
# and variance ratio. We need this function to convert to the parameters
# of the beta distribution.

def beta_params_from_mean_and_variance_ratio(mean, variance_ratio):
    """
    mean and variance_ratio between 0 and 1.
    variance = variance_ratio * mean * (1 - mean)
    
    Returns the alpha and beta paramters of the specified beta distribution.
    """
    if variance_ratio in (0, 1) or mean in (0, 1):
        return 1, 1  # Should we throw an exception if mean is 0 or 1?
    nu = (1.0 - variance_ratio) / variance_ratio
    a = nu * mean
    b = nu * (1.0 - mean)
    return a, b


In [5]:
# settings for the distributions used in the simulation
ad_impact_mean, ad_impact_var_ratio = 0.1, 0.05
ad_impact_threshold_mean, ad_impact_threshold_var_ratio = 0.8, 0.1


In [6]:
# Helper functions to get samples from the distributions.
# (We can easily change the distributions by changing these functions.)

def get_ad_impact_thresholds(user_count):
    """
    Get a list of ad impacts sampled from the appropriate
    distribution for use in a simulation.
    """
    a, b = beta_params_from_mean_and_variance_ratio(
        ad_impact_threshold_mean, ad_impact_threshold_var_ratio
    )
    return beta.rvs(a, b, size=user_count)

def get_ad_impacts(ad_count):
    """
    Get a list of ad impacts sampled from the appropriate
    distribution for use in a simulation.
    """
    a, b = beta_params_from_mean_and_variance_ratio(
        ad_impact_mean, ad_impact_var_ratio
    )
    return beta.rvs(a, b, size=ad_count)

def get_ad_viewers(user_count):
    """
    Get a list of AdViewer objects for use in a simulation.
    """
    ad_impact_thresholds = get_ad_impact_thresholds(user_count)
    return [AdViewer(t) for t in ad_impact_thresholds]


In [7]:
# Find the weekly ad counts using the ratio of ad coutn to user count.

WEEKLY_AD_TO_USER_RATIO = 0.05
WEEKLY_AD_COUNT = round(SIMULATION_USER_COUNT * WEEKLY_AD_TO_USER_RATIO)


In [8]:
# try a few weeks

SIM_WEEK_COUNT = 30

ad_viewers = get_ad_viewers(SIMULATION_USER_COUNT)

# These won't change from week to week
weekly_ad_hours = [
    round(i * HOURS_IN_WEEK / WEEKLY_AD_COUNT)
    for i in range(WEEKLY_AD_COUNT)
]

for sim_week in range(1, SIM_WEEK_COUNT + 1):
    weekly_ad_viewer_indices = np.random.randint(
        low=0, high=SIMULATION_USER_COUNT, size=WEEKLY_AD_COUNT
    )
    weekly_ad_impacts = get_ad_impacts(WEEKLY_AD_COUNT)

    for viewer_index, ad_impact, ad_hour in zip(
        weekly_ad_viewer_indices, weekly_ad_impacts, weekly_ad_hours):
        ad_viewers[viewer_index].view_ad(sim_week, ad_hour, ad_impact)

    purchaser_count = len([v for v in ad_viewers if v.purchase_count > 0])

    print(sim_week, purchaser_count, max([v.ad_view_count for v in ad_viewers]),
         max([v.ad_impact for v in ad_viewers]))

1 0 4 0.4602065805213626
2 1 4 0.486582859382134
3 1 4 0.5049113011817962
4 1 4 0.5049113011817962
5 1 4 0.5376460976467836
6 2 4 0.5376460976467836
7 2 5 0.5376460976467836
8 2 6 0.5376460976467836
9 3 6 0.5376460976467836
10 4 6 0.6214294782195965
11 4 6 0.6214294782195965
12 5 6 0.6214294782195965
13 6 7 0.6214294782195965
14 6 7 0.6214294782195965
15 6 7 0.5376460976467836
16 7 7 0.5376460976467836
17 8 7 0.5376460976467836
18 8 7 0.5376460976467836
19 9 7 0.5376460976467836
20 9 8 0.5551984008010239
21 10 8 0.5551984008010239
22 10 8 0.5551984008010239
23 10 9 0.5551984008010239
24 10 9 0.5664760560851883
25 10 9 0.5733882906142266
26 10 9 0.5733882906142266
27 10 9 0.5733882906142266
28 12 9 0.5733882906142266
29 14 9 0.5733882906142266
30 14 10 0.656420680523817
