# sim-2

Second simulation. Put simulation code from previous file in a class.

This is about programmatic display ads.
The main focus is on variable ad impact. The impact of each ad will vary.
If there is enough impact, there is a chance of a purchase.
We won't start with variations in activity by hour of day or by user,
and assume that each user has an equal chance of receiving each ad.

In this simulation, we'll have the "impact" of each ad vary randomly.
For example, the mean impact of and ad could be set to 0.2.
All ads will have impact values between 0 and 1. 
Ad impact will decay over time. When the total impact for a user reaches 
a level specific to the user, we'll say that the user makes a purchase.

My plan is to make the impact of ads depend on attributes of the 
user, publisher, and advertiser in later simulations.

## Notes about Programmatic Display simulations

- There are about 200M adults in the US between 18 and 65.
We assume that all ads are going to this group and that almost all
of this group is active online.

- We'll use a sample of the population for simulations. For example,
a 1% sample would be about 2M people which would be similar to a 
large market area.

- We'll use an "addressable Market" % in some simulations to 
reduce the sample size further. We'll assume that ads get proportionally 
split between the addressable market and the rest of the population but 
purchases will only be made by users in the addressable market.

- Ad budgets with constant CPM (cost per thousand ads) 
and limited targeting are often between
\\$1 and \\$5 CPM with total budgets from \\$10K to \\$1M or more per month. 
This roughly translates to between 0.01 and 2.0 ad per user in a month,
or 0.0025 to 0.5 ad per user per week.
If simulations have a fixed budget and CPM, we can specify the number of 
ad impressions as a proportion of the total number of users.

- A typical conversion rate (the number of purchases divided by the number
of ads shown) for programmatic display advertising is
around 0.00001 to 0.0001. This will depend on the value of the goods 
being advertised.  For example, a conversion rate of 0.0001 with a 
CPM of \\$2 gives a cost of \\$20 / purchase. This could be reasonable 
when the average order values are around \\$60 or more. If the average 
order values are much larger, a lower conversion rate or higher CPM
could be okay.


In [1]:
from scipy.stats import beta
import numpy as np
import pandas as pd

from typing import Union
from typing import ClassVar


In [2]:
US_POPULATION = 200_000_000

SECONDS_IN_HOUR = 60
HOURS_IN_DAY = 24
DAYS_IN_WEEK = 7

SECONDS_IN_DAY = SECONDS_IN_HOUR * HOURS_IN_DAY
SECONDS_IN_WEEK = SECONDS_IN_DAY * DAYS_IN_WEEK
HOURS_IN_WEEK = DAYS_IN_WEEK * HOURS_IN_DAY


In [3]:
# We'll create beta distributions for the simulation using the mean
# and variance ratio. We need this function to convert to the parameters
# of the beta distribution.

def beta_params_from_mean_and_variance_ratio(
        mean: float, variance_ratio: float
    ) -> tuple[float, float]:
    """
    mean and variance_ratio between 0 and 1.
    variance = variance_ratio * mean * (1 - mean)
    
    Returns the alpha and beta paramters of the specified beta distribution.
    """
    if variance_ratio in (0, 1) or mean in (0, 1):
        return 1, 1  # Should we throw an exception if mean is 0 or 1?
    nu = (1.0 - variance_ratio) / variance_ratio
    a = nu * mean
    b = nu * (1.0 - mean)
    return a, b


In [4]:
# we'll just use hours to track time in this simulation, 
# not tuples of week and hour like we did in the first simulation.
class AdViewer:
    """
    Object representing a person who views ads.
    """
    
    def __init__(self, purchase_impact_thrshold: float,
                 hourly_impact_decay: float) -> None:
        self.purchase_impact_thrshold = purchase_impact_thrshold
        self.hourly_impact_decay = hourly_impact_decay
        self.last_ad_time = None
        self.ad_view_count = 0
        self.ad_impact = 0
        self.last_purchase_ad_time = None
        self.last_purchase_ad_view_count = None
        self.purchase_count = 0
        
    def view_ad(self, ad_time: int, impact: float) -> bool:
        """
        Show an ad to the ad viewer. This may result in a purchase.
        Ad time is in seconds. Only 1 purchase allowed.
        Return True if this resulted in a purchase, false otherwise.
        (In a future simulation the logic for purchases may become more
        complex and move out of this method.)
        """
        if self.last_ad_time is None:
            self.ad_impact = impact
        else:
            hours_since_last = (ad_time - self.last_ad_time) / SECONDS_IN_HOUR
            self.ad_impact = impact \
                + self.ad_impact \
                    * self.hourly_impact_decay ** hours_since_last

        self.last_ad_time = ad_time
        self.ad_view_count += 1
        
        new_purchase = (self.ad_impact >= self.purchase_impact_thrshold
                and self.purchase_count == 0)

        if new_purchase:
            self.last_purchase_ad_time = ad_time
            self.last_purchase_ad_view_count = self.ad_view_count
            self.purchase_count += 1
        
        return new_purchase

## Simulation Class Notes

I'd like to have a class to run the simulations and keep the data used for 
reporting. This is partly because ad systems serve ads and report the results 
and partly because it seems a bit more efficient to keep the results in a 
few large lists in the simulation class than in many small lists in the 
viewer classes. But we could keep history in the viewer classes if there is 
a good reason to do so.

I expect that we will use the viewer classes to determine some of the ad 
serving in the future. For example, we might make a list of the viewers 
that are active during each hour of the simulation to determine which 
viewers are available to receive ads and their probablilities of 
receiving ads.


In [5]:
class AdSimulation():
    """
    Class to simulate the impact of an advertising campaign on ad viewers.
    This class logs the results of the ad serving to support reporting.
    """
    
    default_params: ClassVar[dict[str, Union[int, float]]] = {
        'population' : US_POPULATION,
        'market_proportion' : 0.1,
        'addressable_proportion' : 0.1,
        'weekly_ad_to_viewer_ratio' : 0.1,
        'ad_impact_threshold_mean' : 0.8,
        'ad_impact_threshold_var_ratio' : 0.1,
        'ad_impact_mean' : 0.1,
        'ad_impact_var_ratio' : 0.05,
        'daily_impact_decay' : 0.9,
        'n_weeks' : 13,
    }
    
    def __init__(self, params: Union[dict[str, Union[int, float]], None] = None) -> None:
        if params is None:
            self.params = AdSimulation.default_params.copy()
        else:
            self.params = params.copy()
            for param_name, default_value in AdSimulation.default_params.items():
                self.params[param_name] = self.params.get(
                    param_name, default_value
                )
        self.next_week_index = 0
        self.next_imp_id = 0
        self.ad_viewers = None
        self.df_ad_log = None

    def get_ad_impact_thresholds(self, user_count: int) -> 'np.ndarray[np.float64]':
        """
        Get a list of ad impacts sampled from the appropriate
        distribution for use in a simulation.
        """
        a, b = beta_params_from_mean_and_variance_ratio(
            self.params['ad_impact_threshold_mean'],
            self.params['ad_impact_threshold_var_ratio']
        )
        return beta.rvs(a, b, size=user_count)

    def get_ad_viewers(self, user_count: int) -> 'list[AdViewer]':
        """
        Get a list of AdViewer objects for use in a simulation.
        """
        hourly_impact_decay = self.params['daily_impact_decay'] ** (1.0 / 24)
        ad_impact_thresholds = self.get_ad_impact_thresholds(user_count)
        return [
            AdViewer(t, hourly_impact_decay) for t in ad_impact_thresholds
        ]

    def get_ad_impacts(self, ad_count: int) -> 'np.ndarray[np.float64]':
        """
        Get a list of ad impacts sampled from the appropriate
        distribution for use in a simulation.
        """
        a, b = beta_params_from_mean_and_variance_ratio(
            self.params['ad_impact_mean'], self.params['ad_impact_var_ratio']
        )
        return beta.rvs(a, b, size=ad_count)

    def deliver_weekly_ads(self, n_weeks: Union[int, None] = None) -> None:
        """
        Simulate ad delivery for the specified number of weeks.
        The 'n_weeks' value in self.params will be used if n_weeks
        is not given. This can be used to add additional weeks to 
        earlier results. Create a new object to start a new simulation.
        """
        if n_weeks is None:
            n_weeks = self.params['n_weeks']
        if self.ad_viewers is None:
            n_viewers = round(self.params['population']
                              * self.params['market_proportion']
                              * self.params['addressable_proportion'])
            self.ad_viewers = self.get_ad_viewers(n_viewers)
        else:
            n_viewers = len(self.ad_viewers)
        n_ads = round(n_viewers * n_weeks
                      * self.params['weekly_ad_to_viewer_ratio'])

        sim_start_time = self.next_week_index * SECONDS_IN_WEEK
        sim_end_time = sim_start_time + n_weeks * SECONDS_IN_WEEK
        self.next_week_index += n_weeks

        ad_times = np.sort(
            np.random.randint(
                low=sim_start_time, high=sim_end_time, size=n_ads
            )
        )
        viewer_ids = np.random.randint(
            low=0, high=n_viewers, size=n_ads
        )
        # 'imp' is short for 'impression'.
        # An ad delivered to a viewer is commonly called an impression.
        end_imp_id = self.next_imp_id + n_ads
        imp_ids = range(self.next_imp_id, end_imp_id)
        self.next_imp_id = end_imp_id
        ad_impacts = self.get_ad_impacts(n_ads)
        had_purchase = [
            self.ad_viewers[v_id].view_ad(ad_time, ad_impact)
            for v_id, ad_time, ad_impact
            in zip(viewer_ids, ad_times, ad_impacts)
        ]
        
        df_ad_log_cur = pd.DataFrame({
            'imp_id' : imp_ids,
            'ad_time' : ad_times,
            'viewer_id' : pd.Series(viewer_ids, dtype=np.int32),
            'ad_impact' : pd.Series(ad_impacts, dtype=np.float32),
            'had_purchase' : had_purchase,
        }).set_index('imp_id')
        
        # Save the results (combined with any earlier results)
        self.df_ad_log = pd.concat([self.df_ad_log, df_ad_log_cur])


In [6]:
ad_sim = AdSimulation()

In [7]:
ad_sim.deliver_weekly_ads(2)

In [8]:
ad_sim.df_ad_log.head()

Unnamed: 0_level_0,ad_time,viewer_id,ad_impact,had_purchase
imp_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
0,0,81476,0.217125,False
1,0,1137127,0.045393,False
2,0,1786122,0.032607,False
3,0,1178788,0.114233,False
4,0,1538162,0.102115,False


In [9]:
ad_sim.df_ad_log.tail()

Unnamed: 0_level_0,ad_time,viewer_id,ad_impact,had_purchase
imp_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
399995,20159,1452090,0.128733,False
399996,20159,1819393,0.05048,False
399997,20159,360348,0.072866,False
399998,20159,590992,0.048105,False
399999,20159,1640487,0.012987,False


In [10]:
ad_sim.deliver_weekly_ads(2)

In [11]:
ad_sim.df_ad_log.tail()

Unnamed: 0_level_0,ad_time,viewer_id,ad_impact,had_purchase
imp_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
799995,40319,992459,0.114113,False
799996,40319,869573,0.072551,False
799997,40319,1571688,0.040029,False
799998,40319,427090,0.22967,False
799999,40319,576654,0.154999,False


In [12]:
sum(ad_sim.df_ad_log['had_purchase']), sum(ad_sim.df_ad_log['had_purchase']) / ad_sim.df_ad_log.shape[0]

(65, 8.125e-05)