## Simulation Overview

This notebook contains **Python code** to model user behavior as they move through a marketing funnel, influenced by different types of advertisements. The data generated will be used for analysis in `analysis.ipynb`.

---

### Funnel Stages
The funnel consists of four distinct stages:

1. **Not Aware**  
2. **Aware**  
3. **Consider**  
4. **Purchase**

Users progress through these stages sequentially and probabilistically over time.

---

### Advertisement Types
Two types of advertisements are presented to users:

- **Branding Ads**  
- **Performance Ads**  

Each ad type may have a different impact on user progression through the funnel.

---

### User Behavior 
- Each user can make up to **four visits**.
- In each visit, the user is **exposed to one advertisement**.
- Between visits, users may **progress to the next stage** in the funnel.
- Users are only eligible to **make a purchase** when they reach the **Purchase** stage.
- Once a user completes a purchase, they **no longer continue** to make visits.

---

### Experimental Conditions

The simulation includes a **randomized experiment** with five distinct conditions. Each condition determines the type and sequence of ads shown to users during their visits.

#### 1. Control Group
- **No advertisement** is shown at any stage.
- Serves as a baseline for comparison.

#### 2. Branding Group
- Users are shown a **branding ad** during **every visit**, regardless of their funnel stage.

#### 3. Performance Group
- Users are shown a **performance ad** during **every visit**, regardless of their funnel stage.

#### 4. Brand-Plus-Performance Group
- Users are shown:
  - A **branding ad** during the **first two visits** (if applicable).
  - A **performance ad** during the **third and fourth visits** (if applicable).
- Ad exposure is based on the **visit sequence**, not the user's funnel stage.

#### 5. Full-Funnel Group
- Ad type is determined by the **user's current funnel stage**:
  - **Branding ad** if the user is in the **Not Aware** stage.
  - **Performance ad** if the user is in the **Aware**, **Consider**, or **Purchase** stages.

In [1]:
import pandas as pd
import numpy as np
import random
from datetime import datetime, timedelta
from collections import Counter

In [2]:
# logic for funnel transition:
# users progress in the funnel with some baseline probability without any ad  
# branding ad moves users from not aware to aware stage with higher probability, performance ad moves users from aware to consider, and consider to purchase stage with higher probability
# the transition probabilities can be changed 
# branding ad does not work for users in aware or consider stage, performance ad does not work for users in not aware stage
def next_funnel_stage(current_stage, ad_type):
  if current_stage == "not aware":
      if ad_type == "branding":
          return "aware" if random.random() < 0.4 else "not aware"
      elif ad_type == "performance":
          return "aware" if random.random() < 0.1 else "not aware"
      else: # no ad
          return "aware" if random.random() < 0.1 else "not aware"
  elif current_stage == "aware":
      if ad_type == "branding":
          return "consider" if random.random() < 0.1 else "aware"
      elif ad_type == "performance":
          return "consider" if random.random() < 0.3 else "aware"
      else: # no ad
          return "consider" if random.random() < 0.1 else "aware"
  elif current_stage == "consider":
      if ad_type == "branding":
          return "purchase" if random.random() < 0.1 else "consider"
      elif ad_type == "performance":
          return "purchase" if random.random() < 0.3 else "consider"
      else: # no ad
          return "purchase" if random.random() < 0.1 else "consider"
  else:  
      return "purchase"

# logic for purchase decision
# users can purchase with some probability only when they are in purchase stage
# performance ad increases that probability, branding ad does not 
def purchase_decision(current_stage, ad_type, price):
  if current_stage == "purchase":
      if ad_type == "branding":
          return (1,price) if random.random() < 0.1 else (0,0)
      elif ad_type == "performance":
          return (1,price) if random.random() < 0.2 else (0,0)
      else: # no ad 
          return (1,price) if random.random() < 0.1 else (0,0)
  else:
      return (0,0)

In [3]:
def simulate(campaign_type,
                            initial_user_id,
                            num_users, # number of users in each group
                            initial_weights = [0.7, 0.1, 0.1, 0.1], # the distribution of initial funnel stages
                            price = 100):
  num_users = num_users
  initial_funnel_stages = ["not aware", "aware", "consider", "purchase"]
  start_date = datetime.now().date() - timedelta(days=30)

  # funnel progression and purchase for each campaign type
  if campaign_type == "control":
      ad_type = "none"
      user_data = []
      for user_id in range(initial_user_id, (initial_user_id+num_users)):
          initial_stage = random.choices(initial_funnel_stages, weights = initial_weights)[0]
          initial_date = start_date + timedelta(days=random.randint(0, 30))
          user_data.append({
              "user_id": user_id,
              "next_funnel_stage": initial_stage,
              "ad_type": ad_type,
              "purchase": 0,
              "sales": 0,
              "date": initial_date
          })
      active_users = user_data[:]  # copy initial user data
      all_users = []
      for stage in range(1, 5):
          new_data = []
          for user in active_users:
              current_stage = user["next_funnel_stage"]
              new_stage = next_funnel_stage(current_stage, user["ad_type"])
              (new_purchase, new_sales) = purchase_decision(current_stage, "none", price)
              new_date = user["date"] + timedelta(days=random.randint(1, 7))

              # update user data with new information
              updated_user_info = {
                "user_id": user["user_id"],
                "current_funnel_stage": current_stage,
                "next_funnel_stage": new_stage,
                "ad_type": ad_type,
                "purchase": new_purchase,
                "sales": new_sales,
                "date": new_date,
                "campaign_type": campaign_type,
                "visit": stage
            }
              all_users.append(updated_user_info)

              # continue only if no purchase was made
              if new_purchase == 0:
                  new_data.append(updated_user_info)
          active_users = new_data  # update active users to only those who didn't make a purchase


  elif campaign_type in ["branding","performance"]:
      ad_type = campaign_type
      user_data = []
      for user_id in range(initial_user_id, (initial_user_id+num_users)):
          initial_stage = random.choices(initial_funnel_stages, weights = initial_weights)[0]
          initial_date = start_date + timedelta(days=random.randint(0, 30))
          user_data.append({
              "user_id": user_id,
              "next_funnel_stage": initial_stage,
              "ad_type": ad_type,
              "purchase": 0,
              "sales": 0,
              "date": initial_date
          })
      active_users = user_data[:]  # copy initial user data
      all_users = []
      for stage in range(1, 5):
          new_data = []
          for user in active_users:
              current_stage = user["next_funnel_stage"]
              new_stage = next_funnel_stage(current_stage, user["ad_type"])
              (new_purchase, new_sales) = purchase_decision(current_stage, user["ad_type"], price)
              new_date = user["date"] + timedelta(days=random.randint(1, 7))

              # update user data with new information
              updated_user_info = {
                  "user_id": user["user_id"],
                  "current_funnel_stage": current_stage,
                  "next_funnel_stage": new_stage,
                  "ad_type": ad_type,
                  "purchase": new_purchase,
                  "sales": new_sales,
                  "date": new_date,
                  "campaign_type": campaign_type,
                  "visit": stage
              }
              all_users.append(updated_user_info)

              # continue only if no purchase was made
              if new_purchase == 0:
                  new_data.append(updated_user_info)
          active_users = new_data  # update active users to only those who didn't make a purchase
  elif campaign_type == "brand_plus_performance":
      ad_type = []
      user_data = []
      for user_id in range(initial_user_id, (initial_user_id+num_users)):
          initial_stage = random.choices(initial_funnel_stages, weights = initial_weights)[0]
          initial_date = start_date + timedelta(days=random.randint(0, 30))
          user_data.append({
              "user_id": user_id,
              "next_funnel_stage": initial_stage,
              "ad_type": ad_type,
              "purchase": 0,
              "sales": 0,
              "date": initial_date
          })
      active_users = user_data[:]  # copy initial user data
      all_users = []
      for stage in range(1, 5):
          new_data = []
          if stage in [1,2]:
            for user in active_users:
                current_stage = user["next_funnel_stage"]
                new_stage = next_funnel_stage(current_stage, 'branding')
                (new_purchase, new_sales) = purchase_decision(current_stage, 'branding', price)
                new_date = user["date"] + timedelta(days=random.randint(1, 7))

                # update user data with new information
                updated_user_info = {
                  "user_id": user["user_id"],
                  "current_funnel_stage": current_stage,
                  "next_funnel_stage": new_stage,
                  "ad_type": "branding",
                  "purchase": new_purchase,
                  "sales": new_sales,
                  "date": new_date,
                  "campaign_type": campaign_type,
                  "visit": stage
                }
                all_users.append(updated_user_info)

                # continue only if no purchase was made
                if new_purchase == 0:
                    new_data.append(updated_user_info)
            active_users = new_data  # update active users to only those who didn't make a purchase
          else:
            for user in active_users:
                current_stage = user["next_funnel_stage"]
                new_stage = next_funnel_stage(current_stage, 'performance')
                (new_purchase, new_sales) = purchase_decision(current_stage, 'performance', price)
                new_date = user["date"] + timedelta(days=random.randint(1, 7))

                # update user data with new information
                updated_user_info = {
                  "user_id": user["user_id"],
                  "current_funnel_stage": current_stage,
                  "next_funnel_stage": new_stage,
                  "ad_type": "performance",
                  "purchase": new_purchase,
                  "sales": new_sales,
                  "date": new_date,
                  "campaign_type": campaign_type,
                  "visit": stage
                }
                all_users.append(updated_user_info)

                # continue only if no purchase was made
                if new_purchase == 0:
                    new_data.append(updated_user_info)
          active_users = new_data  # update active users to only those who didn't make a purchase
  elif campaign_type == "full_funnel":
      ad_type = []
      user_data = []
      for user_id in range(initial_user_id, (initial_user_id+num_users)):
          initial_stage = random.choices(initial_funnel_stages, weights = initial_weights)[0]
          initial_date = start_date + timedelta(days=random.randint(0, 30))
          user_data.append({
              "user_id": user_id,
              "next_funnel_stage": initial_stage,
              "ad_type": ad_type,
              "purchase": 0,
              "sales": 0,
              "date": initial_date
          })
      active_users = user_data[:]  # copy initial user data
      all_users = []
      for stage in range(1, 5):
          new_data = []
          for user in active_users:
              current_stage = user["next_funnel_stage"]
              new_stage = next_funnel_stage(current_stage, "branding" if current_stage == "not aware" else "performance")
              ad_type = "branding" if current_stage == "not aware" else "performance"
              (new_purchase, new_sales) = purchase_decision(current_stage, "branding" if current_stage == "not aware" else "performance", price)
              new_date = user["date"] + timedelta(days=random.randint(1, 7))

              # update user data with new information
              updated_user_info = {
                  "user_id": user["user_id"],
                  "current_funnel_stage": current_stage,
                  "next_funnel_stage": new_stage,
                  "ad_type": ad_type,
                  "purchase": new_purchase,
                  "sales": new_sales,
                  "date": new_date,
                  "campaign_type": campaign_type,
                  "visit": stage
                }
              all_users.append(updated_user_info)

              # continue only if no purchase was made
              if new_purchase == 0:
                  new_data.append(updated_user_info)
          active_users = new_data  # update active users to only those who didn't make a purchase
  else:
      return

  df_output = pd.DataFrame(all_users)
  return df_output

In [4]:
random.seed(10)

# number of users in each group
n = 10000

df_control = simulate(campaign_type = 'control', initial_user_id = 1, num_users = n)
df_brand = simulate(campaign_type = 'branding', initial_user_id = 1+n, num_users = n)
df_performance = simulate(campaign_type = 'performance', initial_user_id = 1+2*n, num_users = n)
df_brand_plus_performance = simulate(campaign_type = 'brand_plus_performance', initial_user_id = 1+3*n, num_users = n)
df_full_funnel = simulate(campaign_type = 'full_funnel', initial_user_id = 1+4*n, num_users = n)

In [5]:
df = pd.concat([df_control, df_brand, df_performance, df_brand_plus_performance, df_full_funnel], ignore_index=True)

In [6]:
df.to_csv('data.csv', index = False)

### Predicted Funnel Stage

In realistic scenarios, marketers may not have access to a user’s **true funnel stage**. To simulate this uncertainty, an additional experimental condition is introduced, reflecting ad targeting based on **predicted** funnel stages rather than actual ones.

---

#### Prediction Mechanism

- The **predicted funnel stage** matches the **true funnel stage** with a certain probability (e.g., high, medium, or low prediction accuracy).
- When the prediction is incorrect, the predicted stage is **randomly sampled** from the remaining funnel stages.
- The predicted stage is used **only** to determine ad targeting.  
- Funnel progression and purchase decisions continue to be based on the **true funnel stage**.

---

#### Condition 6: Predicted Full-Funnel Group

Users are shown ads based on their **predicted funnel stage** as follows:

- **Branding Ads** if the predicted stage is **Not Aware**  
- **Performance Ads** if the predicted stage is **Aware**, **Consider**, or **Purchase**

This condition allows for analysis of how **accuracy of funnel stage prediction** impact campaign outcomes, and how robust the **full-funnel strategy** is under different levels of accuracy.

In [7]:
def simulate_predicted(initial_user_id,
                            num_users, # number of users in each group
                            accuracy, # accuracy of funnel stage prediction
                            initial_weights = [0.7, 0.1, 0.1, 0.1], # the distribution of initial funnel stages
                            price = 100):
  num_users = num_users
  initial_funnel_stages = ["not aware", "aware", "consider", "purchase"]
  start_date = datetime.now().date() - timedelta(days=30)
 
  ad_type = []
  user_data = []
  for user_id in range(initial_user_id, (initial_user_id+num_users)):
      initial_stage = random.choices(initial_funnel_stages, weights = initial_weights)[0]
      initial_date = start_date + timedelta(days=random.randint(0, 30))
      user_data.append({
          "user_id": user_id,
          "next_funnel_stage": initial_stage,
          "ad_type": ad_type,
          "purchase": 0,
          "sales": 0,
          "date": initial_date
      })
  active_users = user_data[:]  # copy initial user data
  all_users = []
  for stage in range(1, 5):
      new_data = []
      for user in active_users:
          current_stage = user["next_funnel_stage"]
          # predicted funnel stage equals to true stage with probability = accuracy, otherwise it is randomly sampled from other stages
          current_stage_predicted = user["next_funnel_stage"] if random.random() < accuracy else random.choice([x for x in initial_funnel_stages if x != current_stage])
          # funnel transition is based on true funnel stage and ad targeted with predicted funnel stage
          new_stage = next_funnel_stage(current_stage, "branding" if current_stage_predicted == "not aware" else "performance")
          new_stage_predicted = new_stage if random.random() < accuracy else random.choice([x for x in initial_funnel_stages if x != new_stage])
          ad_type = "branding" if current_stage_predicted == "not aware" else "performance"
          (new_purchase, new_sales) = purchase_decision(current_stage, "branding" if current_stage_predicted == "not aware" else "performance", price)
          new_date = user["date"] + timedelta(days=random.randint(1, 7))

          # update user data with new information
          updated_user_info = {
              "user_id": user["user_id"],
              "current_funnel_stage": current_stage,
              "current_funnel_stage_predicted": current_stage_predicted,
              "next_funnel_stage": new_stage,
              "next_funnel_stage_predicted": new_stage_predicted,
              "ad_type": ad_type,
              "purchase": new_purchase,
              "sales": new_sales,
              "date": new_date,
              "campaign_type": "full_funnel_predicted",
              "visit": stage
            }
          all_users.append(updated_user_info)

          # continue only if no purchase was made
          if new_purchase == 0:
              new_data.append(updated_user_info)
      active_users = new_data  # update active users to only those who didn't make a purchase

  df_output = pd.DataFrame(all_users)
  return df_output

In [8]:
random.seed(10)

# number of users in each group
n = 10000

# simulate data with 90% prediction accuracy
df_predicted_high = simulate_predicted(initial_user_id = 1+5*n, num_users = n, accuracy = 0.9)

# simulate data with 60% prediction accuracy
df_predicted_medium = simulate_predicted(initial_user_id = 1+5*n, num_users = n, accuracy = 0.6)

# simulate data with 30% prediction accuracy
df_predicted_low = simulate_predicted(initial_user_id = 1+5*n, num_users = n, accuracy = 0.3)

In [9]:
df_predicted_high.to_csv('data_predicted_high.csv', index = False)
df_predicted_medium.to_csv('data_predicted_medium.csv', index = False)
df_predicted_low.to_csv('data_predicted_low.csv', index = False)