Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

robyn_allocator - channels that are excluded from the model get excluded from the initial spend mix too #645

Closed
fraschen opened this issue Mar 8, 2023 · 6 comments
Assignees

Comments

@fraschen
Copy link

fraschen commented Mar 8, 2023

Issue description

  variable coef decompPer decompAgg CPA mean_response mean_spend
1 (Intercept) 0.38 0.02% 41.394 - - -
2 trend 0.115 11.51% 31.709K - - -
3 season 0.208 -0.11% -308.36 - - -
4 holiday -0.015 0.00% -8.989 - - -
5 context_var1 13.471K 49.79% 137.15K - - -
6 paid_var1_cost 247.17 0.40% 1.096K 24.923 4.703 250.68
7 paid_var2_cost 443.86 3.49% 9.61K 11.136 75.767 981.84
8 paid_var3_cost 618.42 2.99% 8.224K 4.489 154.08 338.67
9 paid_var4_cost 0 0.00% 0 - 0 1.772K
10 paid_var5_cost 586.01 7.64% 21.031K 13.232 234.18 2.553K
11 paid_var6_cost 882.39 7.37% 20.291K 12.642 260.05 2.353K
12 paid_var7_cost 471.91 4.68% 12.901K 9.194 123.54 1.088K
13 paid_var8_cost 867.21 3.53% 9.718K 9.156 153.67 816.29
14 paid_var9_cost 271.37 4.62% 12.718K 13.812 122.74 1.611K
15 paid_var10_cost 660.22 3.05% 8.404K 3.344 84.346 257.79
16 paid_var11_cost 44.764 1.04% 2.876K 42.2 31.124 1.113K

RobynModel-2_216_5.json.zip

Based on this model I would expect paid_var4_cost to be accounted in the initial spend but not in the optimal spend in robyn_allocator(), however it is excluded from both charts.

  InputCollect = InputCollect,
  OutputCollect = OutputCollect,
  select_model = select_model,
  scenario = "max_historical_response",
  channel_constr_low = 0.7,
  channel_constr_up = c(1.2, 1.5, 1.5, 1.5, 1.5, 1.5, 1.5, 1.5, 1.5, 1.5, 1.5),
  export = TRUE,
  date_min = "2022-08-29",
  date_max = "2022-09-25"
)

plot(AllocatorCollect1)

2_216_5_reallocated_hist
As you can notice, paid_var4_cost is not there and the initial spend percentage adds up to 100% despite this var is missing.

Here's a sample dataset to reproduce the issue:
MMMSampleDataset.csv

Related facebook group thread: https://www.facebook.com/groups/robynmmm/permalink/1395036564597806/

Environment & Robyn version

Robyn version: 3.10.0.9000

R version 4.2.2 (2022-10-31)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Ventura 13.1

@laresbernardo laresbernardo self-assigned this Mar 8, 2023
@laresbernardo
Copy link
Collaborator

Hi @fraschen thanks for sharing your issue into a quite detailed ticket.

  • The parameters you used for picking the dates (date_min & date_max) were deprecated in v3.10; now you should use date_range instead (which by default takes the last month's worth of data).
  • Are you sure that for the last month you have spend for paid_var4_cost? Note that for the table you first shared, the mean spend of 1.772K occurred during the whole training window, not necessarily the same during the last month.
  • The reason paid_var4_cost is excluded is because having coef = 0 means that its spend doesn't affect the response, and any spend multiplied by zero will be zero (response). I see your point on "but I did spend money, why it's not in the distribution of initial spend", but no matter how much money you spend, the response will be zero and the optimizer will not be able to give a different solution. It also means that paid_var4_cost represents 0% of the budget we can actually allocate.
  • That said, after you confirm point 2, we can def consider and double check this case to be more congruent. If you actually spent in paid_var4_cost during the last month worth of data, is it summed in the total spend shown in the first section (84K)?

@fraschen
Copy link
Author

fraschen commented Mar 8, 2023

@laresbernardo thanks for looking into it.

Yes there is some spend for paid_var4_cost, as you can see from the attached csv file. The total spend in the last four weeks is supposed to be 88k, not 84k as shown in the one-pager.

I get why the response is 0, but I believe the initial spend should reflect the actual for a better comparison of the spend mix pre- and post-reallocation.

@laresbernardo
Copy link
Collaborator

laresbernardo commented Mar 8, 2023

Yes there is some spend for paid_var4_cost, as you can see from the attached csv file. The total spend in the last four weeks is supposed to be 88k, not 84k as shown in the one-pager.

Ok, then it's actually working as it should. After downloading the CSV and JSON, I was able to replicate a similar model and checked the data for that time period (last 4 weeks): you have a total spend of 92430. In this model, the coef = 0 variable (paid_var4_cost) spent 8455, which means the "total optimizable budget" is actually ~84K (as shown in the first section of the allocator's one-pager). So when we refer to Initial Spend, that's actually Initial Spend that managed to get responses > 0.

# Import the CSV and do some cleaning
> dt_input <- read.csv("~/Desktop/MMMSampleDataset.csv", row.names = NULL, sep = ";") %>%
+     lares::removenarows(all = FALSE) %>% as_tibble()
> dt_input$paid_var11_cost <- as.integer(lares::cleanText(dt_input$paid_var11_cost))
> dt_input$context_var1 <- lares::cleanText(dt_input$context_var1)

# Check the sums of costs
> (sums <- dt_input %>% filter(DATE >= "2022-09-04", DATE <= "2022-09-25") %>% 
+   select(ends_with("cost")) %>%
+   summarise_all(function(x) sum(x)) %>% as.data.frame())
  paid_var1_cost paid_var2_cost paid_var3_cost paid_var4_cost paid_var5_cost paid_var6_cost paid_var7_cost
1            100           8079           5201           8455          17745          20903           5351
  paid_var8_cost paid_var9_cost paid_var10_cost paid_var11_cost
1           9896           8657            3297            4746
> sum(sums)
[1] 92430

We do print a message informing the user when running robyn_allocator() saying:

Excluded in optimiser because their coefficients are 0: paid_var4_cost

All this said, if you want to set that the total budget for those 4 weeks was the total budget you spent, regardless of coef = 0 variables, you can set total_budget = 92430 parameter; that way you'll "spend" all the money you can actually spend and distribute it across all channels that returns conversions (response). Note that the Initital state will continue being 84K though.

@laresbernardo
Copy link
Collaborator

I'm thinking of a way to let the users know how much budget wasn't optimizable and was excluded from the optimizer. Will get back to you after I've implemented a solution. Feel free to share your feedback as well.

@fraschen
Copy link
Author

fraschen commented Mar 8, 2023

Thanks for your deep analysis of this issue.

I still feel that accounting for the coef = 0 vars in the initial spend percentages would be more intuitive from a business perspective. A marketer would probably question the validity of the model just because "where is my paid_var4 spend???". As an analyst, it is difficult to explain the why those percentages, and I would probably need to create my own viz with reviewed figures to avoid these questions.

@laresbernardo
Copy link
Collaborator

Hi @fraschen
We have just updated the main branch with the change proposed.
We now do include the non-optimizable budget and sum it into the initial spend so that, even though we can't/shouldn't consider these channels with coefficient 0 or constrained with lower and upper bounds set in 0, we use that budget nevertheless.
You can check these changes by looking at the allocator's one-pager initial spend, and by checking the percentages don't sum up to 100% (when we exclude a channel); that way we reflect correct initial distributions.
Please, do check these changes and feel free to share additional feedback or closing this task.

@fraschen fraschen closed this as completed Mar 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants