Negative Mean CarryOver #706

MarianaAlvarez1316 · 2023-04-25T17:09:09Z

Project Robyn

Describe issue

Hi!
I was checking my resulting models and noticed that the mean_carryover for channel PAID_4 is negative for all the model solutions available in the pareto_aggregated CSV. Below you can see an example I took from one model:

For a specific model I tried to replicate the mean_carryover using the Robyn_Response function, but I failed.

I think it is a bug in the code, could you please help me?

Thanks in advanced :)

Provide reproducible example

Here is an example date set and how I ran the code

DATA_BUG_GITHUB.csv


robyn_object <- "~/Desktop/MyRobyn.RDS"

###################################################################################################################

InputCollect <- robyn_inputs(
  dt_input = base_final,
  dt_holidays = dt_prophet_holidays,
  date_var = "DATE", # date format must be "2020-01-01"
  dep_var = "REVENUE", # there should be only one dependent variable
  dep_var_type = "revenue", # "revenue" (ROI) or "conversion" (CPA)
  #Modulos en los que queremos que Prophet nos descomponga la serie
  prophet_vars = c("trend", "season", "holiday"), # "trend","season", "weekday" & "holiday"
  prophet_country = "MX", # input one country. dt_prophet_holidays includes 59 countries by default
  context_vars = c("CONTEXT_1",'CONTEXT_2','CONTEXT_3','CONTEXT_4','CONTEXT_5','CONTEXT_6'
                   ,'CONTEXT_7'), # e.g. competitors, discount, unemployment etc
  paid_media_spends = c("PAID_1_S","PAID_2_S","PAID_3_S","PAID_4_S"),  # mandatory input
  #OJO, cambiaste el siguiente renglón
  paid_media_vars = c("PAID_1_I","PAID_2_I","PAID_3_I","PAID_4_I"),  # mandatory.
  # paid_media_vars must have same order as paid_media_spends. Use media exposure metrics like
  # impressions, GRP etc. If not applicable, use spend instead.
  #organic_vars = "newsletter", # marketing activity without media spend
  factor_vars = c("CONTEXT_1"), # force variables in context_vars or organic_vars to be categorical
  window_start = "2018-12-31",
  window_end = "2023-01-23",
  adstock = "weibull_pdf" # geometric, weibull_cdf or weibull_pdf.
)
print(InputCollect)


###########################################################################################################

hyperparameters <- list(
  ######### PAID_1 ###########
  PAID_1_S_alphas = c(0.7, 1)
  ,PAID_1_S_gammas = c(0.3, 1)
  # ADSTOCK
  ,PAID_1_S_scales = c(0.00441,0.023)
  ,PAID_1_S_shapes = c(2.0001, 10)
  
  ######### PAID_2 ############
  ,PAID_2_S_alphas = c(1, 1.3)
  ,PAID_2_S_gammas = c(0.3, 1)
  # ADSTOCK
  ,PAID_2_S_scales = c(0.00018,0.001)
  ,PAID_2_S_shapes = c(2.0001, 10)
  
  ######### PAID_3 ###########
  ,PAID_3_S_alphas = c(1, 1.6)
  ,PAID_3_S_gammas = c(0.3, 1)
  # ADSTOCK
  ,PAID_3_S_scales = c(0.00018,0.001)
  ,PAID_3_S_shapes = c(2.0001, 10)
  
  ######### PAID_4 ###########
  ,PAID_4_S_alphas = c(0.6, 1)
  ,PAID_4_S_gammas = c(0.3, 1)
  # ADSTOCK
  ,PAID_4_S_scales = c(0.00441,0.02509)
  ,PAID_4_S_shapes = c(2.0001, 10)
  
  
  #,train_size = c(0.7, 0.8)

)


###################################################################################################################

InputCollect <- robyn_inputs(InputCollect = InputCollect, hyperparameters = hyperparameters)
print(InputCollect)

###################################################################################################################

OutputModels <- robyn_run(
  InputCollect = InputCollect, # feed in all model specification
  cores = NULL, # NULL defaults to max available - 1
  iterations = 7000, # 2000 recommended for the dummy dataset with no calibration
  trials = 10, # 5 recommended for the dummy dataset
  #Nuevo feature
  ts_validation = FALSE, # 3-way-split time series for NRMSE validation.
  add_penalty_factor = FALSE, # Experimental feature. Use with caution.
  #outputs = FALSE # outputs = FALSE disables direct model output - robyn_outputs()
)
print(OutputModels)

####################################################################################################################

## Calculate Pareto optimality, cluster and export results and plots. See ?robyn_outputs
OutputCollect <- robyn_outputs(
  InputCollect, OutputModels,
  pareto_fronts = "auto", # automatically pick how many pareto-fronts to fill min_candidates
  # min_candidates = 100, # top pareto models for clustering. Default to 100
  # calibration_constraint = 0.1, # range c(0.01, 0.1) & default at 0.1
  csv_out = "pareto", # "pareto", "all", or NULL (for none)
  clusters = TRUE, # Set to TRUE to cluster similar models by ROAS. See ?robyn_clusters
  plot_pareto = FALSE, # Set to FALSE to deactivate plotting and saving model one-pagers
  plot_folder = robyn_object, # path for plots export
  export = TRUE # this will create files locally
)
print(OutputCollect)

Environment & Robyn version

Make sure you're using the latest Robyn version before you post an issue.

Check and share Robyn version: packageVersion("Robyn")
I´m using version ‘3.10.3.9000’
R version (Please, check and share: sessionInfo() or R.version$version.string)

The text was updated successfully, but these errors were encountered:

laresbernardo · 2023-04-25T19:58:05Z

Reference: https://www.facebook.com/groups/robynmmm/posts/1423503721751090/

gufengzhou · 2023-05-03T05:50:23Z

After digging into the issue, I can confirm that it's not a bug. This happens only when using weibull_pdf and the shape > 1, meaning the peak is lagged. Look at this plot, we can clearly see the lag effect: there're days where the adstocked spend < raw spend. Because the carryover spend = adstocked spend - raw spend, therefore the carryover might end up negative for these days.

I agree that it's very unintuitive. But the math is right. I don't see the necessity to change it now. Or are you running into problems with this?

MarianaAlvarez1316 · 2023-05-03T14:25:21Z

Hi Gufeng!

Thanks for the reply, we do see the lag. The only thing that we still don´t understand is why the mean carryover is negative. From your graph we can see that at certain points the carryover spend will be negative, and if we run the Robyn Response function

Response <- robyn_response(
  InputCollect = InputCollectX,
  OutputCollect = OutputCollectX,
  select_model = OutputCollectX$selectID,
  metric_name = "PAID_4_S"
)

in the Response$input_carryover vector we do see some negative entries. But when we calculate mean(Response$input_carryover) the result is positive. Do you know why ? How do you calculate the mean carryover?

gufengzhou · 2023-05-09T09:08:54Z

Hi, I've just pushed a commit to return the partial immediate & carryover effect for lagged adstock (weibull_pdf). Can you please update and check? There should be no negative value any more.

MarianaAlvarez1316 · 2023-05-09T17:30:36Z

Hi Gufeng!
We reran the model, but continue to see the negative mean_carryover

We used the Robyn::robyn_update() line to update Robyn. Is it correct? Are we missing something?

laresbernardo · 2023-05-09T18:39:24Z

We used the Robyn::robyn_update() line to update Robyn. Is it correct? Are we missing something?

Yes, that's fine. That updates Robyn to latest dev version. Did you refresh your R session so that you load the latest updated version before retrying?

MarianaAlvarez1316 · 2023-05-11T01:45:21Z

Hi Bernardo,

Yes, we refreshed the R session. We also restarted the computer, but keep getting a negative carryover

Though something did change, because we are getting different results for the same data with the same restrictions. Not crazy different results, but they do differ.

- weibull_pdf creates lagged peak and thus negative carryover, because the previous carryover = total - raw, and with lagged peaks, total adstocked spend might be lower than raw spend. The new calculation doesn't take raw spend as immediate anymore, but derives immediate from the actual lagged decay matrix. This restores the relationship of total = carryover + immediate with all positive values. - This is the 3rd code snippets change (robyn_response) that impacts the csv output. Other other two in transformation.R and allocator.R were fixed previously. Will look into funtionalizing the 3 places.

gufengzhou · 2023-05-15T07:56:54Z

second try to fix the negative value. please update and let us know

MarianaAlvarez1316 · 2023-05-17T22:52:47Z

Hi Gufeng!

You made it, we don´t see the negative mean carryover anymore :)

We just have a couple of questions with this change

The channel that presented the negative mean carryover is in the fourth line. Although the carryover is no longer negative, the mean spend adstocked is not equal to the sum of mean carryover + mean spend, but just the mean carryover again. Why is that?
In the first row the mean carryover + mean spend is not equal to the mean spend adstocked , there is a difference of 236K. Is that correct? Maybe it is, but we just want to understand the logic behind this change.
Continuing with the idea of point 2, we would like to understand if there has been a substantial change in the way the mean carryover is calculated or if you are just splitting it up or something similar. This, because we want to understand the difference between this new model and the one with the negative carryover. The one with the negative carryover has a bug? Or the old model is fine and now we are just using another interpretation of the mean_carryover.

Thanks a lot, really!!

MarianaAlvarez1316 · 2023-05-23T14:10:06Z

Sorry, Gufeng. It´s us again.

We made the last questions because while using the robyn_recreated function the negative mean_carryover remains and the ROIs change (very similar to the original model with negative mean carryover).

We just wanted to understand the change in the ROI.

Thanks!

gufengzhou · 2023-06-01T05:38:38Z

I can't reproduce this issue. I'd need your dataset and your script for debugging.

Before, we were doing carryover_spend = adstocked_spend - immediate_spend, while immediate_spend == raw_spend. This is true when there's no lag in adstocking. So geometric & weibull_cdf won't have this issue. But with weibull_pdf shape >2, which introduces lags into adstocking, adstocked_spend could be < raw_spend, which causes this negative carryover issue. The last change was to not doing immediate_spend == raw_spend for weibull_pdf, but actually calculate the immediate part, which actually worked out. So I'm bit surprised that this still happens. Probably edge cases.

laresbernardo assigned gufengzhou Apr 25, 2023

laresbernardo added the bug Something isn't working label Apr 25, 2023

gufengzhou removed the bug Something isn't working label May 3, 2023

gufengzhou added a commit that referenced this issue May 9, 2023

recode: return immediate for lagged adstock #706

102ca54

gufengzhou closed this as not planned Won't fix, can't repro, duplicate, stale Jun 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Negative Mean CarryOver #706

Negative Mean CarryOver #706

MarianaAlvarez1316 commented Apr 25, 2023

laresbernardo commented Apr 25, 2023

gufengzhou commented May 3, 2023

MarianaAlvarez1316 commented May 3, 2023

gufengzhou commented May 9, 2023

MarianaAlvarez1316 commented May 9, 2023

laresbernardo commented May 9, 2023

MarianaAlvarez1316 commented May 11, 2023

gufengzhou commented May 15, 2023

MarianaAlvarez1316 commented May 17, 2023

MarianaAlvarez1316 commented May 23, 2023

gufengzhou commented Jun 1, 2023 •

edited

Negative Mean CarryOver #706

Negative Mean CarryOver #706

Comments

MarianaAlvarez1316 commented Apr 25, 2023

Project Robyn

Describe issue

Provide reproducible example

Environment & Robyn version

laresbernardo commented Apr 25, 2023

gufengzhou commented May 3, 2023

MarianaAlvarez1316 commented May 3, 2023

gufengzhou commented May 9, 2023

MarianaAlvarez1316 commented May 9, 2023

laresbernardo commented May 9, 2023

MarianaAlvarez1316 commented May 11, 2023

gufengzhou commented May 15, 2023

MarianaAlvarez1316 commented May 17, 2023

MarianaAlvarez1316 commented May 23, 2023

gufengzhou commented Jun 1, 2023 • edited

gufengzhou commented Jun 1, 2023 •

edited