Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Negative Mean CarryOver #706

Closed
MarianaAlvarez1316 opened this issue Apr 25, 2023 · 11 comments
Closed

Negative Mean CarryOver #706

MarianaAlvarez1316 opened this issue Apr 25, 2023 · 11 comments
Assignees

Comments

@MarianaAlvarez1316
Copy link

Project Robyn

Describe issue

Hi!
I was checking my resulting models and noticed that the mean_carryover for channel PAID_4 is negative for all the model solutions available in the pareto_aggregated CSV. Below you can see an example I took from one model:

image

For a specific model I tried to replicate the mean_carryover using the Robyn_Response function, but I failed.

I think it is a bug in the code, could you please help me?

Thanks in advanced :)

Provide reproducible example

Here is an example date set and how I ran the code

DATA_BUG_GITHUB.csv


robyn_object <- "~/Desktop/MyRobyn.RDS"

###################################################################################################################

InputCollect <- robyn_inputs(
  dt_input = base_final,
  dt_holidays = dt_prophet_holidays,
  date_var = "DATE", # date format must be "2020-01-01"
  dep_var = "REVENUE", # there should be only one dependent variable
  dep_var_type = "revenue", # "revenue" (ROI) or "conversion" (CPA)
  #Modulos en los que queremos que Prophet nos descomponga la serie
  prophet_vars = c("trend", "season", "holiday"), # "trend","season", "weekday" & "holiday"
  prophet_country = "MX", # input one country. dt_prophet_holidays includes 59 countries by default
  context_vars = c("CONTEXT_1",'CONTEXT_2','CONTEXT_3','CONTEXT_4','CONTEXT_5','CONTEXT_6'
                   ,'CONTEXT_7'), # e.g. competitors, discount, unemployment etc
  paid_media_spends = c("PAID_1_S","PAID_2_S","PAID_3_S","PAID_4_S"),  # mandatory input
  #OJO, cambiaste el siguiente renglón
  paid_media_vars = c("PAID_1_I","PAID_2_I","PAID_3_I","PAID_4_I"),  # mandatory.
  # paid_media_vars must have same order as paid_media_spends. Use media exposure metrics like
  # impressions, GRP etc. If not applicable, use spend instead.
  #organic_vars = "newsletter", # marketing activity without media spend
  factor_vars = c("CONTEXT_1"), # force variables in context_vars or organic_vars to be categorical
  window_start = "2018-12-31",
  window_end = "2023-01-23",
  adstock = "weibull_pdf" # geometric, weibull_cdf or weibull_pdf.
)
print(InputCollect)


###########################################################################################################

hyperparameters <- list(
  ######### PAID_1 ###########
  PAID_1_S_alphas = c(0.7, 1)
  ,PAID_1_S_gammas = c(0.3, 1)
  # ADSTOCK
  ,PAID_1_S_scales = c(0.00441,0.023)
  ,PAID_1_S_shapes = c(2.0001, 10)
  
  ######### PAID_2 ############
  ,PAID_2_S_alphas = c(1, 1.3)
  ,PAID_2_S_gammas = c(0.3, 1)
  # ADSTOCK
  ,PAID_2_S_scales = c(0.00018,0.001)
  ,PAID_2_S_shapes = c(2.0001, 10)
  
  ######### PAID_3 ###########
  ,PAID_3_S_alphas = c(1, 1.6)
  ,PAID_3_S_gammas = c(0.3, 1)
  # ADSTOCK
  ,PAID_3_S_scales = c(0.00018,0.001)
  ,PAID_3_S_shapes = c(2.0001, 10)
  
  ######### PAID_4 ###########
  ,PAID_4_S_alphas = c(0.6, 1)
  ,PAID_4_S_gammas = c(0.3, 1)
  # ADSTOCK
  ,PAID_4_S_scales = c(0.00441,0.02509)
  ,PAID_4_S_shapes = c(2.0001, 10)
  
  
  #,train_size = c(0.7, 0.8)

)


###################################################################################################################

InputCollect <- robyn_inputs(InputCollect = InputCollect, hyperparameters = hyperparameters)
print(InputCollect)

###################################################################################################################

OutputModels <- robyn_run(
  InputCollect = InputCollect, # feed in all model specification
  cores = NULL, # NULL defaults to max available - 1
  iterations = 7000, # 2000 recommended for the dummy dataset with no calibration
  trials = 10, # 5 recommended for the dummy dataset
  #Nuevo feature
  ts_validation = FALSE, # 3-way-split time series for NRMSE validation.
  add_penalty_factor = FALSE, # Experimental feature. Use with caution.
  #outputs = FALSE # outputs = FALSE disables direct model output - robyn_outputs()
)
print(OutputModels)

####################################################################################################################

## Calculate Pareto optimality, cluster and export results and plots. See ?robyn_outputs
OutputCollect <- robyn_outputs(
  InputCollect, OutputModels,
  pareto_fronts = "auto", # automatically pick how many pareto-fronts to fill min_candidates
  # min_candidates = 100, # top pareto models for clustering. Default to 100
  # calibration_constraint = 0.1, # range c(0.01, 0.1) & default at 0.1
  csv_out = "pareto", # "pareto", "all", or NULL (for none)
  clusters = TRUE, # Set to TRUE to cluster similar models by ROAS. See ?robyn_clusters
  plot_pareto = FALSE, # Set to FALSE to deactivate plotting and saving model one-pagers
  plot_folder = robyn_object, # path for plots export
  export = TRUE # this will create files locally
)
print(OutputCollect)

Environment & Robyn version

Make sure you're using the latest Robyn version before you post an issue.

  • Check and share Robyn version: packageVersion("Robyn")
    I´m using version ‘3.10.3.9000’

  • R version (Please, check and share: sessionInfo() or R.version$version.string)
    image

@laresbernardo
Copy link
Collaborator

@laresbernardo laresbernardo added the bug Something isn't working label Apr 25, 2023
@gufengzhou
Copy link
Contributor

After digging into the issue, I can confirm that it's not a bug. This happens only when using weibull_pdf and the shape > 1, meaning the peak is lagged. Look at this plot, we can clearly see the lag effect: there're days where the adstocked spend < raw spend. Because the carryover spend = adstocked spend - raw spend, therefore the carryover might end up negative for these days.

I agree that it's very unintuitive. But the math is right. I don't see the necessity to change it now. Or are you running into problems with this?

image

@gufengzhou gufengzhou removed the bug Something isn't working label May 3, 2023
@MarianaAlvarez1316
Copy link
Author

Hi Gufeng!

Thanks for the reply, we do see the lag. The only thing that we still don´t understand is why the mean carryover is negative. From your graph we can see that at certain points the carryover spend will be negative, and if we run the Robyn Response function

Response <- robyn_response(
  InputCollect = InputCollectX,
  OutputCollect = OutputCollectX,
  select_model = OutputCollectX$selectID,
  metric_name = "PAID_4_S"
)

in the Response$input_carryover vector we do see some negative entries. But when we calculate mean(Response$input_carryover) the result is positive. Do you know why ? How do you calculate the mean carryover?

@gufengzhou
Copy link
Contributor

Hi, I've just pushed a commit to return the partial immediate & carryover effect for lagged adstock (weibull_pdf). Can you please update and check? There should be no negative value any more.

@MarianaAlvarez1316
Copy link
Author

Hi Gufeng!
We reran the model, but continue to see the negative mean_carryover

image

We used the Robyn::robyn_update() line to update Robyn. Is it correct? Are we missing something?

@laresbernardo
Copy link
Collaborator

We used the Robyn::robyn_update() line to update Robyn. Is it correct? Are we missing something?

Yes, that's fine. That updates Robyn to latest dev version. Did you refresh your R session so that you load the latest updated version before retrying?

@MarianaAlvarez1316
Copy link
Author

Hi Bernardo,

Yes, we refreshed the R session. We also restarted the computer, but keep getting a negative carryover

image

Though something did change, because we are getting different results for the same data with the same restrictions. Not crazy different results, but they do differ.

gufengzhou added a commit that referenced this issue May 15, 2023
- weibull_pdf creates lagged peak and thus negative carryover, because the previous carryover  = total - raw, and with lagged peaks, total adstocked spend might be lower than raw spend. The new calculation doesn't take raw spend as immediate anymore, but derives immediate from the actual lagged decay matrix. This restores the relationship of total = carryover + immediate with all positive values.
- This is the 3rd code snippets change (robyn_response) that impacts the csv output. Other other two in transformation.R and allocator.R were fixed previously. Will look into funtionalizing the 3 places.
@gufengzhou
Copy link
Contributor

second try to fix the negative value. please update and let us know

@MarianaAlvarez1316
Copy link
Author

Hi Gufeng!

You made it, we don´t see the negative mean carryover anymore :)

image

We just have a couple of questions with this change

  1. The channel that presented the negative mean carryover is in the fourth line. Although the carryover is no longer negative, the mean spend adstocked is not equal to the sum of mean carryover + mean spend, but just the mean carryover again. Why is that?

  2. In the first row the mean carryover + mean spend is not equal to the mean spend adstocked , there is a difference of 236K. Is that correct? Maybe it is, but we just want to understand the logic behind this change.

  3. Continuing with the idea of point 2, we would like to understand if there has been a substantial change in the way the mean carryover is calculated or if you are just splitting it up or something similar. This, because we want to understand the difference between this new model and the one with the negative carryover. The one with the negative carryover has a bug? Or the old model is fine and now we are just using another interpretation of the mean_carryover.

Thanks a lot, really!!

@MarianaAlvarez1316
Copy link
Author

Sorry, Gufeng. It´s us again.

We made the last questions because while using the robyn_recreated function the negative mean_carryover remains and the ROIs change (very similar to the original model with negative mean carryover).

image

We just wanted to understand the change in the ROI.

Thanks!

@gufengzhou
Copy link
Contributor

gufengzhou commented Jun 1, 2023

I can't reproduce this issue. I'd need your dataset and your script for debugging.

Before, we were doing carryover_spend = adstocked_spend - immediate_spend, while immediate_spend == raw_spend. This is true when there's no lag in adstocking. So geometric & weibull_cdf won't have this issue. But with weibull_pdf shape >2, which introduces lags into adstocking, adstocked_spend could be < raw_spend, which causes this negative carryover issue. The last change was to not doing immediate_spend == raw_spend for weibull_pdf, but actually calculate the immediate part, which actually worked out. So I'm bit surprised that this still happens. Probably edge cases.

@gufengzhou gufengzhou closed this as not planned Won't fix, can't repro, duplicate, stale Jun 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants