Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is this expected output from robyn_refresh? #507

Closed
openerror opened this issue Oct 5, 2022 · 7 comments
Closed

Is this expected output from robyn_refresh? #507

openerror opened this issue Oct 5, 2022 · 7 comments
Assignees

Comments

@openerror
Copy link

openerror commented Oct 5, 2022

Project Robyn

Describe issue

I refreshed my model once and was looking forward to seeing how per channel ROI has differed. For context, my data are daily, I am adding 14 days of new observations, and refresh_step = 7 in Robyn.

There are two surprises

  1. I am only seeing one refresh that accounts for 7 days of new data. Where are the rest of the data?
  2. Even the 7-day update is confined to report_actual_fitted.png. Unlike what the docs are showing (see Model Refresh sesction), there is no before/after refresh comparison in report_decomposition.png. For the record, I didn't check the report_*.csv files.

Here are PNGs I have mentioned, and the code that I am running.
report_actual_fitted
report_decomposition
Screen Shot 2022-10-04 at 2 20 50 PM

Am I misusing Robyn or is this actually a bug?

Provide reproducible example

Data CSV and model JSON file here.

bug_report.csv
RobynModel-1_433_1.txt

Environment & Robyn version

R 4.2.1, Robyn dev version 8398f12

@laresbernardo laresbernardo self-assigned this Oct 5, 2022
@laresbernardo
Copy link
Collaborator

Hi @openerror

  1. It is the expected outcome. If you are using refresh_step = 7, it'll refresh 7 steps (days in your case) forward, starting from the end of your modeling window (window_end). If the following refresh, if you add additional 7 steps, then you'll have the first and second refresh for the 14 following days.
  2. We actually don't "compare" previous with new sections of the time series. The R2 and all the metrics/errors provided will be the result of all the time series (previous + new data).

@laresbernardo
Copy link
Collaborator

laresbernardo commented Oct 5, 2022

About point 2, you probably mean that report_decomposition.png doesn't contain the original model but only the refresh results, I see. Let me take a further look... What I think might be happening here, given the directory didn't exist and created it for you, it doesn't find the Robyn_* original model's directory, thus can't find the original json file to reproduce the results. Will find a way around it

@openerror
Copy link
Author

Thanks for getting back to me!

I see, so refresh_step controls how much new data to use in each refresh. Say I have 14 days of new data in total, and I want to see how successive 7 days of it affects ROIs and trained parameters; then I should set refresh_step = 7 and call robyn_refresh twice?

If that's the case, IMO it would be great to make that behavior explicit in documentation --- both the R docstring and the Analyst Guide. Before I was expecting that Robyn would use all available data, rolling forward in time automatically :)

About report_decomposition.png. So my image actually contains refresh results? Interesting. Without the (date) labels it is hard to tell. The Analyst Guide is showing an example (copied below) where refresh and base results lie in separate panels, and that kind of visual organization would really help make robyn_refresh usable.

Looking forward to your updates!

@laresbernardo
Copy link
Collaborator

laresbernardo commented Oct 5, 2022

  • No, you can actually set refresh_step = 14 and all the new 14 periods (days) will be consumed, so no need to run twice.
  • Yes, that plot is what you should have (but updated visuals are now available). The issue you actually detected here is that we are not able to reproduce the original + the refresh results for this plot when the plots folder is not following the default chained structure (example: ~anydir/Robyn_202209201828_init/Robyn_202210050808_rf1/RobynModel-1_20_10.json). For now, you're only looking at your refresh window plotted.
  • The documentation you refer to says: "(refresh_step) controls how many time units the refresh model build move forward". I think it's pretty explicit that it'll move N steps forwards based on the user's input. Would you suggest any other wording?

@openerror
Copy link
Author

  • I am aware that I can set refresh_step = 14 and use all my new data at once; in the comments above I was discussing a different use case --- where I want to roll forward and ingest only part of the new data at each step.
  • Regarding documentation wording. How about adding another sentence at the end of what you quoted? The complete description would look like this

#' @param refresh_steps Integer. It controls how many time units the refresh
#' model build move forward. For example, \code{refresh_steps = 4} on weekly data
#' means the InputCollect$window_start & InputCollect$window_end move forward
#' 4 weeks. If refresh_steps is smaller than the number of newly provided data points,
#' then Robyn would only use part of the new data.

@laresbernardo
Copy link
Collaborator

laresbernardo commented Oct 5, 2022

For now, if you follow the standard chained method, you'll be able to generate the right plots, because we can fetch the JSON files correctly. Sharing some screenshots of what you might expect:

Following the path (and using the dummy data): ~/bernardolares/Desktop/Robyn_202210051427_init/Robyn_202210051450_rf1/Robyn_202210051502_rf2/RobynModel-1_21_14.json

report_actual_fitted

report_decomposition

@CJ2407
Copy link

CJ2407 commented Aug 2, 2023

Hi @laresbernardo @gufengzhou - I have some questions about the Refresh functionality -

  1. My refresh output is showing all the variables compared between initial and refresh model instead of clubbing baseline and promo variables. Because of this, chart is unreadable for most media channels in terms of % decomp. See below
    image

  2. The % decomp and ROAS shown for refresh model 1_2_28[1] is for the new data time period I added in the refresh model or rolling window of 1185 days meaning for the period of 2020-03-03 to 2023-05-31. I am asking this because these were comments generated when refresh model was running. See the refresh code also below -
    `json_file <- "/mnt/data/cherry/MMM/FP/Robyn_202307132108_init/RobynModel-3_60_31.json"

robyn_object <- "/mnt/data/cherry/MMM/FP/Robyn_202307132108_init/"

RobynRefresh <- robyn_refresh(
json_file = json_file,
dt_input = fp_m1_rf1_daily_data,
dt_holidays = dt_prophet_holidays,
refresh_steps = 396,
refresh_mode = "manual",
refresh_iters = 500, # 1k is an estimation
refresh_trials = 1
)`
image

  1. When I was iterating non-stop to find the best initial model, I didn't realize that the JSON file was not creating. So, once I found a good model that I our leadership liked, I had to re-run the model fixing certain hyperparameters to get the desired model. I want to know if this is the reason why most of my hyperparameters are noted to be fixed when the refresh is running (see comment highlighted in Yellow in the above screenshot)? Generally, when we run refresh, does the refresh model use exact same values of hyperparameters for each channel from the initial model? If so, how does model tweak delay or decay rate based on changing strategies or behaviors of channels?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants