Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Robyn error while running OutputModels <- code - Error in { : task 1 failed - "arguments imply differing number of rows: 19, 20" #619

Closed
richa-makhija opened this issue Feb 8, 2023 · 22 comments
Assignees
Labels
bug Something isn't working

Comments

@richa-makhija
Copy link

richa-makhija commented Feb 8, 2023

Project Robyn

Describe issue

My code takes in all the inputs correctly but while running the OutputModels <- code, I get the above error

InputCollect_us <- robyn_inputs(
  dt_input = data_orig_us,
  dt_holidays = dt_prophet_holidays,
  date_var = c("date"),
  dep_var = "cust_acquired",
  dep_var_type = "conversion",
  prophet_vars = c("trend", "season", "holiday"),
  prophet_country ="US",
  context_vars = c("if_payday","if_covid","if_holiday","fx","fee_avg","fee_promo_spend", "fx_promo_spend", "us_unemployment_rate", "us_inflation"), 
  paid_media_spends = c("paid_social_s", "paid_search_s", "brand_s", "uac_s", "upper_funnel_s", "houston_s", "all_other_s"),
  paid_media_vars = c("paid_social_i", "paid_search_i", "brand_i", "uac_i", "upper_funnel_s", "houston_s", "all_other_s"),
  ##follow same sequence in spends and vars. if using imp dont add spends again to vars.. if no imp, repeat spend variable
  organic_vars = c(),
  ## can be kpt empty
  factor_vars = c("if_payday","if_covid","if_holiday"), 
  ## category or binary vars
  window_start = "2022-08-15",
  window_end = "2022-12-31",
  adstock = "weibull_cdf" 
  ##geometric is simpler
)
print(InputCollect_us)

################ Set Hyper Parameters ####################

plot_adstock(plot = FALSE)
plot_saturation(plot = FALSE)

##use same ranges
hyperparameters_us <- list(
  paid_social_s_alphas = c(0.5,3),
  paid_social_s_gammas = c(0.3, 1),
  paid_social_s_shapes = c(0.0001, 2),
  paid_social_s_scales = c(0, 0.1),
  
  paid_search_s_alphas = c(0.5, 3),
  paid_search_s_gammas = c(0.3, 1),
  paid_search_s_shapes = c(0.0001, 2),
  paid_search_s_scales = c(0, 0.1),
  
  brand_s_alphas = c(0.5, 3),
  brand_s_gammas = c(0.3, 1),
  brand_s_shapes = c(0.0001, 2),
  brand_s_scales = c(0, 0.1),
  
  uac_s_alphas = c(0.5, 3),
  uac_s_gammas = c(0.3, 1),
  uac_s_shapes = c(0.0001, 2),
  uac_s_scales = c(0, 0.1),
  
  upper_funnel_s_alphas = c(0.5, 3),
  upper_funnel_s_gammas = c(0.3, 1),
  upper_funnel_s_shapes = c(0.0001, 2),
  upper_funnel_s_scales = c(0, 0.1),
  
  houston_s_alphas = c(0.5, 3),
  houston_s_gammas = c(0.3, 1),
  houston_s_shapes = c(0.0001, 2),
  houston_s_scales = c(0, 0.1),
  
  all_other_s_alphas = c(0.5, 3),
  all_other_s_gammas = c(0.3, 1),
  all_other_s_shapes = c(0.0001, 2),
  all_other_s_scales = c(0, 0.1)
)
InputCollect_us <- robyn_inputs(InputCollect = InputCollect_us, hyperparameters = hyperparameters_us)
print(InputCollect_us)

OutputModels_us <- robyn_run(
  InputCollect = InputCollect_us, # feed in all model specification
  # cores = NULL, # default to max available
  # add_penalty_factor = FALSE, # Untested feature. Use with caution.
  lambda_control = lambda_min,
  iterations = 2000, # 2000 recommended for the dummy dataset with no calibration
  trials = 5, # 5 recommended for the dummy dataset
  outputs = FALSE # outputs = FALSE disables direct model output - robyn_outputs()
)

Error:

Input data has 365 days in total: 2022-01-01 to 2022-12-31
Initial model is built on rolling window of 139 day: 2022-08-15 to 2022-12-31
Fitting time series with all available data...
Using weibull_cdf adstocking with 30 hyperparameters (29 to iterate + 1 fixed) on 11 cores
>>> Starting 5 trials with 2000 iterations each using TwoPointsDE nevergrad algorithm...
  Running trial 1 of 5
  |                                                                                                                                                                                                  |   0%Timing stopped at: 2.284 1.347 1.129
Error in { : 
  task 1 failed - "arguments imply differing number of rows: 19, 20"
In addition: Warning message:
In hyper_collector(InputCollect, hyper_in = InputCollect$hyperparameters,  :
  Provided train_size but ts_validation = FALSE. Time series validation inactive.

Provide reproducible example

Issues are often related to custom input data that is difficult to debug without. If necessary, please modify your data to mask real values and share a dataset that is able to reproduce the issue. Please also share your model configuration and exported JSON files if available.

Environment & Robyn version

R.version$version.string
[1] "R version 4.2.2 (2022-10-31)"
packageVersion("Robyn")
[1] ‘3.9.0’

@richa-makhija
Copy link
Author

Some additional context, my data has range from 1/1/22 - 12/31/22 but I am looking to run this model from 8/15 - 12/31.... If i use the window start and window end as 1/1 - 12/31 then the model works.....

@laresbernardo laresbernardo added the bug Something isn't working label Feb 8, 2023
@dmacoritto
Copy link

dmacoritto commented Mar 10, 2023

@Amyhaoming, any updates on that issue ? I am having the same issue when trying to read and reproduce a model made under the 3.7.0 version in the 3.9.0 version.

The error message is the following:

Input data has 1369 days in total: 2019-01-01 to 2022-09-30
Initial model is built on rolling window of 290 day: 2021-12-15 to 2022-09-30
Warning: longer argument not a multiple of length of shorterAttention for loop 1: immediate & carryover decomp don't sum up to total
Timing stopped at: 0.02 0 0.19
Error in { : 
  task 1 failed - "arguments imply differing number of rows: 25, 26"

And this is the code:

model_chfr_trans_saved <- Robyn::robyn_read(json_file = "MMM/Model/Robyn_202301091301_init/RobynModel-1_342_6.json")

InputCollect_chfr_trans_saved <- robyn_inputs(dt_input = input_mmm_chfr_transaction, 
                                              json_file = model_chfr_trans_saved)


dt_hyper_fixed_chfr_trans_saved <- model_chfr_trans_saved$ExportedModel$hyper_values
# select_model <- model_chfr_trans$ExportedModel$select_model


OutputCollect_chfr_trans_saved <- robyn_run(
  InputCollect = InputCollect_chfr_trans_saved,
  dt_hyper_fixed = dt_hyper_fixed_chfr_trans_saved,
  json_file = model_chfr_trans_saved,
  export = FALSE
)

This works perfectly in the 3.7.0 version

And the traceback

image

Not sure what is going on, I have spend some time investigating, but couldn't find the issue.

Thanks for your help

@gufengzhou
Copy link
Contributor

It looks like you're loading an old model that was built using older version, then recreating using the new version. This might the reason. Esp. Because after 3.9 there's a new hyperparameter train_size that didn't exist before. Can you please rerun your model using the latest package with the narrower ranges as in your old model.

@dmacoritto
Copy link

@gufengzhou, yes it works with the 3.7 version. Nonetheless, I am not sure this is due to the hyperparameter train_size, as it is created automatically if not detected (its value is equal to 1).
Do you mean that all models are not retro-compatible? The ones created in 3.9 will not be compatible with higher versions ?

@sahbakn
Copy link

sahbakn commented Apr 4, 2023

@Amyhaoming Is there an update on this bug? I have tried using different Robyn versions with the same dataset and I see this error coming up starting version 3.9.0 which makes me think this it relates to addition of ts_validation starting this version. However, setting ts_validation to False or train_size to 1 does not fix the error. In my case, the error only shows up in certain modeling periods. For example, For 1 year training data it works without issues, but not 9 months or less. I am not trying to load an old model, these are all building models from scratch

@arturodz
Copy link

arturodz commented Apr 7, 2023

Hi!

I can report I am having the same issue. Interestingly is only happening with one of my clients. I tried different date ranges with no success in neither version 3.9 and 3.10. I was successful in running robyn back on version 3.7.2 for this client.

@sahbakn
Copy link

sahbakn commented May 11, 2023

@gufengzhou @laresbernardo Can you guys kindly take a look at this bug again? as I mentioned in my previous comment, I am not trying to load a model based on an older version, I am building a model from scratch and I see this error after version 3.9.0 for some datasets. I can send you guys a sample dataset through our Meta marketing science partner for the debug if it helps

@laresbernardo
Copy link
Collaborator

Hi @sahbakn, please do. That will help us debug and understand what's happening in your specific case. Thanks!

@SeanRichterWalsh
Copy link

SeanRichterWalsh commented May 12, 2023

I am getting the same error message when I try to pass a character vector to context_vars

However, if I unhash and use the factor_vars argument, it works.

What am I misunderstanding here?

@laresbernardo laresbernardo self-assigned this May 12, 2023
@laresbernardo
Copy link
Collaborator

laresbernardo commented May 12, 2023

@SeanRichterWalsh not sure that's actually the case here. What version are you on?
Note that in the demo we set "events" which is a character column as one of the variables in "context_vars". When doing that, Robyn automatically detects this and sets it as one of the "factor_vars" event hough the user doesn't manually do that. That behavior is actually printed as a message for you the user to know:

Automatically set these variables as 'factor_vars': 'events'

Also keep in mind that this message won't show up if you add robyn_inputs(..., factor_vars = "events", ...)
If you have a reproducible example and are using the latest dev version, please do share it in another ticket given I don't think these are related with the information provided @SeanRichterWalsh

@SeanRichterWalsh
Copy link

Thanks @laresbernardo . Yes, I am using the latest dev version. I restarted my R session and tried again with my dataset and it seems to work fine now. My own data set's events variable is now being auto-forced to factor as expected and I don't have to explicitly use the factor_vars argument. Not sure what was up earlier. Thanks.

@TrunckYagora
Copy link

Hi,

we're running into the same issue with both version 3.9 and 3.10: task 1 failed - "arguments imply differing number of rows: 6, 7"
Has there been an update on this issue yet?

@laresbernardo
Copy link
Collaborator

Hi @TrunckYagora we haven't been able to replicate this issue yet. Can you please provide a reproducible example that returns that error so it can help us debug?

@SeanRichterWalsh
Copy link

Just a final comment from me on this. I know my issue may not be related but I did get a similar error message when exploring different modelling windows. I believe what may have caused it is my events variable having zero variance (all "na"). I mistakenly shortened the window too much and lost the events I had coded in the variable.

@arturodz
Copy link

arturodz commented May 15, 2023 via email

@laresbernardo
Copy link
Collaborator

YES! That was it. We actually checked for no variance on raw input data before running robyn_engineering() and not afterwards. Now I've just fixed this issue by checking both and returning meaningful and helpful errors. Can any of you update and validate it's fixed on latest dev version? Thanks for the valuable hint @SeanRichterWalsh

@sahbakn
Copy link

sahbakn commented May 15, 2023

@laresbernardo I looked into my data and during the modeling time frame that I was getting the error, I had one variable with 0 variance, fixing that I do not see the error anymore!

@SeanRichterWalsh
Copy link

YES! That was it. We actually checked for no variance on raw input data before running robyn_engineering() and not afterwards. Now I've just fixed this issue by checking both and returning meaningful and helpful errors. Can any of you update and validate it's fixed on latest dev version? Thanks for the valuable hint @SeanRichterWalsh

Oh great! A silly mistake on my part but I am glad it has helped lead to a resolution here. I can confirm that the latest dev version gives a very informative message when a variable has zero variance. Thanks a lot.

image

@laresbernardo
Copy link
Collaborator

Great! Thanks for confirming. I'll check with @richa-makhija and @dmacoritto as well! This should have fixed the issue for everyone. Will close ticket after a week or confirmation.

@TrunckYagora
Copy link

TrunckYagora commented May 16, 2023

Hi @laresbernardo,

thanks for the update, interestingly enough we're receive the error even when we're running the model without the factor_vars. For privacy reasons I altered the original data but the error can still be reproduced. You can find the data we're using to reproduce the error here:
mockup_data.csv

package Version is: ‘3.10.3.9000’

The code is as follows:

data("dt_prophet_holidays")
head(dt_prophet_holidays)
selected_dt <- read.csv("mockup.csv")

InputCollect <- robyn_inputs(
  dt_input = selected_dt,
  dt_holidays = dt_prophet_holidays,
  date_var = "date", # date format must be "2020-01-01"
  dep_var = "Umsatz", # there should be only one dependent variable
  dep_var_type = "revenue", # "revenue" (ROI) or "conversion" (CPA)
  prophet_vars = c("trend", "season", "holiday"), # "trend","season", "weekday" & "holiday"
  prophet_country = "DE", # input one country. dt_prophet_holidays includes 59 countries by default
  paid_media_spends = c("cost_dv360", "cost_fb_insta", "cost_pinterest"), # mandatory input
  paid_media_vars = c("impression_dv360", "impression_fb_insta", "impression_pinterest"), # mandatory.
  # paid_media_vars must have same order as paid_media_spends. Use media exposure metrics like
  # impressions, GRP etc. If not applicable, use spend instead.
  # organic_vars = "events", # marketing activity without media spend
  # factor_vars = c("events"), # force variables in context_vars or organic_vars to be categorical
  # window_start = min(selected_dt$date),
  # window_end = max(selected_dt$date),
  adstock = "geometric" # geometric, weibull_cdf or weibull_pdf.
)
print(InputCollect)

hyper_names(adstock = InputCollect$adstock, all_media = InputCollect$all_media)
# plot_adstock(plot = TRUE)
# plot_saturation(plot = TRUE)
hyper_limits()

# Example hyperparameters ranges for Geometric adstock
hyperparameters <- list(
  cost_dv360_alphas = c(0.5, 3),
  cost_dv360_gammas = c(0.3, 1),
  cost_dv360_thetas = c(0, 0.3),
  cost_fb_insta_alphas = c(0.5, 3),
  cost_fb_insta_gammas = c(0.3, 1),
  cost_fb_insta_thetas = c(0.1, 0.4),
  cost_pinterest_alphas = c(0.5, 3),
  cost_pinterest_gammas = c(0.3, 1),
  cost_pinterest_thetas = c(0.3, 0.8),
  train_size = c(0.3, 0.8)
)

InputCollect <- robyn_inputs(InputCollect = InputCollect, hyperparameters = hyperparameters)
print(InputCollect)

OutputModels <- robyn_run(
  InputCollect = InputCollect, # feed in all model specification
  cores = NULL, # NULL defaults to (max available - 1)
  iterations = 2000, # 2000 recommended for the dummy dataset with no calibration
  trials = 5, # 5 recommended for the dummy dataset
  ts_validation = TRUE, # 3-way-split time series for NRMSE validation.
  add_penalty_factor = FALSE # Experimental feature. Use with caution.
)
print(OutputModels)

@laresbernardo
Copy link
Collaborator

@TrunckYagora thanks for reporting this and providing a reproducible example. The problem was occurring when there were some variables not being used so weren't found when unselecting them. Can you please update to latest dev version and check? You should get this error given your dummy dataset now:

> InputCollect <- robyn_inputs(InputCollect = InputCollect, hyperparameters = hyperparameters)
>> Running feature engineering...
NOTE: potential improvement on splitting channels for better exposure fitting. Threshold (Minimum R2) = 0.8 
  Check: InputCollect$plotNLSCollect outputs
  Weak relationship for: 'impression_dv360', 'impression_fb_insta', 'impression_pinterest' and their spend
Error in check_novar(dt_mod_model_window, InputCollect) : 
  There are 1 column(s) with no-variance: 'holiday'. 
Please, remove variable(s) to proceed...
Note that there's no variance when filtering the modeling window (2022-10-10:2022-12-03)

@TrunckYagora
Copy link

Hi @laresbernardo

awesome - thank you very much for your support. In the latest dev version I get the error message you mentioned and we can run Robyn as usual with the correct parameters.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

9 participants