Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

# higher intercept value for the outcome #73

Closed
wenyanyy opened this issue Apr 29, 2021 · 4 comments
Closed

# higher intercept value for the outcome #73

wenyanyy opened this issue Apr 29, 2021 · 4 comments

Comments

@wenyanyy
Copy link

捕获

Hi Team,

Sorry for having too many questions. I used my own data to run Robyn code. But as the result, the intercept is too high almost 100% and each variable has 0% coefficient. I tried 3 different data sets and almost have same questions. Could you please tell me how to fix it? For example, maybe change the parameters or lambda in function R file?

Best,

@gufengzhou
Copy link
Contributor

Hi, are you saying when using our simulated data, the waterfall plot looks normal. but with your own data it doesn't? this definitely sounds strange. How do you set hyperparameters? (set_hyperBoundLocal)

@wenyanyy
Copy link
Author

Hi,
you're right. When I used your simulated data, the results looks perfect. However, when i used my own data, it looks very strange as the attachment you saw. I didn't change the hyperparameters bounds. my data is monthly data so there is only 14 rows. do you think it is a reason which result a very high intercept? Also for the result ,we don't want to see any variables that have 0 coefficient. Do you think how we can avoid having 0 coefficient for any media channel spending(variables)? Thank you for your time and looking forward for your reply.

@gufengzhou
Copy link
Contributor

gufengzhou commented Apr 30, 2021

Hey, 14 rows are bit toooo sparse :) There's a rule of thumb about the n * p relationship of your input data dimensions (n = num of rows, p = num of columns), which is n should be about multiplier 7-10 of p. For example if you have 10 variables, rule of thumb would be to have 70-100 rows, and indeed we recommend to go for multiplier 10. With only 14 observations, I don't believe any technique can give you meaningful results. Therefore, I'd recommend you to break the time unit down to at least weekly, so that you'll have 14 * 7 = 98 rows at least.
Regarding avoiding 0 coef: Robyn uses ridge regression that is reducing overfitting by shrinking beta coefs. Predictors that have very weak correlation to your dependent variable will probably be reduced to 0. What we recommend is:

  1. increase hyperparameter ranges for 0-coef channels on theta (max.reco. c(0, 0.9) ) and gamma (max.reco. c(0.1, 1) ) to give Robyn more freedom
  2. split media into sub-channels, and/or aggregate similar channels, and/or introduce other media
  3. increase trials to get more samples

However, with only 14 rows I am afraid none of these will help. No data, no magic;)

@wenyanyy
Copy link
Author

wenyanyy commented May 1, 2021 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants