Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training Data Selection Logic changed #81

Closed
sujithnair2021 opened this issue May 17, 2021 · 3 comments
Closed

Training Data Selection Logic changed #81

sujithnair2021 opened this issue May 17, 2021 · 3 comments

Comments

@sujithnair2021
Copy link

Issue

The training size logic in the code seems to be modified and does not match with the Step by Step guide : https://facebookexperimental.github.io/Robyn/docs/step-by-step-guide

Can you give some insight on how the training data selection works now?

@gufengzhou
Copy link
Contributor

Hey, yes you're right that we haven't updated the guide. Sorry for that. Currently, we've removed the time series validation as preparation for our coming release: the "rolling window update" for continuous reporting. Before, you could use for example 80% data to build model and 20% as out-of-sample to test and we've reported the R-squared on the test. However, when considering updating the result constantly, we want to always include the latest data into the model. This is a conflict to the time-series validation. In the end, we understand that Robyn is more of a decomposition tool than time series forecast. Also, the potential overfitting issue is already addressed by ridge regression. Therefore we've removed the time-series validation, meaning at the moment the model is built on the entire dataset and only the "train R squared" is reported. With the upcoming update, users will be able to select the window for the modelling and the cadance of update. Hope it helps

@sujithnair2021
Copy link
Author

Yes, that helps. Thank you! Looking forward to the rolling window update feature!

@albertsalgueda
Copy link

Hei!

We have created several Robyn models and our customers are a bit worried about no train test split or cross-validation.

I agree with the fact that Ridge regression prevents overfitting, however, I am not sure it is enough. I wonder if Nevergrad should optimize towards R-squared in test data.

Even though it is a decomposition model, several functionalities regarding future estimates are key, and already implemented (i.e. Budget Allocator) So, there is a predictive utility in the model.

Customers prefer a model that can generalize, or at least we can prove it does.

Are you guys considering including that functionality again? Maybe as an option?
It would be very helpful!

Thank you for your amazing work; we are really happy with Robyn and hope to start using it soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants