Training Data Selection Logic changed #81

sujithnair2021 · 2021-05-17T17:09:15Z

Issue

The training size logic in the code seems to be modified and does not match with the Step by Step guide : https://facebookexperimental.github.io/Robyn/docs/step-by-step-guide

Can you give some insight on how the training data selection works now?

gufengzhou · 2021-05-18T09:14:29Z

Hey, yes you're right that we haven't updated the guide. Sorry for that. Currently, we've removed the time series validation as preparation for our coming release: the "rolling window update" for continuous reporting. Before, you could use for example 80% data to build model and 20% as out-of-sample to test and we've reported the R-squared on the test. However, when considering updating the result constantly, we want to always include the latest data into the model. This is a conflict to the time-series validation. In the end, we understand that Robyn is more of a decomposition tool than time series forecast. Also, the potential overfitting issue is already addressed by ridge regression. Therefore we've removed the time-series validation, meaning at the moment the model is built on the entire dataset and only the "train R squared" is reported. With the upcoming update, users will be able to select the window for the modelling and the cadance of update. Hope it helps

sujithnair2021 · 2021-05-18T18:09:44Z

Yes, that helps. Thank you! Looking forward to the rolling window update feature!

albertsalgueda · 2022-12-23T21:24:14Z

Hei!

We have created several Robyn models and our customers are a bit worried about no train test split or cross-validation.

I agree with the fact that Ridge regression prevents overfitting, however, I am not sure it is enough. I wonder if Nevergrad should optimize towards R-squared in test data.

Even though it is a decomposition model, several functionalities regarding future estimates are key, and already implemented (i.e. Budget Allocator) So, there is a predictive utility in the model.

Customers prefer a model that can generalize, or at least we can prove it does.

Are you guys considering including that functionality again? Maybe as an option?
It would be very helpful!

Thank you for your amazing work; we are really happy with Robyn and hope to start using it soon.

sujithnair2021 closed this as completed May 18, 2021

gufengzhou mentioned this issue Nov 11, 2021

Clarification Questions #214

Closed

giorgossideris mentioned this issue May 25, 2022

README - cross validation #393

Closed

albertsalgueda mentioned this issue Dec 23, 2022

Update on issue #81 || Model generalisation problem. #584

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training Data Selection Logic changed #81

Training Data Selection Logic changed #81

sujithnair2021 commented May 17, 2021

gufengzhou commented May 18, 2021

sujithnair2021 commented May 18, 2021

albertsalgueda commented Dec 23, 2022

Training Data Selection Logic changed #81

Training Data Selection Logic changed #81

Comments

sujithnair2021 commented May 17, 2021

Issue

gufengzhou commented May 18, 2021

sujithnair2021 commented May 18, 2021

albertsalgueda commented Dec 23, 2022