-
Notifications
You must be signed in to change notification settings - Fork 685
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Results differ at each run #323
Comments
Seems weird. I checked again quickly and all relevant randomness pulls from random state. Are you running from the git code or feom the release. Maybe try installing the git code in case this is something we fixed alteady but not merged in a release. We’re about to make a release today, so alternatively you could also wait a day |
I am running from the release (v0.8.0b1). FYI, things are ok when I specify the folds for cross-fitting "manually" (and they were not with the 0.7.0 version). I will wait for the new release and see if the problem persists. |
Just to clarify, are the results differing when you both initialize and fit the model several times, or are you initializing just once and calling fit repeatedly? In the second case, the stored models' internal random state will evolve over time and so seeing different results across fit calls would be expected. |
@gcasamat after chatting with @kbattocchi this seems to be by design. You seem to be calling fit on the same object, but random state is an evolving quantity within the object and the crossfit splitter is initialized at fit time. So if you call init and then fit and then again fit, the splitter is initialized with a different randomstate. To get the exact same results you should just re-instantiate the object. i.e. est_forest_rf_train = ForestDMLCateEstimator(
model_y = RandomForestRegressor(n_estimators = 200, random_state = 42),
model_t = RandomForestRegressor(n_estimators = 200, random_state = 42),
n_estimators = 200,
random_state = 123)
est_forest_rf_train.fit(Y, T, X, W = 'blb')
est_forest_rf_train.const_marginal_effect_inference(X_test).summary_frame(alpha = 0.05, value = 0, decimals = 3))
est_forest_rf_train = ForestDMLCateEstimator(
model_y = RandomForestRegressor(n_estimators = 200, random_state = 42),
model_t = RandomForestRegressor(n_estimators = 200, random_state = 42),
n_estimators = 200,
random_state = 123)
est_forest_rf_train.fit(Y, T, X, W = 'blb')
est_forest_rf_train.const_marginal_effect_inference(X_test).summary_frame(alpha = 0.05, value = 0, decimals = 3)) Here you should be getting the same result both times. Let me know if you get otherwise |
I have tested what you propose and it works fine indeed. Thanks to both of you. I think it would be a nice feature to be able to initialize the model only once and then to get the same results for each subsequent fit when random_state is specified. |
@kbattocchi that seems reasonable. Checked sklearn and that’s what it does. Fix is simple: we just need to move check_random_state at the beginning of fit and not at init of _ortholearner |
@gcasamat I pushed some changes that would fix the issue you raised once the PR is merged. |
This is great! Thanks. |
I train a forest DML estimator and specify random_state:
However, I get different results each time I run the following code:
It is surprising as I thought this bug had been corrected, as indicated in #252.
The text was updated successfully, but these errors were encountered: