Results differ at each run #323

gcasamat · 2020-11-18T16:43:09Z

I train a forest DML estimator and specify random_state:

est_forest_rf_train = ForestDMLCateEstimator(
                            model_y = RandomForestRegressor(n_estimators = 200, random_state = 42), 
                            model_t = RandomForestRegressor(n_estimators = 200, random_state = 42),
                            n_estimators = 200,
                            random_state = 123)

However, I get different results each time I run the following code:

est_forest_rf_train.fit(Y, T, X, W = 'blb')
est_forest_rf_train.const_marginal_effect_inference(X_test).summary_frame(alpha = 0.05, value = 0, decimals = 3))

It is surprising as I thought this bug had been corrected, as indicated in #252.

The text was updated successfully, but these errors were encountered:

vsyrgkanis · 2020-11-18T17:44:00Z

Seems weird. I checked again quickly and all relevant randomness pulls from random state. Are you running from the git code or feom the release. Maybe try installing the git code in case this is something we fixed alteady but not merged in a release.

We’re about to make a release today, so alternatively you could also wait a day

gcasamat · 2020-11-18T18:00:30Z

I am running from the release (v0.8.0b1). FYI, things are ok when I specify the folds for cross-fitting "manually" (and they were not with the 0.7.0 version). I will wait for the new release and see if the problem persists.
Thanks!

kbattocchi · 2020-11-18T18:16:34Z

Just to clarify, are the results differing when you both initialize and fit the model several times, or are you initializing just once and calling fit repeatedly? In the second case, the stored models' internal random state will evolve over time and so seeing different results across fit calls would be expected.

vsyrgkanis · 2020-11-18T18:16:44Z

@gcasamat after chatting with @kbattocchi this seems to be by design. You seem to be calling fit on the same object, but random state is an evolving quantity within the object and the crossfit splitter is initialized at fit time. So if you call init and then fit and then again fit, the splitter is initialized with a different randomstate.

To get the exact same results you should just re-instantiate the object. i.e.

est_forest_rf_train = ForestDMLCateEstimator(
                            model_y = RandomForestRegressor(n_estimators = 200, random_state = 42), 
                            model_t = RandomForestRegressor(n_estimators = 200, random_state = 42),
                            n_estimators = 200,
                            random_state = 123)
est_forest_rf_train.fit(Y, T, X, W = 'blb')
est_forest_rf_train.const_marginal_effect_inference(X_test).summary_frame(alpha = 0.05, value = 0, decimals = 3))
est_forest_rf_train = ForestDMLCateEstimator(
                            model_y = RandomForestRegressor(n_estimators = 200, random_state = 42), 
                            model_t = RandomForestRegressor(n_estimators = 200, random_state = 42),
                            n_estimators = 200,
                            random_state = 123)
est_forest_rf_train.fit(Y, T, X, W = 'blb')
est_forest_rf_train.const_marginal_effect_inference(X_test).summary_frame(alpha = 0.05, value = 0, decimals = 3))

Here you should be getting the same result both times. Let me know if you get otherwise

gcasamat · 2020-11-19T09:18:43Z

I have tested what you propose and it works fine indeed. Thanks to both of you. I think it would be a nice feature to be able to initialize the model only once and then to get the same results for each subsequent fit when random_state is specified.

vsyrgkanis · 2020-11-19T11:00:24Z

@kbattocchi that seems reasonable. Checked sklearn and that’s what it does. Fix is simple: we just need to move check_random_state at the beginning of fit and not at init of _ortholearner

vsyrgkanis · 2020-11-19T15:48:52Z

@gcasamat I pushed some changes that would fix the issue you raised once the PR is merged.

gcasamat · 2020-11-19T15:52:55Z

This is great! Thanks.

vsyrgkanis mentioned this issue Nov 19, 2020

Vasilis/randomstate #325

Merged

vsyrgkanis closed this as completed in #325 Nov 20, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Results differ at each run #323

Results differ at each run #323

gcasamat commented Nov 18, 2020

vsyrgkanis commented Nov 18, 2020

gcasamat commented Nov 18, 2020

kbattocchi commented Nov 18, 2020

vsyrgkanis commented Nov 18, 2020

gcasamat commented Nov 19, 2020

vsyrgkanis commented Nov 19, 2020

vsyrgkanis commented Nov 19, 2020

gcasamat commented Nov 19, 2020

Results differ at each run #323

Results differ at each run #323

Comments

gcasamat commented Nov 18, 2020

vsyrgkanis commented Nov 18, 2020

gcasamat commented Nov 18, 2020

kbattocchi commented Nov 18, 2020

vsyrgkanis commented Nov 18, 2020

gcasamat commented Nov 19, 2020

vsyrgkanis commented Nov 19, 2020

vsyrgkanis commented Nov 19, 2020

gcasamat commented Nov 19, 2020