reproducibility and random state for AutoML.fit() #151

zhonghua-zheng · 2021-08-04T04:00:53Z

Hello, I wonder if it is possible to reproduce the results of "flaml.AutoML.fit()"?
If possible, could you please kindly let me know how to set up the random_state (or seed) for the "flaml.AutoML.fit()"?
Thanks!

qingyun-wu · 2021-08-04T04:59:23Z

Hi @zzheng93, that's a very good question, and thank you for raising it.

A short answer is: For now, unfortunately, we cannot guarantee flaml.AutoML.fit() is 100% reproducible.

Here is the reason: although we used random seeds to control all the explicit randomness in AutoML, we are not able to control the randomness in run time. And the measurement of time is used in multiple places of flaml.AutoML to prioritize the search over multiple estimators and retraining, given the time budget and time left. The possible fluctuation of time measurement (although usually subtle for a fixed task) makes flaml.AutoML.fit() not reproducible.

We didn't anticipate it to be a big issue because such randomness usually won't make a big difference in terms of the performance. There can be ways to work this issue around if reproducible is really desired. Would you like to share your use case and how much reproducibility matters?

Thanks!

zhonghua-zheng · 2021-08-04T15:46:47Z

Hi @zzheng93, that's a very good question, and thank you for raising it.

A short answer is: For now, unfortunately, we cannot guarantee flaml.AutoML.fit() is 100% reproducible.

Here is the reason: although we used random seeds to control all the explicit randomness in AutoML, we are not able to control the randomness in run time. And the measurement of time is used in multiple places of flaml.AutoML to prioritize the search over multiple estimators and retraining, given the time budget and time left. The possible fluctuation of time measurement (although usually subtle for a fixed task) makes flaml.AutoML.fit() not reproducible.

We didn't anticipate it to be a big issue because such randomness usually won't make a big difference in terms of the performance. There can be ways to work this issue around if reproducible is really desired. Would you like to share your use case and how much reproducibility matters?

Thanks!

Hi @qingyun-wu , thank you very much for your prompt reply.
I have "estimator_list": ['lgbm', 'rf', 'xgboost', 'catboost']. With the same code and data, sometimes I got that lgbm is the best estimator and sometimes xgboost. Even with the same "best estimator" (e.g., lgbm), the models were different every time. So I wonder if there is any way to keep the results to be similar?

sonichi · 2021-08-04T17:21:36Z

How long is the time budget? The results seem to indicate that the time budget is not enough to conclude what model is best.

zhonghua-zheng · 2021-08-04T18:10:17Z

How long is the time budget? The results seem to indicate that the time budget is not enough to conclude what model is best.

Hi @sonichi , thank you very much! I just used the default configuration. I have ~1M training samples. It would be great if you could offer some suggestions regarding the configurations (e.g., time budget)!

sonichi · 2021-08-04T18:42:52Z

Oh the default budget is 60 seconds, which is too small for your use case. I'd suggest trying 1 hour (time_budget=3600) if that's affordable. If your desired time budget is shorter than that, then simply use your desired budget.

zhonghua-zheng · 2021-08-04T18:46:34Z

Oh the default budget is 60 seconds, which is too small for your use case. I'd suggest trying 1 hour (time_budget=3600) if that's affordable. If your desired time budget is shorter than that, then simply use your desired budget.

Thank you @sonichi , I will follow your suggestions!

qingyun-wu · 2021-08-09T05:43:58Z

Hi @zzheng93, that's a very good question, and thank you for raising it.
A short answer is: For now, unfortunately, we cannot guarantee flaml.AutoML.fit() is 100% reproducible.
Here is the reason: although we used random seeds to control all the explicit randomness in AutoML, we are not able to control the randomness in run time. And the measurement of time is used in multiple places of flaml.AutoML to prioritize the search over multiple estimators and retraining, given the time budget and time left. The possible fluctuation of time measurement (although usually subtle for a fixed task) makes flaml.AutoML.fit() not reproducible.
We didn't anticipate it to be a big issue because such randomness usually won't make a big difference in terms of the performance. There can be ways to work this issue around if reproducible is really desired. Would you like to share your use case and how much reproducibility matters?
Thanks!

Hi @qingyun-wu , thank you very much for your prompt reply.
I have "estimator_list": ['lgbm', 'rf', 'xgboost', 'catboost']. With the same code and data, sometimes I got that lgbm is the best estimator and sometimes xgboost. Even with the same "best estimator" (e.g., lgbm), the models were different every time. So I wonder if there is any way to keep the results to be similar?

Hi @zzheng93,

Could you try np.random.seed(seed) before automl.fit()

This presumably can help reduce randomness.

zhonghua-zheng · 2021-08-10T02:14:17Z

Hi @zzheng93, that's a very good question, and thank you for raising it.
A short answer is: For now, unfortunately, we cannot guarantee flaml.AutoML.fit() is 100% reproducible.
Here is the reason: although we used random seeds to control all the explicit randomness in AutoML, we are not able to control the randomness in run time. And the measurement of time is used in multiple places of flaml.AutoML to prioritize the search over multiple estimators and retraining, given the time budget and time left. The possible fluctuation of time measurement (although usually subtle for a fixed task) makes flaml.AutoML.fit() not reproducible.
We didn't anticipate it to be a big issue because such randomness usually won't make a big difference in terms of the performance. There can be ways to work this issue around if reproducible is really desired. Would you like to share your use case and how much reproducibility matters?
Thanks!

Hi @qingyun-wu , thank you very much for your prompt reply.
I have "estimator_list": ['lgbm', 'rf', 'xgboost', 'catboost']. With the same code and data, sometimes I got that lgbm is the best estimator and sometimes xgboost. Even with the same "best estimator" (e.g., lgbm), the models were different every time. So I wonder if there is any way to keep the results to be similar?

Hi @zzheng93,

Could you try np.random.seed(seed) before automl.fit()

This presumably can help reduce randomness.

Thank you @qingyun-wu ! I'll try it

zhonghua-zheng · 2021-08-18T20:27:08Z

Hi, I have followed your suggestions (seed and time_budget=3600). Seems the best_estimator is converged (in my case, they converged to "lgbm"). Although the results are different, they are very close (e.g., r2 of testing data varied from 0.80 to 0.82).

However, I noticed the models are overfitting. For example, the n_estimator is >25000. I wonder if there is any approach in FLAML to prevent overfitting and make the models simpler?

{'n_estimators': 26744,
 'num_leaves': 1009,
 'min_child_samples': 24,
 'learning_rate': 0.01940044907863862,
 'subsample': 1.0,
 'log_max_bin': 7,
 'colsample_bytree': 0.5739585077256129,
 'reg_alpha': 0.0009765625,
 'reg_lambda': 1.087179087107877,
 'FLAML_sample_size': 160000}

Here is the configuration:

automl = AutoML()
automl_settings = {
    "metric": 'r2',
    "estimator_list": ['lgbm', 'rf', 'xgboost', 'catboost'],
    "task": 'regression',
    "log_file_name": "./train.log"
}

np.random.seed(66)
automl.fit(X_train=X_train, y_train=y_train,
           verbose=0, time_budget=3600,
           **automl_settings)

print(f"best_etimator:{automl.best_estimator}")
print(automl.best_config)

Thanks!

qingyun-wu · 2021-08-19T00:45:52Z

Hi, I have followed your suggestions (seed and time_budget=3600). Seems the best_estimator is converged (in my case, they converged to "lgbm"). Although the results are different, they are very close (e.g., r2 of testing data varied from 0.80 to 0.82).

However, I noticed the models are overfitting. For example, the n_estimator is >25000. I wonder if there is any approach in FLAML to prevent overfitting and make the models simpler?
{'n_estimators': 26744,
 'num_leaves': 1009,
 'min_child_samples': 24,
 'learning_rate': 0.01940044907863862,
 'subsample': 1.0,
 'log_max_bin': 7,
 'colsample_bytree': 0.5739585077256129,
 'reg_alpha': 0.0009765625,
 'reg_lambda': 1.087179087107877,
 'FLAML_sample_size': 160000}
Here is the configuration:
automl = AutoML()
automl_settings = {
    "metric": 'r2',
    "estimator_list": ['lgbm', 'rf', 'xgboost', 'catboost'],
    "task": 'regression',
    "log_file_name": "./train.log"
}

np.random.seed(66)
automl.fit(X_train=X_train, y_train=y_train,
           verbose=0, time_budget=3600,
           **automl_settings)

print(f"best_etimator:{automl.best_estimator}")
print(automl.best_config)
Thanks!

Hi @zzheng93, thank you for following up. I am wondering why do you conclude that the models are overfitting? Is this conclusion obtained purely based on the fact the this model found is complex, or do you also observe that the gap between validation error and the test error is large on this resulting model? Thanks!

zhonghua-zheng · 2021-08-19T02:20:18Z

Hi @qingyun-wu , thank you for your prompt response. I guess it is overfitting because the n_estimator is much larger than the model trained from time_budget=300 (n_estimator is about 1200). But the testing errors are similar (R2 = ~0.8).

The training error (R2 = 0.99) from configuration "time_budget=3600" is indeed smaller than the one trained from "time_budget=300" (R2 = 0.93).

qingyun-wu · 2021-08-20T04:34:36Z

Hi @qingyun-wu , thank you for your prompt response. I guess it is overfitting because the n_estimator is much larger than the model trained from time_budget=300 (n_estimator is about 1200). But the testing errors are similar (R2 = ~0.8).

The training error (R2 = 0.99) from configuration "time_budget=3600" is indeed smaller than the one trained from "time_budget=300" (R2 = 0.93).

Hi @zzheng93, in your case, at least the larger model (the model found with a larger time budget) does not give worse testing results, so it is less worrisome (assuming you only care about the test error and do not worry about the model size), right? Another more proactive suggestion is to design your own metric to guide the search of AutoML. For example, this custom_metric penalizes the training loss with the hope of alleviating overfitting. It is suggested by other data scientists using FLAML. Perhaps you can also give it a try and let us know whether it helps in your case.

Thank you!

sonichi · 2021-09-01T23:40:40Z

This custom metric function is added as an example to the notebook https://github.com/microsoft/FLAML/blob/main/notebook/flaml_automl.ipynb in #178

sonichi linked a pull request Aug 12, 2021 that will close this issue

v0.5.12 #150

Merged

sonichi closed this as completed Sep 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

reproducibility and random state for AutoML.fit() #151

reproducibility and random state for AutoML.fit() #151

zhonghua-zheng commented Aug 4, 2021

qingyun-wu commented Aug 4, 2021

zhonghua-zheng commented Aug 4, 2021

sonichi commented Aug 4, 2021

zhonghua-zheng commented Aug 4, 2021

sonichi commented Aug 4, 2021

zhonghua-zheng commented Aug 4, 2021

qingyun-wu commented Aug 9, 2021

zhonghua-zheng commented Aug 10, 2021

zhonghua-zheng commented Aug 18, 2021

qingyun-wu commented Aug 19, 2021

zhonghua-zheng commented Aug 19, 2021

qingyun-wu commented Aug 20, 2021

sonichi commented Sep 1, 2021

reproducibility and random state for AutoML.fit() #151

reproducibility and random state for AutoML.fit() #151

Comments

zhonghua-zheng commented Aug 4, 2021

qingyun-wu commented Aug 4, 2021

zhonghua-zheng commented Aug 4, 2021

sonichi commented Aug 4, 2021

zhonghua-zheng commented Aug 4, 2021

sonichi commented Aug 4, 2021

zhonghua-zheng commented Aug 4, 2021

qingyun-wu commented Aug 9, 2021

zhonghua-zheng commented Aug 10, 2021

zhonghua-zheng commented Aug 18, 2021

qingyun-wu commented Aug 19, 2021

zhonghua-zheng commented Aug 19, 2021

qingyun-wu commented Aug 20, 2021

sonichi commented Sep 1, 2021