You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I would like to be able to set n_jobs=-1 in one place and have this take effect in any function with an n_jobs parameter. Same for random_state.
Perhaps there are other parameters that fit the theme: "if a user sets this in one instance, they probably want to set it in all instances".
Describe your proposed solution
Expand the accepted parameters of sklearn.set_config, update all functions to fall back to the config value if the parameter isn't passed.
This would require changing a default of None to a sentinel (ala _NoValue in NumPy), to allow a user to override the global config with the value None, while still allowing the code to detect if the argument was passed.
I'll admit, having to add something like resolve_arg_value("random_state", random_state, None) to tons of functions sounds painful, but I think for the user, being able to set and forget a random state/other params, potentially based on an environment variable would be nice.
For n_jobs, we ruled out the possibility here: #23253. It is already possible to do so with the parallel_backend of joblib. However, be extremely careful with n_jobs=-1.
For the random_state, isn't it enough to seed the singleton random generator from NumPy using np.random.seed and let all random_state=None. In this case, the results will be reproducible since random_state=None will use the global singleton.
Re random_state, np.random.seed will work, thanks for that. Although the docs specifically warn against it and recommend passing the random_state arg to every function that takes it.
So the goal of my global config suggestion was to allow the 'recommended' way of getting reproducible results without the clutter of passing a random_state parameter to all the scikit-learn functions that take it (and the mental overhead of thinking about which functions take it and which don't).
Also if scikit-learn one day moves from the legacy NumPy RandomState to the new Generator, a global config would provide an upgrade path to all the users who were using the discouraged np.random.seed technique.
Update: I've just noticed that setting np.random.seed(77) doesn't give me reproducible results when using GradientBoostingRegressor, but GradientBoostingRegressor(random_state=77) does.
I'm also using HalvingRandomSearchCV with scipy.stats.randint, maybe that complicates things (I'm a newbie, the landscape is still a little foggy). If this isn't a known issue I'll try and create a minimal repro.
Describe the workflow you want to enable
I would like to be able to set
n_jobs=-1
in one place and have this take effect in any function with ann_jobs
parameter. Same forrandom_state
.Perhaps there are other parameters that fit the theme: "if a user sets this in one instance, they probably want to set it in all instances".
Describe your proposed solution
Expand the accepted parameters of sklearn.set_config, update all functions to fall back to the config value if the parameter isn't passed.
This would require changing a default of
None
to a sentinel (ala_NoValue
in NumPy), to allow a user to override the global config with the valueNone
, while still allowing the code to detect if the argument was passed.Rudimentary mockup:
I'll admit, having to add something like
resolve_arg_value("random_state", random_state, None)
to tons of functions sounds painful, but I think for the user, being able to set and forget a random state/other params, potentially based on an environment variable would be nice.Describe alternatives you've considered, if relevant
No response
Additional context
I see that the tests get a global random seed. #22749
The text was updated successfully, but these errors were encountered: