You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Inspired by scikit-learn/enhancement_proposals#24 I decided to look into either it is possible to seed RNG used by sklearn "globally". Apparently it was even me who, after a heated discussion, added treatment of SKLEARN_SEED environment variable within setup_module() fixture back in 2012 4915d4d to provide means for reproducible testing.
AFAIK there is no generic treatment of SKLEARN_SEED or any other env variable as to define a starting point of RNG for an arbitrary script which (directly or indirectly) uses sklearn. It would be useful for the cases where there is a script which uses scikit-learn functionality and has no explicit seeding handling built-in. Setting the seed via env variable provides a chance to provide reproducible results if I re-run that script with the same env variable value without doing any modifications to the script or any underlying library which actually interfaces to scikit-learn and/or other libraries.
This is a strategy we used in PyMVPA and started to collect similar cases (see https://github.com/ReproNim/reproseed/blob/master/reproseed.sh#L24 for the short for now list) with the hope to be able to seed all relevant tools once in case when it is necessary to make some issue or result reproducible.
I wondered if there would be any interest to pursue this direction.
The text was updated successfully, but these errors were encountered:
Inspired by scikit-learn/enhancement_proposals#24 I decided to look into either it is possible to seed RNG used by sklearn "globally". Apparently it was even me who, after a heated discussion, added treatment of
SKLEARN_SEED
environment variable withinsetup_module()
fixture back in 2012 4915d4d to provide means for reproducible testing.AFAIK there is no generic treatment of
SKLEARN_SEED
or any other env variable as to define a starting point of RNG for an arbitrary script which (directly or indirectly) uses sklearn. It would be useful for the cases where there is a script which uses scikit-learn functionality and has no explicit seeding handling built-in. Setting the seed via env variable provides a chance to provide reproducible results if I re-run that script with the same env variable value without doing any modifications to the script or any underlying library which actually interfaces to scikit-learn and/or other libraries.This is a strategy we used in PyMVPA and started to collect similar cases (see https://github.com/ReproNim/reproseed/blob/master/reproseed.sh#L24 for the short for now list) with the hope to be able to seed all relevant tools once in case when it is necessary to make some issue or result reproducible.
I wondered if there would be any interest to pursue this direction.
The text was updated successfully, but these errors were encountered: