Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add common test: fixing random_state makes algorithms deterministic. #7139

Open
amueller opened this issue Aug 4, 2016 · 12 comments
Open

add common test: fixing random_state makes algorithms deterministic. #7139

amueller opened this issue Aug 4, 2016 · 12 comments
Labels
Hard Hard level of difficulty Needs Decision Requires decision

Comments

@amueller
Copy link
Member

amueller commented Aug 4, 2016

I think we should add a test to see if all estimators either are deterministic and have no random_state, or are deterministic after setting the random_state.

@olologin
Copy link
Contributor

olologin commented Aug 7, 2016

How are we going to prove deterministic behaviour? Test every estimator 3-4 times after setting random_state?

@nelson-liu
Copy link
Contributor

A more manual way that would be faster at test-time than the above would be to run the estimator with a set random_state, get the results, then hardcode these results into the test

@olologin
Copy link
Contributor

olologin commented Aug 7, 2016

@nelson-liu, This sounds better, it will be kind of regression testing. We could serialise all results. But we will have to reset these results after some major changes of some estimator.

@GaelVaroquaux
Copy link
Member

GaelVaroquaux commented Aug 8, 2016 via email

@betatim
Copy link
Member

betatim commented Aug 27, 2016

I started work on this.

@nelson-liu
Copy link
Contributor

Great, thanks @betatim I think this is a really needed enhancement

@rth
Copy link
Member

rth commented Apr 11, 2019

How are we going to prove deterministic behavior? Test every estimator 3-4 times after setting random_state?

Another way could be to fit the estimator on some data, serialize it and hash the output. The hash should be be identical after fitting the estimator twice (I think). This would allow checking the reproducibility of estimators that e.g. do not implement predict, though it would fail to detect non-deterministic behavior in predict function itself.

@jnothman jnothman added this to the 0.22 milestone Apr 16, 2019
@amueller amueller modified the milestones: 0.22, 0.23 Oct 17, 2019
@amueller
Copy link
Member Author

move to milestone 0.23

@adrinjalali adrinjalali added Hard Hard level of difficulty Needs Decision Requires decision labels Apr 15, 2020
@adrinjalali adrinjalali removed this from the 0.23 milestone Apr 15, 2020
@adrinjalali
Copy link
Member

removing from milestone

@Reksbril
Copy link
Contributor

@adrinjalali is it still needed? If so, I could start working on this.

@adrinjalali
Copy link
Member

Thanks, but there's already an open PR for this @Reksbril

@aboucaud
Copy link
Contributor

aboucaud commented Oct 8, 2020

I was giving a tutorial these days when someone stopped me because random_state=0 was passed in the train_test_split() but their score did not match mine. I asked all the participants for their score. Some had the same, up to the last decimal, and the others had slightly different answers.

Here is the piece of code executed:

from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression

data = load_breast_cancer()
X_train, X_test, y_train, y_test = train_test_split(
    data.data, data.target, 
    stratify=data.target, random_state=0)
lr = LogisticRegression().fit(X_train, y_train)
score = lr.score(X_test, y_test)

I have no idea if this behavior is to be expected since I did not set the random_state in the LogisticRegression(). The doc says it is not needed for the 'lbfgs' solver, which is the default value in all the versions tested here.

Since this kind of reproducibility errors cannot be easily tested with CI, I asked them for their system information. The report with the scores obtained can be found below.

Python version: 3.8.5 | packaged by conda-forge | (default, Sep 24 2020, 16:20:24) [MSC v.1916 64 bit (AMD64)]
Numpy version: 1.18.5
Scikit-learn version: 0.23.2
System: Windows
Release: 10
Version: 10.0.17763
Machine: AMD64
Processor: Intel64 Family 6 Model 142 Stepping 10, GenuineIntel

Score of sklearn code: 0.9370629370629371

Python version: 3.8.5 | packaged by conda-forge | (default, Sep 24 2020, 16:20:24) [MSC v.1916 64 bit (AMD64)]
Numpy version: 1.18.5
Scikit-learn version: 0.23.2
System: Windows
Release: 10
Version: 10.0.17763
Machine: AMD64
Processor: Intel64 Family 6 Model 142 Stepping 9, GenuineIntel

Score of sklearn code: 0.9370629370629371

Python version: 3.8.5 | packaged by conda-forge | (default, Sep 16 2020, 17:19:16) [MSC v.1916 64 bit (AMD64)]
Numpy version: 1.18.5
Scikit-learn version: 0.23.2
System: Windows
Release: 10
Version: 10.0.17763
Machine: AMD64
Processor: Intel64 Family 6 Model 142 Stepping 12, GenuineIntel

Score of sklearn code: 0.9440559440559441

Python version: 3.8.5 | packaged by conda-forge | (default, Sep 24 2020, 16:20:24) [MSC v.1916 64 bit (AMD64)]
Numpy version: 1.18.5
Scikit-learn version: 0.23.2
System: Windows
Release: 10
Version: 10.0.17763
Machine: AMD64
Processor: Intel64 Family 6 Model 61 Stepping 4, GenuineIntel

Score of sklearn code: 0.9300699300699301

Python version: 3.8.5 | packaged by conda-forge | (default, Aug 29 2020, 00:43:28) [MSC v.1916 64 bit (AMD64)]
Numpy version: 1.19.1
Scikit-learn version: 0.23.2
System: Windows
Release: 10
Version: 10.0.17763
Machine: AMD64
Processor: Intel64 Family 6 Model 142 Stepping 12, GenuineIntel

Score of sklearn code: 0.9230769230769231

Python version: 3.8.3 (default, Jul 2 2020, 17:30:36) [MSC v.1916 64 bit (AMD64)]
Numpy version: 1.18.5
Scikit-learn version: 0.23.1
System: Windows
Release: 10
Version: 10.0.17763
Machine: AMD64
Processor: Intel64 Family 6 Model 158 Stepping 13, GenuineIntel

Score of sklearn code: 0.9300699300699301

Python version: 3.8.6 | packaged by conda-forge | (default, Oct 7 2020, 18:22:52) [MSC v.1916 64 bit (AMD64)]
Numpy version: 1.19.1
Scikit-learn version: 0.23.2
System: Windows
Release: 10
Version: 10.0.17763
Machine: AMD64
Processor: Intel64 Family 6 Model 158 Stepping 13, GenuineIntel

Score of sklearn code: 0.9300699300699301

Python version: 3.8.5 | packaged by conda-forge | (default, Aug 29 2020, 00:43:28) [MSC v.1916 64 bit (AMD64)]
Numpy version: 1.18.5
Scikit-learn version: 0.23.2
System: Windows
Release: 10
Version: 10.0.17763
Machine: AMD64
Processor: Intel64 Family 6 Model 158 Stepping 10, GenuineIntel

Score of sklearn code: 0.9370629370629371

​Python version: 3.7.4 (default, Sep 18 2019, 19:37:15)
[Clang 10.0.1 (clang-1001.0.46.4)]
Numpy version: 1.18.4
Scikit-learn version: 0.23.2
System: Darwin
Release: 18.7.0
Version: Darwin Kernel Version 18.7.0: Mon Aug 31 20:53:32 PDT 2020; root:xnu-4903.278.44~1/RELEASE_X86_64
Machine: x86_64
Processor: i386

Score of sklearn code: 0.9300699300699301

Let me know if I should write an independent issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Hard Hard level of difficulty Needs Decision Requires decision
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants