add common test: fixing random_state makes algorithms deterministic. #7139

amueller · 2016-08-04T16:30:50Z

I think we should add a test to see if all estimators either are deterministic and have no random_state, or are deterministic after setting the random_state.

olologin · 2016-08-07T18:37:21Z

How are we going to prove deterministic behaviour? Test every estimator 3-4 times after setting random_state?

nelson-liu · 2016-08-07T18:40:55Z

A more manual way that would be faster at test-time than the above would be to run the estimator with a set random_state, get the results, then hardcode these results into the test

olologin · 2016-08-07T18:45:00Z

@nelson-liu, This sounds better, it will be kind of regression testing. We could serialise all results. But we will have to reset these results after some major changes of some estimator.

GaelVaroquaux · 2016-08-08T06:19:03Z

A more manual way that would be faster at test-time than the above would be to run the estimator with a set random_state, get the results, then hardcode these results into the test

I am always very worried of hardcoded results in tests. When things start failing, it's hard to know why.

betatim · 2016-08-27T13:46:20Z

I started work on this.

nelson-liu · 2016-08-27T14:25:28Z

Great, thanks @betatim I think this is a really needed enhancement

rth · 2019-04-11T16:17:28Z

How are we going to prove deterministic behavior? Test every estimator 3-4 times after setting random_state?

Another way could be to fit the estimator on some data, serialize it and hash the output. The hash should be be identical after fitting the estimator twice (I think). This would allow checking the reproducibility of estimators that e.g. do not implement predict, though it would fail to detect non-deterministic behavior in predict function itself.

amueller · 2019-10-17T15:04:52Z

move to milestone 0.23

adrinjalali · 2020-04-15T14:25:14Z

removing from milestone

Reksbril · 2020-04-29T10:39:47Z

@adrinjalali is it still needed? If so, I could start working on this.

adrinjalali · 2020-04-29T12:31:59Z

Thanks, but there's already an open PR for this @Reksbril

aboucaud · 2020-10-08T19:54:02Z

I was giving a tutorial these days when someone stopped me because random_state=0 was passed in the train_test_split() but their score did not match mine. I asked all the participants for their score. Some had the same, up to the last decimal, and the others had slightly different answers.

Here is the piece of code executed:

from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression

data = load_breast_cancer()
X_train, X_test, y_train, y_test = train_test_split(
    data.data, data.target, 
    stratify=data.target, random_state=0)
lr = LogisticRegression().fit(X_train, y_train)
score = lr.score(X_test, y_test)

I have no idea if this behavior is to be expected since I did not set the random_state in the LogisticRegression(). The doc says it is not needed for the 'lbfgs' solver, which is the default value in all the versions tested here.

Since this kind of reproducibility errors cannot be easily tested with CI, I asked them for their system information. The report with the scores obtained can be found below.

Python version: 3.8.5 | packaged by conda-forge | (default, Sep 24 2020, 16:20:24) [MSC v.1916 64 bit (AMD64)]
Numpy version: 1.18.5
Scikit-learn version: 0.23.2
System: Windows
Release: 10
Version: 10.0.17763
Machine: AMD64
Processor: Intel64 Family 6 Model 142 Stepping 10, GenuineIntel

Score of sklearn code: 0.9370629370629371

Python version: 3.8.5 | packaged by conda-forge | (default, Sep 24 2020, 16:20:24) [MSC v.1916 64 bit (AMD64)]
Numpy version: 1.18.5
Scikit-learn version: 0.23.2
System: Windows
Release: 10
Version: 10.0.17763
Machine: AMD64
Processor: Intel64 Family 6 Model 142 Stepping 9, GenuineIntel

Score of sklearn code: 0.9370629370629371

Python version: 3.8.5 | packaged by conda-forge | (default, Sep 16 2020, 17:19:16) [MSC v.1916 64 bit (AMD64)]
Numpy version: 1.18.5
Scikit-learn version: 0.23.2
System: Windows
Release: 10
Version: 10.0.17763
Machine: AMD64
Processor: Intel64 Family 6 Model 142 Stepping 12, GenuineIntel

Score of sklearn code: 0.9440559440559441

Python version: 3.8.5 | packaged by conda-forge | (default, Sep 24 2020, 16:20:24) [MSC v.1916 64 bit (AMD64)]
Numpy version: 1.18.5
Scikit-learn version: 0.23.2
System: Windows
Release: 10
Version: 10.0.17763
Machine: AMD64
Processor: Intel64 Family 6 Model 61 Stepping 4, GenuineIntel

Score of sklearn code: 0.9300699300699301

Python version: 3.8.5 | packaged by conda-forge | (default, Aug 29 2020, 00:43:28) [MSC v.1916 64 bit (AMD64)]
Numpy version: 1.19.1
Scikit-learn version: 0.23.2
System: Windows
Release: 10
Version: 10.0.17763
Machine: AMD64
Processor: Intel64 Family 6 Model 142 Stepping 12, GenuineIntel

Score of sklearn code: 0.9230769230769231

Python version: 3.8.3 (default, Jul 2 2020, 17:30:36) [MSC v.1916 64 bit (AMD64)]
Numpy version: 1.18.5
Scikit-learn version: 0.23.1
System: Windows
Release: 10
Version: 10.0.17763
Machine: AMD64
Processor: Intel64 Family 6 Model 158 Stepping 13, GenuineIntel

Score of sklearn code: 0.9300699300699301

Python version: 3.8.6 | packaged by conda-forge | (default, Oct 7 2020, 18:22:52) [MSC v.1916 64 bit (AMD64)]
Numpy version: 1.19.1
Scikit-learn version: 0.23.2
System: Windows
Release: 10
Version: 10.0.17763
Machine: AMD64
Processor: Intel64 Family 6 Model 158 Stepping 13, GenuineIntel

Score of sklearn code: 0.9300699300699301

Python version: 3.8.5 | packaged by conda-forge | (default, Aug 29 2020, 00:43:28) [MSC v.1916 64 bit (AMD64)]
Numpy version: 1.18.5
Scikit-learn version: 0.23.2
System: Windows
Release: 10
Version: 10.0.17763
Machine: AMD64
Processor: Intel64 Family 6 Model 158 Stepping 10, GenuineIntel

Score of sklearn code: 0.9370629370629371

Python version: 3.7.4 (default, Sep 18 2019, 19:37:15)
[Clang 10.0.1 (clang-1001.0.46.4)]
Numpy version: 1.18.4
Scikit-learn version: 0.23.2
System: Darwin
Release: 18.7.0
Version: Darwin Kernel Version 18.7.0: Mon Aug 31 20:53:32 PDT 2020; root:xnu-4903.278.44~1/RELEASE_X86_64
Machine: x86_64
Processor: i386

Score of sklearn code: 0.9300699300699301

Let me know if I should write an independent issue.

amueller added the Need Contributor label Aug 4, 2016

betatim mentioned this issue Aug 27, 2016

[WIP] Test determinism of estimators #7270

Closed

5 tasks

amueller removed the Need Contributor label Mar 3, 2017

jnothman added this to the 0.22 milestone Apr 16, 2019

amueller modified the milestones: 0.22, 0.23 Oct 17, 2019

adrinjalali added Hard Hard level of difficulty Needs Decision Requires decision labels Apr 15, 2020

adrinjalali removed this from the 0.23 milestone Apr 15, 2020

aboucaud mentioned this issue Nov 12, 2020

Incoherent results of supposedly reproducible piece of code #18824

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add common test: fixing random_state makes algorithms deterministic. #7139

add common test: fixing random_state makes algorithms deterministic. #7139

amueller commented Aug 4, 2016

olologin commented Aug 7, 2016

nelson-liu commented Aug 7, 2016

olologin commented Aug 7, 2016

GaelVaroquaux commented Aug 8, 2016 via email

betatim commented Aug 27, 2016

nelson-liu commented Aug 27, 2016

rth commented Apr 11, 2019

amueller commented Oct 17, 2019

adrinjalali commented Apr 15, 2020

Reksbril commented Apr 29, 2020

adrinjalali commented Apr 29, 2020

aboucaud commented Oct 8, 2020 •

edited

add common test: fixing random_state makes algorithms deterministic. #7139

add common test: fixing random_state makes algorithms deterministic. #7139

Comments

amueller commented Aug 4, 2016

olologin commented Aug 7, 2016

nelson-liu commented Aug 7, 2016

olologin commented Aug 7, 2016

GaelVaroquaux commented Aug 8, 2016 via email

betatim commented Aug 27, 2016

nelson-liu commented Aug 27, 2016

rth commented Apr 11, 2019

amueller commented Oct 17, 2019

adrinjalali commented Apr 15, 2020

Reksbril commented Apr 29, 2020

adrinjalali commented Apr 29, 2020

aboucaud commented Oct 8, 2020 • edited

aboucaud commented Oct 8, 2020 •

edited