-
-
Notifications
You must be signed in to change notification settings - Fork 25.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG] Fix SGD non deterministic behavior #13422
Conversation
Windows unit tests seem to fail, I need to investigate a bit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this. I agree the nondeterminism is a problem
dataset, intercept_decay = make_dataset(X, y_i, sample_weight) | ||
|
||
# XXX should have random_state_! | ||
random_state = random_state if random_state is not None \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure we need to support retrieving the random_state from the estimator. Let's just use check_random_state and keep it simple.
4c4d02d
to
a5fbfe6
Compare
@jnothman thanks, I made the change and used |
a5fbfe6
to
fe23a0c
Compare
Is it possible that MAX_INT is not platform-independent? |
a89888d
to
2015b20
Compare
aeddd29
to
171d863
Compare
e9e94e8
to
ab03321
Compare
OK I think I fixed it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Under the assumption that there's no reasonable way to write a regression test, this LGTM.
@ClemDoum you'll need to rebase on current master |
Codecov Report
@@ Coverage Diff @@
## master #13422 +/- ##
==========================================
- Coverage 96.65% 96.39% -0.27%
==========================================
Files 376 377 +1
Lines 69711 69869 +158
==========================================
- Hits 67381 67347 -34
- Misses 2330 2522 +192
Continue to review full report at Codecov.
|
Thank you @ClemDoum. Sorry I forgot to merge it when I approved. |
Side note: Since GitHub now has a nice squash-and-merge button, we now prefer merging master instead of rebasing. It is easier to follow the discussion. |
This reverts commit 7560d92.
This reverts commit 7560d92.
Does @ClemDoum or anyone else involved in this discussion remember the reason of I was not able to find any reference to this in the discussion ... |
@lesteve yes sorry I didn't mentioned it in the PR but if I remember well |
Right that makes sense, thanks a lot! |
This removes the -Woverflow warnings observed when building scikit-learn. RAND_R_MAX is the max value for uint8, incrementing it causes an overflow (hence the warning). I think this commit fixes the implementation, yet I comes with a backwards incompatible results and tests for implementation relying on `our_rand_r` fails because results are now different. I see several alternatives to remove the warning while having tests pass - prefered solution: adapt the test suite using the new results so that all tests pass and ackowledge the change of behavior for impacted user-facing APIs in the changelog - accept the quirk of this implementation but hardcode and rename the effective constant - silent the -Woverflow warning by another mean Relates to: scikit-learn#13422 scikit-learn#24895
This removes the -Woverflow warnings observed when building scikit-learn. RAND_R_MAX is the max value for uint8, incrementing it causes an overflow (hence the warning). Elements were originaly mentionned in scikit-learn#13422 (comment) but left unreviewed, it seems. I think this commit fixes the implementation, yet I comes with a backwards incompatible results and tests for implementation relying on `our_rand_r` fails because results are now different. I see several alternatives to remove the warning while having tests pass - prefered solution: adapt the test suite using the new results so that all tests pass and ackowledge the change of behavior for impacted user-facing APIs in the changelog - accept the quirk of this implementation but hardcode and rename the effective constant - silent the -Woverflow warning by another mean Relates to: scikit-learn#13422 scikit-learn#24895
This removes the -Woverflow warnings observed when building scikit-learn. RAND_R_MAX is the max value for uint8, incrementing it causes an overflow (hence the warning). Elements were originally mentioned but seem to have been left unreviewed, see: scikit-learn#13422 (comment) I think this commit fixes the implementation, yet I comes with a backwards incompatible results and tests for implementation relying on `our_rand_r` fails because results are now different. I see several alternatives to remove the warning while having tests pass - prefered solution: adapt the test suite using the new results so that all tests pass and ackowledge the change of behavior for impacted user-facing APIs in the changelog - accept the quirk of this implementation but hardcode and rename the effective constant - silent the -Woverflow warning by another mean Relates to: scikit-learn#13422 scikit-learn#24895
This removes the -Woverflow warnings observed when building scikit-learn. RAND_R_MAX is the max value for uint8, incrementing it causes an overflow (hence the warning). Elements were originally mentioned but seem to have been left unreviewed, see: scikit-learn#13422 (comment) I think this commit fixes the implementation, yet I comes with a backwards incompatible results and tests for implementation relying on `our_rand_r` fails because results are now different. I see several alternatives to remove the warning while having tests pass - preferred solution: adapt the test suite using the new results so that all tests pass and acknowledge the change of behavior for impacted user-facing APIs in the change-log - accept the quirk of this implementation but hardcode and rename the effective constant - silent the -Woverflow warning by another mean Relates to: scikit-learn#13422 scikit-learn#24895
What does this fix:
While trying to make my intent classifier training deterministic in my NLU lib I noticed that I couldn't because the
BaseSGDClassifier._fit_multiclass
method is non deterministic.To reproduce you have to initialize the
SGDClassifier
with aRandomState
instance:I added a small print statement just before the joblib threads pass their seed to the
plain_sgd
oraverage_sgd
function. This gives me the following output (and will give you a different output on your machine but will eventually fail):What is happening is that the joblib threads share the
BaseSGDClassifier.random_state
and set a seed in thefit_binary
function before it's passed to theplain_sgd
oraverage_sgd
function. Depending on the order on which the threads reach the seed setting, the output of the SGD can differ.The other reason is that the estimator random state was not passed in the
make_dataset
function.I think the bug was not noticeable in the unit tests because the SGD estimators were initialized with
int
random states. In this case, the input of thecheck_random_state
function in thefit_binary
function is the integer seed, and each thread is actually returning the exact same random state and then sample the exact same random seed for the SGD.How does this fixes the bug:
seed
argument to thefit_binary
function which default toNone
. If the seed is notNone
then it will be used otherwise the seed is set with the estimatorrandom_state
. This allows to set the jobs seeds before the jobs are distributed to the threads and avoids the non-deterministic behaviormake_dataset
functionHelp needed
(Now also fixes #5015.)
Comment
While fixing the initial but another bug was found and fixed: the
our_rand_r
in the Cython code of thesklearn/utils/seq_dataset.pyx
was not behaving consistently across platforms, see the full bug description and fix below