-
-
Notifications
You must be signed in to change notification settings - Fork 25.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG+1] Fix lower_bound_ not equal to max lower bound in mixture models when n_init > 1 #10870
[MRG+1] Fix lower_bound_ not equal to max lower bound in mixture models when n_init > 1 #10870
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is right, but it makes the repeated setting of lower_bound_ look inexplicable (unless it were intended for debugging a crash!). Can we use a local lower_bound instead?
a0bffd9
to
1b9f54e
Compare
Hi @jnothman , |
5957ea2
to
1b9f54e
Compare
… training, in BaseMixture.fit()
There were a couple traps, but I finally managed to get this thing to work. Here's the new behavior:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this logic looks reasonable. Do attribute docstrings need improvement?
rand_data = RandomData(np.random.RandomState(random_state), scale=1) | ||
n_components = rand_data.n_components | ||
X = rand_data.X['full'] | ||
for random_state in range(100): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you double check that this runs quite quickly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It runs in 4 seconds on my laptop. Is that sufficiently quick? If not, we could reduce this to less than a second by iterating just 25 times. The probability for the unfixed code to pass this test would be 1/2^25, which is roughly 3e-8. Tell me what you prefer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, 4s for this kind of test is excessively long, IMO. And if the unfixed code only has a chance of 3e-8 of passing (did you get that the right way around?), then we can run it for fewer than that. Alternatively, we can reduce the size of X if that's not actually related to the property being tested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, reducing to 25 iterations. I think I got the calculation right: the unfixed code sets the lower_bound_
to the lower bound of the last initialization, which has 50% chance of being higher than the lower bound of the first initialization. So in order to pass the updated test, the unfixed code would need to have 50% chance 25 times in a row, which 0.5^25 = 3e-8. In other words, the new test would catch the bug in the unfixed code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, thanks. A smaller dataset would also be fine.
sklearn/mixture/base.py
Outdated
@@ -191,6 +191,7 @@ def fit(self, X, y=None): | |||
X = _check_X(X, self.n_components, ensure_min_samples=2) | |||
self._check_initial_parameters(X) | |||
|
|||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rm blank line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, indeed. Fixing this now.
I just updated the documentation, I hope it's clear. I also changed the logic slightly: the previous |
Hi @jnothman , is there anything else you need from me to fix this bug or are we okay to merge? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can that new logic be tested?
This requires a second review before merge
Please add an entry to the change log at |
…t file to fit on 80 columns
Thanks for your feedback, @jnothman . I just updated |
I also asked: Can that new logic be tested? |
Hi @jnothman , I just added the tests for the new logic. In short:
|
Hi @jnothman , |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs a review by someone else still.
doc/whats_new/v0.20.rst
Outdated
@@ -26,6 +26,8 @@ random sampling procedures. | |||
- :class:`linear_model.OrthogonalMatchingPursuit` (bug fix) | |||
- :class:`metrics.roc_auc_score` (bug fix) | |||
- :class:`metrics.roc_curve` (bug fix) | |||
- :class:`mixture.BayesianGaussianMixture` (bug bix) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remind me: does this pr actually change the prediction? If it's only affecting the attribute value, I think we should leave it out of here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it may change the prediction in some very rare case. Consider:
gm = GaussianMixture(warm_start=True, max_iter=10)
gm.fit(X1)
gm.fit(X2)
If X2
is different from X1
, is does not make sense to use the previous value of lower_bound_
as a starting point. The previous implementation would do that, and therefore it might wrongly detect convergence after the first iteration (this is unlikely, but possible, in particular if X2
is very similar to X1
, or if tol
is large). This PR fixes this. Thus, the fixed algorithm might converge to a better solution and produce different (better) predictions.
You could consider these as separate issues: (1) wrong lower_bound_
value, and (2) wrong convergence detection logic. However, they're both caused by the same few lines of code, so I fixed them both in this one PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation
doc/whats_new/v0.20.rst
Outdated
@@ -212,8 +214,8 @@ Model evaluation and meta-estimators | |||
:issue:`9304` by :user:`Breno Freitas <brenolf>`. | |||
|
|||
- Add `return_estimator` parameter in :func:`model_selection.cross_validate` to | |||
return estimators fitted on each split. :issue:`9686` by :user:`Aurélien Bellet | |||
<bellet>`. | |||
return estimators fitted on each split. :issue:`9686` by |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Argh! Please revert all changes unrelated to the present fix!
Please do not change unrelated things. It makes your contribution harder to review and may introduce merge conflicts to other pull requests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, sorry about that, makes sense (FYI, I was trying to make the file use 80 characters per line).
gmm.fit(X) | ||
if gmm.converged_: | ||
break | ||
assert_true(gmm.converged_) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: since moving to pytest, we're trying to avoid such assert_* functions. Use a bare assert instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, fixing this now.
sklearn/mixture/base.py
Outdated
which the model has the largest likelihood or lower bound. Within each | ||
trial, the method iterates between E-step and M-step for `max_iter` | ||
times until the change of likelihood or lower bound is less than | ||
`tol`, otherwise, a `ConvergenceWarning` is raised. | ||
If `warm_start` is `True`, then `n_init` is ignored and a single |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be double backticks everywhere....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh right, just fixed this, thanks.
# lower_bound_ is very close to the lower_bound_ after the previous call | ||
# to the fit method. | ||
# Unlikely, but possible and problematic, so we might as well avoid it. | ||
rng = np.random.RandomState(0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I'm being slow. I don't see where this case is handled and I don't understand the test. Why do we always reset with max_iter > 1 now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No worries, let me explain. The scenario I'm trying to avoid, which could happen today (without this PR), is using warm_start=True
, and you run gm.fit(X1)
followed by gm.fit(X2)
, where X1
and X2
are different datasets, and if you are unlucky the first iteration of gm.fit(X2)
happens to compute a lower bound very close to the final lower bound of gm.fit(X1)
, so the algorithm thinks the algorithm has converged when it fact it should have continued to iterate. Sure, this is quite unlikely with the default tol
and if X1
and X2
are very different, but it might be dangerously likely if X1
and X2
are very similar, but not identical (or if tol
is high).
Since it is hard to find two datasets X1
and X2
where this scenario occurs, I test this scenario by setting tol
to infinity.
This case is handled on line 217 of base.py
: we start the iterations with lower_bound = (-np.infty if do_init or self.max_iter > 1 else self.lower_bound_)
. So we only continue from the final lower bound of the last call to fit()
if warm_start
is True
and max_iter == 1
. The assumption is that people who use warm_start=True
and max_iter = 1
are certainly doing this to manually run the training loop themselves on a single dataset, but if they are using max_iter > 1
, it is unclear whether they are running consecutive calls to fit()
on the same dataset or not, so we should err on the safe side.
So in short there are two issues in the current implementation: (1) if n_init > 1
, the lower bound is the one from the last initialization, not from the best initialization, and (2) there is a risk of false convergence if warm_start
is True
. Since they are both due to the same few lines of code, I fixed them both in this one PR.
Hope this helps.
I've not got a deep understanding yet, but I'm not sure we should be
focusing on users changing datasets between warm_start fits (which I
consider an abnormal use) or making implicit assumptions about the meaning
of max_iter=1
|
@jnothman, thanks for your feedback. I did not know that switching datasets between |
I see what you mean now. that's a good use. let's look again...
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the only concerning part here is that max_iter controls multiple things now. If it's not hard to split that from this PR, it would help us ensure that at least the bug is fixed for release
Sure @jnothman , it shouldn't be too hard, I'll split the PR (probably this week-end). |
Hi @jnothman , I updated this PR to keep only the fix for the original bug, i.e., the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. LGTM.
Hi there @amueller , this PR needs a second review, whenever you have the chance. |
I resolved the conflicts. I will merge once the tests pass. |
All tests pass aside from appveyor, which is lagging behind as an effect of the sprint. Merging! |
Thank you!!! |
Thanks to all the reviewers! 👍 |
Reference Issues/PRs
Fixes #10869
What does this implement/fix? Explain your changes.
Just set the
lower_bound_
to be equal to themax_lower_bound
at the end of the loop over the initializations (at the end ofBaseMixture.fit()
).Also fix the
test_init()
function that was supposed to catch this bug. I do this by looping over multiple random states rather than just trying one (which had a 50% chance of wrongly succeeding).Any other comments?
Thanks for making such a great library! :)