New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In Gaussian mixtures, when n_init > 1, the lower_bound_ is not always the max #10869

Closed
ageron opened this Issue Mar 25, 2018 · 1 comment

Comments

Projects
None yet
2 participants
@ageron
Contributor

ageron commented Mar 25, 2018

Description

In Gaussian mixtures, when n_init is set to any value greater than 1, the lower_bound_ is not the max lower bound across all initializations, but just the lower bound of the last initialization.

The bug can be fixed by adding the following line just before return self in BaseMixture.fit():

self.lower_bound_ = max_lower_bound

The test that should have caught this bug is test_init() in mixture/tests/test_gaussian_mixture.py, but it just does a single test, so it had a 50% chance of missing the issue. It should be updated to try many random states.

Steps/Code to Reproduce

import numpy as np
from sklearn.mixture import GaussianMixture

X = np.random.rand(1000, 10)
for random_state in range(100):
    gm1 = GaussianMixture(n_components=2, n_init=1, random_state=random_state).fit(X)
    gm2 = GaussianMixture(n_components=2, n_init=10, random_state=random_state).fit(X)
    assert gm2.lower_bound_ > gm1.lower_bound_, random_state

Expected Results

No error.

Actual Results

Traceback (most recent call last):
  File "<stdin>", line 4, in <module>
AssertionError: 4

Versions

>>> import platform; print(platform.platform())
Darwin-17.4.0-x86_64-i386-64bit
>>> import sys; print("Python", sys.version)
Python 3.6.4 (default, Dec 21 2017, 20:33:21)
[GCC 4.2.1 Compatible Apple LLVM 9.0.0 (clang-900.0.38)]
>>> import numpy; print("NumPy", numpy.__version__)
NumPy 1.14.2
>>> import scipy; print("SciPy", scipy.__version__)
SciPy 1.0.0
>>> import sklearn; print("Scikit-Learn", sklearn.__version__)
Scikit-Learn 0.19.1
@jnothman

This comment has been minimized.

Member

jnothman commented Mar 26, 2018

Thanks for the report and the analysis!

GaelVaroquaux added a commit that referenced this issue Jul 16, 2018

[MRG+1] Fix lower_bound_ not equal to max lower bound in mixture mode…
…ls when n_init > 1 (#10870)

* Set lower_bound_ to max lower bound at the end of BaseMixture.fit(), fixes #10869

* Use a local lower_bound variable rather than self.lower_bound_ during training, in BaseMixture.fit()

* Remove extra empty line

* Update documentation and reduce test_init() iterations from 100 to 25

* Update whats_new/v0.20.rst to add mention of issue 10869, and reformat file to fit on 80 columns

* Remove extra line in whats_new/v0.20.rst

* Add tests for convergence detection in Gaussian mixtures when warm_start=True

* Remove unnecessary catch_exception blocks

* Fix tests since n_iter_ was recently fixed to be increased by 1

* Revert changes unrelated to PR 10870 in doc/whats_new/v0.20.rst

* Replace assert_* with plain asserts because of the move to pytest

* Remove comment in whats_new/v0.20.rst that will be added upon merging

* Replace single backticks with double backticks in doc string

* Limit is 79 characters per line, not 80.

* Remove the false convergence fix to treat it in a separate PR
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment