Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+1] Bayesian Gaussian Mixture (Integration of GSoC2015 -- second step) #6651

Merged
merged 33 commits into from Aug 30, 2016

Conversation

Projects
None yet
9 participants
@tguillemot
Copy link
Contributor

tguillemot commented Apr 12, 2016

This PR is the second part of the GSoC integration. It is directly based on the work of the #6407.
Here I propose to integrate the Bayesian Gaussian Mixture :

  • Check the code and formulas
  • Add the Bayesian Gaussian class
  • Add the docstring
  • Add the tests
  • Depreciation of the old class
  • Remove the mixtures with small weights during the process
  • Change the doc
  • Create some examples

This PR is based on #6407, it will be better to analyse only the files that refer to the BayesianGaussianMixture class.

# XXX @xuewei4d I think you forgot n_component in your code ?
temp1 = (.5 * np.sum(temp1) +
self.n_components * self._log_gaussian_norm_prior)

This comment has been minimized.

Copy link
@tguillemot

tguillemot Apr 12, 2016

Author Contributor

@xuewei4d I think you forgot to multiply the log_gaussian_norm by n_components. Could you confirm it for the 4 functions please ?

This comment has been minimized.

Copy link
@xuewei4d

xuewei4d Apr 12, 2016

Contributor

I checked it. I didn't forget it in Line791 in my PR

@ogrisel

This comment has been minimized.

Copy link
Member

ogrisel commented Apr 22, 2016

@tguillemot could you please rebase/squash on top of the current master to take the recent changes from #6666 into account in this PR?

@tguillemot tguillemot force-pushed the tguillemot:GSoC-BayesianMixture branch 2 times, most recently from 352b51a to 427650b Apr 22, 2016

@tguillemot tguillemot force-pushed the tguillemot:GSoC-BayesianMixture branch from 427650b to 8710b60 May 19, 2016

@ogrisel

This comment has been minimized.

Copy link
Member

ogrisel commented May 25, 2016

@tguillemot This PR needs to be updated to take the precision-based parametrization into account:

ImportError: cannot import name _check_covariance_matrix
@tguillemot

This comment has been minimized.

Copy link
Contributor Author

tguillemot commented May 25, 2016

@ogrisel I push the last commit I've done but I'm working on another PR for the moment.
There are some bugs I've to investigate.

@tguillemot tguillemot force-pushed the tguillemot:GSoC-BayesianMixture branch from 2d93f16 to 8b205df Jun 5, 2016

@tguillemot

This comment has been minimized.

Copy link
Contributor Author

tguillemot commented Jul 17, 2016

I've solved the problem with VBGMM. I've to do some cleaning but I think I will be good to merge next week.
@xuewei4d Can you send me the latex of your GSOC pdf of the bishop formulas place ? I will have to add these formula on sklearn. Thanks in advance.

@xuewei4d

This comment has been minimized.

Copy link
Contributor

xuewei4d commented Jul 17, 2016

@tguillemot Sure. Can I have your email address?

@ngoix

This comment has been minimized.

Copy link
Contributor

ngoix commented Jul 26, 2016

Is is expected that when increasing n_components, the number of components found (with non-negligible weights) can decrease, even with a large n_init? (alpha_init was fixed to 0.1)
It is due to the initialization step, right?

@tguillemot

This comment has been minimized.

Copy link
Contributor Author

tguillemot commented Jul 27, 2016

@xuewei4d Thanks for the formula.

@ngoix The version current version of this PR is not working well and have a lot a problem. So I suspect that it is a problem of that.
Hopefully, BayesianGaussianMixture works perfectly now (after a lot of corrections).
I will push everything in a few moment.
I will try on my side but once I pushed, can you confirm that ?

@tguillemot tguillemot force-pushed the tguillemot:GSoC-BayesianMixture branch 4 times, most recently from d486d12 to 65e3400 Jul 28, 2016

@tguillemot

This comment has been minimized.

Copy link
Contributor Author

tguillemot commented Jul 28, 2016

@ngoix The code of the BayesianGaussianMixture is corrected now.
I need to add some tests, examples and docs until MRG.

@xuewei4d

This comment has been minimized.

Copy link
Contributor

xuewei4d commented Jul 28, 2016

@tguillemot Can I have the updated formula pdf?

@ngoix

This comment has been minimized.

Copy link
Contributor

ngoix commented Jul 28, 2016

It can be due to my data, but now the number of components found is always maximal (even with n_components = 100). The algorithm does not compute bic/aic scores, right?

@ngoix

This comment has been minimized.

Copy link
Contributor

ngoix commented Jul 29, 2016

whoops, it does not always find the maximal number of components sorry.
However, even when the number of components found is lower than n_components, it can still varies when increasing n_components. (even with alpha_init fixed)
How much does the number of components found depends on n_init?

@tguillemot

This comment has been minimized.

Copy link
Contributor Author

tguillemot commented Jul 29, 2016

@xuewei4d I haven't corrected the latex formula. I will put everything on scikit once it will be done.

@ngoix This method is an EM and converge to a local minimum. If the init is not good, you will never reach the global minimum.
As the init is actually done with kmeans, if the kmeans is more or less the same at each iteration of n_iter, the solution will be the same every time.
I have put another init option called 'test' (I will remove it in a few moment), maybe if you use that you will have different results.
Can you do a notebook to show me your data and results ?

@tguillemot

This comment has been minimized.

Copy link
Contributor Author

tguillemot commented Aug 3, 2016

@tguillemot tguillemot changed the title [WIP] Bayesian Gaussian Mixture (Integration of GSoC2015 -- second step) [MRG] Bayesian Gaussian Mixture (Integration of GSoC2015 -- second step) Aug 3, 2016

@tguillemot

This comment has been minimized.

Copy link
Contributor Author

tguillemot commented Aug 3, 2016

@agramfort @amueller @ogrisel BayesianGaussianMixture is mergeable.
Nevertheless the review will be easier when #7123 and #7124 will be merged.

random_state = check_random_state(self.random_state)

if self.init_params == 'kmeans':
resp = np.zeros((n_samples, self.n_components))
label = cluster.KMeans(n_clusters=self.n_components, n_init=1,
random_state=random_state).fit(X).labels_
random_state=0).fit(X).labels_

This comment has been minimized.

Copy link
@agramfort

agramfort Aug 4, 2016

Member

0 -> random_state

Parameters
----------
n_components: int, default to 1.

This comment has been minimized.

Copy link
@agramfort

agramfort Aug 4, 2016

Member

default -> defaults

@@ -264,6 +265,8 @@ def _m_step(self, X, log_resp):
X : array-like, shape (n_samples, n_features)
log_resp : array-like, shape (n_samples, n_components)
Logarithm of the posterior probabilities (or responsibilities) of
the point of X.

This comment has been minimized.

Copy link
@ogrisel

ogrisel Aug 29, 2016

Member

of each sample in X.

@TomDLT TomDLT added this to the 0.18 milestone Aug 29, 2016

@tguillemot tguillemot force-pushed the tguillemot:GSoC-BayesianMixture branch from 4d5de40 to 9c7ca50 Aug 30, 2016

@tguillemot

This comment has been minimized.

Copy link
Contributor Author

tguillemot commented Aug 30, 2016

@ogrisel Do you prefer to wait that #7284 is ready to merge before merging this PR ?

@ogrisel

This comment has been minimized.

Copy link
Member

ogrisel commented Aug 30, 2016

Let's merge now. Thanks for all your efforts @tguillemot!

@ogrisel ogrisel merged commit f0862f7 into scikit-learn:master Aug 30, 2016

@ogrisel

This comment has been minimized.

Copy link
Member

ogrisel commented Aug 30, 2016

And also thank you again @xuewei4d for the initial code refactoring and maths derivations.

@xuewei4d

This comment has been minimized.

Copy link
Contributor

xuewei4d commented Aug 30, 2016

Thanks @tguillemot !
I will take a look at the math part, once I have time, @ogrisel

@jnothman

This comment has been minimized.

Copy link
Member

jnothman commented Aug 31, 2016

Hurrah!!

On 31 August 2016 at 08:28, Wei Xue notifications@github.com wrote:

Thanks @tguillemot https://github.com/tguillemot !
I will take a look at the math part, once I have time, @ogrisel
https://github.com/ogrisel


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#6651 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAEz62GC5M8UAeBaJJtEx4BPHOz8KCiRks5qlK6NgaJpZM4IFO_l
.

@tguillemot

This comment has been minimized.

Copy link
Contributor Author

tguillemot commented Aug 31, 2016

Hurrah !!!!!!!! Thanks everyone !!!

@raghavrv

This comment has been minimized.

Copy link
Member

raghavrv commented Aug 31, 2016

yay! 🍻

@amueller

This comment has been minimized.

Copy link
Member

amueller commented Aug 31, 2016

awesome :) Thanks everyone!

:class:`BayesianGaussianMixture`. The new class solves the computational
problems of the old class and computes the Variational Bayesian Gaussian
mixture faster than before.
Ref :ref:`b` for more information.

This comment has been minimized.

Copy link
@amueller

amueller Sep 7, 2016

Member

@tguillemot what's b supposed to reference? It's a dead link.

This comment has been minimized.

Copy link
@tguillemot

tguillemot Sep 7, 2016

Author Contributor

This will be fix with #7295.

TomDLT added a commit to TomDLT/scikit-learn that referenced this pull request Oct 3, 2016

[MRG+1] Bayesian Gaussian Mixture (Integration of GSoC2015 -- second …
…step) (scikit-learn#6651)

* Add the new BayesianGaussianMixture class.
Add the test file for the BayesianGaussianMixture.

* Add the use of the cholesky decomposition of the precision matrix.

* Fix some bugs.

* Modification of GaussianMixture class.

The purpose here is to prepare the integration of BayesianGaussianMixture.

* Fix comments.

* Modification of the Docstring.

* Add license and author.

* Fix pb typo of eq 10.64 and 10.62.

* Correct VBGMM bugs.

* Fix full version.

* Fix the precision normalisation pb.

* Fix all cov_type algo for BayesianGaussianMixture.

* Optimisation of spherical and diag computation.

* Code simplification.

* Check the Gaussian Mixture tests are ok.

* Add test.

* Add new tests for BayesianGaussianMixture and GaussianMixture.

* Add the bayesian_gaussian_example and the doc.

* Fix comments.

* Fix review comments and add license and author.

* Fix test compare covar type.

* Fix reviews.

* Fix tests.

* Fix review comments.

* Correct reviews.

* Fix travis pb.

* Fix circleci pb.

* Fix review comments.

* Fix typo.

* Fix comments.

Add reg_covar and what's new.

* Fix comments.

* Fix comments.

* [ci skip] Correct legend.

bmanohar16 added a commit to bmanohar16/scikit-learn that referenced this pull request Jul 20, 2017

Change label vbgmm to bgmm
Previously modified with PR scikit-learn#6651

jnothman added a commit that referenced this pull request Jul 30, 2017

[MRG + 1] DOC Fix Sphinx errors (#9420)
* Fix Rouseeuw1984 broken link

* Change label vbgmm to bgmm
Previously modified with PR #6651

* Change tag name
Old refers to new tag added with PR #7388

* Remove prefix underscore to match tag

* Realign to fit 80 chars

* Link to metrics.rst.
pairwise metrics yet to be documented

* Remove tag as LSHForest is deprecated

* Remove all references to randomized_l1 and sphx_glr_auto_examples_linear_model_plot_sparse_recovery.py.
It is deprecated.

* Fix few Sphinx warnings

* Realign to 80 chars

* Changes based on PR review

* Remove unused ref in calibration

* Fix link ref in covariance.rst

* Fix linking issues

* Differentiate Rouseeuw1999 tag within file.

* Change all duplicate Rouseeuw1999 tags

* Remove numbers from tag Rousseeuw

jnothman added a commit to jnothman/scikit-learn that referenced this pull request Aug 6, 2017

[MRG + 1] DOC Fix Sphinx errors (scikit-learn#9420)
* Fix Rouseeuw1984 broken link

* Change label vbgmm to bgmm
Previously modified with PR scikit-learn#6651

* Change tag name
Old refers to new tag added with PR scikit-learn#7388

* Remove prefix underscore to match tag

* Realign to fit 80 chars

* Link to metrics.rst.
pairwise metrics yet to be documented

* Remove tag as LSHForest is deprecated

* Remove all references to randomized_l1 and sphx_glr_auto_examples_linear_model_plot_sparse_recovery.py.
It is deprecated.

* Fix few Sphinx warnings

* Realign to 80 chars

* Changes based on PR review

* Remove unused ref in calibration

* Fix link ref in covariance.rst

* Fix linking issues

* Differentiate Rouseeuw1999 tag within file.

* Change all duplicate Rouseeuw1999 tags

* Remove numbers from tag Rousseeuw

dmohns added a commit to dmohns/scikit-learn that referenced this pull request Aug 7, 2017

[MRG + 1] DOC Fix Sphinx errors (scikit-learn#9420)
* Fix Rouseeuw1984 broken link

* Change label vbgmm to bgmm
Previously modified with PR scikit-learn#6651

* Change tag name
Old refers to new tag added with PR scikit-learn#7388

* Remove prefix underscore to match tag

* Realign to fit 80 chars

* Link to metrics.rst.
pairwise metrics yet to be documented

* Remove tag as LSHForest is deprecated

* Remove all references to randomized_l1 and sphx_glr_auto_examples_linear_model_plot_sparse_recovery.py.
It is deprecated.

* Fix few Sphinx warnings

* Realign to 80 chars

* Changes based on PR review

* Remove unused ref in calibration

* Fix link ref in covariance.rst

* Fix linking issues

* Differentiate Rouseeuw1999 tag within file.

* Change all duplicate Rouseeuw1999 tags

* Remove numbers from tag Rousseeuw

dmohns added a commit to dmohns/scikit-learn that referenced this pull request Aug 7, 2017

[MRG + 1] DOC Fix Sphinx errors (scikit-learn#9420)
* Fix Rouseeuw1984 broken link

* Change label vbgmm to bgmm
Previously modified with PR scikit-learn#6651

* Change tag name
Old refers to new tag added with PR scikit-learn#7388

* Remove prefix underscore to match tag

* Realign to fit 80 chars

* Link to metrics.rst.
pairwise metrics yet to be documented

* Remove tag as LSHForest is deprecated

* Remove all references to randomized_l1 and sphx_glr_auto_examples_linear_model_plot_sparse_recovery.py.
It is deprecated.

* Fix few Sphinx warnings

* Realign to 80 chars

* Changes based on PR review

* Remove unused ref in calibration

* Fix link ref in covariance.rst

* Fix linking issues

* Differentiate Rouseeuw1999 tag within file.

* Change all duplicate Rouseeuw1999 tags

* Remove numbers from tag Rousseeuw

paulha added a commit to paulha/scikit-learn that referenced this pull request Aug 19, 2017

[MRG+1] Bayesian Gaussian Mixture (Integration of GSoC2015 -- second …
…step) (scikit-learn#6651)

* Add the new BayesianGaussianMixture class.
Add the test file for the BayesianGaussianMixture.

* Add the use of the cholesky decomposition of the precision matrix.

* Fix some bugs.

* Modification of GaussianMixture class.

The purpose here is to prepare the integration of BayesianGaussianMixture.

* Fix comments.

* Modification of the Docstring.

* Add license and author.

* Fix pb typo of eq 10.64 and 10.62.

* Correct VBGMM bugs.

* Fix full version.

* Fix the precision normalisation pb.

* Fix all cov_type algo for BayesianGaussianMixture.

* Optimisation of spherical and diag computation.

* Code simplification.

* Check the Gaussian Mixture tests are ok.

* Add test.

* Add new tests for BayesianGaussianMixture and GaussianMixture.

* Add the bayesian_gaussian_example and the doc.

* Fix comments.

* Fix review comments and add license and author.

* Fix test compare covar type.

* Fix reviews.

* Fix tests.

* Fix review comments.

* Correct reviews.

* Fix travis pb.

* Fix circleci pb.

* Fix review comments.

* Fix typo.

* Fix comments.

Add reg_covar and what's new.

* Fix comments.

* Fix comments.

* [ci skip] Correct legend.

paulha added a commit to paulha/scikit-learn that referenced this pull request Aug 19, 2017

[MRG + 1] DOC Fix Sphinx errors (scikit-learn#9420)
* Fix Rouseeuw1984 broken link

* Change label vbgmm to bgmm
Previously modified with PR scikit-learn#6651

* Change tag name
Old refers to new tag added with PR scikit-learn#7388

* Remove prefix underscore to match tag

* Realign to fit 80 chars

* Link to metrics.rst.
pairwise metrics yet to be documented

* Remove tag as LSHForest is deprecated

* Remove all references to randomized_l1 and sphx_glr_auto_examples_linear_model_plot_sparse_recovery.py.
It is deprecated.

* Fix few Sphinx warnings

* Realign to 80 chars

* Changes based on PR review

* Remove unused ref in calibration

* Fix link ref in covariance.rst

* Fix linking issues

* Differentiate Rouseeuw1999 tag within file.

* Change all duplicate Rouseeuw1999 tags

* Remove numbers from tag Rousseeuw

AishwaryaRK added a commit to AishwaryaRK/scikit-learn that referenced this pull request Aug 29, 2017

[MRG + 1] DOC Fix Sphinx errors (scikit-learn#9420)
* Fix Rouseeuw1984 broken link

* Change label vbgmm to bgmm
Previously modified with PR scikit-learn#6651

* Change tag name
Old refers to new tag added with PR scikit-learn#7388

* Remove prefix underscore to match tag

* Realign to fit 80 chars

* Link to metrics.rst.
pairwise metrics yet to be documented

* Remove tag as LSHForest is deprecated

* Remove all references to randomized_l1 and sphx_glr_auto_examples_linear_model_plot_sparse_recovery.py.
It is deprecated.

* Fix few Sphinx warnings

* Realign to 80 chars

* Changes based on PR review

* Remove unused ref in calibration

* Fix link ref in covariance.rst

* Fix linking issues

* Differentiate Rouseeuw1999 tag within file.

* Change all duplicate Rouseeuw1999 tags

* Remove numbers from tag Rousseeuw

maskani-moh added a commit to maskani-moh/scikit-learn that referenced this pull request Nov 15, 2017

[MRG + 1] DOC Fix Sphinx errors (scikit-learn#9420)
* Fix Rouseeuw1984 broken link

* Change label vbgmm to bgmm
Previously modified with PR scikit-learn#6651

* Change tag name
Old refers to new tag added with PR scikit-learn#7388

* Remove prefix underscore to match tag

* Realign to fit 80 chars

* Link to metrics.rst.
pairwise metrics yet to be documented

* Remove tag as LSHForest is deprecated

* Remove all references to randomized_l1 and sphx_glr_auto_examples_linear_model_plot_sparse_recovery.py.
It is deprecated.

* Fix few Sphinx warnings

* Realign to 80 chars

* Changes based on PR review

* Remove unused ref in calibration

* Fix link ref in covariance.rst

* Fix linking issues

* Differentiate Rouseeuw1999 tag within file.

* Change all duplicate Rouseeuw1999 tags

* Remove numbers from tag Rousseeuw

jwjohnson314 pushed a commit to jwjohnson314/scikit-learn that referenced this pull request Dec 18, 2017

[MRG + 1] DOC Fix Sphinx errors (scikit-learn#9420)
* Fix Rouseeuw1984 broken link

* Change label vbgmm to bgmm
Previously modified with PR scikit-learn#6651

* Change tag name
Old refers to new tag added with PR scikit-learn#7388

* Remove prefix underscore to match tag

* Realign to fit 80 chars

* Link to metrics.rst.
pairwise metrics yet to be documented

* Remove tag as LSHForest is deprecated

* Remove all references to randomized_l1 and sphx_glr_auto_examples_linear_model_plot_sparse_recovery.py.
It is deprecated.

* Fix few Sphinx warnings

* Realign to 80 chars

* Changes based on PR review

* Remove unused ref in calibration

* Fix link ref in covariance.rst

* Fix linking issues

* Differentiate Rouseeuw1999 tag within file.

* Change all duplicate Rouseeuw1999 tags

* Remove numbers from tag Rousseeuw

MLopez-Ibanez pushed a commit to MLopez-Ibanez/scikit-learn that referenced this pull request Feb 9, 2019

[MRG + 1] DOC Fix Sphinx errors (scikit-learn#9420)
* Fix Rouseeuw1984 broken link

* Change label vbgmm to bgmm
Previously modified with PR scikit-learn#6651

* Change tag name
Old refers to new tag added with PR scikit-learn#7388

* Remove prefix underscore to match tag

* Realign to fit 80 chars

* Link to metrics.rst.
pairwise metrics yet to be documented

* Remove tag as LSHForest is deprecated

* Remove all references to randomized_l1 and sphx_glr_auto_examples_linear_model_plot_sparse_recovery.py.
It is deprecated.

* Fix few Sphinx warnings

* Realign to 80 chars

* Changes based on PR review

* Remove unused ref in calibration

* Fix link ref in covariance.rst

* Fix linking issues

* Differentiate Rouseeuw1999 tag within file.

* Change all duplicate Rouseeuw1999 tags

* Remove numbers from tag Rousseeuw
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.