New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+2] GSoC Final : Dirichlet Gaussian Mixture #7295

Closed
wants to merge 12 commits into
base: master
from

Conversation

Projects
None yet
6 participants
@tguillemot
Contributor

tguillemot commented Aug 30, 2016

This closes #7377, closes #7115, closes #2473, closes #2454, closes #1764 and closes #1637.

This is the last PR to remove completely the old GMM classes.

Here, you'll find the DirichletGaussianMixture class with the doc, examples and tests.

It will be easier to review when #6651 will be merged (and also there will be no conflits).

I have removed the example plot_gmm_sin.py because it's wasn't showing the properties of DPGMM correctly for me (it modifies the covariance_type through the experiment to obtain better results).

Instead of that I prefer introduce a similar exemple as the one I have introduce for the BayesianGaussianMixture.
dpgmm

@tguillemot tguillemot changed the title from [WIP] GSoC Final : Dirichlet Gaussian Mixture to [MRG] GSoC Final : Dirichlet Gaussian Mixture Aug 31, 2016

The examples above compare Gaussian mixtures models with fixed number of
components, to the Dirichlet Gaussian Mixtures models. **On the left** the GMM

This comment has been minimized.

@TomDLT

TomDLT Aug 31, 2016

Member

GMM -> Gaussian mixtures

@@ -226,34 +227,41 @@ concentration parameter.
:target: ../auto_examples/mixture/plot_gmm.html
:scale: 48%
.. |plot_gmm_sin| image:: ../auto_examples/mixture/images/sphx_glr_plot_gmm_sin_001.png
:target: ../auto_examples/mixture/plot_gmm_sin.html
.. |plot_gmm_sin| image:: ../auto_examples/mixture/images/sphx_glr_plot_dirichlet_process_mixture_001.png

This comment has been minimized.

@TomDLT

TomDLT Aug 31, 2016

Member

plot_gmm_sin does not exist anymore

components, to the Dirichlet Gaussian Mixtures models. **On the left** the GMM
is fitted with 5 components on a dataset composed of 2 clusters. We can see that
the Dirichlet Gaussian Mixtures is able to limit itself to only 2 components
whereas the GMM fits the data fit too many components. Note that with very

This comment has been minimized.

@TomDLT

TomDLT Aug 31, 2016

Member

GMM/DPGMM here and also above in the rest of the file

Plot the resulting ellipsoids of a mixture of three Gaussians with
a Dirichlet Process Gaussian Mixture for three different values of the prior
the beta concentration.

This comment has been minimized.

@TomDLT

TomDLT Aug 31, 2016

Member

the beta concentration prior?

mean_precision_prior=None, mean_prior=None,
degrees_of_freedom_prior=None, covariance_prior=None,
random_state=None, warm_start=False, verbose=0,
verbose_interval=20):

This comment has been minimized.

@TomDLT

TomDLT Aug 31, 2016

Member

verbose_interval not in the docstring

X : array-like, shape (n_samples, n_features)
log-resp : array, shape (n_samples, n_components)

This comment has been minimized.

@TomDLT

TomDLT Aug 31, 2016

Member

as in BayesianGaussianMixture:

        log_resp : array, shape (n_samples, n_components)
            Logarithm of the posterior probabilities (or responsibilities) of
            the point of each sample in X.

        log_prob_norm : float
            Logarithm of the probability of each sample in X.
# Parameters
random_state = 2
n_components, n_features = 3, 2
colors = np.array(['mediumseagreen', 'royalblue', 'r', 'gold',

This comment has been minimized.

@TomDLT

TomDLT Aug 31, 2016

Member

You modified the colors. Are you sure its colorblind compatible ?
cf #5576

This comment has been minimized.

@TomDLT

TomDLT Aug 31, 2016

Member

also in BayesianGaussianMixture example

@tguillemot

This comment has been minimized.

Contributor

tguillemot commented Aug 31, 2016

@TomDLT Thanks for this first round of review.

@ogrisel

This comment has been minimized.

Member

ogrisel commented Aug 31, 2016

I am in favor of keeping the "sin" example (adapted to use the new class). While I agree that it is a weird and artificial dataset, I also appreciate the facts that:

  • it's bad to break sacred links on the Holy Web,
  • it's good to show that models can in practice still be used (and be somewhat useful) even if the data violates the assumptions of the underlying generative model.
@tguillemot

This comment has been minimized.

Contributor

tguillemot commented Sep 1, 2016

Ok fair enough. So, I've modified the example to plot something that make sense (change only the beta concentration prior and not the covariance_type ).
gmm_sin

This class doesn't require the user to choose the number of
components, and at the expense of extra computational time the user
only needs to specify a loose upper bound on this number and a
concentration parameter.
.. |plot_gmm| image:: ../auto_examples/mixture/images/sphx_glr_plot_gmm_001.png
:target: ../auto_examples/mixture/plot_gmm.html
:scale: 48%
:scale: 31%

This comment has been minimized.

@TomDLT

TomDLT Sep 1, 2016

Member

I guess the rendered page is not as intended.

This comment has been minimized.

@tguillemot

tguillemot Sep 1, 2016

Contributor

I was wondering what will be the result and it's ugly :).
I'll change that.

@TomDLT

This comment has been minimized.

Member

TomDLT commented Sep 1, 2016

This seems pretty clean to me

@tguillemot

This comment has been minimized.

Contributor

tguillemot commented Sep 2, 2016

This is the new doc.

@agramfort @ogrisel Do you think we can merge that for 0.18 ???

The BIC criterion can be used to select the number of components in a Gaussian
Mixture in an efficient way. In theory, it recovers the true number of
components only in the asymptotic regime (i.e. if much data is available). Note
that using a :ref:`DirichletGaussianMixture <dpgmm>` avoids the specification of

This comment has been minimized.

@raghavrv

raghavrv Sep 2, 2016

Member

Since this is not a class ref, could we have this as Dirichlet Gaussian Mixture?

Dirichlet Process: Infinite Gaussian mixtures
==============================================================
The :class:`DirichletGaussianMixture` object implements a variant of the

This comment has been minimized.

@raghavrv

raghavrv Sep 2, 2016

Member

object

This comment has been minimized.

@ogrisel

ogrisel Sep 2, 2016

Member

It should be:

The :class:`DirichletGaussianMixture` class implements...

and that will render to:

The DirichletGaussianMixture class implements...

which is proper English.

:align: center
:scale: 70%
Here, a classical Gaussian mixtures is fitted with 5 components on a dataset

This comment has been minimized.

@raghavrv

raghavrv Sep 2, 2016

Member

mixture?

This class doesn't require the user to choose the number of
components, and at the expense of extra computational time the user
only needs to specify a loose upper bound on this number and a
concentration parameter.
.. |plot_gmm| image:: ../auto_examples/mixture/images/sphx_glr_plot_gmm_001.png
The examples bellow compare Gaussian mixtures models with fixed number of

This comment has been minimized.

@raghavrv

raghavrv Sep 2, 2016

Member

mixtures --> mixture

.. |plot_gmm| image:: ../auto_examples/mixture/images/sphx_glr_plot_gmm_001.png
The examples bellow compare Gaussian mixtures models with fixed number of
components, to the Dirichlet Gaussian mixtures models.

This comment has been minimized.

@raghavrv

raghavrv Sep 2, 2016

Member

mixtures --> mixture

Here, a classical Gaussian mixtures is fitted with 5 components on a dataset
composed of 2 clusters. We can see that the Dirichlet Gaussian mixtures is able
to limit itself to only 2 components whereas the Gaussian mixtures fits the data
fit too many components. Note that with very little observations, the Dirichlet

This comment has been minimized.

@raghavrv

raghavrv Sep 2, 2016

Member

fit too --> with too

data.
The previous figure presents the resulting clusters computed by the Dirichlet
Gaussian mixtures for different values of `beta_concentration_prior`. The
`beta_concentration_prior` is directly linked to the number of clusters

This comment has been minimized.

@raghavrv

raghavrv Sep 2, 2016

Member

double backticks here too...

Gaussian mixtures for different values of `beta_concentration_prior`. The
`beta_concentration_prior` is directly linked to the number of clusters
obtained. As for the Variational Bayesian Gaussian Mixtures, small value of this
parameters will lead to some mixture components while high value leads more

This comment has been minimized.

@raghavrv

raghavrv Sep 2, 2016

Member

leads to

This comment has been minimized.

@raghavrv

raghavrv Sep 2, 2016

Member

Is this better maybe?

The ``beta_concentration_prior`` is directly proportional to the number of clusters obtained.
In Variational Bayesian Gaussian Mixture, smaller values of ``beta_concentration_prior`` lead to
fewer components and higher values lead to more components but is more stable as each
component is activated only if necessary.
Pros and cons of class :class:`DPGMM`: Dirichlet process mixture model
----------------------------------------------------------------------
Pros and cons of class :class:`DirichletGaussianMixture`: Dirichlet process mixture model

This comment has been minimized.

@raghavrv

raghavrv Sep 2, 2016

Member

Capital P, M, M?

This comment has been minimized.

@tguillemot

tguillemot Sep 2, 2016

Contributor

It's more Pros and cons of Dirichlet process with class :class:DirichletGaussianMixture

this number needs to be provided. Note however that the DPMM is not
a formal model selection procedure, and thus provides no guarantee
on the result.
this number needs to be provided. Note however that the Dirichlet Gaussian

This comment has been minimized.

@raghavrv

raghavrv Sep 2, 2016

Member

Dirichlet Process Gaussian Mixture

@@ -5,7 +5,7 @@
Plot the resulting ellipsoids of a mixture of three Gaussians with
variational Bayesian Gaussian Mixture for three different values on the
prior the dirichlet concentration.
dirichlet concentration prior.
For all models, the Variationnal Bayesian Gaussian Mixture adapts its number of
mixture automatically. The parameter `dirichlet_concentration_prior` has a

This comment has been minimized.

@raghavrv

raghavrv Sep 2, 2016

Member

Double ticks here too..

This comment has been minimized.

@tguillemot

tguillemot Sep 2, 2016

Contributor

I will correct all the double ticks. :)

@@ -0,0 +1,115 @@
"""
=============================================
Dirichlet Process Mixture Beta Prior Analysis

This comment has been minimized.

@raghavrv

raghavrv Sep 2, 2016

Member

Gaussian Mixture

Plot the resulting ellipsoids of a mixture of three Gaussians with
a Dirichlet Process Gaussian Mixture for three different values of the
beta concentration prior.

This comment has been minimized.

@raghavrv

raghavrv Sep 2, 2016

Member
``beta_concentration_prior``
a Dirichlet Process Gaussian Mixture for three different values of the
beta concentration prior.
For all models, the Dirichlet Process Gaussian Mixture adapts its number of

This comment has been minimized.

@raghavrv

raghavrv Sep 2, 2016

Member

all the

beta concentration prior.
For all models, the Dirichlet Process Gaussian Mixture adapts its number of
mixture automatically. The parameter `beta_concentration_prior` has a

This comment has been minimized.

@raghavrv

raghavrv Sep 2, 2016

Member

components rather than mixtures for consistency maybe? (ambivalent)

For all models, the Dirichlet Process Gaussian Mixture adapts its number of
mixture automatically. The parameter `beta_concentration_prior` has a
direct link with the resulting number of components. Specifying a high value of
`beta_concentration_prior` leads more often to uniformly-sized mixture

This comment has been minimized.

@raghavrv

raghavrv Sep 2, 2016

Member

more often leads

each figure, we plot the results for three different values of the weight
concentration prior.
The ``BayesianGaussianMixture`` can adapt its number of mixture automatically.

This comment has been minimized.

@ogrisel

ogrisel Sep 8, 2016

Member

The BayesianGaussianMixture class

This comment has been minimized.

@ogrisel

ogrisel Sep 8, 2016

Member

its number of mixture components

The ``BayesianGaussianMixture`` can adapt its number of mixture automatically.
The parameter ``weight_concentration_prior`` has a direct link with the
resulting number of components. Specifying higher values more often leads to

This comment has been minimized.

@ogrisel

ogrisel Sep 8, 2016

Member

resulting number of components with non-zero weights.

The ``BayesianGaussianMixture`` can adapt its number of mixture automatically.
The parameter ``weight_concentration_prior`` has a direct link with the
resulting number of components. Specifying higher values more often leads to
uniformly- sized mixture components, while specifying smaller values will lead

This comment has been minimized.

@ogrisel

ogrisel Sep 8, 2016

Member

Please remove the reference to uniformly distributed components which is not true for the DP prior.

Add a note at the end of the paragraph stating that the DD prior will favor more uniformly weighted components.

This class allows to infer an approximate posterior distribution over the
parameters of a Gaussian mixture distribution. The effective number of
components can be inferred from the data.
This class implements two different prior type for the weight distribution:

This comment has been minimized.

@ogrisel

ogrisel Sep 8, 2016

Member

types of prior for the weights distribution

model with the Dirichlet Process. In practice the approximate the Dirichlet
Process inference algorithm uses a truncated distribution with a fixed
maximum number of components (called the Stick-breaking representation),
but almost always the number of components actually used depends on the

This comment has been minimized.

@ogrisel

ogrisel Sep 8, 2016

Member

Cut the sentence:

representation). The number of components actually used almost always depends on the data.

This class implements two different prior type for the weight distribution:
a finite mixture model with Dirichlet distribution and an infinite mixture
model with the Dirichlet Process. In practice the approximate the Dirichlet
Process inference algorithm uses a truncated distribution with a fixed

This comment has been minimized.

@ogrisel

ogrisel Sep 8, 2016

Member

In practice Dirichlet Process inference algorithm is approximated and uses a truncated distribution with a fixed...

@ogrisel

This comment has been minimized.

Member

ogrisel commented Sep 8, 2016

CI is broken because of a broken import and a PEP8 issue.

Also could you please try to make the following test run in less than 1s by tweaking the training data size or the hyper parameters of the model?

sklearn.mixture.tests.test_bayesian_mixture.test_monotonic_likelihood: 4.5968s
@tguillemot

This comment has been minimized.

Contributor

tguillemot commented Sep 9, 2016

Thanks @ogrisel.
I've changed the doc and add the missing tests.

:class:`GaussianMixture` and :class:`BayesianGaussianMixture` to fit a
sine wave.
* See :ref:`sphx_glr_auto_example_mixture_plot_concentration_prior.py`

This comment has been minimized.

@ogrisel

ogrisel Sep 9, 2016

Member

This reference is broken.

@tguillemot

This comment has been minimized.

Contributor

tguillemot commented Sep 9, 2016

I have a problem of the a ghost file in travis. @ogrisel can you remove the cache ?

@@ -111,7 +118,14 @@ class BayesianGaussianMixture(BaseMixture):
'kmeans' : responsibilities are initialized using kmeans.
'random' : responsibilities are initialized randomly.
dirichlet_concentration_prior : float | None, optional.
weight_concentration_prior_type : {'dirichlet_process',
'dirichlet_distribution'}, defaults to 'full'.

This comment has been minimized.

@ogrisel

ogrisel Sep 9, 2016

Member

Two problems:

  • This new line breaks the sphinx rendering in classes.rst. I am not sure how this should be fixed.
  • The default value is 'dirichlet_process', not 'full'.

This comment has been minimized.

@tguillemot

tguillemot Sep 9, 2016

Contributor

I don't know how to fix the first problem because if I put it on a unique line pep8 will not be happy.

algorithm is approximated and uses a truncated distribution with a fixed
maximum number of components (called the Stick-breaking representation).
The number of components actually used almost always depends on the data.
Read more in the :ref:`User Guide <bgmm>`.

This comment has been minimized.

@ogrisel

ogrisel Sep 9, 2016

Member

Please insert the following marker:

.. versionadded:: 0.18
   *BayesianGaussianMixture*.

A similar marker should be inserted for the GaussianMixture class.

# mean_precision_prior= 0.8 to minimize the influence of the prior
estimators = [
(r"Bayesian Gaussian Mixture for $\gamma_0=$", BayesianGaussianMixture(

This comment has been minimized.

@ogrisel

ogrisel Sep 9, 2016

Member

Please change the title with "Finite mixture with a Dirichlet distribution prior and $\gamma_0=$"" (maybe with a line break \n after "prior".

n_components=2 * n_components, reg_covar=0, init_params='random',
max_iter=1500, mean_precision_prior=.8,
random_state=random_state), [0.001, 1, 1000]),
(r"Dirichlet Process Mixture for $\gamma_0=$", BayesianGaussianMixture(

This comment has been minimized.

@ogrisel

ogrisel Sep 9, 2016

Member

Please change the title with "Infinite mixture with a Dirichlet process prior and $\gamma_0=$"" (maybe with a line break \n after "prior".

automatically selects the correct number of components. Contrary to the
classical variation Bayesian model using a Dirichlet distribution, it activates
a component only if it is necessary (resulting in a better selection of the
mixtures) and will favor more uniformly weighted components.

This comment has been minimized.

@ogrisel

ogrisel Sep 9, 2016

Member

This is confusing, I would instead write:

The Dirichlet process prior allows to define an infinite number of components and automatically selects the correct number of components: it activates a component only if it is necessary.

On the contrary the classical finite mixture model with a Dirichlet distribution prior will favor more uniformly weighted components and therefore tends to divide natural clusters into unnecessary sub-components.

@ogrisel

This comment has been minimized.

Member

ogrisel commented Sep 9, 2016

I think the red travis builds were caused by old cached versions of deleted python modules and test files. I manually deleted the cache for this PR in travis and relaunched the build to check if that fixes it.

@ogrisel

This comment has been minimized.

Member

ogrisel commented Sep 9, 2016

pyflakes has caught an unused variable:

./sklearn/mixture/tests/test_bayesian_mixture.py:403:19: F841 local variable 'n_features' is assigned to but never used
    n_components, n_features = 2 * rand_data.n_components, 2
process prior, however, show that the model can either learn a global structure
for the data (small ``weight_concentration_prior``) or easily interpolate to
finding relevant local structure (large ``weight_concentration_prior``), never
falling into the problems shown by the ``GaussianMixture`` class.

This comment has been minimized.

@ogrisel

ogrisel Sep 9, 2016

Member

I don't agree with this analysis. Let me suggest the following instead:

This example demonstrates the behavior of Gaussian mixture models on data that was not generated by a mixture of Gaussian random variables. The dataset is formed by 100 points loosely spaced following a noisy sine curve. There is therefore no ground truth value for the number of Gaussian components.

The first model is a classical Gaussian Mixture Model with 10 components fit with the Expectation Maximization algorithm.

The second model is a Bayesian Gaussian Mixture Model with a Dirichlet process prior fit with variational inference. The low value of the concentration prior makes the model favor a lower number of active components. This models "decides" to focus its modeling power on the big picture of the structure of the datasets: groups of points with alternating directions modeled by non-spherical covariance matrices. Those alternating directions roughly capture the alternating nature of the original sine signal.

The third model is also Bayesian Gaussian Mixture Model with a Dirichlet process prior but this time the value of the concentration prior is higher giving the model more liberty to try to model the finer-grained structure of the data. The result is a mixture with a larger number of active components that is similar to the first model where we decided to fix the number of components to 10 arbitrarily.

Which model is the best is a matter of subjective judgement: do we want to favor models that only capture the big picture to summarize and explain most of the structure of the data while ignoring the details or do we prefer models that closely follow the high density regions of the signal?

The last two panels show how we can sample from the last two models. The resulting samples distributions do not look exactly like the original data distribution. The difference primarily stems from the approximation error we made by using a model that assumes that the data was generated by a finite number of Gaussian components instead of a continuous noisy sine curve.

@ogrisel

This comment has been minimized.

Member

ogrisel commented Sep 9, 2016

This time the travis error is for real I think:

https://travis-ci.org/scikit-learn/scikit-learn/jobs/158774349#L2336

@ogrisel

This comment has been minimized.

Member

ogrisel commented Sep 9, 2016

I think I am done with the review. +1 for merge once CI is green and my last comment on plot_gmm_sin.py has been taken into account. Thanks very much for bearing with me @tguillemot :)

@ogrisel

This comment has been minimized.

Member

ogrisel commented Sep 10, 2016

I fixed the CI failure and addressed the doc of the example in #7386. If it's green, I will merge.

@ogrisel

This comment has been minimized.

Member

ogrisel commented Sep 10, 2016

Merged as #7386 🍻

@ogrisel ogrisel closed this Sep 10, 2016

@tguillemot

This comment has been minimized.

Contributor

tguillemot commented Sep 10, 2016

@ogrisel Sorry I had to go yesterday. Thanks for taking care of that.

@tguillemot

This comment has been minimized.

Contributor

tguillemot commented Sep 10, 2016

Thanks everyone for your review and helps !!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment