dpgmm sample not working #1637

Closed
esheldon opened this Issue Jan 29, 2013 · 15 comments

Comments

Projects
None yet
6 participants
@esheldon

Hi Guys -

Trying out DPGMM to see if I can get a more stable solution. Things look good so far except for one problem.

The sample() method does not work for dpgmm. The primary reason is that sample() assumes there is a self.covars_, but this does not exist.

There is a secondary problem however. I tried using get_covars to set the covars but these covariances are not correct, perhaps because of a different convention somehow in the definitions of these covariances.

best, and thanks for the good work on sklearn,
-e

@amueller

This comment has been minimized.

Show comment Hide comment
@amueller

amueller Jan 29, 2013

Owner

Thanks for reporting. I'll try to have a look soon (if no one else does before me).

Owner

amueller commented Jan 29, 2013

Thanks for reporting. I'll try to have a look soon (if no one else does before me).

@amueller

This comment has been minimized.

Show comment Hide comment
@amueller

amueller Jan 29, 2013

Owner

By the way, thanks for trying to help us with DPGMM. Maybe one good step forward would be to improve test coverage to prevent such mishaps :-/

Owner

amueller commented Jan 29, 2013

By the way, thanks for trying to help us with DPGMM. Maybe one good step forward would be to improve test coverage to prevent such mishaps :-/

@amueller

This comment has been minimized.

Show comment Hide comment
@amueller

amueller Jan 29, 2013

Owner

Ok, so it seems that the sample method is inherited from GMM. I'm not sure that should be the case. the parameterization in which VBGMM and DPGMM are stored are somewhat different from GMM iirc.
It would be great if GMM and VBGMM/DPGMM had a common interface. That is not really the case now :-/

Owner

amueller commented Jan 29, 2013

Ok, so it seems that the sample method is inherited from GMM. I'm not sure that should be the case. the parameterization in which VBGMM and DPGMM are stored are somewhat different from GMM iirc.
It would be great if GMM and VBGMM/DPGMM had a common interface. That is not really the case now :-/

@GaelVaroquaux

This comment has been minimized.

Show comment Hide comment
@GaelVaroquaux

GaelVaroquaux Jan 30, 2013

Owner

Ok, so it seems that the sample method is inherited from GMM. I'm not
sure that should be the case. the parameterization in which VBGMM and
DPGMM are stored are somewhat different from GMM iirc.

So the sample method should probably be rewritten in the subclass.
@esheldon do you want to have a go?

Owner

GaelVaroquaux commented Jan 30, 2013

Ok, so it seems that the sample method is inherited from GMM. I'm not
sure that should be the case. the parameterization in which VBGMM and
DPGMM are stored are somewhat different from GMM iirc.

So the sample method should probably be rewritten in the subclass.
@esheldon do you want to have a go?

@esheldon

This comment has been minimized.

Show comment Hide comment
@esheldon

esheldon Jan 30, 2013

I am interested but I don't have much time over the next couple of weeks.

Looking at this, I think the sampling code is the easy part. We just need
to have the variances present in the object, correct?

More of an issue is that the variances produced by get_covar are not
correct. I played with it some more and I'm pretty sure that they are
actually wrong and it isn't just a misinterpretation. I don't understand
the algorithm so I'm not sure I could help.

-e

On Wed, Jan 30, 2013 at 3:24 AM, Gael Varoquaux notifications@github.comwrote:

Ok, so it seems that the sample method is inherited from GMM. I'm not
sure that should be the case. the parameterization in which VBGMM and
DPGMM are stored are somewhat different from GMM iirc.

So the sample method should probably be rewritten in the subclass.
@esheldon do you want to have a go?


Reply to this email directly or view it on GitHubhttps://github.com/scikit-learn/scikit-learn/issues/1637#issuecomment-12878997.

Erin Scott Sheldon
Brookhaven National Laboratory erin dot sheldon at gmail dot com

I am interested but I don't have much time over the next couple of weeks.

Looking at this, I think the sampling code is the easy part. We just need
to have the variances present in the object, correct?

More of an issue is that the variances produced by get_covar are not
correct. I played with it some more and I'm pretty sure that they are
actually wrong and it isn't just a misinterpretation. I don't understand
the algorithm so I'm not sure I could help.

-e

On Wed, Jan 30, 2013 at 3:24 AM, Gael Varoquaux notifications@github.comwrote:

Ok, so it seems that the sample method is inherited from GMM. I'm not
sure that should be the case. the parameterization in which VBGMM and
DPGMM are stored are somewhat different from GMM iirc.

So the sample method should probably be rewritten in the subclass.
@esheldon do you want to have a go?


Reply to this email directly or view it on GitHubhttps://github.com/scikit-learn/scikit-learn/issues/1637#issuecomment-12878997.

Erin Scott Sheldon
Brookhaven National Laboratory erin dot sheldon at gmail dot com

@amueller

This comment has been minimized.

Show comment Hide comment
@amueller

amueller Jan 30, 2013

Owner

This is probably the wrong function to get the covariances. The VBGMM stores the precision values, not covariances iirc.

Owner

amueller commented Jan 30, 2013

This is probably the wrong function to get the covariances. The VBGMM stores the precision values, not covariances iirc.

@esheldon

This comment has been minimized.

Show comment Hide comment
@esheldon

esheldon Jan 30, 2013

It just inverts the precisions

def _get_covars(self):
    return [pinvh(c) for c in self._get_precisions()]

-e

On Wed, Jan 30, 2013 at 9:44 AM, Andreas Mueller
notifications@github.com wrote:

This is probably the wrong function to get the covariances. The VBGMM stores the precision values, not covariances iirc.


Reply to this email directly or view it on GitHub.

Erin Scott Sheldon
Brookhaven National Laboratory erin dot sheldon at gmail dot com

It just inverts the precisions

def _get_covars(self):
    return [pinvh(c) for c in self._get_precisions()]

-e

On Wed, Jan 30, 2013 at 9:44 AM, Andreas Mueller
notifications@github.com wrote:

This is probably the wrong function to get the covariances. The VBGMM stores the precision values, not covariances iirc.


Reply to this email directly or view it on GitHub.

Erin Scott Sheldon
Brookhaven National Laboratory erin dot sheldon at gmail dot com

@amueller

This comment has been minimized.

Show comment Hide comment
@amueller

amueller Jan 30, 2013

Owner

Ok I'll have to have a closer look. What gave you the impression they are wrong?

Owner

amueller commented Jan 30, 2013

Ok I'll have to have a closer look. What gave you the impression they are wrong?

@esheldon

This comment has been minimized.

Show comment Hide comment
@esheldon

esheldon Jan 30, 2013

Looks like they are off by a factor of two, so maybe it is just definitional.

x=numpy.random.randn(1000)
gmm=mixture.DPGMM(n_components=1)
gmm.fit(x.reshape(1000,1))
gmm.precs_
[array([[ 0.49380025]])]
gmm._get_covars()
[array([[ 2.02511035]])]

Looks like they are off by a factor of two, so maybe it is just definitional.

x=numpy.random.randn(1000)
gmm=mixture.DPGMM(n_components=1)
gmm.fit(x.reshape(1000,1))
gmm.precs_
[array([[ 0.49380025]])]
gmm._get_covars()
[array([[ 2.02511035]])]

@amueller

This comment has been minimized.

Show comment Hide comment
@amueller

amueller Feb 20, 2013

Owner

hm that is a bit irritating behavior, though....

Owner

amueller commented Feb 20, 2013

hm that is a bit irritating behavior, though....

@amueller

This comment has been minimized.

Show comment Hide comment
@amueller

amueller Feb 20, 2013

Owner

The get_covars just uses precs_. I think this is an "off by two" error in VBGMM/ DPGMM. It is the same for all covariance types apparently. cc @alextp

Owner

amueller commented Feb 20, 2013

The get_covars just uses precs_. I think this is an "off by two" error in VBGMM/ DPGMM. It is the same for all covariance types apparently. cc @alextp

@joelkuiper

This comment has been minimized.

Show comment Hide comment
@joelkuiper

joelkuiper Jan 13, 2014

Is there any word on this? Or a hint on how to fix it 😄 I'd love to get a sample from the DPGMM
@amueller @alextp

Is there any word on this? Or a hint on how to fix it 😄 I'd love to get a sample from the DPGMM
@amueller @alextp

@HapeMask

This comment has been minimized.

Show comment Hide comment
@HapeMask

HapeMask Jan 23, 2014

Looks like they are off by a factor of two, so maybe it is just definitional.

x=numpy.random.randn(1000)
gmm=mixture.DPGMM(n_components=1)
gmm.fit(x.reshape(1000,1))
gmm.precs_
[array([[ 0.49380025]])]
gmm._get_covars()
[array([[ 2.02511035]])]

Maybe I'm sorely mistaken / missing something obvious, but this appears to be correct behavior to me. Precision is inverse covariance, and the covariance returned by _get_covars() is equal to the inverse precision in your example:

>>> 1 / 2.02511035
0.49380025142827405

I don't see any off-by-two error here.

Looks like they are off by a factor of two, so maybe it is just definitional.

x=numpy.random.randn(1000)
gmm=mixture.DPGMM(n_components=1)
gmm.fit(x.reshape(1000,1))
gmm.precs_
[array([[ 0.49380025]])]
gmm._get_covars()
[array([[ 2.02511035]])]

Maybe I'm sorely mistaken / missing something obvious, but this appears to be correct behavior to me. Precision is inverse covariance, and the covariance returned by _get_covars() is equal to the inverse precision in your example:

>>> 1 / 2.02511035
0.49380025142827405

I don't see any off-by-two error here.

@amueller

This comment has been minimized.

Show comment Hide comment
@amueller

amueller Jan 29, 2015

Owner

I agree with @HapeMask, I think this looks ok. I fixed it in #4182.

Owner

amueller commented Jan 29, 2015

I agree with @HapeMask, I think this looks ok. I fixed it in #4182.

@ogrisel

This comment has been minimized.

Show comment Hide comment
@ogrisel

ogrisel Sep 10, 2016

Owner

Closing: the new Dirichlet process GMM re-write has been merged in master. Its sample method is properly tested.

Owner

ogrisel commented Sep 10, 2016

Closing: the new Dirichlet process GMM re-write has been merged in master. Its sample method is properly tested.

@ogrisel ogrisel closed this Sep 10, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment