Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP Mixture Models #1437

Merged
merged 17 commits into from
Oct 18, 2016
Merged

WIP Mixture Models #1437

merged 17 commits into from
Oct 18, 2016

Conversation

AustinRochford
Copy link
Member

@AustinRochford AustinRochford commented Oct 10, 2016

Re: #1401

This is my first pass at a flexible mixture model class, and definitely needs a lot of work. As this notebook shows, this code can support two use cases:

  1. comp_dists is a PyMC3 distribution; that is, each of the mixture components are from the same distributional family, differing only in their parameters (e.g. a mixture of normals)
  2. comp_dists is an iterable of PyMC3 distributions (e.g. a zero inflated Poisson)

There are a few issues to address here, that I'd love to get some feedback on:

  • How to subclass Discrete or Continuous as appropriate, based on the component distributions (it seems like there is some Python metaprogramming magic that should work here, but I am not very well versed in that sort of thing)
  • Intuitive broadcasting
  • NUTS seems slow for these mixture models; this may be an initialization/scaling problem

@twiecki @springcoil @fonnesbeck any feedback/guidance you could give would be much appreciated :)

@twiecki
Copy link
Member

twiecki commented Oct 11, 2016

Nice, this is a really clear implementation.

How to subclass Discrete or Continuous as appropriate, based on the component distributions (it seems like there is some Python metaprogramming magic that should work here, but I am not very well versed in that sort of thing)

I assume you mean for discrete mixtures? If you look at the code, it's mostly about default dtypes:

class Discrete(Distribution):
    """Base class for discrete distributions"""

    def __init__(self, shape=(), dtype='int64', defaults=['mode'], *args, **kwargs):
        super(Discrete, self).__init__(
            shape, dtype, defaults=defaults, *args, **kwargs)


class Continuous(Distribution):
    """Base class for continuous distributions"""

    def __init__(self, shape=(), dtype='float64', defaults=['median', 'mean', 'mode'], *args, **kwargs):
        super(Continuous, self).__init__(
            shape, dtype, defaults=defaults, *args, **kwargs)

As such, maybe we could inherit from Distribution instead and take the dtype from the mixture?

Intuitive broadcasting

Can you specify an example case?

NUTS seems slow for these mixture models; this may be an initialization/scaling problem

Yes, that's the most common reason. If you post an example it would help with testing.

@AustinRochford
Copy link
Member Author

@twiecki thanks for the feedback.

The Continuous vs. Discrete approach makes complete sense.

For broadcasting, the notebook I posted shows two different cases. As the code is now, when the component distributions all have the same type (as in the normal mixture case comp_dists=pm.Normal.dist(mu, sd)), then the observations for the Mixture need to be broadcastable with mu (hence observed=x[:, np.newaxis]), but when the components are specified as a list of distributions (as in the zero-inflated Poisson case comp_dists=[pm.ConstantDist.dist(0), pm.Poissoin.dist(lam)), the observations should be one-dimensional. As I write this out, I think this is easier to handle than I originally thought.

@AustinRochford
Copy link
Member Author

I added a NUTS example to the linked notebook; 6 samples/sec seems quite low from my experience.

@twiecki
Copy link
Member

twiecki commented Oct 11, 2016

I think this looks good. The NUTS issue I bet is initialization, can you try with ADVI init (https://gist.github.com/jonsedar/cd4985bbfafdba61b3c8d077dd91f237)?

In any case, I wouldn't block on the NUTS issue, we can resolve it later. Instead, I would focus on:

  • tests
  • docs (the example notebook you have is close, just needs a bit of text).

@AustinRochford
Copy link
Member Author

@twiecki yup, those are my priorities. I would also add random value generation to support posterior predictive sampling to the list of necessary additions.

One more question: do we want to replace the current ZeroInflated* implementations with mixtures?

@twiecki
Copy link
Member

twiecki commented Oct 12, 2016

One more question: do we want to replace the current ZeroInflated* implementations with mixtures?

Yes, that's probably cleaner.

@AustinRochford
Copy link
Member Author

I also need to better understand/resolve the issues in #1449 before I can make this code work with dependent weights.

comp_dists = self.comp_dists

try:
value_ = value if value.ndim > 1 else value[:, np.newaxis]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should perhaps use tt.shapepadright instead of np.newaxis

@AustinRochford
Copy link
Member Author

AustinRochford commented Oct 14, 2016

I am not going to port the ZeroInflated* models to be subclasses in this pull request. Due to #1452, we no longer broadcast a certain way in ConstantDist's logp that makes that a bit trickier. To keep things simpler, I will separate it into a subsquent PR unless anyone has objections.

@twiecki
Copy link
Member

twiecki commented Oct 14, 2016

That sounds right. Let me know when you want us to take another look.

@AustinRochford
Copy link
Member Author

@twiecki I think I have gotten this into a pretty good state and would love your feedback on it again. Happy to make any changes you think are necessary to merge.

_, sd = get_tau_sd(tau=kwargs.pop('tau', None),
sd=kwargs.pop('sd', None))

super(NormalMixture, self).__init__(w, Normal.dist(mu, sd=sd),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The simplicity to create a Normal Mixture here is really validating, nicely done.

@twiecki
Copy link
Member

twiecki commented Oct 18, 2016

This is a really high-quality PR, thanks!

@twiecki twiecki merged commit 2572852 into pymc-devs:master Oct 18, 2016
@AustinRochford AustinRochford deleted the WIP-mixture-model branch October 18, 2016 11:45
ColCarroll pushed a commit to ColCarroll/pymc3 that referenced this pull request Dec 2, 2016
* First pass at mixture modelling

* No longer necessary to reference self.comp_dists directly in logp

* Add dimension internally (when necessary)

* Import get_tau_sd

* Misc bugfixes

* Add sampling to Mixtures

* Differentiate between Discrete and Continuous mixtures when possible

* Add support for 2D weights

* Gracefully try to calculate mean and mode defaults

* Add docstrings for Mixture classes

* Export mixture models

* Reference self.comp_dists

* Remove unnecessary pm.

* Add Mixture tests

* Add missing imports

* Add marginalized Gaussian mixture model example

* Calculate the mode of the mixture distribution correctly
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants