Create Ordered Multinomial distribution #4773

AlexAndorra · 2021-06-15T17:10:16Z

This PR enables the ordered logistic constraint on multinomial observed data (i.e aggregated by trial).
Currently, OrderedLogistic only accepts data in a disaggregated format (i.e Categorical observed data).

I also added the option of having the inferred probabilities in the trace (not only the cutpoints, which are hard to interpret anyway). As I'm guessing that's mostly what people care about when using this likelihood, the option is True by default (it's only useful to disable it when memory usage is an issue).

This probably needs a few tests (inspired by those of OrderedLogistic), but here is already a proof of concept:

true_cum_p = np.array([0.1, 0.15, 0.25, 0.50, 0.65, 0.90, 1.0])
true_p = np.hstack([true_cum_p[0], true_cum_p[1:] - true_cum_p[:-1]])
fake_elections = np.random.multinomial(n=1_000, pvals=true_p, size=60)

with pm.Model() as m:
    cutpoints = pm.Normal(
        "cutpoints",
        mu=np.arange(6) - 2.5,
        sigma=1.5,
        initval=np.arange(6) - 2.5,
        transform=pm.distributions.transforms.ordered,
    )

    pm.OrderedMultinomial(
        "results",
        eta=0.0,
        cutpoints=cutpoints,
        n=fake_elections.sum(1),
        observed=fake_elections,
    )

    trace = pm.sample()

Which does recover the true probabilities:

No breaking changes
New tests
Run linting / style checks
Mention in RELEASE-NOTES.md

pymc3/distributions/multivariate.py

ricardoV94 · 2021-06-15T19:06:34Z

I also added the option of having the inferred probabilities in the trace (not only the cutpoints, which are hard to interpret anyway).

This would also be nice for the other Ordered distributions no?

AlexAndorra · 2021-06-16T11:08:19Z

This would also be nice for the other Ordered distributions no?

Just added that and addressed your other comments 🥳
Only the name issue and new tests remaining -- will handle this later

ricardoV94 · 2021-06-16T11:24:55Z

Maybe it's easier to keep the two things separate. Have a helper class that creates the raw Ordered distribution and then, optionally, creates the deterministic p. You can retrieve p from the inputs that go into the variable returned from the raw Ordered distribution. Something along the lines of:

class _OrderedMultinomial(Multinomial):
  # standard implementation without the helper Deterministic

class OrderedMultinomial:
  def __new__(cls, name, *args, save_p=True, **kwargs):
    out_rv = _OrderedMultinomial(name, *args, **kwargs)
    if save_p:
      pm.Deterministic(f'{name}_p', out_rv.owner.inputs[4])  # I think this is the p vector in the multinomial
    return out_rv

  @classmethod
  def dist(cls, *args, **kwargs):
    return _OrderedMultinomial.dist(*args, **kwargs)

No need to even check if it's inside the model context, that's done by the default _OrderedMultinomial.

ricardoV94 · 2021-06-16T11:52:55Z

~~Ignore the previous idea. I think you can overwrite __new__ to save name in self.name that's probably simpler.~~ No, doesn't work either. The previous comment goes back to being my best idea :p

AlexAndorra · 2021-06-16T12:51:53Z

Agreed, I was thinking the same -- that's actually how I did it for LKJCholeskyCov back in the days

AlexAndorra · 2021-06-22T14:57:43Z

out_rv.owner.inputs[4]) does seem to work @ricardoV94 ! Small question though: how did you know that the p vector was in the 4th position? 😅

ricardoV94 · 2021-06-22T15:28:47Z

out_rv.owner.inputs[4]) does seem to work @ricardoV94 ! Small question though: how did you know that the p vector was in the 4th position? sweat_smile

The RVs inputs are always of the form (RandomStateSharedVariable, size, dtype, param1, param2 ..., paramN)

import aesara.tensor as at
x = at.random.multinomial(5, np.ones(3)/3, size=2)
print(x.owner.inputs)

[RandomStateSharedVariable(<RandomState(MT19937) at 0x7F125671EC40>),
 TensorConstant{(1,) of 2},
 TensorConstant{4},
 TensorConstant{5},
 TensorConstant{(3,) of 0...3333333333}]

AlexAndorra · 2021-06-22T15:50:21Z

The RVs inputs are always of the form (RandomStateSharedVariable, size, dtype, param1, param2 ..., paramN)

Ooooh, ok, thanks! What does a dtype of TensorConstant{4} mean though? 🤔

ricardoV94 · 2021-06-22T16:41:53Z

The RVs inputs are always of the form (RandomStateSharedVariable, size, dtype, param1, param2 ..., paramN)

Ooooh, ok, thanks! What does a dtype of TensorConstant{4} mean though? thinking

It's a numerical code for different dtypes ("float32", "float64", "int32", and so on). I don't know where they are defined though.

4 is probably for "int64" since that's the default dtype of discrete RVs

AlexAndorra · 2021-06-22T17:21:58Z

Ah ok, I wouldn't have guessed that 😅
So the only thing left for this PR is adding a few tests. I think I need to write the logpdf for that but don't know how it works yet for this distribution

ricardoV94 · 2021-06-22T17:32:44Z

So the only thing left for this PR is adding a few tests. I think I need to write the logpdf for that but don't know how it works yet for this distribution

The logp shouldn't have to be tested as it's just the multinomial. The tests should just make sure the parameters that go into it are the correct ones. Have a look at the check_pymc_params_match_rv_op tests in test_random.py (guide), I think it does just what is needed.

Plus some tests for the helper Deterministic

pymc3/distributions/discrete.py

ricardoV94

LGTM!

Some conflicts in the files it seems. Other than that it looks good.

codecov · 2021-07-04T15:27:36Z

Codecov Report

Merging #4773 (64b1ac8) into main (125256f) will increase coverage by 0.07%.
The diff coverage is 98.11%.

@@            Coverage Diff             @@
##             main    #4773      +/-   ##
==========================================
+ Coverage   71.97%   72.05%   +0.07%     
==========================================
  Files          85       85              
  Lines       13839    13872      +33     
==========================================
+ Hits         9961     9995      +34     
+ Misses       3878     3877       -1

Impacted Files	Coverage Δ
pymc3/distributions/__init__.py	`100.00% <ø> (ø)`
pymc3/distributions/multivariate.py	`58.31% <95.45%> (+1.43%)`	⬆️
pymc3/distributions/discrete.py	`99.00% <100.00%> (+0.05%)`	⬆️
pymc3/step_methods/hmc/nuts.py	`96.87% <0.00%> (-0.63%)`	⬇️

ricardoV94 · 2021-07-04T15:34:02Z

The failing tests in distributions_random.py are due to the test looking for the rv_op which doesn't exist in the wrapper class. Probably just need to overwrite the relevant part in the inherited test subclasses.

AlexAndorra · 2021-07-04T15:35:35Z

I just added back the rv_op in the wrapper class. Tests pass locally. Is that a good solution?

ricardoV94 · 2021-07-04T15:50:09Z

Is that a good solution?

Actually I think testing the wrapped distribution might the best. Just test the _Ordered* classes instead.

This shouldn't require any changes to the wrapper classes

pymc3/tests/test_distributions_random.py

pymc3/distributions/multivariate.py

AlexAndorra · 2021-07-04T20:27:17Z

All tests are green 🥳
Just added a mention in RELEASE-NOTES.md

ricardoV94

Looks good! Left a minor suggestion that can be ignored.

pymc3/distributions/multivariate.py

Co-authored-by: Ricardo Vieira <28983449+ricardoV94@users.noreply.github.com>

ricardoV94 · 2021-07-05T04:36:24Z

The failing test comes from #4771. It can be ignored.

~~Finally, do we need to add an entry in the docs for the new distribution or is it automatic?~~ I added it

AlexAndorra · 2021-07-05T10:49:46Z

Merging 🍾 Thanks for your help and reviews @ricardoV94 !

AlexAndorra added 2 commits June 15, 2021 18:07

Add Ordered Multinomial implementation

99fd87b

isort

fdc09f2

AlexAndorra added enhancements v4 labels Jun 15, 2021

AlexAndorra added this to the vNext (4.0.0) milestone Jun 15, 2021

AlexAndorra self-assigned this Jun 15, 2021

AlexAndorra requested a review from ricardoV94 June 15, 2021 17:10

ricardoV94 reviewed Jun 15, 2021

View reviewed changes

pymc3/distributions/multivariate.py Outdated Show resolved Hide resolved

pymc3/distributions/multivariate.py Outdated Show resolved Hide resolved

pymc3/distributions/multivariate.py Show resolved Hide resolved

AlexAndorra added 4 commits June 16, 2021 12:22

Address Ricardo's comments

bb5c745

Fix typo in quaddist_parse

6c1d51f

isort

04d8c57

Add docstring to new OrderedMultinomial

d1677c4

Black

5ef4f12

AlexAndorra added 3 commits June 24, 2021 16:44

Remove explicit import

6367c3f

Fixed typos in tests

d131214

Black

f444a6a

ricardoV94 reviewed Jul 1, 2021

View reviewed changes

pymc3/distributions/discrete.py Outdated Show resolved Hide resolved

AlexAndorra added 2 commits July 2, 2021 08:56

Move docstrings to the user-facing classes

8d98177

Merge branch 'main' into add-ordered-multinomial

7a52004

fonnesbeck modified the milestones: vNext (4.0.0), v4.0.1 Jul 2, 2021

ricardoV94 previously approved these changes Jul 4, 2021

View reviewed changes

Merge branch 'main' into add-ordered-multinomial

ee15e49

AlexAndorra added 2 commits July 4, 2021 17:38

Black

a62669d

Add back rv_op in wrapper classes

c9c9a97

ricardoV94 reviewed Jul 4, 2021

View reviewed changes

pymc3/tests/test_distributions_random.py Outdated Show resolved Hide resolved

isort

5281c4a

ricardoV94 reviewed Jul 4, 2021

View reviewed changes

pymc3/distributions/multivariate.py Outdated Show resolved Hide resolved

AlexAndorra added 2 commits July 4, 2021 20:53

Black

898c40e

Mention in release notes

4ce1294

AlexAndorra requested a review from ricardoV94 July 4, 2021 20:27

ricardoV94 approved these changes Jul 4, 2021

View reviewed changes

pymc3/distributions/multivariate.py Outdated Show resolved Hide resolved

Apply suggestions from code review

bd4470f

Co-authored-by: Ricardo Vieira <28983449+ricardoV94@users.noreply.github.com>

Add reference to new distribution in docs

64b1ac8

AlexAndorra merged commit e5e83d0 into main Jul 5, 2021

AlexAndorra deleted the add-ordered-multinomial branch July 5, 2021 10:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create Ordered Multinomial distribution #4773

Create Ordered Multinomial distribution #4773

AlexAndorra commented Jun 15, 2021 •

edited

Loading

ricardoV94 commented Jun 15, 2021

AlexAndorra commented Jun 16, 2021 •

edited

Loading

ricardoV94 commented Jun 16, 2021 •

edited

Loading

ricardoV94 commented Jun 16, 2021 •

edited

Loading

AlexAndorra commented Jun 16, 2021

AlexAndorra commented Jun 22, 2021

ricardoV94 commented Jun 22, 2021

AlexAndorra commented Jun 22, 2021

ricardoV94 commented Jun 22, 2021 •

edited

Loading

AlexAndorra commented Jun 22, 2021

ricardoV94 commented Jun 22, 2021

ricardoV94 left a comment

codecov bot commented Jul 4, 2021 •

edited

Loading

ricardoV94 commented Jul 4, 2021 •

edited

Loading

AlexAndorra commented Jul 4, 2021

ricardoV94 commented Jul 4, 2021

AlexAndorra commented Jul 4, 2021

ricardoV94 left a comment •

edited

Loading

ricardoV94 commented Jul 5, 2021 •

edited

Loading

AlexAndorra commented Jul 5, 2021

Create Ordered Multinomial distribution #4773

Create Ordered Multinomial distribution #4773

Conversation

AlexAndorra commented Jun 15, 2021 • edited Loading

ricardoV94 commented Jun 15, 2021

AlexAndorra commented Jun 16, 2021 • edited Loading

ricardoV94 commented Jun 16, 2021 • edited Loading

ricardoV94 commented Jun 16, 2021 • edited Loading

AlexAndorra commented Jun 16, 2021

AlexAndorra commented Jun 22, 2021

ricardoV94 commented Jun 22, 2021

AlexAndorra commented Jun 22, 2021

ricardoV94 commented Jun 22, 2021 • edited Loading

AlexAndorra commented Jun 22, 2021

ricardoV94 commented Jun 22, 2021

ricardoV94 left a comment

Choose a reason for hiding this comment

codecov bot commented Jul 4, 2021 • edited Loading

Codecov Report

ricardoV94 commented Jul 4, 2021 • edited Loading

AlexAndorra commented Jul 4, 2021

ricardoV94 commented Jul 4, 2021

AlexAndorra commented Jul 4, 2021

ricardoV94 left a comment • edited Loading

Choose a reason for hiding this comment

ricardoV94 commented Jul 5, 2021 • edited Loading

AlexAndorra commented Jul 5, 2021

AlexAndorra commented Jun 15, 2021 •

edited

Loading

AlexAndorra commented Jun 16, 2021 •

edited

Loading

ricardoV94 commented Jun 16, 2021 •

edited

Loading

ricardoV94 commented Jun 16, 2021 •

edited

Loading

ricardoV94 commented Jun 22, 2021 •

edited

Loading

codecov bot commented Jul 4, 2021 •

edited

Loading

ricardoV94 commented Jul 4, 2021 •

edited

Loading

ricardoV94 left a comment •

edited

Loading

ricardoV94 commented Jul 5, 2021 •

edited

Loading