Should Categorical broadcast? #2082

kyleabeauchamp · 2017-04-26T05:08:16Z

What is the intended behavior when passing a 2D array into Categorical?

import pymc3 as pm

with pm.Model():
    x = pm.Bernoulli("x", eye(4)[0], shape=4)
    tr = pm.sample(10)


tr["x"].mean(0)
Out[30]: array([ 1.,  0.,  0.,  0.])



with pm.Model():
    x = pm.Categorical("x", eye(4), shape=4)
    tr = pm.sample(10)

tr["x"].mean(0)
Out[35]: array([ 0.,  0.,  0.,  0.])


In [37]: tr["x"].shape
    ...: 
Out[37]: (10, 4)

I was somewhat expecting to see [0, 1, 2, 3], assuming some sort of broadcast.

Also: do others find it alarming that the pymc3 Categorical automatically normalizes the input p vector to sum to 1.0? To me, having an exception on un-normalized input was an important sanity check in pymc2. This would be particularly true if 2D inputs are tolerated, in which case row vs. column normalization is always an issue.

The text was updated successfully, but these errors were encountered:

kyleabeauchamp · 2017-04-26T05:20:17Z

Here's another edge case. The output is all zeros with shape (10, 4). To me, this suggests the variable is just taking the first row and discarding the rest silently...


p = eye(4)
p[:, 1:] = 1.
with pm.Model():
    x = pm.Categorical("x", p, shape=4)
    tr = pm.sample(10)

tr["x"].shape
tr["x"].mean(0)

In [3]: print(tr["x"].shape)
   ...: print(tr["x"].mean(0))
   ...: 
(10, 4)
[ 0.  0.  0.  0.]

kyleabeauchamp · 2017-04-26T05:32:12Z

pymc2 definetely did assume normalization along a particular 2D array axis:

https://github.com/pymc-devs/pymc/blob/0c9958838014e2b5693c56ebd4fc32a96632f189/pymc/distributions.py#L987

kyleabeauchamp · 2017-04-26T05:41:39Z

IMHO, the following should also raise an exception but does not:


p = np.array([-1, 0, 0, 0])
with pm.Model():
    x = pm.Categorical("x", p)
    tr = pm.sample(10)

tr["x"].shape
tr["x"].mean(0)

junpenglao · 2017-04-26T07:24:29Z

I agree that the pm.Categorical shape is a bit confusing.
I actually prefer that pm.Categorical only accept 2D output/observed, with the row being always the same size as p. If you want higher-dimension you need to squeeze or reshape.

fonnesbeck · 2017-05-07T18:10:18Z

I think we ought to have a dimension argument that distinguishes the dimension of the distribution from the number of variables. We've had this discussion in the past, but have failed to come to a consuensus.

twiecki · 2017-05-08T06:56:44Z

The most promising effort on this was done by @brandonwillard on #1125.

lucianopaz · 2019-02-25T11:29:15Z

The current status on this issue is that the last axis of p is taken to encode the category probability. The other dimensions are just independent repetitions. At least that is how it is handled at the random method level. I will make some adjustments to the logp to handle the edge cases mentioned above.

lucianopaz mentioned this issue Feb 25, 2019

Changed Categorical to work with multidim p at the logp level #3383

Merged

twiecki closed this as completed in #3383 Feb 26, 2019

ricardoV94 mentioned this issue Dec 2, 2021

Tweak DirichletMultinomial logp and refactor some multivariate logp tests #5234

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Should Categorical broadcast? #2082

Should Categorical broadcast? #2082

kyleabeauchamp commented Apr 26, 2017 •

edited

kyleabeauchamp commented Apr 26, 2017

kyleabeauchamp commented Apr 26, 2017

kyleabeauchamp commented Apr 26, 2017 •

edited

junpenglao commented Apr 26, 2017

fonnesbeck commented May 7, 2017

twiecki commented May 8, 2017

lucianopaz commented Feb 25, 2019

Should Categorical broadcast? #2082

Should Categorical broadcast? #2082

Comments

kyleabeauchamp commented Apr 26, 2017 • edited

kyleabeauchamp commented Apr 26, 2017

kyleabeauchamp commented Apr 26, 2017

kyleabeauchamp commented Apr 26, 2017 • edited

junpenglao commented Apr 26, 2017

fonnesbeck commented May 7, 2017

twiecki commented May 8, 2017

lucianopaz commented Feb 25, 2019

kyleabeauchamp commented Apr 26, 2017 •

edited

kyleabeauchamp commented Apr 26, 2017 •

edited