Memory Error: OrderedLogistic #3535

aloctavodia · 2019-06-29T17:28:50Z

I am updating and re-running the statistical rethinking notebooks. I get a memory allocation error with model m_11 (code block 11.5). The problem seems to be related with #3383, reverting changes in categorical distribution previous to that PR, fix the issue. @lucianopaz probably you have a better idea of what is going on here.

tohtsky · 2019-07-03T21:12:07Z

I think the basic example provided in OrderedLogistic (slightly modified to use ADVI) behaves differently in the recent commits.
I've run the following code in the current head and old commits like 9ef2947.

n1_c = 300; n2_c = 300; n3_c = 300
cluster1 = np.random.randn(n1_c) + -1
cluster2 = np.random.randn(n2_c) + 0
cluster3 = np.random.randn(n3_c) + 2

x = np.concatenate((cluster1, cluster2, cluster3))
y = np.concatenate((1*np.ones(n1_c),
                    2*np.ones(n2_c),
                    3*np.ones(n3_c))) - 1

# Ordered logistic regression
with pm.Model() as model:
    cutpoints = pm.Normal("cutpoints", mu=[-1,1], sigma=10, shape=2,
                          transform=pm.distributions.transforms.ordered)
    y_ = pm.OrderedLogistic("y", cutpoints=cutpoints, eta=x, observed=y)
    inference = pm.fit(30000, method='advi')
    tr = inference.sample(1000)

In the older versions, the value of ELBO seems to converge to a reasonable value ~ 10^2, but in the recent commits, the progress bar shows large value ~ 10 ^5.

I suspect this is caused by logp method in Categorical.
In the older versions, when p.ndim == 2 (which is the case in the above example), the log likelihood

a = tt.log(p[tt.arange(p.shape[0]), value_clip])

will be a length p.shape[0] vector.
In recent versions, however, a defined as

pattern = (p.ndim - 1,) + tuple(range(p.ndim - 1))
a = tt.log(p.dimshuffle(pattern)[value_clip])

will be a p.shape[0] x p.shape[0] matrix (which has 900 **2 entries in the above example, roughly matching with the scale of ELBO).

junpenglao · 2019-07-04T05:18:05Z

Thanks @tohtsky, I think your diagnosis is right.

lucianopaz · 2019-07-04T14:41:30Z

@aloctavodia, sorry for the delay in taking a look at this. I'm still mostly offline until next week.

I looked a bit more into the commit you referenced and I think that there may be a bug in Categorical.logp, specifically here. If p is multidimensional, the dimshuffle moves the last axis around to simplify advanced indexing into it with the supplied values. The resulting a will have a weird shape like (value.shape, p.shape[:-1]), instead of having value.shape at the end. This could lead to strange broadcasting with observeds and produce an intermediate array of shape (value.shape, value.shape). So this could cause the memory error you encountered. Maybe that line could be replaced with this:

a = tt.log(p[..., value_clip])

I'll be able to test this in a bit less than two weeks, but if you want to try it out first, you're welcome to do so.

lucianopaz · 2019-07-04T14:45:48Z

Oops, I skipped over @tohtsky's answer that points to exactly the same line of code that I thought was causing the bug in my previous answer. Maybe we could try with the ellipses in the indexing instead of doing a dimshuffle.

aloctavodia · 2019-07-09T15:39:10Z

I tried replacing the line you suggested and I still see the same problem :-(

tohtsky · 2019-07-16T02:12:02Z

Simply flattening multi-dimensional tensor into a 2d array and then reshaping the resulting logp vector back into the original tensor shape should work?

        if p.ndim > 1:
            original_shape = p.shape[:-1]
            p_flatten = p.reshape((-1, p.shape[-1]))
            a = tt.log(
                p_flatten[
                    (tt.arange(p_flatten.shape[0]), value_clip.ravel())
                ]).reshape(
                    original_shape
            )
        else:
            a = tt.log(p[value_clip])

lucianopaz · 2019-07-17T19:38:21Z

I'll have time to look into this issue this Friday.

bdyetton · 2019-07-22T15:39:22Z

Hi,
Any progress on this issue?

lucianopaz · 2019-07-22T16:25:31Z

I managed to write a small test involving logp that highlights the error:

import numpy as np
from scipy.special import logit
import pymc3 as pm


loge = np.log10(np.exp(1))
size = 100
p = np.ones(10) / 10
cutpoints = logit(np.linspace(0, 1, 11)[1:-1])
obs = np.random.randint(0, 1, size=size)
with pm.Model():
    ol = pm.OrderedLogistic("ol", eta=0, cutpoints=cutpoints, observed=obs)
    c = pm.Categorical("c", p=p, observed=obs)

print(ol.logp({"ol": 1}) * loge)
print(c.logp({"c": 1}) * loge)

The OrderedLogistic variable ol, given the provided cutpoints, should be equivalent to the categorical RV c. When we look at the returned logp for each RV on a given class, as all are equally likely, the log10 should be approximately equal to -size == -100. This is the case for the Categorical but the OrderedLogistic returns -size**2 == -10000. The problem seems to be in an unnecessary shape padding done in the OrderedLogistic.__init__.

Furthermore,

>>> ol.distribution.p.ndim
2
>>> c.distribution.p.ndim
1

However, there is also an additional shape issue when we give an array of cutpoints, and how this broadcasts with the passed observed. I'll try to finish a fix tomorrow.

lucianopaz · 2019-07-23T16:45:18Z

In the end, it wasn't a broadcasting problem. It was an indexing problem when p was multidimensional.

* Added tests for issue * Fix for #3535 * Added release notes

lucianopaz · 2019-07-25T13:50:36Z

tt.choose ended up making more of a mess so we'll have to come up with a different fix for this problem.

lucianopaz · 2019-07-26T10:37:32Z

@seberg was kind enough to point me to take_along_axis which is exactly what we need to replace choose with to fix this issue and not fall into the problems raised in #3564 and #3567. Luckily, take_along_axis is implemented using advanced indexing and we should be able to copy the relevant part to get a theano based implementation to use in Categorical.logp.

bdyetton · 2019-07-30T22:15:32Z

Hi @lucianopaz,
Unfortunately, that PR is still very slow for a multi-dim p

       data = np.random.randint(0, 3, size=(1000, 1))

    with pm.Model() as model:
        tp1 = pm.Dirichlet('tp1', a=np.array([0.25]*4), shape=(4,)) #4 Free RV
        obs = pm.Categorical('obs', p=tp1, observed=data)
        trace = pm.sample() #super fast!

    data_indexer = np.random.randint(0,2,size=(1000,))

    with pm.Model() as model:
        tp1 = pm.Dirichlet('tp1', a=np.array([0.25]*4), shape=(2,4)) #8 Free RV
        obs = pm.Categorical('obs', p=tp1[data_indexer, :], observed=data)
        trace = pm.sample() #takes ages!

Does the second model sample ok for you (just incase i've done something silly with my install)?

lucianopaz · 2019-07-31T09:53:22Z

@bdyetton, it samples super slow for me too. I'll try to find out why this is happening.

bdyetton · 2019-07-31T09:54:55Z

Great thanks !!!

…

On Wed, Jul 31, 2019 at 11:53 AM Luciano Paz ***@***.***> wrote: @bdyetton <https://github.com/bdyetton>, it samples super slow for me too. I'll try to find out why this is happening. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#3535>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABHSEKDXMKRAWOPZIQBN7ALQCFOJ5ANCNFSM4H4K2ZNQ> .

-- Regards, *Ben Yetton* MA Cog Psych, BSc (Robotics) w. first class honors, Graduate Student, Mednick Sleep and Cognition Lab, University Of California, Irvine byetton@uci.edu <benjamin.yetton@email.ucr.edu>

bdyetton · 2019-07-31T11:14:33Z

@lucianopaz I'm not sure if this is helpful at all, but the second model above will not begin sampling with the slice step method, so this is not an issue affecting NUTS only.

bdyetton · 2019-08-05T10:26:55Z

Hi @lucianopaz, any progress? Anything I can do to help?

lucianopaz · 2019-08-06T09:39:30Z

@bdyetton, sorry, I have deal with other stuff from work first. Once I finish, I'll be able to look into this more deeply.

bdyetton · 2019-08-06T13:44:44Z

@lucianopaz, Thanks!!!

lucianopaz added a commit to lucianopaz/pymc that referenced this issue Jul 23, 2019

Fix for pymc-devs#3535

8fda70c

lucianopaz mentioned this issue Jul 23, 2019

Fixed categorical logp with tt.choose #3563

Merged

twiecki closed this as completed in #3563 Jul 23, 2019

twiecki pushed a commit that referenced this issue Jul 23, 2019

Fixed categorical logp with tt.choose (#3563)

8191b49

* Added tests for issue * Fix for #3535 * Added release notes

bdyetton mentioned this issue Jul 24, 2019

NUTS fails to initalize with multi-dim catergorical model #3564

Closed

lucianopaz reopened this Jul 25, 2019

lucianopaz mentioned this issue Jul 25, 2019

Posterior sampling from pm.Categorical results in strange "ValueError: Need at least 1 and at most 32 array objects" #3567

Closed

lucianopaz mentioned this issue Jul 30, 2019

Fix Categorical.logp with take_along_axis #3572

Merged

aloctavodia closed this as completed in #3572 Aug 17, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory Error: OrderedLogistic #3535

Memory Error: OrderedLogistic #3535

aloctavodia commented Jun 29, 2019

tohtsky commented Jul 3, 2019 •

edited

Loading

junpenglao commented Jul 4, 2019

lucianopaz commented Jul 4, 2019

lucianopaz commented Jul 4, 2019

aloctavodia commented Jul 9, 2019

tohtsky commented Jul 16, 2019

lucianopaz commented Jul 17, 2019

bdyetton commented Jul 22, 2019

lucianopaz commented Jul 22, 2019 •

edited

Loading

lucianopaz commented Jul 23, 2019

lucianopaz commented Jul 25, 2019

lucianopaz commented Jul 26, 2019

bdyetton commented Jul 30, 2019 •

edited

Loading

lucianopaz commented Jul 31, 2019

bdyetton commented Jul 31, 2019 via email

bdyetton commented Jul 31, 2019 •

edited

Loading

bdyetton commented Aug 5, 2019

lucianopaz commented Aug 6, 2019

bdyetton commented Aug 6, 2019

Memory Error: OrderedLogistic #3535

Memory Error: OrderedLogistic #3535

Comments

aloctavodia commented Jun 29, 2019

tohtsky commented Jul 3, 2019 • edited Loading

junpenglao commented Jul 4, 2019

lucianopaz commented Jul 4, 2019

lucianopaz commented Jul 4, 2019

aloctavodia commented Jul 9, 2019

tohtsky commented Jul 16, 2019

lucianopaz commented Jul 17, 2019

bdyetton commented Jul 22, 2019

lucianopaz commented Jul 22, 2019 • edited Loading

lucianopaz commented Jul 23, 2019

lucianopaz commented Jul 25, 2019

lucianopaz commented Jul 26, 2019

bdyetton commented Jul 30, 2019 • edited Loading

lucianopaz commented Jul 31, 2019

bdyetton commented Jul 31, 2019 via email

bdyetton commented Jul 31, 2019 • edited Loading

bdyetton commented Aug 5, 2019

lucianopaz commented Aug 6, 2019

bdyetton commented Aug 6, 2019

tohtsky commented Jul 3, 2019 •

edited

Loading

lucianopaz commented Jul 22, 2019 •

edited

Loading

bdyetton commented Jul 30, 2019 •

edited

Loading

bdyetton commented Jul 31, 2019 •

edited

Loading