Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LKJ priors influence sampling of unconnected RVs #3641

Closed
sammosummo opened this issue Oct 3, 2019 · 9 comments
Closed

LKJ priors influence sampling of unconnected RVs #3641

sammosummo opened this issue Oct 3, 2019 · 9 comments

Comments

@sammosummo
Copy link

sammosummo commented Oct 3, 2019

Both LKJCorr and LKJCholeskyCov influence the sampling of random variables they are not connected to. This is not expected behavior, right?

The following code produces several figures to illustrate this:

import pymc3 as pm
import matplotlib.pyplot as plt


if __name__ == '__main__':

    with pm.Model():

        a = pm.Normal(name="a")
        trace = pm.sample(10000, tune=2000, chains=2)
        pm.traceplot(trace, compact=True)
        plt.savefig("test_00.png")
        pm.autocorrplot(trace, "a")
        plt.savefig("test_01.png")

    with pm.Model():

        a = pm.Normal(name="a")
        M = pm.LKJCorr(name="C", n=4, eta=1)
        trace = pm.sample(10000, tune=2000, chains=2)
        pm.traceplot(trace, compact=True)
        plt.savefig("test_10.png")
        pm.autocorrplot(trace, "a")
        plt.savefig("test_11.png")

    with pm.Model():

        a = pm.Normal(name="a")
        M = pm.LKJCholeskyCov(name="M", n=4, eta=1, sd_dist=pm.HalfCauchy.dist(1))
        trace = pm.sample(10000, tune=2000, chains=2)
        pm.traceplot(trace, compact=True)
        plt.savefig("test_20.png")
        pm.autocorrplot(trace, "a")
        plt.savefig("test_21.png")

Here they are, in order of creation:

test_00
test_01
test_10
test_11
test_20
test_21

What is causing the obvious pathologies when sampling a in the latter two models if a and M are not connected in any way?

Also, why is the LKJ prior on the correlation matrix in test_1.png not flat? Is it related to https://github.com/pymc-devs/pymc3/issues/3473#issue-442041528?

@junpenglao
Copy link
Member

That's because our transformation for LKJ is incorrect (#3473 exactly). In pure forward sampling, this will gives invalid correlation matrix, that interact with the other RVs in unexpected way (recall that all RVs are sampled in the same augmented auxiliary space, and if some RVs are in a geometric space that is difficult/incorrect, it will make NUTS terminated too early and affect the correctness of other RVs)

@junpenglao
Copy link
Member

LKJCholeskyCov should give you correct result, if you are not using half cauchy in this case.

@sammosummo
Copy link
Author

LKJCholeskyCov should give you correct result, if you are not using half cauchy in this case.

What is the issue with the half Cauchy? The same thing happens with other choices of sd_dist.

@sammosummo
Copy link
Author

Confirmed same thing with sd_dist=HalfNormal.dist().

test_20
test_21

@junpenglao
Copy link
Member

How can you tell there is problem? There is not divergent sample.

@junpenglao
Copy link
Member

Note that eyeballing the trace is usually not a good way to diagnose problem

@sammosummo
Copy link
Author

sammosummo commented Oct 3, 2019

Note that eyeballing the trace is usually not a good way to diagnose problem

Perhaps not in general, but there is clearly a pathology at ~1500 samples and ~2 in the posterior density of M, and another smaller one at ~7000 samples. There is also more autocorrelation in a, especially in chain 0.

How can you tell there is problem? There is not divergent sample.

Just ran it for a second time:

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (2 chains in 2 jobs)
NUTS: [M, a]
Sampling 2 chains: 100%|██████████| 24000/24000 [01:01<00:00, 388.15draws/s]
There was 1 divergence after tuning. Increase `target_accept` or reparameterize.
The estimated number of effective samples is smaller than 200 for some parameters.

And a third:

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (2 chains in 2 jobs)
NUTS: [M, a]
Sampling 2 chains: 100%|██████████| 24000/24000 [00:44<00:00, 534.12draws/s]
The acceptance probability does not match the target. It is 0.6687596986054827, but should be close to 0.8. Try to increase the number of tuning steps.
The number of effective samples is smaller than 10% for some parameters.

There is not always a divergence but there is always a problem. There are usually more of them with smaller chains and less tuning.

Here is the figure from the third run. Again, clearly, a massive pathology.

test_20

@junpenglao
Copy link
Member

Usually the prior distribution is in a much larger range, which makes the tail area much more pathological. For this reason we usually recommend using prior_sample for sampling from model with no observed.

@junpenglao
Copy link
Member

Closing this for now but feel free to keep discussing it here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants