Shape of sample_ppc #1529

shkr · 2016-11-17T15:39:27Z

"""
Student T distribution

This one with uniform priors

Parameters
    ----------
    nu : int
        Degrees of freedom (nu > 0).
    mu : float
        Location parameter.
    lam : float
        Scale parameter (lam > 0).
"""
with pm.Model() as vvf_model:
    
    # Define priors
    b0 = pm.Normal("b0", mu=0, sd=20)
    b1 = pm.Normal("b1", mu=0, sd=20)
    lam = pm.Uniform("lam", lower=0.0, upper=20.0)
    nu = pm.Uniform("nu", lower=0.0, upper=20.0)
    
    # Identity Link Function 
    mu = b0 + b1*x
    
    y_obs = pm.StudentT("y_obs", mu=mu, lam=lam, nu=nu, observed=y)
    
    
with vvf_model:
    start = pm.find_MAP(model=vvf_model, fmin=scipy.optimize.fmin_powell)
    logger.info("Starting values = {}".format(start))

    # draw posterior samples
    trace = pm.sampling.sample(10000, start=start)

ppc = pm.sampling.sample_ppc(trace, samples=500, model=vvf_model, size=100)

As per the documentation https://pymc-devs.github.io/pymc3/api.html#pymc3.sampling.sample_ppc
found here. I am expecting a keyed dictionary, in this case a dict with key y_obs since I have only one observed variable.
The ppc['y_obs'] I expect to be a matrix of 500 datasets of predictive posterior samples of each size 100 samples drawn from different values of the posterior distribution.

However, the shape of ppc is not (500, 100) as expected but (500, 100, 100).

Can someone explain what is this matrix and why its shape is not (500, 100) ?

The text was updated successfully, but these errors were encountered:

fonnesbeck · 2016-11-17T15:52:46Z

If you use samples=100 rather than size=100, you will get what you expect. I can see this is confusing, however.

@AustinRochford can you lend some insight as to the need for a size parameter here?

shkr · 2016-11-17T15:58:54Z

Thanks. I omitted the size parameter and set samples = 100. Now I believe I have 100 datasets each of 100 samples from posterior predictive distribution.

I think this issue is related to the shape and size discussion on PR #862. The things I was able to infer from the discussion for my example

y_obs is sampled from posterior predictive distribution because it is an observed and dependent variable
For each sample in the training data y, which has shape (100, 1), one sample from the posterior predictive distribution is fetched which explains the sample_ppc(..).shape = (samples, 100). Can someone shed light on why is this rule being followed ? Okay, my guess is the input to the deterministic portion of y_obs i.e. x from b0 + b1*(x) is maintained between a pair y_obs and y from training data.

AustinRochford · 2016-11-18T01:11:27Z

@fonnesbeck I am not familiar at all with the innards of the PPC sampling code, so I don't have much to add offhand, unfortunately.

fonnesbeck · 2016-11-18T01:12:45Z

Sorry, thought you had authored this.

AustinRochford · 2016-11-18T01:13:30Z

No problem.

twiecki · 2016-11-18T04:54:40Z

@taku-y did

kyleabeauchamp · 2016-11-26T17:54:36Z

Here's another question: why not have the PPC sampler return an object like the Trace object? Right now, the trace is an object that supports slicing, but the output of sample_ppc is a dictionary that requires more gymnastics.

twiecki · 2016-11-26T18:39:49Z

Yeah you're totally right.

…

On Nov 26, 2016 6:54 PM, "Kyle Beauchamp" ***@***.***> wrote: Here's another question: why not have the PPC sampler return an object like the Trace object? Right now, the trace is an object that supports slicing, but the output of sample_ppc is a dictionary that requires more gymnastics. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#1529 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AApJmPDeLEBREU2na5qjX0MMbApZyAo9ks5rCHJdgaJpZM4K1dCZ> .

twiecki · 2016-11-26T19:24:22Z

@kyleabeauchamp Want to do a PR?

kyleabeauchamp · 2016-11-27T01:58:56Z

Is there an obvious constructor for the MultiTrace object from a dictionary of traces? I'm not seeing a clear API on what it takes to construct the trace object outside of the usual chain.

kyleabeauchamp · 2016-11-30T03:07:22Z

Should we also document the fact that the PPC trace is actually reshuffled with respect to the original trace? E.g. does everyone find it obvious that this thing is first drawing shuffled points from the original trace (https://github.com/pymc-devs/pymc3/blob/481a231dd2ef31d5f1581e26320cf387edeed343/pymc3/sampling.py#L385) as opposed to iterating over the trace?

kyleabeauchamp · 2016-11-30T03:10:42Z

E.g. I want to prevent someone from assuming order and doing the following:

ppc = sample_ppc(trace)
for (var, obsvar) in zip(trace[key1], ppc[key2]):
    pass

twiecki · 2016-11-30T08:21:22Z

That's indeed a bit counter intuitive. Could we not just change the order of that loop?

fonnesbeck · 2016-11-30T17:01:48Z

Its not clear why you would use the trace and the posterior predictive check draws together anywhere. Its doing what I would expect:

draw a random point from the trace
use that point to generate a random draw from the posterior predictive distribution

Let me know if I am reading that wrong.

taku-y · 2016-11-30T17:46:24Z

I did not write the code of sample_ppc, but sample_vp. Sorry for late reply.

kyleabeauchamp · 2016-12-01T20:13:56Z

OK, here's my simple model for why one might want to zip() the MCMC trace and the PPC trace.

Suppose we have a coin that is lands on heads either 75% of the time or 25% of the time. Suppose we have observed 4 coin flips. The key parameter in the model is then p, which is a discrete uniform.

Now we go and sample the model, and also the PP. The PP are essentially the resampled coin flips. I imagine I would want to plot a 2D histogram of p versus n_heads.

If we use the current PPC without zipping the MCMC trace for p, then we have smudged together the coin tosses where p = 75% and p = 25%, wherease we might want the ability to visualize those cases separately.

fonnesbeck · 2016-12-01T20:18:31Z

If you are doing posterior predictive checks (which is implied in the name of the function), all we want is a set of random draws from the conditional posterior for each data point we have observed. The only thing we are conditioning on are the predictors associated with the observed outcomes, and not on any particular draw from intermediate variables).

kyleabeauchamp · 2016-12-01T20:25:24Z

I agree, I guess I'm saying there may sometimes be value in having things conditioned on the intermediates. AFAIK, in pymc2 my use case could be achieved using either hand-introduced auxiliary nodes or by using the masked array / missing values formalism.

fonnesbeck · 2016-12-01T21:02:14Z

If I'm doing posterior predictive checks, not only do I want the original MCMC sample shuffled, I want it to be sampling with replacement. Because, in principle, you could ask for more PPC samples than original samples, so there should be a means for providing that.

I'm happy to have a function that is more flexible, but am suggesting it probably shouldnt be sample_ppc. Ideally, we would have a super-flexible sampling function and then sample_ppc would be a convenience function that restricted it to the sort of sampling that posterior predictive checks required.

twiecki · 2016-12-01T21:04:40Z

That's a good proposal.

kyleabeauchamp · 2016-12-01T21:05:30Z

agreed

shkr · 2016-12-04T00:32:56Z

My question about how the predictor values are used to calculate the posterior predictive samples, is still not answered. I want to calculate the variance of the posterior predictive samples, y_new after subtracting (aX + b) [here x is the predictor variable] i.e Var(y_new - (aX + b)) where y_new is the list of posterior predictive samples returned. Can someone answer how to do this ?

fonnesbeck · 2016-12-05T18:59:41Z

@shkr you can just manipulate the posterior samples from sample_ppc directly, implementing exactly what you calculate in Python, since y_new will just be a Numpy array.

fonnesbeck mentioned this issue Nov 30, 2016

Removed redundant size argument from sample_ppc #1566

Closed

fonnesbeck mentioned this issue Dec 1, 2016

Bump version to 3.0.rc3 #1458

Closed

jonsedar mentioned this issue Mar 14, 2017

Regression test fail: sample_ppc 'size' param seems to no longer work correctly #1897

Closed

twiecki closed this as completed Dec 22, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Shape of sample_ppc #1529

Shape of sample_ppc #1529

shkr commented Nov 17, 2016 •

edited

Loading

fonnesbeck commented Nov 17, 2016

shkr commented Nov 17, 2016 •

edited

Loading

AustinRochford commented Nov 18, 2016 •

edited

Loading

fonnesbeck commented Nov 18, 2016

AustinRochford commented Nov 18, 2016

twiecki commented Nov 18, 2016

kyleabeauchamp commented Nov 26, 2016

twiecki commented Nov 26, 2016 via email

twiecki commented Nov 26, 2016

kyleabeauchamp commented Nov 27, 2016

kyleabeauchamp commented Nov 30, 2016

kyleabeauchamp commented Nov 30, 2016

twiecki commented Nov 30, 2016

fonnesbeck commented Nov 30, 2016

taku-y commented Nov 30, 2016

kyleabeauchamp commented Dec 1, 2016

fonnesbeck commented Dec 1, 2016 •

edited

Loading

kyleabeauchamp commented Dec 1, 2016

fonnesbeck commented Dec 1, 2016 •

edited

Loading

twiecki commented Dec 1, 2016

kyleabeauchamp commented Dec 1, 2016

shkr commented Dec 4, 2016

fonnesbeck commented Dec 5, 2016

Shape of sample_ppc #1529

Shape of sample_ppc #1529

Comments

shkr commented Nov 17, 2016 • edited Loading

fonnesbeck commented Nov 17, 2016

shkr commented Nov 17, 2016 • edited Loading

AustinRochford commented Nov 18, 2016 • edited Loading

fonnesbeck commented Nov 18, 2016

AustinRochford commented Nov 18, 2016

twiecki commented Nov 18, 2016

kyleabeauchamp commented Nov 26, 2016

twiecki commented Nov 26, 2016 via email

twiecki commented Nov 26, 2016

kyleabeauchamp commented Nov 27, 2016

kyleabeauchamp commented Nov 30, 2016

kyleabeauchamp commented Nov 30, 2016

twiecki commented Nov 30, 2016

fonnesbeck commented Nov 30, 2016

taku-y commented Nov 30, 2016

kyleabeauchamp commented Dec 1, 2016

fonnesbeck commented Dec 1, 2016 • edited Loading

kyleabeauchamp commented Dec 1, 2016

fonnesbeck commented Dec 1, 2016 • edited Loading

twiecki commented Dec 1, 2016

kyleabeauchamp commented Dec 1, 2016

shkr commented Dec 4, 2016

fonnesbeck commented Dec 5, 2016

shkr commented Nov 17, 2016 •

edited

Loading

shkr commented Nov 17, 2016 •

edited

Loading

AustinRochford commented Nov 18, 2016 •

edited

Loading

fonnesbeck commented Dec 1, 2016 •

edited

Loading

fonnesbeck commented Dec 1, 2016 •

edited

Loading