Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shape of sample_ppc #1529

Closed
shkr opened this issue Nov 17, 2016 · 23 comments
Closed

Shape of sample_ppc #1529

shkr opened this issue Nov 17, 2016 · 23 comments

Comments

@shkr
Copy link
Contributor

shkr commented Nov 17, 2016

"""
Student T distribution

This one with uniform priors

Parameters
    ----------
    nu : int
        Degrees of freedom (nu > 0).
    mu : float
        Location parameter.
    lam : float
        Scale parameter (lam > 0).
"""
with pm.Model() as vvf_model:
    
    # Define priors
    b0 = pm.Normal("b0", mu=0, sd=20)
    b1 = pm.Normal("b1", mu=0, sd=20)
    lam = pm.Uniform("lam", lower=0.0, upper=20.0)
    nu = pm.Uniform("nu", lower=0.0, upper=20.0)
    
    # Identity Link Function 
    mu = b0 + b1*x
    
    y_obs = pm.StudentT("y_obs", mu=mu, lam=lam, nu=nu, observed=y)
    
    
with vvf_model:
    start = pm.find_MAP(model=vvf_model, fmin=scipy.optimize.fmin_powell)
    logger.info("Starting values = {}".format(start))

    # draw posterior samples
    trace = pm.sampling.sample(10000, start=start)    
ppc = pm.sampling.sample_ppc(trace, samples=500, model=vvf_model, size=100)

As per the documentation https://pymc-devs.github.io/pymc3/api.html#pymc3.sampling.sample_ppc
found here. I am expecting a keyed dictionary, in this case a dict with key y_obs since I have only one observed variable.
The ppc['y_obs'] I expect to be a matrix of 500 datasets of predictive posterior samples of each size 100 samples drawn from different values of the posterior distribution.

However, the shape of ppc is not (500, 100) as expected but (500, 100, 100).

Can someone explain what is this matrix and why its shape is not (500, 100) ?

@fonnesbeck
Copy link
Member

If you use samples=100 rather than size=100, you will get what you expect. I can see this is confusing, however.

@AustinRochford can you lend some insight as to the need for a size parameter here?

@shkr
Copy link
Contributor Author

shkr commented Nov 17, 2016

Thanks. I omitted the size parameter and set samples = 100. Now I believe I have 100 datasets each of 100 samples from posterior predictive distribution.

I think this issue is related to the shape and size discussion on PR #862. The things I was able to infer from the discussion for my example

  • y_obs is sampled from posterior predictive distribution because it is an observed and dependent variable
  • For each sample in the training data y, which has shape (100, 1), one sample from the posterior predictive distribution is fetched which explains the sample_ppc(..).shape = (samples, 100). Can someone shed light on why is this rule being followed ? Okay, my guess is the input to the deterministic portion of y_obs i.e. x from b0 + b1*(x) is maintained between a pair y_obs and y from training data.

@AustinRochford
Copy link
Member

AustinRochford commented Nov 18, 2016

@fonnesbeck I am not familiar at all with the innards of the PPC sampling code, so I don't have much to add offhand, unfortunately.

@fonnesbeck
Copy link
Member

Sorry, thought you had authored this.

@AustinRochford
Copy link
Member

No problem.

@twiecki
Copy link
Member

twiecki commented Nov 18, 2016

@taku-y did

@kyleabeauchamp
Copy link
Contributor

Here's another question: why not have the PPC sampler return an object like the Trace object? Right now, the trace is an object that supports slicing, but the output of sample_ppc is a dictionary that requires more gymnastics.

@twiecki
Copy link
Member

twiecki commented Nov 26, 2016 via email

@twiecki
Copy link
Member

twiecki commented Nov 26, 2016

@kyleabeauchamp Want to do a PR?

@kyleabeauchamp
Copy link
Contributor

Is there an obvious constructor for the MultiTrace object from a dictionary of traces? I'm not seeing a clear API on what it takes to construct the trace object outside of the usual chain.

@kyleabeauchamp
Copy link
Contributor

Should we also document the fact that the PPC trace is actually reshuffled with respect to the original trace? E.g. does everyone find it obvious that this thing is first drawing shuffled points from the original trace (https://github.com/pymc-devs/pymc3/blob/481a231dd2ef31d5f1581e26320cf387edeed343/pymc3/sampling.py#L385) as opposed to iterating over the trace?

@kyleabeauchamp
Copy link
Contributor

E.g. I want to prevent someone from assuming order and doing the following:

ppc = sample_ppc(trace)
for (var, obsvar) in zip(trace[key1], ppc[key2]):
    pass

@twiecki
Copy link
Member

twiecki commented Nov 30, 2016

That's indeed a bit counter intuitive. Could we not just change the order of that loop?

@fonnesbeck
Copy link
Member

Its not clear why you would use the trace and the posterior predictive check draws together anywhere. Its doing what I would expect:

  1. draw a random point from the trace
  2. use that point to generate a random draw from the posterior predictive distribution

Let me know if I am reading that wrong.

@taku-y
Copy link
Contributor

taku-y commented Nov 30, 2016

I did not write the code of sample_ppc, but sample_vp. Sorry for late reply.

@kyleabeauchamp
Copy link
Contributor

OK, here's my simple model for why one might want to zip() the MCMC trace and the PPC trace.

Suppose we have a coin that is lands on heads either 75% of the time or 25% of the time. Suppose we have observed 4 coin flips. The key parameter in the model is then p, which is a discrete uniform.

Now we go and sample the model, and also the PP. The PP are essentially the resampled coin flips. I imagine I would want to plot a 2D histogram of p versus n_heads.

If we use the current PPC without zipping the MCMC trace for p, then we have smudged together the coin tosses where p = 75% and p = 25%, wherease we might want the ability to visualize those cases separately.

@fonnesbeck
Copy link
Member

fonnesbeck commented Dec 1, 2016

If you are doing posterior predictive checks (which is implied in the name of the function), all we want is a set of random draws from the conditional posterior for each data point we have observed. The only thing we are conditioning on are the predictors associated with the observed outcomes, and not on any particular draw from intermediate variables).

@kyleabeauchamp
Copy link
Contributor

I agree, I guess I'm saying there may sometimes be value in having things conditioned on the intermediates. AFAIK, in pymc2 my use case could be achieved using either hand-introduced auxiliary nodes or by using the masked array / missing values formalism.

@fonnesbeck
Copy link
Member

fonnesbeck commented Dec 1, 2016

If I'm doing posterior predictive checks, not only do I want the original MCMC sample shuffled, I want it to be sampling with replacement. Because, in principle, you could ask for more PPC samples than original samples, so there should be a means for providing that.

I'm happy to have a function that is more flexible, but am suggesting it probably shouldnt be sample_ppc. Ideally, we would have a super-flexible sampling function and then sample_ppc would be a convenience function that restricted it to the sort of sampling that posterior predictive checks required.

@twiecki
Copy link
Member

twiecki commented Dec 1, 2016

That's a good proposal.

@kyleabeauchamp
Copy link
Contributor

agreed

@shkr
Copy link
Contributor Author

shkr commented Dec 4, 2016

My question about how the predictor values are used to calculate the posterior predictive samples, is still not answered. I want to calculate the variance of the posterior predictive samples, y_new after subtracting (aX + b) [here x is the predictor variable] i.e Var(y_new - (aX + b)) where y_new is the list of posterior predictive samples returned. Can someone answer how to do this ?

@fonnesbeck
Copy link
Member

@shkr you can just manipulate the posterior samples from sample_ppc directly, implementing exactly what you calculate in Python, since y_new will just be a Numpy array.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants