Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tutorial for models with missing data #415

Closed
fehiepsi opened this issue Oct 28, 2019 · 5 comments
Closed

Add tutorial for models with missing data #415

fehiepsi opened this issue Oct 28, 2019 · 5 comments
Assignees

Comments

@fehiepsi
Copy link
Member

We should able to run MCMC for models with missing data in NumPyro because NumPyro supports improper priors through param primitive. This is also a good chance to illustrate that NumPyro supports improper priors.

Related issues in other repos:

@fehiepsi fehiepsi added this to the version 0.2.2 milestone Oct 28, 2019
@fehiepsi fehiepsi self-assigned this Oct 28, 2019
@neerajprad
Copy link
Member

I think the issue in Pyro refers to allowing a way to support models with sample statements of the kind pyro.sample('obs', dist(..), obs=[some partially observed tensor]), where we condition on the observed data and impute the missing entries. Could you clarify how having param statements in MCMC help this use case?

@fehiepsi
Copy link
Member Author

fehiepsi commented Oct 28, 2019

I will use the most naive way to impute those missing data. For example,

def model(x, y):
    loc = sample('loc', Normal(0, 1))
    scale = sample('scale', LogNormal(0, 1))
    isnan = onp.isnan(x)
    x_impute = param('x_impute', np.zeros(isnan.sum()))
    x = ops.index_update(x, onp.nonzero(isnan), x_impute)  # update x
    sample('x', dist.Normal(loc, scale), obs=x)
    ...

I am not sure how to make this possible with pyro.sample... Does the above script do what pyro/brmp issues mentioned?

@neerajprad
Copy link
Member

I see - I think your method will give us the ML estimate for the missing values.

I am not sure how to make this possible with pyro.sample.

I am not quite sure, but maybe sampling from the dist.Normal(..) might work:

    x_impute = sample('x_impute', dist.Normal(loc, scale))
    x = ops.index_update(x, onp.nonzero(isnan), x_impute)  # update x
    sample('x', dist.Normal(loc, scale), obs=x)

In any case, I think this will make for a good tutorial.

@fehiepsi
Copy link
Member Author

fehiepsi commented Oct 29, 2019

the ML estimate for the missing values

Actually, those imputed values will have priors dist.Normal(loc, scale) because the statement

sample('x', dist.Normal(loc, scale), obs=x)

acts on the "merged" x (x_notnan & x_impute). Here I am using the fact that in NumPyro (not Pyro) MCMC

x = param('x', ...)
sample('x', dist.Normal(), obs=x)

and

x = sample('x', dist.Normal())

are equivalent (even for priors with constrained supports). This way also works for multivariate priors IIUC.

The script in your comment will double log_prob of x_impute, so I think that it is not quite correct. Maybe I am overlooking something?

@neerajprad
Copy link
Member

Interesting. No, you are absolutely right about the double counting. What I wrote above is incorrect unless we are doing masked observes. Actually, your original solution is quite clever! This will make for an interesting tutorial, look forward to it.

@fehiepsi fehiepsi modified the milestones: version 0.2.2, version 0.2.3 Dec 3, 2019
@fehiepsi fehiepsi removed this from the version 0.2.4 milestone Jan 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants