Variance reduction for reparametrized ELBo #63

ngoodman · 2017-07-21T19:11:26Z

For the reparameterizable case (courtesy @eb8680):

ngoodman · 2017-07-21T19:12:01Z

splitting this off from #42 in order to keep issues somewhat approachable.

null-a · 2017-08-07T15:10:20Z

Sticking the Landing is also relevant, and might be worth considering. Perhaps best implemented as a control variate.

(I had a quick play with this in WebPPL a while back.)

null-a · 2017-08-16T12:14:27Z

I've put some thought into what it would take to support reparameterized accept/reject samplers. (I'm not sure if anyone else has looked at this already?)

I think the key change will be to add support for the idea of a partially reparameterized distribution. i.e. one in which the base distribution retains a dependency on the parameters. (Reparameterizing a accept/reject sampler produces a distribution of this type.)

Inference algorithms that assume distributions are fully reparameterized will need updating to correctly handle the partially reparameterized case. For the ELBO estimator, a partially reparameterized choice will have both reinforce and path-wise terms. The reinforce term requires us to compute the log density of the base sample under the base distribution. (Something we don't have to do for fully reparameterized distributions.) AFAICT this isn't possible with pyro's current distribution interface, so we may need to tweak that.

To help get a feel for this, I made an initial attempt at adding the reparameterized gamma sampler to webppl. The way this works is that I added two new methods to distributions: sampleReparam generates a sample from the base distribution and passes it through the
(auto-differentiable) transform (following current pyro), and returns a pair of the transformed sample and the base sample. The method baseScore can then be used to compute the log density of the base sample, if required. The code is here.

This seems to works OK on a super simple model. An obvious next step would be to extend this to other distributions to test the interfaces.

null-a · 2017-08-16T15:07:54Z

Note: Related comment here.

eb8680 · 2017-08-17T01:05:26Z

@null-a Cool stuff! In your opinion how important/valuable is this? Is the variance reduction worth the effort? Is it something we should consider adding to Pyro before release, or before splitting off the distributions library?

null-a · 2017-08-17T08:54:05Z

Is it something we should consider adding to Pyro before release

@eb8680 I guess it depends on our goals, so perhaps we should let the anchor models drive this.

If models with Gamma/Beta/Dirichlet choices end up in that set, then I imagine that having something other than vanilla reinforce for these will be worth the effort.

The two most promising approaches I know of are this and the use of a transformed Gaussian (or other fully reparameterizable distribution) as a guide. I don't know whether one of these is strictly superior or both have their place.

ETA: Section 4 of the paper mentions one reason to think that reparam for accept/reject might be better than the transformation approach, and least in some settings. i.e. The transformation approach can't accurately approximate densities with singularities.

null-a · 2017-08-17T09:37:01Z

The two most promising approaches I know of are this and the use of a transformed Gaussian

For reference, The Generalized Reparameterization Gradient is another, but reparameterized accept/reject probably has lower variance.

ngoodman · 2017-08-17T14:54:47Z

this is neat. i suspect that partially reparametrized distributions are the more general case, and therefore a good move anyhow. i don't have a super strong opinion about rejection based samplers, but i think it is promising enough to try out.

my suggestion would be for you to add one partially reparametrized dist (eg gamma) and make an extended version of the elbo inference method that uses it. then we can all think through the code and interfaces to see if there are any tweaks we want to make before converting all the dists to this style.

(btw i think there might be a more general idiom for making distributions compositionally out of sampling pieces, deterministic pieces, and scorers. with a contract something like having a complete scorer after every composition step....)

ngoodman · 2017-08-17T14:57:10Z

Note: but in terms of prioritizing this wrt other extensions, I agree that being guided by anchor models is probably best at this point!

null-a · 2017-08-17T17:01:11Z

my suggestion would be for you to add one partially reparametrized dist (eg gamma) and make an extended version of the elbo inference method that uses it

Yeah, I was thinking I might come back and do that once #64 is merged and the anchor models demand it.

btw i think there might be a more general idiom for making distributions compositionally out of sampling pieces, deterministic pieces, and scorers

I agree. I poked around with this for a while, but didn't arrive at anything satisfactory, so went with what I have here. Implicit models might fit here too.

martinjankowiak · 2017-08-23T00:34:07Z

@null-a cool, interesting stuff. in the context of accept/reject sampling wouldn't it be sufficient to do something like the following?

-- give the distribution class a score_function_term() method
-- by default it reverts to log_pdf
-- for a distribution of the accept/reject type, override score_function_term() with log q - log r
-- when constructing the elbo in the presence of non-reparameterizable distributions (which would include
accept/reject distributions) use score_function_term() when constructing the gradient estimator

as far as i can tell, this would construct the right estimator. (although one might need to take care that certain gradients are being blocked, but i think this would basically be automatic.)

or am i missing something?

one can imagine that other complex distributions with hidden/unexposed RVs could fit into the same framework, at least in certain cases.

null-a · 2017-08-24T13:21:35Z

@martinjankowiak Thanks!

I don't think I fully understand the suggestion, since I don't see where logq-logr comes from? (I would have expected just logr perhaps.)

That aside, it seems that score_function_term() would need to take the value sampled from the base distribution in order to compute logr(), and adding a way of getting hold of that brings you back to something similar my approach? (But maybe I'm missing something!)

martinjankowiak · 2017-08-24T17:33:26Z

@null-a we're probably ultimately thinking along the same lines (modulo possible difference in interface). when i have a chance i'll see about implementing a v0. it'll be easier to discuss adequacy/shortcomings with something concrete

jpchen · 2017-12-17T23:50:55Z

@martinjankowiak what is the status of this? i dont know if youve looked back at this since your fancy variance-reduced estimators.. otherwise can we close this in favor of concrete tasks?

cavaunpeu · 2018-01-21T02:58:39Z

FWIW, I'm working on a PR for almost this in Edward. Perhaps I can do a port to Pyro when done. Would be 🌴 💯 ☀️ to be back working in PyTorch land...

fritzo · 2018-01-21T05:41:29Z

@cavaunpeu we've implemented RSVI in #659 which should be merged within the next week or two (just needs some clean-up and tests).

cavaunpeu · 2018-01-21T18:16:26Z

Super!

fritzo · 2018-04-20T23:03:24Z

RSVI and Sticking the Landing are already in Pyro. I'm closing this issue in favor of more targeted issues.

karalets closed this as completed Aug 29, 2017

martinjankowiak reopened this Aug 29, 2017

martinjankowiak added this to the 0.2 release milestone Jan 10, 2018

fritzo mentioned this issue Jan 21, 2018

Implement Rejection Sampling Variational Inference #659

Merged

7 tasks

fritzo closed this as completed Apr 20, 2018

fritzo mentioned this issue Aug 8, 2018

Support PyTorch JIT compilation #1063

Closed

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Variance reduction for reparametrized ELBo #63

Variance reduction for reparametrized ELBo #63

ngoodman commented Jul 21, 2017 •

edited

Loading

ngoodman commented Jul 21, 2017

null-a commented Aug 7, 2017

null-a commented Aug 16, 2017

null-a commented Aug 16, 2017

eb8680 commented Aug 17, 2017 •

edited

Loading

null-a commented Aug 17, 2017 •

edited

Loading

null-a commented Aug 17, 2017

ngoodman commented Aug 17, 2017 •

edited

Loading

ngoodman commented Aug 17, 2017

null-a commented Aug 17, 2017 •

edited

Loading

martinjankowiak commented Aug 23, 2017

null-a commented Aug 24, 2017

martinjankowiak commented Aug 24, 2017 •

edited

Loading

jpchen commented Dec 17, 2017

cavaunpeu commented Jan 21, 2018

fritzo commented Jan 21, 2018

cavaunpeu commented Jan 21, 2018

fritzo commented Apr 20, 2018

Variance reduction for reparametrized ELBo #63

Variance reduction for reparametrized ELBo #63

Comments

ngoodman commented Jul 21, 2017 • edited Loading

ngoodman commented Jul 21, 2017

null-a commented Aug 7, 2017

null-a commented Aug 16, 2017

null-a commented Aug 16, 2017

eb8680 commented Aug 17, 2017 • edited Loading

null-a commented Aug 17, 2017 • edited Loading

null-a commented Aug 17, 2017

ngoodman commented Aug 17, 2017 • edited Loading

ngoodman commented Aug 17, 2017

null-a commented Aug 17, 2017 • edited Loading

martinjankowiak commented Aug 23, 2017

null-a commented Aug 24, 2017

martinjankowiak commented Aug 24, 2017 • edited Loading

jpchen commented Dec 17, 2017

cavaunpeu commented Jan 21, 2018

fritzo commented Jan 21, 2018

cavaunpeu commented Jan 21, 2018

fritzo commented Apr 20, 2018

ngoodman commented Jul 21, 2017 •

edited

Loading

eb8680 commented Aug 17, 2017 •

edited

Loading

null-a commented Aug 17, 2017 •

edited

Loading

ngoodman commented Aug 17, 2017 •

edited

Loading

null-a commented Aug 17, 2017 •

edited

Loading

martinjankowiak commented Aug 24, 2017 •

edited

Loading