Implement GuideMessenger, AutoNormalMessenger, AutoRegressiveMessenger #2953

fritzo · 2021-10-28T19:45:54Z

This introduces a new pattern for writing variational posteriors: as effect handlers. Whereas ELBO's used to always first run the guide and then run the model, the they now check if the guide is a GuideMessenger, and if so run the guide as messenger that intercepts model sites as they arise. This has two advantages:

Guides can be more dynamic, sampling only those nodes in models that actually arise.
Guides can leverage model-side computations at no extra cost. This framework produces a (model,guide) pair of traces while running model computations only once (they are shared between the model and guide). This is the main feature requested by @vitkl

The interface is backwards compatible with all ELBOs (except TraceEnum_ELBO which is out of scope):

model = ...  # standard model syntax
guide = AutoRegressiveMessenger()  # note this currently doesn't take a model
elbo = Trace_ELBO()
optim = Adam({"lr": 1e-3})
svi = SVI(model, guide, optim, elbo)

To demonstrate the new interface I've added two example autoguides subclassing GuideMessenger:

AutoNormalMessenger is an analog of AutoNormal and is a useful base class for custom guides.
AutoRegressiveMessenger is a simple autoregressive guide whose posteriors are learned affine transforms of the priors at each site (recursively conditioned on upstream posterior samples).

Tested

Added AutoNormalMessenger and AutoRegressiveMessenger to a bunch of test_autoguide.py tests

fritzo · 2021-10-28T23:42:00Z

@vitkl I think this should be ready to try out.

vitkl · 2021-10-29T09:45:37Z

@fritzo this looks quite exciting!

So AutoRegressiveMessenger acts like AutoNormal guide?
Can you please give an example of how to use upstream_values=... for encoding the hierarchy/dependencies?

pyro/infer/effect_elbo.py

vitkl · 2021-10-29T11:54:04Z

I see that upstream_values are automatically collected and AutoRegressiveMessenger currently doesn't support the specification of hierarchies.

So the guide with hierarchical dependencies for all sites should simply do this? See line marked as "Here ->"

class HierarchicalGuideMessenger(AutoRegressiveMessenger):
            def get_posterior(self, name, prior, upstream_values):
                    # Use a distribution at all site the value of which depends on upstream_values.
                    with helpful_support_errors({"name": name, "fn": prior}):
                        transform = biject_to(prior.support)
                    loc, scale = self._get_params(name, prior)
                    affine = dist.transforms.AffineTransform(
Here ->                 loc + transform.inv(prior.mean), scale, 
                        event_dim=transform.domain.event_dim, cache_size=1
                    )
                    posterior = dist.TransformedDistribution(
                        prior, [transform.inv.with_cache(), affine, transform.with_cache()]
                    )
                    return posterior

Or should it be something more complex like below where users need to provide a dictionary specifying which sites have which parents and how to transform them into each other:

class HierarchicalGuideMessenger(AutoRegressiveMessenger):
            def init(self, args=list(), kwargs=dict(), hierarchical_sites=dict()):
                        self.super().init(*args, **kwargs)
                        self.hierarchical_sites = hierarchical_sites
            def get_posterior(self, name, prior, upstream_values):
                if name in self.hierarchical_sites.keys():
                    # Use a custom distribution at this site the value of which depends on upstream_values.
                    with helpful_support_errors({"name": name, "fn": prior}):
                        transform = biject_to(prior.support)
                    # Get values of parent sites
                    parent_names = self.hierarchical_sites[name]["parent_nodes"]
                    parent_upstream_values = {k: upstream_values[k] for k in parent_names}
                    hierarchical_loc = self.hierarchical_sites[name]["fn"](**parent_upstream_values)
                    hierarchical_loc_untransformed = transform.inv(hierarchical_loc)
                    loc, scale = self._get_params(name, prior)
                    affine = dist.transforms.AffineTransform(
                        loc + hierarchical_loc_untransformed, scale, event_dim=transform.domain.event_dim, cache_size=1
                    )
                    posterior = dist.TransformedDistribution(
                        prior, [transform.inv.with_cache(), affine, transform.with_cache()]
                    )
                    return posterior
                # Fall back to autoregressive.
                return super().get_posterior(name, prior, upstream_values)

Where hierarchical_sites needs to specify:

hierarchical_sites = {"x": {"parent_nodes": ["y", "z"], "fn": lambda y, z: y @ z}}

fritzo · 2021-10-29T13:32:05Z

So GuideMessenger acts like AutoNormal guide?
Can you please give an example of how to use GuideMessenger(upstream_values=...) ?

I've added an AutoNormalMessenger guide just now, that's probably easier for discussion. In the AutoNormalMessenger docstring I've added an example of how to use upstream_values.

fritzo · 2021-10-29T13:35:41Z

So the guide with hierarchical dependencies for all sites should simply do this? See line marked as "Here ->"

I think AutoRegressiveMessenger is already hierarchical, there's no need to extract a mean.

BTW I think your idea is similar to ASVI, which samples from a posterior with the same dependency structure as the prior. Note that in general the posterior may have more complex dependency structure, as described in Webb et al. (2017).

vitkl · 2021-10-29T15:34:23Z

Very interesting, thanks for adding AutoNormalMessenger and sharing the papers. Is the Webb et al. (2017) the motivation for how you do things in AutoStructured?

@yozhikoff @la-sekretar @bv2 are probably interested in this too

pyro/infer/autoguide/effect.py

vitkl · 2021-10-29T15:48:50Z

pyro/infer/autoguide/effect.py

+    ) -> Union[TorchDistribution, torch.Tensor]:
+        with helpful_support_errors({"name": name, "fn": prior}):
+            transform = biject_to(prior.support)
+        loc, scale = self._get_params(name, prior)


The guide will become fully hierarchical if you do this but it is not fully hierarchical by default, right?

loc, scale = self._get_params(name, prior) loc = loc + prior.loc

Ideally one can add some kind of test of whether this site has dependency sites.

You are also mentioning that it could be useful to encode a more complex dependency:

loc, scale, weight = self._get_params(name, prior) loc = loc + prior.loc * weight

Correct, the intention of this simple guide is to be mean field.

Do you want to try contributing an AutoHierchicalNormalMessenger guide as a follow-up to this PR? I tried to do something similar with AutoRegressiveMessenger below by sampling from the prior and then shifting in unconstrained space. I was unsure how to implement a general AutoHierarchicalNormalMessenger because not all prior distributions have a .mean method, and even then it is the mean in unconstrained space that we care about. E.g. how do we deal with Gamma or LogNormal or Beta or Dirichlet?

I understand your point about the distributions that don't have the mean. What are those distributions by the way?

I am thinking about this solution:

loc, scale, weight = self._get_params(name, prior) loc = loc + transform.inv(prior.loc) * weight

Does it make sense for all distributions that have the mean?

Yes, I will doAutoHierchicalNormalMessenger PR - should I wait until this PR is merged?

What [distributions do not have a .mean method] by the way?

Heavy tailed distributions may not have a mean, e.g. Cauchy and Stable have infinite variance and no defined mean

Non-euclidean distributions such as VonMises3D and ProjectedNormal have no defined mean.

Some complex distributions have no computable mean, e.g. TransformedDistribution(Normal(...), MyNormalizingFlow).

Does prior.loc make sense for all distributions that have the mean?

First I would opt for prior.mean rather than prior.loc, since e.g. LogNormal(...).loc isn't a mean, rather it is the mean of the pre-transformed normal. Second note that the transform of the constrained mean is not the same as the unconstrained mean or unconstrained median, e.g. for LogNormal, mean = exp(loc + scale**2 / 2) whereas median = exp(loc).

I think your .mean idea is good enough in most cases, and for cases where it fails, users can subclass and define their own custom .get_posterior() methods.

pyro/infer/autoguide/effect.py

fritzo · 2021-10-29T19:19:37Z

Is the Webb et al. (2017) the motivation for how you do things in AutoStructured?

Yes, Webb et al. (2017) was the idea behind AutoStructured with auto dependency detection via pyro.infer.inspect.get_dependencies(), however @eb8680 convinced me AutoGaussian is a more natural solution, so I've been spending more work on that.

pyro/infer/autoguide/effect.py

fritzo · 2021-10-30T16:10:19Z

pyro/infer/elbo.py

+        if self.num_particles == 1:
+            return fn
+        return pyro.plate(
+            "num_particles_vectorized",
+            self.num_particles,
+            dim=-self.max_plate_nesting,
+        )(fn)


this ensures serializability

pyro/infer/autoguide/effect.py

vitkl · 2021-10-31T02:30:03Z

How do these guides know about the plates? AutoNormal has create_plates argument which tells AutoNormal which plates exist. Maybe good to comment about this in the docs.

fritzo · 2021-10-31T14:21:24Z

How do these guides know about the plates?

These guides directly use the pyro.plate statements in the model, so no create_plates logic is needed.

@vitkl I'm unsure how subsampling should behave in these models, what do you think? Should we always assume that subsampled plates are amortized (share a single parameter value)? Should we additionally provide an amortized_plates kwarg to AutoMessenger.__init__()? I'd guess yes to both, but will that work for your use cases?

EDIT I've just pushed a fix to support subsampling, and an amortized_plates kwarg with tests.

pyro/infer/effect_elbo.py

fritzo · 2021-11-01T17:24:20Z

@martinjankowiak thanks for the helpful review, the new design is much simpler!

@vitkl you'll need to update:

- def get_posterior(self, name, prior, upstream_values):
+ def get_posterior(self, name, prior):

fritzo · 2021-11-01T19:57:31Z

@martinjankowiak looks like tests now pass

vitkl · 2021-11-01T22:55:24Z

@fritzo I think that, in the simplest case, it would be good to support subsampling as it is currently done in AutoNormal (define all parameters on initialisation, use a subset of them according to plate indices). I assume this behaviour was already supported, right?
I see amortisation as a second optional layer.

I don't think I understand what this statement means:

A tuple of names of plates over which guide parameters should be shared. This is useful for subsampling, where a guide parameter can be shared across all plates.

The code seems to suggest that, for a parameter w_cf that has both cell c and cell type f index where subsampling is done across cells c, only w_f parameters are learned test_subsample_model_amortized. Is this correct? If yes, I don't understand why this is useful. You are essentially converting a local parameter to global, whereas users of the model are interested in values specific to each cell c (e.g. cell abundance in cell2location model).

fritzo · 2021-11-02T01:42:13Z

@vitkl feel free to clarify the docstrings in your follow-up PR. Indeed your language of "global" and "local" seems clearer.

Is this correct? If yes, I don't understand why this is useful.

Yes your interpretation of amortized_plates is correct. Note if you specify amortized_plates=() (the default), the behavior will be the same as in AutoNormal, where the full parameter is initialized at the first invocation, and subsets of it are extracted at each minibatch. Do you use that version in cell2location?

Can you explain what kind of amortization and minibatching strategies you would find useful, in cell2location or elsewhere?

vitkl · 2021-11-02T02:25:59Z

Thanks for explaining!

The setting where the full parameter is initialized at the first invocation, and subsets of it are extracted at each minibatch, this setting can be used in cell2location (although we find that it leads to reduces accuracy). It is also used in MOFA (https://biofam.github.io/MOFA2/) and a few models related to cell2location (Stereoscope, DestVI - in scvi-tools) - so this approach is used by a few methods - although these particular models are implemented using pyro. Good to know that this setting works with the messenger guides.

The amortization strategy I am thinking about will be hopefully clear when I add the amortised class as a PR.

fritzo added 2 commits October 28, 2021 15:32

Implement Effect_ELBO and AutoRegressiveMessenger

84feca4

Fix max_plate_nesting typo

afe472c

fritzo added enhancement discussion WIP labels Oct 28, 2021

Add docs

75a0a6f

fritzo marked this pull request as draft October 28, 2021 20:22

fritzo added 2 commits October 28, 2021 19:26

Get smoke tests passing

b656276

Fix tests

6c29e1b

vitkl reviewed Oct 29, 2021

View reviewed changes

pyro/infer/effect_elbo.py Outdated Show resolved Hide resolved

fritzo added 2 commits October 29, 2021 09:17

Add AutoNormalMessenger

2993402

Add example to docstring

6eb02e4

vitkl reviewed Oct 29, 2021

View reviewed changes

pyro/infer/autoguide/effect.py Show resolved Hide resolved

vitkl reviewed Oct 29, 2021

View reviewed changes

pyro/infer/autoguide/effect.py Outdated Show resolved Hide resolved

fritzo marked this pull request as ready for review October 29, 2021 19:15

fritzo marked this pull request as draft October 29, 2021 19:16

Support calling guide(*args, **kwargs)

84d5e60

fritzo commented Oct 30, 2021

View reviewed changes

pyro/infer/autoguide/effect.py Show resolved Hide resolved

fritzo added 3 commits October 30, 2021 10:49

Support init_loc_fn, init_scale

3d5250e

Add more tests

0ac9ae1

Fix jit tests

0637e00

fritzo commented Oct 30, 2021

View reviewed changes

vitkl reviewed Oct 30, 2021

View reviewed changes

pyro/infer/autoguide/effect.py Show resolved Hide resolved

vitkl mentioned this pull request Oct 30, 2021

Implement AutoHierarchicalNormalMessenger #2955

Merged

3 tasks

Document relationship to AutoNormal

56d26af

fritzo added 3 commits October 31, 2021 15:56

Support subsampling and amortization

26bd9bf

lint

8556734

Remove debug statement

7715687

fritzo requested review from martinjankowiak and removed request for eb8680 November 1, 2021 15:11

fritzo commented Nov 1, 2021

View reviewed changes

pyro/infer/effect_elbo.py Outdated Show resolved Hide resolved

fritzo commented Nov 1, 2021

View reviewed changes

pyro/infer/effect_elbo.py Outdated Show resolved Hide resolved

pyro/infer/effect_elbo.py Outdated Show resolved Hide resolved

fritzo added 3 commits November 1, 2021 12:24

Add a poutine.unwrap() helper function

5c3e5bc

Eliminate Effect_ELBO and mixin stuff

f03401a

Fix docs

8749c96

fritzo changed the title ~~Implement Effect_ELBO, AutoNormalMessenger, AutoRegressiveMessenger~~ Implement GuideMessenger, AutoNormalMessenger, AutoRegressiveMessenger Nov 1, 2021

Revert unnecessary change

c161ebb

martinjankowiak previously approved these changes Nov 1, 2021

View reviewed changes

Fix poutine.unwrap()

872b82d

fritzo dismissed martinjankowiak’s stale review via 872b82d November 1, 2021 18:54

martinjankowiak approved these changes Nov 1, 2021

View reviewed changes

martinjankowiak merged commit 1dc2d73 into dev Nov 1, 2021

This was referenced Nov 7, 2021

Support for multiple dependency/parent nodes in AutoStructured [feature request] #2950

Closed

Support multivariate dependencies in AutoStructured #2951

Closed

fritzo deleted the guide-messenger branch February 24, 2022 14:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement GuideMessenger, AutoNormalMessenger, AutoRegressiveMessenger #2953

Implement GuideMessenger, AutoNormalMessenger, AutoRegressiveMessenger #2953

fritzo commented Oct 28, 2021 •

edited

Loading

fritzo commented Oct 28, 2021

vitkl commented Oct 29, 2021 •

edited

Loading

vitkl commented Oct 29, 2021 •

edited

Loading

fritzo commented Oct 29, 2021

fritzo commented Oct 29, 2021

vitkl commented Oct 29, 2021 •

edited

Loading

vitkl Oct 29, 2021 •

edited

Loading

fritzo Oct 29, 2021

vitkl Oct 29, 2021

vitkl Oct 29, 2021

fritzo Oct 30, 2021

fritzo commented Oct 29, 2021

fritzo Oct 30, 2021

vitkl commented Oct 31, 2021

fritzo commented Oct 31, 2021 •

edited

Loading

fritzo commented Nov 1, 2021

fritzo commented Nov 1, 2021

vitkl commented Nov 1, 2021

fritzo commented Nov 2, 2021

vitkl commented Nov 2, 2021 •

edited

Loading

Implement GuideMessenger, AutoNormalMessenger, AutoRegressiveMessenger #2953

Implement GuideMessenger, AutoNormalMessenger, AutoRegressiveMessenger #2953

Conversation

fritzo commented Oct 28, 2021 • edited Loading

Tested

fritzo commented Oct 28, 2021

vitkl commented Oct 29, 2021 • edited Loading

vitkl commented Oct 29, 2021 • edited Loading

fritzo commented Oct 29, 2021

fritzo commented Oct 29, 2021

vitkl commented Oct 29, 2021 • edited Loading

vitkl Oct 29, 2021 • edited Loading

Choose a reason for hiding this comment

fritzo Oct 29, 2021

Choose a reason for hiding this comment

vitkl Oct 29, 2021

Choose a reason for hiding this comment

vitkl Oct 29, 2021

Choose a reason for hiding this comment

fritzo Oct 30, 2021

Choose a reason for hiding this comment

fritzo commented Oct 29, 2021

fritzo Oct 30, 2021

Choose a reason for hiding this comment

vitkl commented Oct 31, 2021

fritzo commented Oct 31, 2021 • edited Loading

fritzo commented Nov 1, 2021

fritzo commented Nov 1, 2021

vitkl commented Nov 1, 2021

fritzo commented Nov 2, 2021

vitkl commented Nov 2, 2021 • edited Loading

fritzo commented Oct 28, 2021 •

edited

Loading

vitkl commented Oct 29, 2021 •

edited

Loading

vitkl commented Oct 29, 2021 •

edited

Loading

vitkl commented Oct 29, 2021 •

edited

Loading

vitkl Oct 29, 2021 •

edited

Loading

fritzo commented Oct 31, 2021 •

edited

Loading

vitkl commented Nov 2, 2021 •

edited

Loading