Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Params inside iarange #238

Closed
ngoodman opened this issue Oct 11, 2017 · 9 comments

Comments

Projects
None yet
5 participants
@ngoodman
Copy link
Collaborator

commented Oct 11, 2017

the current behavior of params created inside an iarange (aka vectorized map_data) is a bit nonsensical: a vector of params of length batch_size will be created and used on every iteration. this means that a param gets reused for whatever data point happens to be selected at that index in a batch. this comes up eg when there is a mean-field model for a local variable.

(there is a reasonable work-around, by creating the vector of needed params globally, and only selecting from it in the iarange.)

a better behavior would be to automatically associate params to their index into the data tensor, pulling out the appropriate params for a given batch (and creating them as needed). it might be tricky to get this working, and especially to do it efficiently.

i'm flagging this as a low priority bug for now. we can think through a solution after launch.

@martinjankowiak

This comment has been minimized.

Copy link
Collaborator

commented Oct 11, 2017

the other attitude to take is #alwaysamortize ; )

@fritzo

This comment has been minimized.

Copy link
Collaborator

commented Nov 8, 2017

It seems pretty simple to write this as:

with pyro.iarange("data", size, subsample_size) as ind:
    ps = pyro.param("ps", Variable(torch.ones(size), requires_grad=True)).index_select(ind)
    # ...do stuff with ps...

@ngoodman ngoodman removed the low priority label Dec 20, 2017

@ngoodman ngoodman added this to the 0.2 release milestone Dec 20, 2017

@fritzo

This comment has been minimized.

Copy link
Collaborator

commented Apr 13, 2018

I believe the only way to get this to work automatically is to add a notion of .event_dim to parameters. However the batch_dim/event_dim distinction has proven to be confusing in the context of distributions. It seems like it will be easier for us to provide documentation for this subtle use of param inside iarange, rather than implementing automatic machinery that is itself confusing and needing documentation.

I'll remove this from the 0.2 milestone (if you don't mind). This will give us a chance to see how users adapt to other automatic dimension handling in e.g. enumeration. Given what we learn after 0.2 release, there may be a clear path to resolving this issue in Pyro 0.3.

@fritzo fritzo added the low priority label Apr 13, 2018

@fritzo fritzo removed this from the 0.2.0 release milestone Apr 14, 2018

@ngoodman

This comment has been minimized.

Copy link
Collaborator Author

commented Apr 14, 2018

We can wait until after 0.2, but I do think this is a significant issue, both because it makes irange semantically diverge from iarange and because it messes up mean field guides.

@fritzo

This comment has been minimized.

Copy link
Collaborator

commented Oct 23, 2018

Here is a proposal for refactoring towards a more plate-friendly pyro.param statement.

I think we could modify param behavior by inspecting the cond_indep_stack. For each vectorized frame we could .expand(frame.size) at the appropriate dim. Determining the appropriate dim is tricky because param currently does not have a notion of event_dim, and we would need to add something akin to .independent() or an event_dim kwarg to param statements. For each non-vectorized frame we could do some simple autoname stuff like appending '_{}'.format(frame.counter). For consistency, we should probably also apply this autonaming to sample statements and avoid most of our current '_{}'.format(i) grossness.

To correctly handle subsampling, we would need to give pyro.param access to the subsampling tensors. This information can't live in the cond_indep_stack, but it could be looked-up in a trace via trace.nodes['_subsample_{}'.format(frame.name)]. This is a little gross, but param could simply find that trace in the _PYRO_STACK. Once this information is available, vectorized param would slice its tensors by the subsample index, and sequential param would index into subsample_index[frame.counter] to create the correct param name.

To account for global params, we could use various autonaming hacks. E.g. we could avoid expanding in vectorize_particles if the cond_indep_stack's frame.name.startswith('_pyro_'). Or we could add a kwarg pyro.param(..., global=True) to avoid the proposed .expand() behavior. @eb8680 could probably find cleaner patterns based on pyro.contrib.autoname.

@eb8680 @neerajprad @ngoodman WDYT?

@ngoodman

This comment has been minimized.

Copy link
Collaborator Author

commented Oct 23, 2018

what if we add an abstract index argument to param (and sample)? ie something along the lines of pyro.param("foo", index=[I1 I2], ...) where Ij can be either literals or abstract indices introduced by plates (assuming we go with the proposal to have plate bind a name for the index it introduces).

that is, within an iteration the indices are just strings/numbers that get appended to the name. within a plate block the index is an object that tells us which dimension to expand along. if a param is created without and index it is treated as global.

this by itself doesn't tell us what to do with subsampling, though as @eb8680 pointed out we should probably rethink that anyhow.... maybe @fritzo's suggestions is an answer?

@neerajprad

This comment has been minimized.

Copy link
Member

commented Oct 23, 2018

I like the idea of a more plate friendly param!

This information can't live in the cond_indep_stack, but it could be looked-up in a trace via trace.nodes['subsample{}'.format(frame.name)]. This is a little gross, but param could simply find that trace in the _PYRO_STACK.

It should mostly be fine, but I hope that the cost doesn't add up as we do this every time the model is run. In any case, I think we can explore other ideas of exposing this information to param.

E.g. we could avoid expanding in vectorize_particles if the cond_indep_stack's frame.name.startswith('pyro'). Or we could add a kwarg pyro.param(..., global=True) to avoid the proposed

Given our existing use cases, it might be easier to add a kwarg to plate, say global_params=Trueto denote that any params inside it do not need to be expanded. Then it will be easy for us to wrap a user model inside a plate without making any changes to the model code.

I have not thought through this, but I think this could also be extended to support iparam (e.g. by wrapping inside a plate with global_params=False) #1213, #1330.

@fritzo fritzo referenced this issue Oct 23, 2018

Closed

Add plate magic for autoname and subsampling #1491

1 of 7 tasks complete
@fritzo

This comment has been minimized.

Copy link
Collaborator

commented Oct 23, 2018

Ok, I've illustrated possible syntax changes in #1491 (based on #1486 ). The intent is to:

  1. illustrate current syntax in #1486;
  2. work towards consensus on proposed syntax changes in #1491;
  3. decide which changes we actually want to implement.
@eb8680

This comment has been minimized.

Copy link
Member

commented Oct 23, 2018

In addition to a batch_shape/event_shape distinction, it seems to me that there are at least two types of broadcasting operations we'd need to distinguish between - one that simply broadcasts and one that actually replicates the parameter tensor along a dimension to allow different values at each site. This distinction corresponds to the distinction between torch.Tensor.expand() and torch.Tensor.repeat().

For example, consider the case of SVI with vectorized num_particles and local parameters created by a param statement within a model plate. Presumably we'd want to repeat along the model plate, but only expand along the num_particles plate. It's easy enough to see that here, but what about more complicated settings, especially when a model may be used under multiple different broadcasting interpretations?

There's fundamentally ambiguity in the identity of each broadcasting dimension, and moreover expand/repeat decisions made in one setting will affect all future decisions that could be made about that parameter. Is there a design decision we could make that would remove that ambiguity? We can't simply create all parameters globally, since the existence, shape, or initial values of local parameters may depend on computation within a model. Should we only allow repeat when a parameter is first created, and assume all broadcasts thereafter are expands? That's a possibility, but it introduces a dependence on the context in which the parameter was created, which sounds like a recipe for very tricky bugs.

@martinjankowiak martinjankowiak removed this from the 0.3.0 release milestone Nov 26, 2018

@fritzo fritzo self-assigned this Mar 30, 2019

@eb8680 eb8680 closed this in #1796 Mar 30, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.