Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide access to initial values #2950

Open
jgabry opened this issue Aug 8, 2020 · 11 comments
Open

Provide access to initial values #2950

jgabry opened this issue Aug 8, 2020 · 11 comments
Labels

Comments

@jgabry
Copy link
Member

jgabry commented Aug 8, 2020

We've recently discussed how CmdStan does not have a way to provide users access to the initial values it used:

It sounds like this is actually an io/services issue because the contents of a stan::io::var_context object would need to be dumped to text. Is that right?

I don't know how difficult this would be, but to me providing access to this information seems pretty important.

The most important information to convey to users I think would be inits for parameters on the constrained scale. Transformed parameters, generated quantities, and the unconstrained scale, are potentially useful but not as high of a priority.

@bob-carpenter
Copy link
Contributor

We need to answer at least the following before building this:

  • Is this a new service argument callback or do we piggyback on the iteration writer using iteration = 0 or something like that? If the latter, what's the effect on existing interfaces? If the former, what's the callback structure (probably just like the iteration writer, but it should be stated)?

  • Do we output transformed parameters and generated quantities as part of the initialization? They're not given by the var_context, but by running write_array. I'm not sure if that happens as is, because we don't have any need for the transformed parameters or generated quantities of the initial state now. If we do add it, it's going to cause slightly different output than last iteration because the RNG will get advanced.

  • Is the output on the constrained or unconstrained scale? Presumably the former, but the feature request needs to make this clear.

I would suggest grabbing the values after the var_context is used to extract the values and the results fed through write_array. We can't read from the var_context twice because it can advance the RNG and we don't have an easy and generic way to set it back.

@jgabry
Copy link
Member Author

jgabry commented Aug 10, 2020

We need to answer at least the following before building this:

  • Is this a new service argument callback or do we piggyback on the iteration writer using iteration = 0 or something like that? If the latter, what's the effect on existing interfaces? If the former, what's the callback structure (probably just like the iteration writer, but it should be stated)?

Good question. I don't know enough of the current implementation details.

  • Do we output transformed parameters and generated quantities as part of the initialization? They're not given by the var_context, but by running write_array. I'm not sure if that happens as is, because we don't have any need for the transformed parameters or generated quantities of the initial state now. If we do add it, it's going to cause slightly different output than last iteration because the RNG will get advanced.

If we have to just do the parameters to avoid RNG issues then I think that's better than nothing. The parameters are of greatest interest here anyway.

  • Is the output on the constrained or unconstrained scale? Presumably the former, but the feature request needs to make this clear.

Good point. Yes, constrained scale. I update the initial post.

@bob-carpenter
Copy link
Contributor

For new callback vs. using existing one, you could trace down how RStan uses the iterations.

The RNG issue just means we can't call the var_context to print and then call it again for sampling because the RNG state won't match. Similarly, we can generate transformed parameters and generated quantities as long as we only do it once. Those only come out on the constrained scale.

@mitzimorris
Copy link
Member

I would suggest grabbing the values after the var_context is used to extract the values and the results fed through write_array. We can't read from the var_context twice because it can advance the RNG and we don't have an easy and generic way to set it back.

this is doable given the instantiated model and the var_context for the initial parameters - it would be very similar to how the standalone generated quantities method works.

@syclik
Copy link
Member

syclik commented Aug 18, 2020

Sorry about the late reply. There's actually an init_writer() that should have the initial values written out to it.

https://github.com/stan-dev/stan/blob/develop/src/stan/services/sample/hmc_nuts_diag_e_adapt.hpp#L51
https://github.com/stan-dev/stan/blob/develop/src/stan/services/sample/hmc_nuts_diag_e_adapt.hpp#L65

It's on the unconstrained scale. There are convenience functions to get it back to the constrained scale.

To get to the original post, @jgabry, I think it's a matter of writing it out... it's already there.

@mitzimorris
Copy link
Member

@syclik
Copy link
Member

syclik commented Aug 18, 2020

That's right. It just needs to be stored somewhere.

As a prototype, if you replaced that writer with one that writes out to std::cout, you should see the initial values. I'll try to bring up some code to show that.

@mitzimorris
Copy link
Member

I understand that it needs to be stored somewhere.
the question is where and how to label it so that it's easily interpretable.

@syclik
Copy link
Member

syclik commented Aug 19, 2020

If you replace that line linked (command.hpp L154) with this one, it'll print out right before the first iteration:

  stan::callbacks::stream_writer init_writer(std::cout);

It looks something like this...

...

Gradient evaluation took 1e-05 seconds
1000 transitions using 10 leapfrog steps per transition would take 0.1 seconds.
Adjust your expectations accordingly!


0.277565
Iteration:    1 / 2000 [  0%]  (Warmup)

where 0.277565 is the initial value. (if you ran it with init=0, then you'll see that it's exactly 0.)

I understand that it needs to be stored somewhere.
the question is where and how to label it so that it's easily interpretable.

Got it. That, I have no good solution... we could put it in as the first line of the CSV, but that would wreak havoc with an off-by-one error everywhere. We could add it as a comment, but that's not super useful. We could write it to a different file...

These all seem non-optimal.

@mitzimorris
Copy link
Member

mitzimorris commented Aug 19, 2020

These all seem non-optimal.

exactly.
which is why Aki's hack is needed (for now) - documented it in the Stan User's Guide, section 9.5.4 -
https://mc-stan.org/docs/2_24/cmdstan-guide/mcmc-config.html#initializing-parameters

@jgabry
Copy link
Member Author

jgabry commented Aug 21, 2020

Thanks for the additional info. I agree none of these are optimal, but writing it to a different file seems like the least bad of the non-optimal solutions ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants