Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rng_seed appears to no effect in map2stan() #47

Open
kmiddleton opened this issue Aug 30, 2016 · 5 comments
Open

rng_seed appears to no effect in map2stan() #47

kmiddleton opened this issue Aug 30, 2016 · 5 comments
Labels

Comments

@kmiddleton
Copy link

Setting rng_seed does not appear to ensure repeatable analyses with map2stan(). For example, a simple model from Rethinking:

library(rethinking)
data("Howell1")

d <- Howell1
d2 <- d[ d$age >= 18 , ]

m1 <- map2stan(
  alist(
    height ~ dnorm(mu, sigma),
    mu ~ dnorm(178, 20),
    sigma ~ dunif(0, 50)),
  data = d2,
  WAIC = FALSE,
  seed = 8
)

m2 <- map2stan(
  alist(
    height ~ dnorm(mu, sigma),
    mu ~ dnorm(178, 20),
    sigma ~ dunif(0, 50)),
  data = d2,
  WAIC = FALSE,
  seed = 8
)

m1
m2

Produce slightly different results.

> m1
map2stan model fit
1000 samples from 1 chain

Formula:
height ~ dnorm(mu, sigma)
mu ~ dnorm(178, 20)
sigma ~ dunif(0, 50)

Log-likelihood at expected values: -1219.41 
Deviance: 2438.83 
DIC: 2442.6 
Effective number of parameters (pD): 1.89 

> m2
map2stan model fit
1000 samples from 1 chain

Formula:
height ~ dnorm(mu, sigma)
mu ~ dnorm(178, 20)
sigma ~ dunif(0, 50)

Log-likelihood at expected values: -1219.41 
Deviance: 2438.82 
DIC: 2442.72 
Effective number of parameters (pD): 1.95

all.equal(m1, m2) show lots of differences, which seem to be fundamental rather than related to sampling (e.g., timing).:

 [1] "Attributes: < Component “coef”: Mean relative difference: 0.0001766267 >"                                                                                                                                                                                                                         
 [2] "Attributes: < Component “deviance”: Mean relative difference: 3.669022e-06 >"                                                                                                                                                                                                                     
 [3] "Attributes: < Component “DIC”: Mean relative difference: 6.030028e-05 >"                                                                                                                                                                                                                          
 [4] "Attributes: < Component “pD”: Mean relative difference: 0.03561425 >"       
...
[54] "Attributes: < Component “start”: Component 1: Component 1: Mean relative difference: 0.2771743 >"                                                                                                                                                                                                 
[55] "Attributes: < Component “start”: Component 1: Component 2: Mean relative difference: 0.1926305 >"                                                                                                                                                                                                 
[56] "Attributes: < Component “vcov”: Mean relative difference: 0.03782353 >"                                                                                                                                                                                                                      

These differences are admittedly small, but my thinking is that the models should be identical except for timing differences (the case when using seed = is a pure stan() call).

@rmcelreath
Copy link
Owner

Yeah, this is a mysterious issue. The map2stan argument is rng_seed. It's passed to stan as seed but seems not to have an effect. No idea why, since passing a value in a raw stan call works. But I can't yet figure out a way that my code is the issue. You can even check the internal stanfit object and see that stan is saving the passed seed. It just doesn't appear to use it?

Maybe something will occur to me. In the meantime, just use set.seed before the map2stan call.

@kmiddleton
Copy link
Author

It might be the init parameter in lines 1356 and 1358 of map2stan.R. The comment in this SO question suggests that seed only covers the RNG from the Boost library, and that init draws separately.

set.seed() coupled with set_rng does appear to work:

library(rethinking)
data("Howell1")

d <- Howell1
d2 <- d[ d$age >= 18 , ]

set.seed(5)

m1 <- map2stan(
  alist(
    height ~ dnorm(mu, sigma),
    mu ~ dnorm(178, 20),
    sigma ~ dunif(0, 50)),
  data = d2,
  WAIC = FALSE,
  rng_seed = 8
)

set.seed(5)

m2 <- map2stan(
  alist(
    height ~ dnorm(mu, sigma),
    mu ~ dnorm(178, 20),
    sigma ~ dunif(0, 50)),
  data = d2,
  WAIC = FALSE,
  rng_seed = 8
)

m1
m2

all.equal(m1, m2)

The only differences are in the date, time, dso, and model_cpp.

@jgabry
Copy link

jgabry commented Aug 30, 2016

On Tuesday, August 30, 2016, Kevin Middleton notifications@github.com
wrote:

The comment in this SO question
http://stackoverflow.com/questions/27204096/supplying-seed-to-stan-doesnt-guarantee-the-same-chains
suggests that seed only covers the RNG from the Boost library, and that
init draws separately.

Yes, that's correct. If any initial value generation is being done in R
(whether by you or by passing a function to init) then you'll have to set
both the R seed and the seed for the Boost library.

@rmcelreath
Copy link
Owner

Thanks. Sounds like an easy fix, with this information. Will try to roll it into the next push.

@rmcelreath rmcelreath added the bug label Aug 31, 2016
@rmcelreath
Copy link
Owner

This is fixed in experimental branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants