Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Control precision of sampling ASCII output #2515

Closed
bob-carpenter opened this issue Apr 22, 2018 · 7 comments
Closed

Control precision of sampling ASCII output #2515

bob-carpenter opened this issue Apr 22, 2018 · 7 comments

Comments

@bob-carpenter
Copy link
Contributor

From @aaronjg on April 20, 2018 20:19

Summary:

When writing out data to the sample file, precision is lost. The data should be the same when writing out and read back in using read_stan_csv as when using rstan.

Description:

When rstan writes to a file it only keeps the first 6 decimal places of precision, this causes the sample file to differ from what is stored in the rstan object.

Reproducible Steps:

stan.model <- compile_model("bernoulli.stan")
source(bernoulli.data.R)
out <- sampling(stan.model,chains=1,sample_file="out.csv",seed=1)

Current Output:

extract(foo,permute=FALSE,inc_warmup=TRUE)[1:10,,]

          parameters
iterations      theta      lp__
      [1,] 0.10831226 -7.699964
      [2,] 0.10831226 -7.699964
      [3,] 0.10831226 -7.699964
      [4,] 0.10719339 -7.719830
      [5,] 0.08838579 -8.110978
      [6,] 0.23460585 -6.755824
      [7,] 0.22917849 -6.762448
      [8,] 0.17383842 -6.967571
      [9,] 0.17383842 -6.967571
     [10,] 0.19948957 -6.838531

head -n 35 out.csv| tail

lp__,accept_stat__,stepsize__,treedepth__,n_leapfrog__,divergent__,energy__,theta
-7.69996,0.821754,1,2,3,0,7.95282,0.108312
-7.69996,1.87223e-146,10.4034,1,1,0,9.9209,0.108312
-7.69996,0.0719279,1.59718,1,1,0,7.70735,0.108312
-7.71983,0.999813,0.180632,1,1,0,7.72391,0.107193
-8.11098,0.997778,0.23924,3,7,0,8.11104,0.0883858
-6.75582,0.995139,0.366777,2,5,0,8.47479,0.234606
-6.76245,0.978753,0.609755,2,3,0,6.91575,0.229178
-6.96757,0.901618,1.01542,1,1,0,6.9752,0.173838
-6.96757,0.00030679,1.36694,1,1,0,7.87545,0.173838

Expected Output:

If applicable, the output you expected from RStan.

File written should have the same values as the extract command.

RStan Version:

Compiled from: 4706b82028a7fc3a31cbdf6c60beed4c49233562

R Version:

"R version 3.4.4 (2018-03-15)"

Operating System:

Your operating system (e.g., OS X 10.11.3)
Ubuntu 14.04

Copied from original issue: stan-dev/rstan#518

@bob-carpenter
Copy link
Contributor Author

From @bgoodri on April 20, 2018 20:25

This is a Stan thing rather than a RStan one, and I believe it is
intentional.

On Fri, Apr 20, 2018 at 4:19 PM, aaronjg notifications@github.com wrote:

Summary:

When writing out data to the sample file, precision is lost. The data
should be the same when writing out and read back in using read_stan_csv as
when using rstan.
Description:

When rstan writes to a file it only keeps the first 6 decimal places of
precision, this causes the sample file to differ from what is stored in the
rstan object.
Reproducible Steps:

stan.model <- compile_model("bernoulli.stan")
source(bernoulli.data.R)
out <- sampling(stan.model,chains=1,sample_file="out.csv",seed=1)
Current Output:

extract(foo,permute=FALSE,inc_warmup=TRUE)[1:10,,]

      parameters

iterations theta lp__
[1,] 0.10831226 -7.699964
[2,] 0.10831226 -7.699964
[3,] 0.10831226 -7.699964
[4,] 0.10719339 -7.719830
[5,] 0.08838579 -8.110978
[6,] 0.23460585 -6.755824
[7,] 0.22917849 -6.762448
[8,] 0.17383842 -6.967571
[9,] 0.17383842 -6.967571
[10,] 0.19948957 -6.838531

head -n 35 out.csv| tail

lp__,accept_stat__,stepsize__,treedepth__,n_leapfrog__,divergent__,energy__,theta
-7.69996,0.821754,1,2,3,0,7.95282,0.108312
-7.69996,1.87223e-146,10.4034,1,1,0,9.9209,0.108312
-7.69996,0.0719279,1.59718,1,1,0,7.70735,0.108312
-7.71983,0.999813,0.180632,1,1,0,7.72391,0.107193
-8.11098,0.997778,0.23924,3,7,0,8.11104,0.0883858
-6.75582,0.995139,0.366777,2,5,0,8.47479,0.234606
-6.76245,0.978753,0.609755,2,3,0,6.91575,0.229178
-6.96757,0.901618,1.01542,1,1,0,6.9752,0.173838
-6.96757,0.00030679,1.36694,1,1,0,7.87545,0.173838

Expected Output:

If applicable, the output you expected from RStan.

File written should have the same values as the extract command.
RStan Version:

Compiled from: 4706b82
stan-dev/rstan@4706b82
R Version:

"R version 3.4.4 (2018-03-15)"
Operating System:

Your operating system (e.g., OS X 10.11.3)
Ubuntu 14.04


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
stan-dev/rstan#518, or mute the thread
https://github.com/notifications/unsubscribe-auth/ADOrqiK4jPoJYxqxM7Ic1cVL2mAkXCXlks5tqkK4gaJpZM4TeCT5
.

@bob-carpenter
Copy link
Contributor Author

We could double file size and clog up I/O for that extra precision, but most computations don't have much more than the residual precision we provide left over. So even though you get about 16 digits of precision in floating point, after sampling, it's usually not that accurate.

Ideally, we'd have a feature to control the precison.

@bob-carpenter
Copy link
Contributor Author

I'm going to move this to being a Stan feature request. My guess is that we'll wind up providing a binary output format before fixing it, though you never know. It should be easy to extend precision, just a matter of how to control it in the calls.

@bob-carpenter bob-carpenter changed the title Precision Differs RStan and sample file Control precision of sampling ASCII output Apr 22, 2018
@bob-carpenter bob-carpenter added this to the v3 milestone Apr 22, 2018
@aaronjg
Copy link
Contributor

aaronjg commented Apr 22, 2018

I don't particularly expect the extra precision to add much to the inference. However, as I was moving from keeping the results in memory to streaming to a file and loading them back in, I was expecting identical results and had some tests fail because of it. Having a binary output format seems ideal.

@jgabry
Copy link
Member

jgabry commented Apr 22, 2018

Yeah this would be nice but at least it should be deterministic currently, so a tolerance level for the tests will work reliably.

If it’s not documented anywhere we should do that too.

@bob-carpenter
Copy link
Contributor Author

bob-carpenter commented Apr 23, 2018 via email

@rok-cesnovar
Copy link
Member

This was added to cmdstan via the sig_figs argument for 2.25. Closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants