Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CmdStan checks for file suffix .csv but should be checking for suffix .R #1076

Closed
mitzimorris opened this issue Feb 2, 2022 · 10 comments
Closed
Labels

Comments

@mitzimorris
Copy link
Member

mitzimorris commented Feb 2, 2022

Summary:

The introduction of single-process multi-chain parallel threads introduced a significant bug into CmdStan because instead of checking for input files with suffix .R the code checks for input files with suffix .csv.

} else if (file_ending == ".csv") {

We don't have input routines that can parse CSV files into a var_context object, only parsers for rdump and json.

Description:

reported on Discourse by Rob Goedman: https://discourse.mc-stan.org/t/file-ending-of-r-is-not-supported-by-cmdstan/26134

Reproducible Steps:

Run CmdStan sample method with num_chains greater than 1 and specify init file in Rdump format.

Additional Information:

  • either change .csv to .R or omit check and just throw file into the rdump reader.
  • add a unit test

Current Version:

v2.29.0

@mitzimorris mitzimorris added the bug label Feb 2, 2022
@WardBrian WardBrian mentioned this issue Feb 3, 2022
26 tasks
@mitzimorris
Copy link
Member Author

Rdump files can have any suffix - FWIW

@mitzimorris
Copy link
Member Author

reverting to old way will definitely work. checking to see if there's a way to catch the rdump reader error.

@mitzimorris
Copy link
Member Author

so-called "Rdump" format really is an R file which consists of variable definitions for Stan program variables.

it's not clear whether or not the rdump reader would do if it found other kinds of statements.
the rdump reader is used for data inputs as well as parameter and metric inits. getting rid of this input format will require a lot of planning and buy-in from the R user base.

@WardBrian
Copy link
Member

It appears we went nearly 10 months with the incorrect check in place before someone noticed and reported it, so I’m not sure if it’s being used all that much. I don’t think cmdstanr uses it over JSON, so it would most likely be people manually preparing files who would use it?

@rok-cesnovar
Copy link
Member

I don’t think cmdstanr uses it over JSON

That is correct. Parsing rdump in CmdStan is a few times slower than JSON, so we never use rdump. Its definitely a niche thing at this point and the only users that faced this issue are those that used CmdStan directly with num_chains > 1. But as long as we havent deprecated the format, we have to support it.

@WardBrian
Copy link
Member

@rok-cesnovar does rdump support complex numbers?

@mitzimorris
Copy link
Member Author

definitely not. it's an ad-hoc parser for a very small subset of R statements.
https://github.com/stan-dev/stan/blob/develop/src/stan/io/dump.hpp

@WardBrian
Copy link
Member

It seems like the answer is actually yes. It works the same as JSON where a complex value is just like a 2-long array:

data {
  complex z;
}
z <- c(1,2)

I'm sure this has all sorts of edge cases, though

@mitzimorris
Copy link
Member Author

cool! and of course, given that encoding, it should work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants