-
-
Notifications
You must be signed in to change notification settings - Fork 265
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support I/O of NaN, +inf, -inf values #67
Comments
I'm happy to copy/translate RStan's tests -- do they exist for this? |
I'm afraid not --- RStan still doesn't support I/O for NaN or inf. So I'm not sure what you'd need to do to get it into Python. It's not a huge issue, because rarely is data or inits going to It looks like Python lets you check if a variable is defined, but
On May 5, 2014, at 7:29 PM, Allen Riddell notifications@github.com wrote:
|
I think we need a more general thinking about special values in Stan. I can't see what sense it makes to have Inf as input, but maybe as output. For missing values, it is often useful to distinguish between a value that could exist but happens to be missing vs. a value that is observed to be undefined or inapplicable. With mi, we use NA for the former and NaN for the latter. Another option would be to do what Stata does (which may be wise if there is ever a Stata interface) and reserve the last 27 values that are supported for that numeric type for various missingness codes, which no one should be using anyway because they will likely overflow. See http://www.stata.com/help.cgi?dta_115#numbers |
On May 6, 2014, at 1:05 AM, bgoodri notifications@github.com wrote:
I don't see much use for it yet. It could be used
Absolutely. I get what R does with NA.
Yes, we're going to do a Stata interface for IES.
That's an interesting approach. I really don't like messing
|
Technically, you could exp(-Inf) to get 0 in transformed data but why would Another option would be to distinguish between quiet and signaling NaN. The Stata approach (in addition to being compatible with Stata) doesn't If we supported sparse Eigen::Matrix, we could do that thing I suggested at I don't think ignoring NA altogether is a viable option because missingness |
On May 6, 2014, at 2:40 PM, bgoodri notifications@github.com wrote:
I'd rather go with some random high valued finite values a la Stata than I know it's not the R or BUGS/JAGS way.
We could perhaps use a signaling NaN if they're implemented on all of http://en.wikipedia.org/wiki/NaN#Signaling_NaN I haven't tried playing with clang++ on the Mac to swee what happens with
I'd think we'd want to separate two data structures, sparse matrices, which
R and BUGS do, but obviously not "all" stat software does! Let's discuss this with more bandwidth at the next Stan meeting
|
Summarizing my view, here is basically all you can do with Stan currently
This behavior is bad in several respects. First, if X were accidentally used in the model block, Stan has no way to catch this mistake because the elements of X where R == 0 have some arbitrary but meaningless numeric value that Stan I/O accepts. Second, it is tedious and error prone to do all that fiddling with R. I still think there was no consensus as to whether it would be a good idea for Stan to have special values for missing or undefined, but I think that if these existed, then it would then be possible to do stuff more easily with Stan. So, let's suppose missing is a quiet NaN and undefined is a signaling NaN. You could introduce some syntax like
I suppose you only write the unknown elements of X_complete to the .csv file and permit something like
if the user actually wants to store tons of matrices that mix observed and estimated elements. That only works for continuous things. For discrete objects, counts are a fundamental problem. But for categoricals, I don't think it would be so hard to change the behavior of the PMFs to marginalize over the missing values. For example, bernoulli_logit would do one thing if y == 1, another if y == 0, and mix those if y == quiet_NaN. That's why I think that some progress can be made if Stan would just do what every other statistical software does and have a way to represent missingness so that the appropriate functions could recognize it. |
It's been a couple years since the last message in this thread--any updates on allowing missing data (e.g., via imputation)? Missing data are quite common in data analysis. Having the ability to handle missing data would make Stan much more flexible. |
Little has changed. With R-style indexing, it takes a bit less typing to do On Sun, May 15, 2016 at 12:01 AM, dadrivr notifications@github.com wrote:
|
Thanks for the update! Hope that handling missing data is a consideration for the future. I love Stan and would love for it to be able to handle that data that I work with! Thanks for all of your work on the project. |
It's not impossible to code up missing data models in Stan
|
This is on the border between bug and feature. We should be reading in NaN and +inf and -inf values through all our interfaces. I think the output's OK now, but we can't input NaN or inf now. Or if we can, it's not clear to me or our users how to do it.
The corresponding issue for CmdStan: stan-dev/stan#637
I'm not sure what the status is of PyStan on this.
The text was updated successfully, but these errors were encountered: