Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Warning when storing NA (integer value -2^63 replaced NA) #61

Closed
jonocarroll opened this issue May 27, 2020 · 4 comments
Closed

Warning when storing NA (integer value -2^63 replaced NA) #61

jonocarroll opened this issue May 27, 2020 · 4 comments
Assignees

Comments

@jonocarroll
Copy link

I'd like to clarify the warning I get when storing NA_integer_ ...

library(rhdf5)
m <- matrix(c(0L, 1L, NA_integer_, 0L, 1L, NA_integer_), nrow=2)
h5write(m, "test.h5", "M1")
h5read("test.h5", "M1")
#      [,1] [,2] [,3]
# [1,]    0   NA    1
# [2,]    1    0   NA
# Warning message:
# In H5Dread(h5dataset = h5dataset, h5spaceFile = h5spaceFile, h5spaceMem = h5spaceMem,  :
#   integer value -2^63 replaced NA. See the section 'Large integer data types' in the 'rhdf5' vignette for more details.

(rhdf5 2.30.1)

I see #58 has this (the issue there is more severe) and another discussion in #42 but is this still an expected warning? I spent quite a while trying to figure out why I apparently had large negative ints in my input data when I really only had some NA.

Is it possible to identify when NA is being used and avoid this warning? I couldn't immediately find where this was documented (if it is). The aforementioned section does not seem to appear in this vignette https://bioconductor.org/packages/release/bioc/vignettes/rhdf5/inst/doc/rhdf5.html

@grimbough grimbough self-assigned this May 29, 2020
@grimbough
Copy link
Owner

Thanks for the report. I've set aside the next couple of days to look at the outstanding rhdf5 issues, hopefully I'll get round to addressing this by the end of the week.

@grimbough
Copy link
Owner

This message was intended to warn someone that had created an HDF5 file outside R that any instance of the "smallest integer" had been replaced by NA in the resulting R object. It looks like for 64-bit integers there's a typo and it should read "integer value -2^63 replaced by NA" - which might have made the intention a little clearer.

The side effect to this is that any R object that contains NA and is written the HDF5 will then trigger the warning when read back, because the original NA values will be stored as "smallest int" in the file. I guess this probably happens at least as frequently as someone reading a file generated outside R.

The warning is annoying, but if you're writing and reading things contains NA with rhdf5 they should be preserved despite the message.

I propose to add an attribute to anything written with rhdf5 containing NA values and use this to ignore the warning. Then it should only show up for someone encountering the original usecase.

@grimbough
Copy link
Owner

As of rhdf5 v. 2.33.3 you shouldn't get this warning if the original file was created with rhdf5.

library(rhdf5)
m <- matrix(c(0L, 1L, NA_integer_, 0L, 1L, NA_integer_), nrow=2)
file <- tempfile(fileext = '.h5')
h5write(m, file, "M1")
h5read(file, "M1")
#>      [,1] [,2] [,3]
#> [1,]    0   NA    1
#> [2,]    1    0   NA

For a dataset not generated with rhdf5 the information is still printed, but downgraded to a message since there's nothing a user can do about R using those values to represent NA.

## This code removes the 'rhdf5-NA.OK' attribute to simulate data not written by rhdf5
fid <- H5Fopen(name = h5File)
did <- H5Dopen(fid, name = "M1")
H5Adelete(did, "rhdf5-NA.OK")
H5Dclose(did)
H5Fclose(fid)

h5read(file, "M1")
#> The value -2^31 was detected in the dataset.
#> This has been converted to NA within R.
#>      [,1] [,2] [,3]
#> [1,]    0   NA    1
#> [2,]    1    0   NA

@jonocarroll
Copy link
Author

Confirmed resolved in 2.33.7 - thank you!!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants