Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

H5Awrite for strings has premature null termination #132

Closed
LTLA opened this issue Nov 28, 2023 · 0 comments
Closed

H5Awrite for strings has premature null termination #132

LTLA opened this issue Nov 28, 2023 · 0 comments

Comments

@LTLA
Copy link
Contributor

LTLA commented Nov 28, 2023

library(rhdf5)
tmp <- tempfile(fileext=".h5")
fhandle <- H5Fcreate(tmp, "H5F_ACC_TRUNC")
ghandle <- H5Gcreate(fhandle, "whee")

tid <- H5Tcopy("H5T_C_S1")
H5Tset_strpad(tid, strpad = "NULLPAD")
H5Tset_size(tid, 5L) # size of 5 bytes

ahandle <- H5Acreate(ghandle, "name", dtype_id=tid, h5space=H5Screate("H5S_SCALAR"))
H5Awrite(ahandle, "Aaron") # string of length 5

H5Aclose(ahandle)
H5Gclose(ghandle)
H5Fclose(fhandle)

One would expect my name to fit inside the attribute, but alas:

h5readAttributes(tmp, "whee")
## $name
## [1] "Aaro"

This seems to be caused by

rhdf5/src/H5A.c

Line 457 in cb102ba

for (j=0; (j < LENGTH(STRING_ELT(_buf,i))) & (j < (stsize-1)); j++) {

where the loop stops prematurely because of the j < (stsize-1) condition. The fix is probably quite simple; just make this j < stsize instead, which would cause the entire string to be written.

Incidentally, datasets do the right thing, so I don't see why attributes have this weird behavior.

library(rhdf5)
tmp <- tempfile(fileext=".h5")
fhandle <- H5Fcreate(tmp, "H5F_ACC_TRUNC")

tid <- H5Tcopy("H5T_C_S1")
H5Tset_strpad(tid, strpad = "NULLPAD")
H5Tset_size(tid, 5L) # size of 5 bytes

dhandle <- H5Dcreate(fhandle, "name", dtype_id=tid, h5space=H5Screate("H5S_SCALAR"))
H5Dwrite(dhandle, "Aaron") # string of length 5

H5Dclose(dhandle)
H5Fclose(fhandle)

h5read(tmp, "name")
## [1] "Aaron"
Session information
R Under development (unstable) (2023-11-10 r85507)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 20.04.6 LTS

Matrix products: default
BLAS:   /home/luna/Software/R/trunk/lib/libRblas.so 
LAPACK: /home/luna/Software/R/trunk/lib/libRlapack.so;  LAPACK version 3.11.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

time zone: America/Los_Angeles
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] rhdf5_2.47.0

loaded via a namespace (and not attached):
[1] compiler_4.4.0      rhdf5filters_1.15.1 Rhdf5lib_1.25.0    
grimbough added a commit that referenced this issue Nov 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant