Skip to content

Commit

Permalink
H5SpecWriter: Store the cached specification as text in UTF-8 encoding (
Browse files Browse the repository at this point in the history
#62)

We currently store the specification in ASCII format.

Inspecting the file with h5dump -pH basic_example.nwb | grep -A 40 specifications gives

  GROUP "specifications" {
     GROUP "core" {
        GROUP "2.0.0b" {
           DATASET "namespace" {
              DATATYPE  H5T_STRING {
                 STRSIZE H5T_VARIABLE;
                 STRPAD H5T_STR_NULLTERM;
                 CSET H5T_CSET_ASCII;
                 CTYPE H5T_C_S1;
              }
              DATASPACE  SCALAR
              STORAGE_LAYOUT {
                 CONTIGUOUS
                 SIZE 16
                 OFFSET 3768
              }
              FILTERS {
                 NONE
              }
              FILLVALUE {
                 FILL_TIME H5D_FILL_TIME_ALLOC
                 VALUE  NULL
              }
              ALLOCATION_TIME {
                 H5D_ALLOC_TIME_LATE
              }
           }
           DATASET "nwb.base" {
              DATATYPE  H5T_STRING {
                 STRSIZE H5T_VARIABLE;
                 STRPAD H5T_STR_NULLTERM;
                 CSET H5T_CSET_ASCII;
                 CTYPE H5T_C_S1;
              }
              DATASPACE  SCALAR
              STORAGE_LAYOUT {
                 CONTIGUOUS
                 SIZE 16
                 OFFSET 3592
              }
              FILTERS {

The CSET entry denotes the encoding.

This does not play well with future users who write extensions and future
code which will store the specification by default.

Switch over to UTF-8 encoding using the text type from six, see also
http://docs.h5py.org/en/latest/strings.html#variable-length-utf-8.
  • Loading branch information
t-b authored and rly committed Jun 11, 2019
1 parent 8ddc857 commit 812edbf
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion src/hdmf/backends/hdf5/h5_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -125,7 +125,7 @@ def dtype(self):

class H5SpecWriter(SpecWriter):

__str_type = special_dtype(vlen=binary_type)
__str_type = special_dtype(vlen=text_type)

@docval({'name': 'group', 'type': Group, 'doc': 'the HDF5 file to write specs to'})
def __init__(self, **kwargs):
Expand Down

0 comments on commit 812edbf

Please sign in to comment.