Skip to content

malcolmw/HDF5eis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

HDF5eis Python API (read H-D-F-Size)

A solution for handling big, multidimensional timeseries data from environmental sensors in HPC applications.

HDF5eis is designed to

  1. store primitive timeseries data with any number of dimensions;
  2. store auxiliary and meta data in columnar format or as UTF-8 encoded byte streams alongside timeseries data;
  3. provide a single point of fast access to diverse data distributed across many files; and
  4. simultaneously leverage existing technology and minimize external dependencies.
import hdf5eis

with hdf5eis.File("demo.hdf5", mode="w") as demo_file:
    # Add some random multidimensional timeseries data to the demo.hdf5 file.
    first_sample_time = "2022-01-01T00:00:00Z"
    sampling_rate = 100
    demo_file.timeseries.add(
        np.random.rand(32, 16, 8, 16, 32, 1000),
        first_sample_time, 
        sampling_rate,
        tag="random"
    )
    
    # Data can be efficiently retrieved using hybrid dictionary (with regular expression parsing)
    # and array metaphors.
    start_time, end_time = "2022-01-01T00:01:00Z", "2022-01-01T00:02:00Z"
    sliced_data = demo_file.timeseries["rand*", 8:12, ..., 0, start_time: end_time]

Installation

>$ pip install hdf5eis

Credit

Please cite White et al. (2023) if you make use of HDF5eis in published work.

White, M.C.A., Zhang, Z., Bai, T., Qiu, H., Chang, H., and Nakata, N., 2023, HDF5eis: A storage and input/output solution for big multidimensional time series data from environmental sensors: GEOPHYSICS, v. 88, p. F29–F38, doi:10.1190/geo2022-0448.1.

About

A solution for handling big, multidimensional timeseries data from environmental sensors in HPC applications.

Resources

License

Stars

Watchers

Forks

Packages

No packages published