Skip to content
H. Joe Lee edited this page Jan 13, 2023 · 20 revisions

Kerchunk Study

Table of Contents

Introduction

Kerchunk

DMR++

Kerchunk to DMR++

DMR++ to Kerchunk

Introduction

This Wiki is for documenting the Kerchunk Study. We studied the Kerchunk using a few sample NASA Earthdata HDF5 files. We also studied the feasibility of converting Kerchunk to and from OPeNDAP Hyrax DMR++.

Kerchunk

Kerchunk is a derived work based on Hypertext Transfer Protocol (HTTP/1.1): Range Requests (2014), Utilizing HDF4 File Content Maps for the Cloud Computing (2016), and Cloudydap (2017).

Thus, it has many similarities with OPeNDAP Hyrax DMR++ that the Cloudydap project produced. Both rely on HDF5 API calls to get offset/length information. Kerchunk obtains such information through high-level h5py Python calls. DMR++ obtains the same information by calling HDF5 C API directly.

Although the basic idea is same, there are a few differences between them. The following table summarizes key differences.

Workflow Kerchunk DMR++
source HDF5/fits/grib2/netCDF3/tiff HDF5
language Python C/C++
schema zarr v2 dmrpp/1.0.0
output json xml
aggregation fsspec+multizarr API NcML
inline threshold Yes No
subchunking Yes for netCDF-3 No
IPFS Yes No

The Kerchunk development is still active. Thus, it has some issues with NASA HDF5/netCDF-4 data products. See Kerchunk for the details.

Reading NASA data through (nc)Zarr/xarray also has some interoperability issues. For example, an xarray-based DataTree reports an error in reading a NASA HDF5 data product. See DataTree for the details.

Unidata NCZarr can't read Kerchunk file. NCZarr also has other issues in handling Zarr files. See NCZarr for the details.

We have tested the above software packages to cross-examine if Kerchunk missed any information from the original HDF5.

DMR++

The dmrpp_module can serve NASA HDF5 data products robustly. However, pydap client has some issues. See DMR++ for the details.

OPeNDAP Hyrax provides a few options that can affect DMR++ output. One important option is EnableCF. See DMR++-CF for the details.

DMR++ to Kerchunk

We studied the feasibility of converting Kerchunk to DMR++. See DMR++ to Kerchunk for the details.

Kerchunk to DMR++

We also studied the feasibility of converting Kerchunk to DMR++. See Kerchunk to DMR++ for the details.

Clone this wiki locally