### NClimGrid Daily NetCDF Kerchunk Indexing Status:
After taking some time to start an understanding of chunksizes in NetCDF, Zarr, and Dask, and how they relate to each other when using xarray to generate Zarr metadata, I have a better understanding of the problem I'm encountering. In its current form, Kerchunk is unable to combine the Zarr metadata it forms from NetCDF files containing data from months with differing numbers of days, e.g., June (30) and July (31). 
- The NetCDF chunksizes, which Kerchunk needs to describe (using Zarr metadata generated by xarray), are different:
    - June NetCDF chunksize = [1, 147, 343]
    - July NetCDF chunksize = [1, 144, 337]
- Zarr, however, expects all chunks within a Zarr array to have the same shape. Even though Kerchunk isn't creating a Zarr array, it still needs to conform to Zarr's requirements.
- The desire to combine data with different sizes is noted in Kerchunk issue [#85](https://github.com/fsspec/kerchunk/issues/85) and listed as a planned improvement in [issue #106](https://github.com/fsspec/kerchunk/issues/106).

### Paths Forward
- Kerchunk each monthly file for each variable (tmin, tmax, tavg, prcp)
- COG (or Kerchunk) each day for each variable
    - 70 years of data will produce over 100,000 files
- Combine daily data into a single Zarr store for each variable
- Wait for and/or contribute to the necessary enhancements required to combined Kerchunk reference data having different native NetCDF chunksizes.

To me, combining to a single Zarr store seems to fit best in a short timeframe.

In [7]:
import json
import fsspec
import kerchunk.hdf
import kerchunk.combine
import xarray as xr

def make_full_path(variable, year, month):
    return f'https://nclimgridwesteurope.blob.core.windows.net/nclimgrid/nclimgrid-daily/beta/by-month/{year}/{month:02d}/{variable}-{year}{month:02d}-grd-scaled.nc'

In [12]:
# Works: Combine Kerchunk reference data from months with same number of days

# Get month urls
urls = [make_full_path('prcp', year, month) for year in range(1970, 1971) for month in range(7, 9)]

# Kerchunk the reference data
translated = []
for url in urls:
    with fsspec.open(url) as fobj:
        h5chunks = kerchunk.hdf.SingleHdf5ToZarr(fobj, url)
        translated.append(h5chunks.translate())

# Concatenate along time
mzz = kerchunk.combine.MultiZarrToZarr(
    translated,
    remote_protocol='https',
    xarray_concat_args={
        "dim": "time"
    }
)

# Convert and export to json
output_file = f'daily-prcp.json'
mzz.translate(output_file)

In [14]:
# Does not work: Combine Kerchunk reference data from months with different number of days

# Get month urls
urls = [make_full_path('prcp', year, month) for year in range(1970, 1971) for month in range(6, 8)]

# Kerchunk the reference data
translated = []
for url in urls:
    with fsspec.open(url) as fobj:
        h5chunks = kerchunk.hdf.SingleHdf5ToZarr(fobj, url)
        translated.append(h5chunks.translate())

# Concatenate along time
mzz = kerchunk.combine.MultiZarrToZarr(
    translated,
    remote_protocol='https',
    xarray_concat_args={
        "dim": "time"
    }
)

# Convert and export to json
output_file = f'daily-prcp.json'
mzz.translate(output_file)

NotImplementedError: Specified zarr chunks encoding['chunks']=(1, 147, 343) for variable named 'prcp' would overlap multiple dask chunks ((1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1), (144, 3, 141, 6, 138, 9, 135, 12, 8), (337, 6, 331, 12, 325, 18, 319, 24, 13)). Writing this array in parallel with dask could lead to corrupted data. Consider either rechunking using `chunk()`, deleting or modifying `encoding['chunks']`, or specify `safe_chunks=False`.