Skip to content

move chunk grid off metadata#6

Merged
maxrjones merged 1 commit intomaxrjones:poc/unified-chunk-gridfrom
d-v-b:poc/unified-chunk-grid
Mar 20, 2026
Merged

move chunk grid off metadata#6
maxrjones merged 1 commit intomaxrjones:poc/unified-chunk-gridfrom
d-v-b:poc/unified-chunk-grid

Conversation

@d-v-b
Copy link

@d-v-b d-v-b commented Mar 19, 2026

This PR moves the chunk grid object off the array metadata objects and on to the AsyncArray. ArrayV3Metadata.chunk_grid is now a plain dataclass that serves to model the contents of array metadata.

The logic for this change is that we want to keep the array metadata classes scoped to modelling the contents of array metadata. Giving these classes methods for array indexing complicates or even interferes with roles as models for array metadata documents.

TODO:

  • Add unit tests and/or doctests in docstrings
  • Add docstrings and API docs for any new/modified user-facing classes and functions
  • New/modified features documented in docs/user-guide/*.md
  • Changes documented as a new file in changes/
  • GitHub Actions have all passed
  • Test coverage is 100% (Codecov passes)

shape: tuple[int, ...],
dtype: ZDType[TBaseDType, TBaseScalar],
chunk_grid: ChunkGrid,
chunk_grid: ChunkGridMetadata,
Copy link
Author

@d-v-b d-v-b Mar 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this change will affect downstream implementations of the codec pipeline. I think that's just zarrs-python at this time.

array_config: ArrayConfig,
prototype: BufferPrototype,
) -> ArraySpec:
"""Build an ArraySpec for a single chunk using the behavioral ChunkGrid."""
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

claude is calling this a "behavioral chunkgrid" which is not my preferred phrasing. We would probably benefit from a different name to distinguish this python object from the chunk_grid field in array metadata

@d-v-b
Copy link
Author

d-v-b commented Mar 19, 2026

one thing that I struggled with is explaining to myself why the ChunkGrid doesn't belong on the array metadata class. We start with the fact that ArrayV3Metadata is a model of the array metadata document, and the Array class is where the array metadata, storage, and runtime config come together. The ChunkGrid only depends on array metadata, and it doesn't depend on storage or the config. So why shouldn't the ChunkGrid exist on array metadata?

The reason for this is API we have not built yet: when we implement lazy slicing, we can use the chunk grid to model the set of chunks that support a sliced array. This leads to the following set of layers:

  1. array schema:
    components: array metadata document
    describes an abstract Zarr array, conveys the basic chunk structure

  2. indexed array:
    components: array schema + indexing operation
    describes a subset of an abstract array.
    can be concatenated with other indexed arrays, even if they have different schemas, to form another indexed array.
    the chunk grid object lives at this level

  3. IO Array (better name needed):
    components: index array + storage backend
    describes an indexed array that can drive IO operations against an external stored representation
    the zarr.core.array.AsyncArray object sits at this level

@d-v-b
Copy link
Author

d-v-b commented Mar 19, 2026

in zarr-python today, we have only formalized levels 1 and 3. The middle level is missing, but that's where the chunkgrid should sit

@maxrjones
Copy link
Owner

Thanks @d-v-b!

@maxrjones maxrjones merged commit 67e540c into maxrjones:poc/unified-chunk-grid Mar 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants