Add a fast path that doesn't include normalized chunks in tokenize #11017

jsignell · 2025-12-15T14:53:41Z

Closes opening a zarr dataset taking so much time with dask #8902
~~Tests added~~
~~User visible changes (including notable bug fixes) are documented in whats-new.rst~~

The idea in this PR is to include a fast path for open_dataset that just uses the token that is passed into _maybe_chunk and doesn't worry about including chunks within the token.

Before:

After:

This PR shaves ~30 sec off the previous runtime for the dataset from the original issue. I was still seeing pretty intense memory consumption 17.14GB for this open_dataset call though - not a new thing, just wanted to flag

Add a fast path that doesn't include normalized chunks in tokenize

0ba9c0c

github-actions bot added topic-backends io labels Dec 15, 2025

jsignell marked this pull request as ready for review December 17, 2025 19:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Add a fast path that doesn't include normalized chunks in tokenize #11017

Add a fast path that doesn't include normalized chunks in tokenize #11017

jsignell commented Dec 15, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Add a fast path that doesn't include normalized chunks in tokenize #11017

Are you sure you want to change the base?

Add a fast path that doesn't include normalized chunks in tokenize #11017

Conversation

jsignell commented Dec 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jsignell commented Dec 15, 2025 •

edited

Loading