perf(load): skip Variable.load dispatch for numpy data by FBumann · Pull Request #11355 · pydata/xarray

FBumann · 2026-05-25T16:52:13Z

Description

Variable.load() and Variable.load_async() always end with self._data = to_duck_array(self._data, ...). For an in-memory numpy.ndarray, that dispatch walks is_chunked_array, the ExplicitlyIndexed | ImplicitToExplicitIndexingAdapter isinstance check, and is_duck_array only to return self._data unchanged. The whole call is pure overhead — the same no-op that IndexVariable.load (xarray/core/variable.py:2781) already short-circuits with return self.

This PR adds an isinstance(self._data, np.ndarray) guard at the top of both Variable.load and Variable.load_async. Behavior is unchanged on chunked, ExplicitlyIndexed, and non-numpy duck-array inputs.

Where this fires

Narrower than #11354 — this fast-path only fires when Variable.load() itself runs. Every caller of Variable.load() gets a per-variable saving on numpy-backed data:

Top-level .load() / .compute():

ds.load(), da.load(), ds.compute(), da.compute(), xr.load_dataset(...), xr.load_dataarray(...), xr.load_datatree(...), da.persist()
Dataset.load finishes its dict comprehension with [v.load() for k, v in self.variables.items() if k not in chunked_data] (xarray/core/dataset.py:577) — every non-chunked variable goes through Variable.load().
DataTree.load (xarray/core/datatree.py:2482) walks every node's variables and calls v.load() on each.

Backend writers:

xarray/backends/writers.py:768 — the zarr writer materializes zero-size variables with v.load() (a workaround for dask<2023.12.1). On numpy-backed empty/scalar dims, this is the only Variable.load() call in the write path.

Concat with compat="...":

xarray/structure/concat.py:489 — when xarray must decide whether two variables are equal across input datasets, it calls datasets[0].variables[k].load() (and .compute() on the others). On numpy-backed inputs the load() is pure dispatch overhead.

Not affected:

Arithmetic (lazy stays lazy).
da.values, da.to_numpy(), da.to_dataframe(), da.to_pandas(), da.plot(), repr — these go through to_duck_array directly without Variable.load, so they benefit from perf(load): short-circuit is_chunked_array for numpy arrays #11354 but not from this PR.
AbstractDataStore.load (used in xarray/conventions.py:578) — same name, different method; not touched here.

Benchmark numbers

isel(...).load() on synthetic scalar-var datasets, against main, best of 5, GC off:

	per call (main)	per call (this PR)	speedup
50 scalar vars	0.090 ms	0.063 ms	1.44×
200 scalar vars	0.283 ms	0.175 ms	1.62×
400 scalar vars	0.524 ms	0.324 ms	1.62×
1000 scalar vars	1.271 ms	0.760 ms	1.67×
2000 scalar vars	2.484 ms	1.490 ms	1.67×

Scaling check across per-variable data size (200 vars fixed): flat ~1.56× speedup from size=0 to size=10,000, confirming the saving is pure dispatch overhead — not work-per-element.

Note on overlap with #11354

#11354 makes is_chunked_array(numpy) near-free. This PR skips the entire to_duck_array body for numpy Variable._data, which makes the is_chunked_array call inside it dead code on that path. The two PRs are still complementary, not redundant:

Dataset.load calls is_chunked_array(v._data) in its dict comprehension (xarray/core/dataset.py:563) before dispatching to Variable.load — that call is only sped up by perf(load): short-circuit is_chunked_array for numpy arrays #11354.
The dispatch inside Variable.load itself is what this PR removes.

If both land, the per-variable saving on in-memory datasets compounds.

Checklist

Non-numpy paths preserved: ExplicitlyIndexed/ImplicitToExplicitIndexingAdapter adapters are not np.ndarray instances and still take the to_duck_array path; dask arrays likewise hit the existing chunked-compute branch.
pytest xarray/tests/test_variable.py xarray/tests/test_dataset.py — 1051 passed, 76 skipped, 9 xfailed, 4 xpassed
doc/whats-new.rst entry under Internal Changes

AI Disclosure

This PR contains AI-generated content.
- I have tested any AI-generated content in my PR.
- I take responsibility for any AI-generated content in my PR.

Tools: Claude (Claude Code)

[This is Claude Code on behalf of Felix Bumann]

Variable.load() and Variable.load_async() always end with ``self._data = to_duck_array(self._data)`` which, for an in-memory ``numpy.ndarray``, walks the dispatch chain only to return ``self._data`` unchanged. The whole call is pure overhead in that case — the same no-op pattern that ``IndexVariable.load`` already short-circuits. Add an ``isinstance(self._data, np.ndarray)`` guard at the top of both methods. Behavior is unchanged on chunked, ExplicitlyIndexed, or non-numpy duck-array inputs. Measured on ``isel(...).load()`` of synthetic scalar-var datasets against upstream/main (best of 5, GC off): 400 scalar vars: 0.524 ms -> 0.324 ms ~1.62x 2000 scalar vars: 2.484 ms -> 1.490 ms ~1.67x Speedup scales with the number of variables (1.44x at 50 vars -> 1.67x at 2000 vars) and is flat across per-variable data size (~1.56x from size=0 to size=10,000), confirming the saving is pure dispatch overhead removal. Refs pydata#11352. Co-authored-by: Claude <noreply@anthropic.com>

The previous `isinstance(self._data, np.ndarray)` short-circuit incorrectly returned `self` (skipping the load) for ndarray subclasses with a `chunks` attribute — test fakes like DummyChunkedArray, or any third-party chunked array implementation that subclasses ndarray. Narrow to `isinstance + not hasattr("chunks")` so plain ndarrays and non-chunked subclasses (MaskedArray, np.matrix) still skip the to_duck_array dispatch, while subclasses that advertise chunks fall through to the full path. Co-authored-by: Claude <noreply@anthropic.com>

FBumann · 2026-05-26T10:42:35Z

After some more testing i think this is a pretty marginal improvement. Happy to take a "Not planned"

FBumann and others added 2 commits May 23, 2026 19:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf(load): skip Variable.load dispatch for numpy data#11355

perf(load): skip Variable.load dispatch for numpy data#11355
FBumann wants to merge 2 commits into
pydata:mainfrom
FBumann:perf/variable-load-skip-numpy

FBumann commented May 25, 2026

Uh oh!

FBumann commented May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

FBumann commented May 25, 2026

Description

Where this fires

Benchmark numbers

Note on overlap with #11354

Checklist

AI Disclosure

Uh oh!

FBumann commented May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant