Skip to content

perf(load): skip Variable.load dispatch for numpy data#11355

Open
FBumann wants to merge 2 commits into
pydata:mainfrom
FBumann:perf/variable-load-skip-numpy
Open

perf(load): skip Variable.load dispatch for numpy data#11355
FBumann wants to merge 2 commits into
pydata:mainfrom
FBumann:perf/variable-load-skip-numpy

Conversation

@FBumann
Copy link
Copy Markdown

@FBumann FBumann commented May 25, 2026

Description

Variable.load() and Variable.load_async() always end with self._data = to_duck_array(self._data, ...). For an in-memory numpy.ndarray, that dispatch walks is_chunked_array, the ExplicitlyIndexed | ImplicitToExplicitIndexingAdapter isinstance check, and is_duck_array only to return self._data unchanged. The whole call is pure overhead — the same no-op that IndexVariable.load (xarray/core/variable.py:2781) already short-circuits with return self.

This PR adds an isinstance(self._data, np.ndarray) guard at the top of both Variable.load and Variable.load_async. Behavior is unchanged on chunked, ExplicitlyIndexed, and non-numpy duck-array inputs.

Where this fires

Narrower than #11354 — this fast-path only fires when Variable.load() itself runs. Every caller of Variable.load() gets a per-variable saving on numpy-backed data:

Top-level .load() / .compute():

  • ds.load(), da.load(), ds.compute(), da.compute(), xr.load_dataset(...), xr.load_dataarray(...), xr.load_datatree(...), da.persist()
  • Dataset.load finishes its dict comprehension with [v.load() for k, v in self.variables.items() if k not in chunked_data] (xarray/core/dataset.py:577) — every non-chunked variable goes through Variable.load().
  • DataTree.load (xarray/core/datatree.py:2482) walks every node's variables and calls v.load() on each.

Backend writers:

  • xarray/backends/writers.py:768 — the zarr writer materializes zero-size variables with v.load() (a workaround for dask<2023.12.1). On numpy-backed empty/scalar dims, this is the only Variable.load() call in the write path.

Concat with compat="...":

  • xarray/structure/concat.py:489 — when xarray must decide whether two variables are equal across input datasets, it calls datasets[0].variables[k].load() (and .compute() on the others). On numpy-backed inputs the load() is pure dispatch overhead.

Not affected:

  • Arithmetic (lazy stays lazy).
  • da.values, da.to_numpy(), da.to_dataframe(), da.to_pandas(), da.plot(), repr — these go through to_duck_array directly without Variable.load, so they benefit from perf(load): short-circuit is_chunked_array for numpy arrays #11354 but not from this PR.
  • AbstractDataStore.load (used in xarray/conventions.py:578) — same name, different method; not touched here.

Benchmark numbers

isel(...).load() on synthetic scalar-var datasets, against main, best of 5, GC off:

per call (main) per call (this PR) speedup
50 scalar vars 0.090 ms 0.063 ms 1.44×
200 scalar vars 0.283 ms 0.175 ms 1.62×
400 scalar vars 0.524 ms 0.324 ms 1.62×
1000 scalar vars 1.271 ms 0.760 ms 1.67×
2000 scalar vars 2.484 ms 1.490 ms 1.67×

Scaling check across per-variable data size (200 vars fixed): flat ~1.56× speedup from size=0 to size=10,000, confirming the saving is pure dispatch overhead — not work-per-element.

Note on overlap with #11354

#11354 makes is_chunked_array(numpy) near-free. This PR skips the entire to_duck_array body for numpy Variable._data, which makes the is_chunked_array call inside it dead code on that path. The two PRs are still complementary, not redundant:

If both land, the per-variable saving on in-memory datasets compounds.

Checklist

  • Non-numpy paths preserved: ExplicitlyIndexed/ImplicitToExplicitIndexingAdapter adapters are not np.ndarray instances and still take the to_duck_array path; dask arrays likewise hit the existing chunked-compute branch.
  • pytest xarray/tests/test_variable.py xarray/tests/test_dataset.py — 1051 passed, 76 skipped, 9 xfailed, 4 xpassed
  • doc/whats-new.rst entry under Internal Changes

AI Disclosure

  • This PR contains AI-generated content.
    • I have tested any AI-generated content in my PR.
    • I take responsibility for any AI-generated content in my PR.

Tools: Claude (Claude Code)


[This is Claude Code on behalf of Felix Bumann]

FBumann and others added 2 commits May 23, 2026 19:58
Variable.load() and Variable.load_async() always end with
``self._data = to_duck_array(self._data)`` which, for an in-memory
``numpy.ndarray``, walks the dispatch chain only to return ``self._data``
unchanged. The whole call is pure overhead in that case — the same
no-op pattern that ``IndexVariable.load`` already short-circuits.

Add an ``isinstance(self._data, np.ndarray)`` guard at the top of both
methods. Behavior is unchanged on chunked, ExplicitlyIndexed, or
non-numpy duck-array inputs.

Measured on ``isel(...).load()`` of synthetic scalar-var datasets
against upstream/main (best of 5, GC off):

  400 scalar vars:  0.524 ms -> 0.324 ms   ~1.62x
  2000 scalar vars: 2.484 ms -> 1.490 ms   ~1.67x

Speedup scales with the number of variables (1.44x at 50 vars ->
1.67x at 2000 vars) and is flat across per-variable data size
(~1.56x from size=0 to size=10,000), confirming the saving is pure
dispatch overhead removal.

Refs pydata#11352.

Co-authored-by: Claude <noreply@anthropic.com>
The previous `isinstance(self._data, np.ndarray)` short-circuit incorrectly
returned `self` (skipping the load) for ndarray subclasses with a `chunks`
attribute — test fakes like DummyChunkedArray, or any third-party chunked
array implementation that subclasses ndarray.

Narrow to `isinstance + not hasattr("chunks")` so plain ndarrays and
non-chunked subclasses (MaskedArray, np.matrix) still skip the
to_duck_array dispatch, while subclasses that advertise chunks fall
through to the full path.

Co-authored-by: Claude <noreply@anthropic.com>
@FBumann
Copy link
Copy Markdown
Author

FBumann commented May 26, 2026

After some more testing i think this is a pretty marginal improvement. Happy to take a "Not planned"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant