Skip to content

Variable.load() is a no-op for numpy data; can skip dispatch entirely #11352

@FBumann

Description

@FBumann

What

For a Variable whose ._data is already an in-memory numpy.ndarray, Variable.load() does no useful work — it walks a dispatch chain that always returns the same array unchanged. The cost is paid once per variable inside Dataset.load() / DataArray.load(), on top of (and after) is_chunked_array has already concluded the variable isn't chunked.

Code path

Variable.load (xarray/core/variable.py:1022):

def load(self, **kwargs) -> Self:
    self._data = to_duck_array(self._data, **kwargs)
    return self

to_duck_array (xarray/namedarray/pycompat.py:139):

def to_duck_array(data, **kwargs):
    if is_chunked_array(data):
        ...
    if isinstance(data, ExplicitlyIndexed | ImplicitToExplicitIndexingAdapter):
        return data.get_duck_array()
    elif is_duck_array(data):
        return data
    else:
        return np.asarray(data)

For np.ndarray:

  1. is_chunked_array(data)False (after Speed up Dataset.load for in-memory datasets with many variables #11351 lands, this is now fast)
  2. isinstance(data, ExplicitlyIndexed | ImplicitToExplicitIndexingAdapter)False (np.ndarray is neither)
  3. is_duck_array(data)True → returns data unchanged

So self._data = to_duck_array(self._data) reduces to self._data = self._data. The entire call is dispatch overhead.

Why it matters

Dataset.load() calls Variable.load() per variable. On many-variable datasets — common with open_mfdataset / concat results — this overhead compounds. With #11351 the is_chunked_array portion is near-free, so this dispatch is the remaining cost on the same hot path.

Proposed fix

Add a numpy fast-path at the top of Variable.load() and Variable.load_async():

def load(self, **kwargs) -> Self:
    if isinstance(self._data, np.ndarray):
        return self
    self._data = to_duck_array(self._data, **kwargs)
    return self

(Happy to open a follow-up PR. Filing here so the design can be discussed independently of the implementation.)


[This is Claude Code on behalf of Felix Bumann]

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions