Variable.load() is a no-op for numpy data; can skip dispatch entirely

### What

For a `Variable` whose `._data` is already an in-memory `numpy.ndarray`, `Variable.load()` does no useful work — it walks a dispatch chain that always returns the same array unchanged. The cost is paid once per variable inside `Dataset.load()` / `DataArray.load()`, on top of (and after) `is_chunked_array` has already concluded the variable isn't chunked.

### Code path

`Variable.load` ([`xarray/core/variable.py:1022`](https://github.com/pydata/xarray/blob/main/xarray/core/variable.py#L1022)):

```python
def load(self, **kwargs) -> Self:
    self._data = to_duck_array(self._data, **kwargs)
    return self
```

`to_duck_array` ([`xarray/namedarray/pycompat.py:139`](https://github.com/pydata/xarray/blob/main/xarray/namedarray/pycompat.py#L139)):

```python
def to_duck_array(data, **kwargs):
    if is_chunked_array(data):
        ...
    if isinstance(data, ExplicitlyIndexed | ImplicitToExplicitIndexingAdapter):
        return data.get_duck_array()
    elif is_duck_array(data):
        return data
    else:
        return np.asarray(data)
```

For `np.ndarray`:

1. `is_chunked_array(data)` → `False` (after #11351 lands, this is now fast)
2. `isinstance(data, ExplicitlyIndexed | ImplicitToExplicitIndexingAdapter)` → `False` (`np.ndarray` is neither)
3. `is_duck_array(data)` → `True` → returns `data` unchanged

So `self._data = to_duck_array(self._data)` reduces to `self._data = self._data`. The entire call is dispatch overhead.

### Why it matters

`Dataset.load()` calls `Variable.load()` per variable. On many-variable datasets — common with `open_mfdataset` / `concat` results — this overhead compounds. With #11351 the `is_chunked_array` portion is near-free, so this dispatch is the remaining cost on the same hot path.

### Proposed fix

Add a numpy fast-path at the top of `Variable.load()` and `Variable.load_async()`:

```python
def load(self, **kwargs) -> Self:
    if isinstance(self._data, np.ndarray):
        return self
    self._data = to_duck_array(self._data, **kwargs)
    return self
```

(Happy to open a follow-up PR. Filing here so the design can be discussed independently of the implementation.)

---

[This is Claude Code on behalf of Felix Bumann]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Variable.load() is a no-op for numpy data; can skip dispatch entirely #11352

What

Code path

Why it matters

Proposed fix

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Variable.load() is a no-op for numpy data; can skip dispatch entirely #11352

Description

What

Code path

Why it matters

Proposed fix

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions