Per-chunk path for interp(linear|nearest) on dask-chunked core dims by thodson-usgs · Pull Request #11312 · pydata/xarray

thodson-usgs · 2026-04-23T19:12:38Z

Description

When the core dim is chunked, interp(method="linear"|"nearest"|"slinear") currently goes through apply_ufunc(allow_rechunk=True), which concatenates the full interp axis per task. For each target point, this PR looks up the source chunk that contains its coord value and runs the interpolator over that chunk + a size-1 halo — so per-task memory scales with source_chunk + halo rather than the full axis.

import numpy as np, xarray as xr

src = xr.DataArray(
    np.random.default_rng(0).standard_normal((6000, 5000)),
    dims=("lat", "lon"),
    coords={"lat": np.linspace(-89.5, 89.5, 6000),
            "lon": np.linspace(-179.5, 179.5, 5000)},
).chunk({"lat": 60})                                  # 100 chunks; 240 MB/block post-rechunk > 128 MiB
out = src.interp(
    lat=np.linspace(-89.5, 89.5, 1500),
    lon=np.linspace(-179.5, 179.5, 1250),
    method="linear",
)
len(out.__dask_graph__()), out.compute()
# before: 41,106 tasks, ~2,700 ms     after: 1,126 tasks, ~370 ms

Cubic / multi-dim interpnd / non-monotonic coord / numpy input all fall back to the existing path unchanged. The fast path also requires the source to have more than one chunk along the interp dim; single-chunk-source cases still use the existing path.

Related: #9907 (the original report that motivated dask#11683; already closed but the same root cause), #10130 (open memory-OOM report — partial overlap; the single-chunk-source case there still uses the existing path and is better addressed by the dask-side guard in dask/dask#12360).

Companion dask PR (dask/dask#12360) addresses a narrower regression in apply_gufunc(allow_rechunk=True) that also blows up graphs on small inputs. The two fixes are complementary; either one improves the common case, both together cover both the "small input, huge graph" and "large input, huge memory" regimes.

Checklist

Tests added
User visible changes documented in `whats-new.rst`

AI Disclosure

This PR contains AI-generated content.
- I have tested any AI-generated content in my PR.
- I take responsibility for any AI-generated content in my PR.

Tools: Claude Code (Claude Opus 4.7).

Routes ``xarray.interp(method="linear"|"nearest"|"slinear")`` on a dask-chunked core dim through a per-chunk dispatch instead of ``apply_ufunc(..., allow_rechunk=True)``. For each target point, look up the source chunk that contains its coord value and run the interpolator over that chunk plus a size-1 halo. Per-task memory scales with ``source_chunk + halo`` rather than the full interp axis. Fall-back path preserves the existing behavior for cubic, multi-dim interpn, non-monotonic source coord, empty target, and numpy input. Verified against the existing apply_ufunc path on 200x400 -> 50x100 for several source-chunk layouts (bit-identical), on a 3D time-chunked input (time chunking preserved), and on the memory-constrained 6000x5000 case where the new path beats ``apply_ufunc`` by ~10x. The per-chunk path materializes 1D source coords (searchsorted-based routing); data stays lazy. ``test_dataset_interp_datetime_dask`` bumped its ``raise_if_dask_computes`` budget to account for this. Related: :issue:`9907` (already closed; same root cause) and :issue:`10130` (open; partial overlap — single-chunk-source cases still use the existing path, better addressed by the dask-side guard in dask/dask#12360). Co-Authored-By: Claude <noreply@anthropic.com>

thodson-usgs · 2026-04-24T01:44:19Z

Closing this. Benchmarking across a range of chunk sizes shows the speedup is concentrated in the small-chunk regime, not in typical workloads:

chunks along lat	chunk size	main (tasks / ms)	this PR (tasks / ms)
100	2.3 MiB	41,106 / 3825	1,126 / 131
10	22.9 MiB	516 / 141	116 / 97
4	57.2 MiB	114 / 88	46 / 111
2	128 MiB (auto)	71 / 105	51 / 142
1	229 MiB	17 / 180	17 / 153

At dask's default auto-chunking (~128 MiB), this PR shows no measurable improvement and is slightly slower; single-chunk sources fall back to the existing path. The big win is specifically when post-rechunk blocks are well below array.chunk-size — i.e., when apply_ufunc(allow_rechunk=True) is doing pathological graph construction rather than hitting a real memory wall.

That's a legitimate pain point (see #9907, #10130) but not wide enough to justify adding a second interp code path that needs its own maintenance. The motivating memory-OOM scenario is better addressed by guarding the rechunk itself (something like dask/dask#12360, also closed for now) or by guidance for users to pick reasonable chunk sizes before calling .interp().

Happy to revisit with a narrower framing if this keeps hitting real workloads.

[This is Claude Code on behalf of Tim Hodson]

thodson-usgs mentioned this pull request Apr 23, 2026

Skip apply_gufunc compensation when post-rechunk block fits array.chunk-size dask/dask#12360

Closed

2 tasks

thodson-usgs force-pushed the fix/interp-map-overlap branch 2 times, most recently from b839f7e to 57b096d Compare April 23, 2026 23:30

thodson-usgs force-pushed the fix/interp-map-overlap branch from 08ab770 to beb87cf Compare April 24, 2026 01:13

thodson-usgs mentioned this pull request Apr 24, 2026

Simplify per-chunk interp1d dask path thodson-usgs/xarray#1

Closed

2 tasks

thodson-usgs closed this Apr 24, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Per-chunk path for interp(linear|nearest) on dask-chunked core dims#11312

Per-chunk path for interp(linear|nearest) on dask-chunked core dims#11312
thodson-usgs wants to merge 1 commit into
pydata:mainfrom
thodson-usgs:fix/interp-map-overlap

thodson-usgs commented Apr 23, 2026 •

edited

Loading

Uh oh!

thodson-usgs commented Apr 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

thodson-usgs commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Checklist

AI Disclosure

Uh oh!

thodson-usgs commented Apr 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

thodson-usgs commented Apr 23, 2026 •

edited

Loading