Skip to content

Per-chunk path for interp(linear|nearest) on dask-chunked core dims#11312

Closed
thodson-usgs wants to merge 1 commit into
pydata:mainfrom
thodson-usgs:fix/interp-map-overlap
Closed

Per-chunk path for interp(linear|nearest) on dask-chunked core dims#11312
thodson-usgs wants to merge 1 commit into
pydata:mainfrom
thodson-usgs:fix/interp-map-overlap

Conversation

@thodson-usgs
Copy link
Copy Markdown
Contributor

@thodson-usgs thodson-usgs commented Apr 23, 2026

Description

When the core dim is chunked, interp(method="linear"|"nearest"|"slinear") currently goes through apply_ufunc(allow_rechunk=True), which concatenates the full interp axis per task. For each target point, this PR looks up the source chunk that contains its coord value and runs the interpolator over that chunk + a size-1 halo — so per-task memory scales with source_chunk + halo rather than the full axis.

import numpy as np, xarray as xr

src = xr.DataArray(
    np.random.default_rng(0).standard_normal((6000, 5000)),
    dims=("lat", "lon"),
    coords={"lat": np.linspace(-89.5, 89.5, 6000),
            "lon": np.linspace(-179.5, 179.5, 5000)},
).chunk({"lat": 60})                                  # 100 chunks; 240 MB/block post-rechunk > 128 MiB
out = src.interp(
    lat=np.linspace(-89.5, 89.5, 1500),
    lon=np.linspace(-179.5, 179.5, 1250),
    method="linear",
)
len(out.__dask_graph__()), out.compute()
# before: 41,106 tasks, ~2,700 ms     after: 1,126 tasks, ~370 ms

Cubic / multi-dim interpnd / non-monotonic coord / numpy input all fall back to the existing path unchanged. The fast path also requires the source to have more than one chunk along the interp dim; single-chunk-source cases still use the existing path.

Related: #9907 (the original report that motivated dask#11683; already closed but the same root cause), #10130 (open memory-OOM report — partial overlap; the single-chunk-source case there still uses the existing path and is better addressed by the dask-side guard in dask/dask#12360).

Companion dask PR (dask/dask#12360) addresses a narrower regression in apply_gufunc(allow_rechunk=True) that also blows up graphs on small inputs. The two fixes are complementary; either one improves the common case, both together cover both the "small input, huge graph" and "large input, huge memory" regimes.

Checklist

  • Tests added
  • User visible changes documented in `whats-new.rst`

AI Disclosure

  • This PR contains AI-generated content.
    • I have tested any AI-generated content in my PR.
    • I take responsibility for any AI-generated content in my PR.

Tools: Claude Code (Claude Opus 4.7).

Routes ``xarray.interp(method="linear"|"nearest"|"slinear")`` on a
dask-chunked core dim through a per-chunk dispatch instead of
``apply_ufunc(..., allow_rechunk=True)``. For each target point, look up
the source chunk that contains its coord value and run the interpolator
over that chunk plus a size-1 halo. Per-task memory scales with
``source_chunk + halo`` rather than the full interp axis.

Fall-back path preserves the existing behavior for cubic, multi-dim
interpn, non-monotonic source coord, empty target, and numpy input.

Verified against the existing apply_ufunc path on 200x400 -> 50x100 for
several source-chunk layouts (bit-identical), on a 3D time-chunked input
(time chunking preserved), and on the memory-constrained 6000x5000 case
where the new path beats ``apply_ufunc`` by ~10x.

The per-chunk path materializes 1D source coords (searchsorted-based
routing); data stays lazy. ``test_dataset_interp_datetime_dask`` bumped
its ``raise_if_dask_computes`` budget to account for this.

Related: :issue:`9907` (already closed; same root cause) and
:issue:`10130` (open; partial overlap — single-chunk-source cases still
use the existing path, better addressed by the dask-side guard in
dask/dask#12360).

Co-Authored-By: Claude <noreply@anthropic.com>
@thodson-usgs
Copy link
Copy Markdown
Contributor Author

Closing this. Benchmarking across a range of chunk sizes shows the speedup is concentrated in the small-chunk regime, not in typical workloads:

chunks along lat chunk size main (tasks / ms) this PR (tasks / ms)
100 2.3 MiB 41,106 / 3825 1,126 / 131
10 22.9 MiB 516 / 141 116 / 97
4 57.2 MiB 114 / 88 46 / 111
2 128 MiB (auto) 71 / 105 51 / 142
1 229 MiB 17 / 180 17 / 153

At dask's default auto-chunking (~128 MiB), this PR shows no measurable improvement and is slightly slower; single-chunk sources fall back to the existing path. The big win is specifically when post-rechunk blocks are well below array.chunk-size — i.e., when apply_ufunc(allow_rechunk=True) is doing pathological graph construction rather than hitting a real memory wall.

That's a legitimate pain point (see #9907, #10130) but not wide enough to justify adding a second interp code path that needs its own maintenance. The motivating memory-OOM scenario is better addressed by guarding the rechunk itself (something like dask/dask#12360, also closed for now) or by guidance for users to pick reasonable chunk sizes before calling .interp().

Happy to revisit with a narrower framing if this keeps hitting real workloads.

[This is Claude Code on behalf of Tim Hodson]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant