Per-chunk path for interp(linear|nearest) on dask-chunked core dims#11312
Per-chunk path for interp(linear|nearest) on dask-chunked core dims#11312thodson-usgs wants to merge 1 commit into
Conversation
b839f7e to
57b096d
Compare
Routes ``xarray.interp(method="linear"|"nearest"|"slinear")`` on a dask-chunked core dim through a per-chunk dispatch instead of ``apply_ufunc(..., allow_rechunk=True)``. For each target point, look up the source chunk that contains its coord value and run the interpolator over that chunk plus a size-1 halo. Per-task memory scales with ``source_chunk + halo`` rather than the full interp axis. Fall-back path preserves the existing behavior for cubic, multi-dim interpn, non-monotonic source coord, empty target, and numpy input. Verified against the existing apply_ufunc path on 200x400 -> 50x100 for several source-chunk layouts (bit-identical), on a 3D time-chunked input (time chunking preserved), and on the memory-constrained 6000x5000 case where the new path beats ``apply_ufunc`` by ~10x. The per-chunk path materializes 1D source coords (searchsorted-based routing); data stays lazy. ``test_dataset_interp_datetime_dask`` bumped its ``raise_if_dask_computes`` budget to account for this. Related: :issue:`9907` (already closed; same root cause) and :issue:`10130` (open; partial overlap — single-chunk-source cases still use the existing path, better addressed by the dask-side guard in dask/dask#12360). Co-Authored-By: Claude <noreply@anthropic.com>
08ab770 to
beb87cf
Compare
|
Closing this. Benchmarking across a range of chunk sizes shows the speedup is concentrated in the small-chunk regime, not in typical workloads:
At dask's default auto-chunking (~128 MiB), this PR shows no measurable improvement and is slightly slower; single-chunk sources fall back to the existing path. The big win is specifically when post-rechunk blocks are well below That's a legitimate pain point (see #9907, #10130) but not wide enough to justify adding a second interp code path that needs its own maintenance. The motivating memory-OOM scenario is better addressed by guarding the rechunk itself (something like dask/dask#12360, also closed for now) or by guidance for users to pick reasonable chunk sizes before calling Happy to revisit with a narrower framing if this keeps hitting real workloads. [This is Claude Code on behalf of Tim Hodson] |
Description
When the core dim is chunked,
interp(method="linear"|"nearest"|"slinear")currently goes throughapply_ufunc(allow_rechunk=True), which concatenates the full interp axis per task. For each target point, this PR looks up the source chunk that contains its coord value and runs the interpolator over that chunk + a size-1 halo — so per-task memory scales withsource_chunk + halorather than the full axis.Cubic / multi-dim
interpnd/ non-monotonic coord / numpy input all fall back to the existing path unchanged. The fast path also requires the source to have more than one chunk along the interp dim; single-chunk-source cases still use the existing path.Related: #9907 (the original report that motivated dask#11683; already closed but the same root cause), #10130 (open memory-OOM report — partial overlap; the single-chunk-source case there still uses the existing path and is better addressed by the dask-side guard in dask/dask#12360).
Companion dask PR (dask/dask#12360) addresses a narrower regression in
apply_gufunc(allow_rechunk=True)that also blows up graphs on small inputs. The two fixes are complementary; either one improves the common case, both together cover both the "small input, huge graph" and "large input, huge memory" regimes.Checklist
AI Disclosure
Tools: Claude Code (Claude Opus 4.7).