Improve interp performance #4069

fujiisoup · 2020-05-16T04:23:47Z

Closes DataArray.interp() : poor performance #2223
Passes isort -rc . && black . && mypy . && flake8
Fully documented, including whats-new.rst for all changes and api.rst for new API

Now n-dimensional interp works sequentially if possible.
It may speed up some cases.

fujiisoup · 2020-05-17T20:56:34Z

Maybe I'll merge this in a few days.

dcherian

Thanks @fujiisoup

dcherian · 2020-05-18T14:29:17Z

doc/whats-new.rst

@@ -34,6 +34,11 @@ Breaking changes
  (:pull:`3274`)
  By `Elliott Sales de Andrade <https://github.com/QuLogic>`_

+Enhancements
+~~~~~~~~~~~~
+- Performance improvement of :py:meth:`DataArray.interp` and :py:func:`Dataset.interp` (:issue:`2223`)


can we add one more line describing the improvement?

dcherian · 2020-05-18T14:39:41Z

xarray/tests/test_interp.py

+
+
+@requires_scipy
+def test_decompose():


Can we test both linear and nearest methods?

dcherian · 2020-05-18T14:44:02Z

xarray/core/missing.py

+        len(indexes_coords) > 1
+        and method in ["linear", "nearest"]
+        and all(dest[1].ndim == 1 for dest in indexes_coords.values())
+        and len(set([d[1].dims[0] for d in indexes_coords.values()]))


This condition is confusing me. This will not speed up this case:

da = xr.DataArray( np.arange(6).reshape(3, 2), dims=["x", "y"], coords={"x": [0, 1, 2], "y": [-0.1, -0.3]}, ) x_new = xr.DataArray([0.5, 1.5, 2.5], dims=["x1"]) y_new = xr.DataArray([-0.15, -0.25, -0.35], dims=["x1"]) da.interp(x=x_new, y=y_new)

Correct? Is that intentional?

Thanks @dcherian
It was intentional, but you are right. This case can be also improved.

da.interp(x=x_new, y=y_new)

should be equivalent to

da.interp(x=x_new).interp(y=y_new)

But then I'm a bit confused about which case should be improved.
For example if len(x_new) = len(y_new) = 1000000, then the original interpretation may be faster, although this is a rare use case.
Maybe we can use some heuristics, such as

len(x_new) < len(x)

?

I'm now thinking that the simpler behavior is better; for an orthogonal interpolation we interpolate sequentially and otherwise we use interpn.
Further improvement may be done in upstream.

Sounds good.

da = xr.DataArray( np.arange(6).reshape(3, 2), dims=["x", "y"], coords={"x": [0, 1, 2], "y": [-0.1, -0.3]}, ) x_new = xr.DataArray([0.5, 1.5, 2.5], dims=["x1"]) y_new = xr.DataArray([-0.15, -0.25, -0.35], dims=["x1"]) da.interp(x=x_new, y=y_new)

It looks that this case is not slow even with our current code.
The problem is when the final destination is a regular grid, where interpn will compute many times.
So, probably this PR should work good enough for this case.

fujiisoup · 2020-05-25T00:09:39Z

I'll merge this tomorrow.

* upstream/master: Improve interp performance (pydata#4069) Auto chunk (pydata#4064) xr.cov() and xr.corr() (pydata#4089) allow multiindex levels in plots (pydata#3938) Fix bool weights (pydata#4075) fix dangerous default arguments (pydata#4006)

fujiisoup added 2 commits May 16, 2020 13:21

Fixes 2223

b81a624

more tests

7c1919f

fujiisoup added 3 commits May 18, 2020 07:11

add @requires_scipy to test

4a4e295

fix tests

238a08d

black

200dacc

dcherian reviewed May 18, 2020

View reviewed changes

update whatsnew. Added a test for nearest

1a7d738

fujiisoup merged commit d1f7cb8 into pydata:master May 25, 2020

fujiisoup deleted the improve_interp branch May 25, 2020 20:02

slevang mentioned this pull request Jul 18, 2022

interp performance with chunked dimensions #6799

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve interp performance #4069

Improve interp performance #4069

fujiisoup commented May 16, 2020

fujiisoup commented May 17, 2020

dcherian left a comment

dcherian May 18, 2020

dcherian May 18, 2020

dcherian May 18, 2020

fujiisoup May 18, 2020

fujiisoup May 24, 2020

dcherian May 24, 2020

fujiisoup May 25, 2020

fujiisoup commented May 25, 2020

Improve interp performance #4069

Improve interp performance #4069

Conversation

fujiisoup commented May 16, 2020

fujiisoup commented May 17, 2020

dcherian left a comment

Choose a reason for hiding this comment

dcherian May 18, 2020

Choose a reason for hiding this comment

dcherian May 18, 2020

Choose a reason for hiding this comment

dcherian May 18, 2020

Choose a reason for hiding this comment

fujiisoup May 18, 2020

Choose a reason for hiding this comment

fujiisoup May 24, 2020

Choose a reason for hiding this comment

dcherian May 24, 2020

Choose a reason for hiding this comment

fujiisoup May 25, 2020

Choose a reason for hiding this comment

fujiisoup commented May 25, 2020