polyval: Use Horner's algorithm + support chunked inputs #6548

headtr1ck · 2022-04-30T14:50:53Z

Closes xr.polyval first arg requires name attribute #6526, Better dask support in polyval #6411
Tests added
User visible changes (including notable bug fixes) are documented in whats-new.rst

headtr1ck · 2022-04-30T14:57:58Z

Several open points still:

Unittests for datetime values are failing, I might need some help with that since I have no idea what this means for polynomials.
Algorithm should work also with Datasets (any combination of DataArray and Dataset for coord and coeffs inputs). Still needs to be checked and tested (How does one define typing for such cases? I.e. DataArray + Dataset -> Dataset but DataArray + DataArray -> DataArray?)
It uses Horners method instead of Vandermonde matrix, should be faster and consume less memory (unless the overhead of sorting index, isel etc. is too large). Maybe some performance comparisons should be done.
Instead of coord the input should simply be called x or similar, however this would break backwards compatibility, so maybe we just leave it.
I had to add a copy(deep=True) since the broadcast returned a read-only DataArray. Any better ideas?

for more information, see https://pre-commit.ci

headtr1ck · 2022-04-30T18:01:10Z

I noticed that broadcasting Datasets behaves weird, see #6549, so I used a "hack" of adding an 0-valued DataArray/Dataset.
Anyone got a better idea?

max-sixty · 2022-04-30T21:07:10Z

This looks like excellent code @headtr1ck , especially so for a first PR. Welcome!

so I used a "hack" of adding an 0-valued DataArray/Dataset.

I think it's fine to do the hack and reference that issue; we can clean it up then (though if others have ideas then great).

I had to add a copy(deep=True) since the broadcast returned a read-only DataArray. Any better ideas?

IIRC we generally leave this up to the caller (and generally discourage mutation).

Someone who knows better should review the algo & opine on the datetime issue; I agree I'm not sure whether we need to support them.

headtr1ck · 2022-05-01T10:27:52Z

Some performance comparison:
With 5th order polynomial and 10 x-values:
old: 1.05 ms ± 15.8 µs per loop
new: 1.41 ms ± 11.6 µs per loop

With 5th order polynomial and 10000 x-values:
old: 1.46 ms ± 10.5 µs per loop
new: 1.41 ms ± 14.5 µs per loop

With 5th order polynomial and 1mio x-values:
old: 65.1 ms ± 332 µs per loop
new: 6.99 ms ± 168 µs per loop

As expected for small arrays the new method creates some overhead, but for larger arrays the speedup is quite nice.
Also, it uses in-place operations with much less memory usage.

headtr1ck · 2022-05-01T11:41:54Z

I added a rough support for datetime values. Someone with more knowledge of handling them should take a look, the code seems too complicated and I am sure there is a more clever solution (I could not use get_clean_interp_index since it is not an index anymore).

I agree I'm not sure whether we need to support them.

I think keeping support is nice, since they are a commonly occuring coordinates and we do not want to break anything if possible.

dcherian

Thanks @headtr1ck . What a great PR! This looks like a great improvement; the tests are certainly more readable.

I've left some minor comments.

One optional request: since you seem to have benchmark code, can you add it to benchmarks/polyfit.py (new file)? If this is too much, just add the code in a comment here and I'll send in a followup PR. We have some documentation on using asv here.

xarray/core/computation.py

xarray/tests/test_computation.py

xarray/core/computation.py

xarray/tests/test_computation.py

xarray/core/computation.py

for more information, see https://pre-commit.ci

xarray/tests/test_computation.py

max-sixty · 2022-05-01T23:24:56Z

Benchmark did not succeed since the inputs are not compatible with the old algorith...
Do we change it such that it is compatible?

If you're confident this is faster and it's not a trivial amount of work to adjust them, I would leave it...

asv_bench/benchmarks/polyfit.py

headtr1ck · 2022-05-02T06:30:21Z

Edit: nvmd, was only confusing output when the benchmark was failing. Now the benchmark looks good :)

First time working with asv...
It seems that module level variables affect all other peakmem tests (i guess memory usage of the phyton process is measured,).

We should refactor all dataarrays into the setup functions, otherwise O(n) memory algos will show wrong numbers and adding new tests will show regression on other tests.

dcherian · 2022-05-02T16:52:59Z

Nice this looks like an improvement for everything other than dask arrays with only 100 elements, which is not a good use-case for dask.

I was slightly concerned that the recrusive algorithm wouldn't work well with dask but it does seem to work better.

def other_polyval(coord, coeffs, degree_dim="degree"):
    x = coord.data

    deg_coord = coeffs[degree_dim]
    N = int(deg_coord.max()) + 1

    lhs = xr.DataArray(
        np.stack([x ** (N - 1 - i) for i in range(N)], axis=1),
        dims=(coord.name, degree_dim),
        coords={
            coord.name: coord.data,
            degree_dim: np.arange(deg_coord.max() + 1)[::-1],
        },
    )
    return xr.dot(lhs, coeffs, dims=degree_dim)


coeffs = xr.DataArray(np.random.randn(2), dims="degree")
da = xr.DataArray(dask.array.random.random((10**6), chunks=(10000)), dims=("x"))
print(len(da.data.dask))
print(len(xr.polyval(da, coeffs).data.dask))
print(len(other_polyval(da, coeffs).data.dask))

100
502
1005

headtr1ck · 2022-05-03T07:57:46Z

One minor open point: what to do with a non-integer "degree" index?
Float type could be cast to integer (thats what is happening now).
But (nonsense) datetime etc. should raise an error?

xarray/core/computation.py

headtr1ck · 2022-05-04T17:33:07Z

Personally I would allow coeffs without explicit index since I am a lazy person and would like to do coeffs = xr.DataArray([1,2], dims="degree").
But I guess with the new indexing system you want to encourage people to use them.

But I am happy with this code and look forward to use it in my projects :)

dcherian · 2022-05-05T19:17:24Z

Forcing the user to be explicit reduces bugs and user support requests :) so we like to do that.

Thanks again @headtr1ck this is a great PR!

max-sixty · 2022-05-05T19:33:53Z

Thanks @headtr1ck !

* main: (24 commits) Fix overflow issue in decode_cf_datetime for dtypes <= np.uint32 (pydata#6598) Enable flox in GroupBy and resample (pydata#5734) Add setuptools as dependency in ASV benchmark CI (pydata#6609) change polyval dim ordering (pydata#6601) re-add timedelta support for polyval (pydata#6599) Minor Dataset.map docstr clarification (pydata#6595) New inline_array kwarg for open_dataset (pydata#6566) Fix polyval overloads (pydata#6593) Restore old MultiIndex dropping behaviour (pydata#6592) [docs] add Dataset.assign_coords example (pydata#6336) (pydata#6558) Fix zarr append dtype checks (pydata#6476) Add missing space in exception message (pydata#6590) Doc Link to accessors list in extending-xarray.rst (pydata#6587) Fix Dataset/DataArray.isel with drop=True and scalar DataArray indexes (pydata#6579) Add some warnings about rechunking to the docs (pydata#6569) [pre-commit.ci] pre-commit autoupdate (pydata#6584) terminology.rst: fix link to Unidata's "netcdf_dataset_components" (pydata#6583) Allow string formatting of scalar DataArrays (pydata#5981) Fix mypy issues & reenable in tests (pydata#6581) polyval: Use Horner's algorithm + support chunked inputs (pydata#6548) ...

commit 398f1b6 Author: dcherian <deepak@cherian.net> Date: Fri May 20 08:47:56 2022 -0600 Backward compatibility dask commit bde40e4 Merge: 0783df3 4cae8d0 Author: dcherian <deepak@cherian.net> Date: Fri May 20 07:54:48 2022 -0600 Merge branch 'main' into dask-datetime-to-numeric * main: concatenate docs style (pydata#6621) Typing for open_dataset/array/mfdataset and to_netcdf/zarr (pydata#6612) {full,zeros,ones}_like typing (pydata#6611) commit 0783df3 Merge: 5cff4f1 8de7061 Author: dcherian <deepak@cherian.net> Date: Sun May 15 21:03:50 2022 -0600 Merge branch 'main' into dask-datetime-to-numeric * main: (24 commits) Fix overflow issue in decode_cf_datetime for dtypes <= np.uint32 (pydata#6598) Enable flox in GroupBy and resample (pydata#5734) Add setuptools as dependency in ASV benchmark CI (pydata#6609) change polyval dim ordering (pydata#6601) re-add timedelta support for polyval (pydata#6599) Minor Dataset.map docstr clarification (pydata#6595) New inline_array kwarg for open_dataset (pydata#6566) Fix polyval overloads (pydata#6593) Restore old MultiIndex dropping behaviour (pydata#6592) [docs] add Dataset.assign_coords example (pydata#6336) (pydata#6558) Fix zarr append dtype checks (pydata#6476) Add missing space in exception message (pydata#6590) Doc Link to accessors list in extending-xarray.rst (pydata#6587) Fix Dataset/DataArray.isel with drop=True and scalar DataArray indexes (pydata#6579) Add some warnings about rechunking to the docs (pydata#6569) [pre-commit.ci] pre-commit autoupdate (pydata#6584) terminology.rst: fix link to Unidata's "netcdf_dataset_components" (pydata#6583) Allow string formatting of scalar DataArrays (pydata#5981) Fix mypy issues & reenable in tests (pydata#6581) polyval: Use Horner's algorithm + support chunked inputs (pydata#6548) ... commit 5cff4f1 Merge: dfe200d 6144c61 Author: Maximilian Roos <5635139+max-sixty@users.noreply.github.com> Date: Sun May 1 15:16:33 2022 -0700 Merge branch 'main' into dask-datetime-to-numeric commit dfe200d Author: dcherian <deepak@cherian.net> Date: Sun May 1 11:04:03 2022 -0600 Minor cleanup commit 35ed378 Author: dcherian <deepak@cherian.net> Date: Sun May 1 10:57:36 2022 -0600 Support dask arrays in datetime_to_numeric

new polyval algo

e309682

headtr1ck and others added 6 commits April 30, 2022 17:33

polyval improved typing with datasets

88e476a

more polyval unit tests

261aadc

[pre-commit.ci] auto fixes from pre-commit.com hooks

2a6a633

for more information, see https://pre-commit.ci

support for Dataset coord in polyval

a2701b8

[pre-commit.ci] auto fixes from pre-commit.com hooks

553de10

for more information, see https://pre-commit.ci

fix bug in polyval broadcasting

4fbca23

max-sixty added 2 commits April 30, 2022 13:27

Merge branch 'main' into main

b335679

Merge branch 'main' into main

b54a4a3

max-sixty added the needs review label Apr 30, 2022

Merge branch 'main' into main

7001e4e

headtr1ck added 2 commits May 1, 2022 13:38

support for datetime values in polyval

212505f

Merge branch 'main' of github.com:headtr1ck/xarray into main

595c83c

headtr1ck added 3 commits May 1, 2022 13:49

polyval update in whats-new

945cbfb

fix dask polyval unit tests

5945537

fix bug in polyval unit tests

a0964ed

dcherian reviewed May 1, 2022

View reviewed changes

xarray/core/computation.py Outdated Show resolved Hide resolved

headtr1ck and others added 7 commits May 1, 2022 18:05

add polyval benchmark

82db394

add breaking change of polyval tp whats-new

ca5a7f7

move _ensure_numeric to its own function

7a70831

fix import error in _ensure_numeric

401e126

add raise_if_dask_computes to polyval unit tests

3c21a64

chunk coord arg as well for polyval unit tests

ff37fe2

[pre-commit.ci] auto fixes from pre-commit.com hooks

63ed137

for more information, see https://pre-commit.ci

another bugfix for polyval benchmark

70c4419

andersy005 reviewed May 1, 2022

View reviewed changes

xarray/tests/test_computation.py Outdated Show resolved Hide resolved

max-sixty removed the needs review label May 2, 2022

dcherian reviewed May 2, 2022

View reviewed changes

asv_bench/benchmarks/polyfit.py Outdated Show resolved Hide resolved

Update asv_bench/benchmarks/polyfit.py

c4ced87

dcherian reviewed May 2, 2022

View reviewed changes

asv_bench/benchmarks/polyfit.py Outdated Show resolved Hide resolved

dcherian reviewed May 2, 2022

View reviewed changes

asv_bench/benchmarks/polyfit.py Outdated Show resolved Hide resolved

Actually compute dask arrays in benchmark.

7a73a42

dcherian approved these changes May 2, 2022

View reviewed changes

Minor cleanup

a824ad2

dcherian added the plan to merge Final call for comments label May 2, 2022

dcherian changed the title ~~new polyval algo~~ polyval: Use Horner's algorithm + support chunked inputs May 2, 2022

dcherian mentioned this pull request May 2, 2022

Support dask arrays in datetime_to_numeric #6556

Merged

2 tasks

dcherian reviewed May 3, 2022

View reviewed changes

xarray/core/computation.py Outdated Show resolved Hide resolved

headtr1ck and others added 5 commits May 3, 2022 17:51

simplify polyval algo using reindex

047cd04

don't copy coeffs if not necessary

0d0bb8e

Merge branch 'pydata:main' into main

ef49710

Make sure degree_dim is an indexed coordinate of int dtype

05e0266

Fix benchmark

bd3dd81

dcherian merged commit 6fbeb13 into pydata:main May 5, 2022

aulemahal mentioned this pull request May 19, 2022

Cftime arrays not supported by polyval #6623

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

polyval: Use Horner's algorithm + support chunked inputs #6548

polyval: Use Horner's algorithm + support chunked inputs #6548

headtr1ck commented Apr 30, 2022 •

edited

Loading

headtr1ck commented Apr 30, 2022 •

edited

Loading

headtr1ck commented Apr 30, 2022

max-sixty commented Apr 30, 2022

headtr1ck commented May 1, 2022

headtr1ck commented May 1, 2022

dcherian left a comment

max-sixty commented May 1, 2022

headtr1ck commented May 2, 2022 •

edited

Loading

dcherian commented May 2, 2022

headtr1ck commented May 3, 2022 •

edited

Loading

headtr1ck commented May 4, 2022 •

edited

Loading

dcherian commented May 5, 2022

max-sixty commented May 5, 2022

polyval: Use Horner's algorithm + support chunked inputs #6548

polyval: Use Horner's algorithm + support chunked inputs #6548

Conversation

headtr1ck commented Apr 30, 2022 • edited Loading

headtr1ck commented Apr 30, 2022 • edited Loading

headtr1ck commented Apr 30, 2022

max-sixty commented Apr 30, 2022

headtr1ck commented May 1, 2022

headtr1ck commented May 1, 2022

dcherian left a comment

Choose a reason for hiding this comment

max-sixty commented May 1, 2022

headtr1ck commented May 2, 2022 • edited Loading

dcherian commented May 2, 2022

headtr1ck commented May 3, 2022 • edited Loading

headtr1ck commented May 4, 2022 • edited Loading

dcherian commented May 5, 2022

max-sixty commented May 5, 2022

headtr1ck commented Apr 30, 2022 •

edited

Loading

headtr1ck commented Apr 30, 2022 •

edited

Loading

headtr1ck commented May 2, 2022 •

edited

Loading

headtr1ck commented May 3, 2022 •

edited

Loading

headtr1ck commented May 4, 2022 •

edited

Loading