-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
flox performance regression for cftime resampling #7730
Comments
Thanks can you add version info please |
Of course - I added it above. |
The slowness is basically a bunch of copies happening in And then there's xarray-contrib/flox#222 |
Also because your groups are sorted, gb = da.groupby("time.year")
# using max
xr.set_options(use_flox=True)
%timeit gb.max("time")
%timeit gb.max("time", engine="flox")
xr.set_options(use_flox=False)
%timeit gb.max("time")
|
Thanks for looking into this! |
* align: Avoid reindexing when join="exact" xref #7730 * Aligner.copy always copies ecen with join="exact" * Move logic to Aligner.align * Add whats-new
Thanks for the report! I think we should add your example as a benchmark. |
* [skip-ci] Add cftime groupby, resample benchmarks xref #7730 * [skip-ci]try setting temp dir * [skip-ci] try mamba? * [skip-ci] increase conda verbosity * [skip-ci] specify channels * [skip-ci] Update .github/workflows/benchmarks.yml * [skip-ci] bugfix * [skip-ci] Parameterize use_flox * [skip-ci] cleanup * [skip-ci] fixes * [skip-ci] fix resample parameterizing
* Optimize broadcasting xref pydata/xarray#7730 * reorder * Fix tests * Another optimization * fixes * fix
I looked in to this a bit today and I think the performance regression comes from using |
Yes this is a known slowness. numpy did improve some in 1.25 but I'm not sure if our typical workloads are affected. See numpy/numpy#23176 for remaining work. |
What happened?
Running an in-memory
groupby
operation took much longer than expected. Turning off flox fixed this - but I don't think that's the idea ;-)What did you expect to happen?
flox to be at least on par with our naive implementation
Minimal Complete Verifiable Example
MVCE confirmation
Relevant log output
Anything else we need to know?
No response
Environment
INSTALLED VERSIONS
commit: f8127fc
python: 3.10.10 | packaged by conda-forge | (main, Mar 24 2023, 20:08:06) [GCC 11.3.0]
python-bits: 64
OS: Linux
OS-release: 5.15.0-69-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.12.2
libnetcdf: 4.9.1
xarray: main
pandas: 1.5.3
numpy: 1.23.5
scipy: 1.10.1
netCDF4: 1.6.3
pydap: installed
h5netcdf: 1.1.0
h5py: 3.8.0
Nio: None
zarr: 2.14.2
cftime: 1.6.2
nc_time_axis: 1.4.1
PseudoNetCDF: 3.2.2
iris: 3.4.1
bottleneck: 1.3.7
dask: 2023.3.2
distributed: 2023.3.2.1
matplotlib: 3.7.1
cartopy: 0.21.1
seaborn: 0.12.2
numbagg: 0.2.2
fsspec: 2023.3.0
cupy: None
pint: 0.20.1
sparse: 0.14.0
flox: 0.6.10
numpy_groupies: 0.9.20
setuptools: 67.6.1
pip: 23.0.1
conda: None
pytest: 7.2.2
mypy: None
IPython: 8.12.0
sphinx: None
The text was updated successfully, but these errors were encountered: