-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Future time decoding #5050
BUG: Future time decoding #5050
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@znicholls oof this was an oversight on my part. I naively took the tests passing and the fact that the upstream issue in pandas had been fixed to mean that we did not need this anymore. As you noted it turns out we still do.
Basically the issue stems from this optimization:
Lines 190 to 194 in a6f51c6
# Cast input ordinals to integers of nanoseconds because pd.to_timedelta | |
# works much faster when dealing with integers (GH 1399). | |
flat_num_dates_ns_int = (flat_num_dates * _NS_PER_TIME_DELTA[delta]).astype( | |
np.int64 | |
) |
If the encoded values are too large to be represented with a timedelta64[ns]
type, this line does not catch it; instead it silently overflows:
In [1]: import xarray as xr; import pandas as pd; import numpy as np
In [2]: num_dates = np.array([164375])
In [3]: units = "days since 1850-01-01 00:00:00"
In [4]: calendar = "proleptic_gregorian"
In [5]: (num_dates * xr.coding.times._NS_PER_TIME_DELTA["D"]).astype(np.int64)
Out[5]: array([-4244744073709551616])
So this is indeed the correct place to fix this (since it has to do with our bespoke decoding logic and not to do with pandas). If you can just add a what's new entry, I think this is ready to go!
Done, thanks! |
a3e6755
to
2274f8d
Compare
Co-authored-by: Spencer Clark <spencerkclark@gmail.com>
In the meantime it occurs to me that if you (or anyone else experiencing this issue) need a workaround, you should be able to set |
Thanks @znicholls! |
…indow * upstream/master: Fix regression in decoding large standard calendar times (pydata#5050) Fix sticky sidebar responsiveness on small screens (pydata#5039) Flexible indexes refactoring notes (pydata#4979) add a install xarray step to the upstream-dev CI (pydata#5044) Adds Dataset.query() method, analogous to pandas DataFrame.query() (pydata#4984) run tests on python 3.9 (pydata#5040) Add date attribute to datetime accessor (pydata#4994) 📚 New theme & rearrangement of the docs (pydata#4835) upgrade ci-trigger to the most recent version (pydata#5037) GH5005 fix documentation on open_rasterio (pydata#5021) GHA for automatically canceling previous CI runs (pydata#5025) Implement GroupBy.__getitem__ (pydata#3691) conventions: decode unsigned integers to signed if _Unsigned=false (pydata#4966) Added support for numpy.bool_ (pydata#4986) Add additional str accessor methods for DataArray (pydata#4622)
…-tasks * upstream/master: Fix regression in decoding large standard calendar times (pydata#5050) Fix sticky sidebar responsiveness on small screens (pydata#5039) Flexible indexes refactoring notes (pydata#4979) add a install xarray step to the upstream-dev CI (pydata#5044) Adds Dataset.query() method, analogous to pandas DataFrame.query() (pydata#4984) run tests on python 3.9 (pydata#5040) Add date attribute to datetime accessor (pydata#4994) 📚 New theme & rearrangement of the docs (pydata#4835) upgrade ci-trigger to the most recent version (pydata#5037) GH5005 fix documentation on open_rasterio (pydata#5021) GHA for automatically canceling previous CI runs (pydata#5025) Implement GroupBy.__getitem__ (pydata#3691) conventions: decode unsigned integers to signed if _Unsigned=false (pydata#4966) Added support for numpy.bool_ (pydata#4986) Add additional str accessor methods for DataArray (pydata#4622) add polyval to polyfit see also (pydata#5020) mention map_blocks in the docstring of apply_ufunc (pydata#5011) Switch backend API to v2 (pydata#4989) WIP: add new backend api documentation (pydata#4810) pin netCDF4=1.5.3 in min-all-deps (pydata#4982)
pre-commit run --all-files
whats-new.rst
This identifies (and hopefully soon fixes) an issue with decoding times that are way into the future. Without re-introducing a change made in #4684, the test fails (see https://github.com/pydata/xarray/pull/5050/checks?check_run_id=2136824754). With the change put back in, the test passes (see https://github.com/pydata/xarray/pull/5050/checks?check_run_id=2136892412).
@spencerkclark can I ask for your help on this one please? It seems that the removal in #4684 caused this case (which wasn't tested) to break in v0.17 and it's unclear to me where the fix should be (I'm guessing pandas from the discussions in #4684...).