Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for CFTimeIndex in get_clean_interp_index #3631

Merged
merged 39 commits into from
Jan 26, 2020

Conversation

huard
Copy link
Contributor

@huard huard commented Dec 16, 2019

Related to #3349

As suggested by @spencerkclark, index values are computed as a delta with respect to 1970-01-01.

At the moment, this fails if dates fall outside of the range for nanoseconds timedeltas [ 1678 AD, 2262 AD]. Is this something we can fix ?

@pep8speaks
Copy link

pep8speaks commented Jan 6, 2020

Hello @huard! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-01-26 13:27:14 UTC

Copy link
Member

@spencerkclark spencerkclark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @huard for all your work on this. I think we're getting closer now to a better situation regarding interpolation with cftime coordinates.

xarray/core/duck_array_ops.py Outdated Show resolved Hide resolved
xarray/core/missing.py Outdated Show resolved Hide resolved
array = array.astype(np.float64) # [ns]
res = "ns"
else:
array = np.asarray(array).astype("timedelta64[us]").astype(np.float64) # [us]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the potential for loss of precision is here, but only when array comes in with dtype np.timedelta64[ns] or finer. I feel like we can guard against this; we could do this in one of two ways:

  1. Only pass dtype object arrays to this function and write a separate function to handle np.timedelta64 arrays.
  2. Add some logic to handle np.timedelta64 arrays more robustly in this function, and name it something that reflects that both standard library timedeltas and NumPy timedeltas can be safely passed to it.

I probably prefer (1) as a solution.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are two cases where we're losing precision:

  1. Conversion from ns to us when the time steps are < 1 us.
  2. Conversion to float of large timedelta[us] values. For example:
In [154]: np.float(13510798882111486) - np.float(13510798882111485)                                                                                                                                         
Out[154]: 2.0

When we discussed about loss of precision, I was more worried about 2 than 1, hence maybe some confusion earlier. I'm concerned 2 will lead to hard to track bugs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we discussed about loss of precision, I was more worried about 2 than 1, hence maybe some confusion earlier. I'm concerned 2 will lead to hard to track bugs.

Ah, that makes sense. Indeed I had mainly been thinking about 1. Thanks for the clear example. My apologies for being dense earlier. At least in the short term, I guess I'm still not so worried about 2, primarily because it is already an issue in the existing code (i.e. even for np.datetime64[ns] dates). So this update is not changing the behavior materially; it's just extending the approach to allow for long time ranges between cftime dates1.

Earlier you raised a good question regarding whether there would be a benefit to keeping these values as integers as long as possible to avoid these floating point problems -- it seems like there could be in the case of differentiate and integrate -- however, that would probably require some refactoring to do the unit conversion after the operation involving the numeric times, rather than before, to avoid overflow. I'd be open to discussing this more, but I don't feel like it's necessarily a blocker here, unless you have an example that clearly shows otherwise.

Regardless, I really appreciate your careful thoughts and help with this. This has been useful discussion.


1I guess it's also important to note that we are currently naively differencing cftime dates to produce timedeltas, which even before float conversion is not microsecond-exact:

In [1]: import cftime

In [2]: cftime.DatetimeNoLeap(2000, 1, 1, 0, 0, 0, 123456) - cftime.DatetimeNoLeap(2000, 1, 1, 0, 0, 0, 123455)
Out[2]: datetime.timedelta(microseconds=8)

We have worked around this before in xarray; see exact_cftime_datetime_difference.

xarray/core/duck_array_ops.py Outdated Show resolved Hide resolved
xarray/core/duck_array_ops.py Outdated Show resolved Hide resolved
xarray/core/missing.py Outdated Show resolved Hide resolved
xarray/core/missing.py Show resolved Hide resolved
xarray/coding/cftimeindex.py Show resolved Hide resolved
xarray/tests/test_duck_array_ops.py Outdated Show resolved Hide resolved
Copy link
Member

@spencerkclark spencerkclark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @huard for these latest updates. I think this is pretty close.

array = array.astype(np.float64) # [ns]
res = "ns"
else:
array = np.asarray(array).astype("timedelta64[us]").astype(np.float64) # [us]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we discussed about loss of precision, I was more worried about 2 than 1, hence maybe some confusion earlier. I'm concerned 2 will lead to hard to track bugs.

Ah, that makes sense. Indeed I had mainly been thinking about 1. Thanks for the clear example. My apologies for being dense earlier. At least in the short term, I guess I'm still not so worried about 2, primarily because it is already an issue in the existing code (i.e. even for np.datetime64[ns] dates). So this update is not changing the behavior materially; it's just extending the approach to allow for long time ranges between cftime dates1.

Earlier you raised a good question regarding whether there would be a benefit to keeping these values as integers as long as possible to avoid these floating point problems -- it seems like there could be in the case of differentiate and integrate -- however, that would probably require some refactoring to do the unit conversion after the operation involving the numeric times, rather than before, to avoid overflow. I'd be open to discussing this more, but I don't feel like it's necessarily a blocker here, unless you have an example that clearly shows otherwise.

Regardless, I really appreciate your careful thoughts and help with this. This has been useful discussion.


1I guess it's also important to note that we are currently naively differencing cftime dates to produce timedeltas, which even before float conversion is not microsecond-exact:

In [1]: import cftime

In [2]: cftime.DatetimeNoLeap(2000, 1, 1, 0, 0, 0, 123456) - cftime.DatetimeNoLeap(2000, 1, 1, 0, 0, 0, 123455)
Out[2]: datetime.timedelta(microseconds=8)

We have worked around this before in xarray; see exact_cftime_datetime_difference.

xarray/core/duck_array_ops.py Show resolved Hide resolved
xarray/core/duck_array_ops.py Outdated Show resolved Hide resolved
xarray/tests/test_interp.py Outdated Show resolved Hide resolved
xarray/tests/test_duck_array_ops.py Outdated Show resolved Hide resolved
xarray/core/missing.py Show resolved Hide resolved
xarray/tests/test_duck_array_ops.py Outdated Show resolved Hide resolved
xarray/tests/test_missing.py Outdated Show resolved Hide resolved
doc/whats-new.rst Outdated Show resolved Hide resolved
xarray/tests/test_duck_array_ops.py Outdated Show resolved Hide resolved
Copy link
Member

@spencerkclark spencerkclark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great @huard -- a couple more minor things:

  1. Could you update the docstrings for DataArray.interpolate_na and Dataset.interpolate_na to reflect that max_gap can now be a datetime.timedelta object too?
  2. Upstream failures are unrelated, though these Windows failures look real. I think it's likely due to imprecise cftime arithmetic. We should probably use assert_allclose instead of assert_equal for these tests.
Windows failures

2020-01-20T17:16:37.9661843Z ================================== FAILURES ===================================
2020-01-20T17:16:37.9911264Z ____ test_interpolate_na_max_gap_time_specifier[3H-<lambda>0-cftime_range] ____
2020-01-20T17:16:37.9916014Z 
2020-01-20T17:16:37.9919666Z da_time = <xarray.DataArray (t: 11)>
2020-01-20T17:16:37.9927724Z array([nan,  1.,  2., nan, nan,  5., nan, nan, nan, nan, 10.])
2020-01-20T17:16:37.9929019Z Coordinates:
2020-01-20T17:16:37.9932071Z   * t        (t) object 2001-01-01 00:00:00 ... 2001-01-01 10:00:00
2020-01-20T17:16:37.9942882Z max_gap = '3H', transform = <function <lambda> at 0x000001EB4C167E58>
2020-01-20T17:16:37.9943161Z time_range_func = <function cftime_range at 0x000001EB453B0B88>
2020-01-20T17:16:37.9944138Z 
2020-01-20T17:16:37.9947748Z     @requires_bottleneck
2020-01-20T17:16:37.9953395Z     @pytest.mark.parametrize("time_range_func", [pd.date_range, xr.cftime_range])
2020-01-20T17:16:37.9953621Z     @pytest.mark.parametrize("transform", [lambda x: x, lambda x: x.to_dataset(name="a")])
2020-01-20T17:16:37.9953802Z     @pytest.mark.parametrize(
2020-01-20T17:16:37.9953936Z         "max_gap", ["3H", np.timedelta64(3, "h"), pd.to_timedelta("3H")]
2020-01-20T17:16:37.9954062Z     )
2020-01-20T17:16:37.9954239Z     def test_interpolate_na_max_gap_time_specifier(
2020-01-20T17:16:37.9954369Z         da_time, max_gap, transform, time_range_func
2020-01-20T17:16:37.9954491Z     ):
2020-01-20T17:16:37.9954660Z         da_time["t"] = time_range_func("2001-01-01", freq="H", periods=11)
2020-01-20T17:16:37.9954787Z         expected = transform(
2020-01-20T17:16:37.9954917Z             da_time.copy(data=[np.nan, 1, 2, 3, 4, 5, np.nan, np.nan, np.nan, np.nan, 10])
2020-01-20T17:16:37.9955081Z         )
2020-01-20T17:16:37.9955208Z         actual = transform(da_time).interpolate_na("t", max_gap=max_gap)
2020-01-20T17:16:37.9955370Z >       assert_equal(actual, expected)
2020-01-20T17:16:37.9956735Z E       AssertionError: Left and right DataArray objects are not equal
2020-01-20T17:16:37.9957510Z E       
2020-01-20T17:16:37.9957762Z E       Differing values:
2020-01-20T17:16:37.9957918Z E       L
2020-01-20T17:16:37.9958076Z E           array([nan,  1.,  2.,  3.,  4.,  5., nan, nan, nan, nan, 10.])
2020-01-20T17:16:37.9958298Z E       R
2020-01-20T17:16:37.9958457Z E           array([nan,  1.,  2.,  3.,  4.,  5., nan, nan, nan, nan, 10.])
2020-01-20T17:16:37.9958590Z 
2020-01-20T17:16:37.9958782Z xarray\tests\test_missing.py:568: AssertionError
2020-01-20T17:16:37.9970494Z ____ test_interpolate_na_max_gap_time_specifier[3H-<lambda>1-cftime_range] ____
2020-01-20T17:16:37.9970809Z 
2020-01-20T17:16:37.9972168Z da_time = <xarray.DataArray (t: 11)>
2020-01-20T17:16:37.9978870Z array([nan,  1.,  2., nan, nan,  5., nan, nan, nan, nan, 10.])
2020-01-20T17:16:37.9983759Z Coordinates:
2020-01-20T17:16:37.9983942Z   * t        (t) object 2001-01-01 00:00:00 ... 2001-01-01 10:00:00
2020-01-20T17:16:37.9984137Z max_gap = '3H', transform = <function <lambda> at 0x000001EB4C167EE8>
2020-01-20T17:16:37.9984270Z time_range_func = <function cftime_range at 0x000001EB453B0B88>
2020-01-20T17:16:37.9984375Z 
2020-01-20T17:16:37.9984492Z     @requires_bottleneck
2020-01-20T17:16:37.9984644Z     @pytest.mark.parametrize("time_range_func", [pd.date_range, xr.cftime_range])
2020-01-20T17:16:37.9985158Z     @pytest.mark.parametrize("transform", [lambda x: x, lambda x: x.to_dataset(name="a")])
2020-01-20T17:16:37.9985367Z     @pytest.mark.parametrize(
2020-01-20T17:16:37.9985534Z         "max_gap", ["3H", np.timedelta64(3, "h"), pd.to_timedelta("3H")]
2020-01-20T17:16:37.9985659Z     )
2020-01-20T17:16:37.9985823Z     def test_interpolate_na_max_gap_time_specifier(
2020-01-20T17:16:37.9985951Z         da_time, max_gap, transform, time_range_func
2020-01-20T17:16:37.9986072Z     ):
2020-01-20T17:16:37.9986236Z         da_time["t"] = time_range_func("2001-01-01", freq="H", periods=11)
2020-01-20T17:16:37.9986370Z         expected = transform(
2020-01-20T17:16:37.9986538Z             da_time.copy(data=[np.nan, 1, 2, 3, 4, 5, np.nan, np.nan, np.nan, np.nan, 10])
2020-01-20T17:16:37.9986668Z         )
2020-01-20T17:16:37.9986795Z         actual = transform(da_time).interpolate_na("t", max_gap=max_gap)
2020-01-20T17:16:37.9986970Z >       assert_equal(actual, expected)
2020-01-20T17:16:37.9987102Z E       AssertionError: Left and right Dataset objects are not equal
2020-01-20T17:16:37.9987622Z E       
2020-01-20T17:16:37.9987837Z E       
2020-01-20T17:16:37.9987978Z E       Differing data variables:
2020-01-20T17:16:37.9988126Z E       L   a        (t) float64 nan 1.0 2.0 3.0 4.0 5.0 nan nan nan nan 10.0
2020-01-20T17:16:37.9988319Z E       R   a        (t) float64 nan 1.0 2.0 3.0 4.0 5.0 nan nan nan nan 10.0
2020-01-20T17:16:37.9988444Z 
2020-01-20T17:16:37.9988610Z xarray\tests\test_missing.py:568: AssertionError
2020-01-20T17:16:37.9988799Z _ test_interpolate_na_max_gap_time_specifier[max_gap1-<lambda>0-cftime_range] _
2020-01-20T17:16:37.9988982Z 
2020-01-20T17:16:37.9989126Z da_time = <xarray.DataArray (t: 11)>
2020-01-20T17:16:37.9989276Z array([nan,  1.,  2., nan, nan,  5., nan, nan, nan, nan, 10.])
2020-01-20T17:16:37.9989452Z Coordinates:
2020-01-20T17:16:37.9989600Z   * t        (t) object 2001-01-01 00:00:00 ... 2001-01-01 10:00:00
2020-01-20T17:16:37.9989770Z max_gap = numpy.timedelta64(3,'h')
2020-01-20T17:16:37.9989957Z transform = <function <lambda> at 0x000001EB4C167E58>
2020-01-20T17:16:37.9990106Z time_range_func = <function cftime_range at 0x000001EB453B0B88>
2020-01-20T17:16:37.9990229Z 
2020-01-20T17:16:37.9990403Z     @requires_bottleneck
2020-01-20T17:16:37.9990551Z     @pytest.mark.parametrize("time_range_func", [pd.date_range, xr.cftime_range])
2020-01-20T17:16:37.9990740Z     @pytest.mark.parametrize("transform", [lambda x: x, lambda x: x.to_dataset(name="a")])
2020-01-20T17:16:37.9991073Z     @pytest.mark.parametrize(
2020-01-20T17:16:37.9991510Z         "max_gap", ["3H", np.timedelta64(3, "h"), pd.to_timedelta("3H")]
2020-01-20T17:16:37.9991678Z     )
2020-01-20T17:16:37.9991809Z     def test_interpolate_na_max_gap_time_specifier(
2020-01-20T17:16:37.9991939Z         da_time, max_gap, transform, time_range_func
2020-01-20T17:16:37.9992101Z     ):
2020-01-20T17:16:37.9992243Z         da_time["t"] = time_range_func("2001-01-01", freq="H", periods=11)
2020-01-20T17:16:37.9992589Z         expected = transform(
2020-01-20T17:16:37.9992761Z             da_time.copy(data=[np.nan, 1, 2, 3, 4, 5, np.nan, np.nan, np.nan, np.nan, 10])
2020-01-20T17:16:37.9992893Z         )
2020-01-20T17:16:37.9993064Z         actual = transform(da_time).interpolate_na("t", max_gap=max_gap)
2020-01-20T17:16:37.9993236Z >       assert_equal(actual, expected)
2020-01-20T17:16:37.9993376Z E       AssertionError: Left and right DataArray objects are not equal
2020-01-20T17:16:37.9993536Z E       
2020-01-20T17:16:37.9993671Z E       Differing values:
2020-01-20T17:16:37.9993962Z E       L
2020-01-20T17:16:37.9994128Z E           array([nan,  1.,  2.,  3.,  4.,  5., nan, nan, nan, nan, 10.])
2020-01-20T17:16:37.9994270Z E       R
2020-01-20T17:16:37.9994404Z E           array([nan,  1.,  2.,  3.,  4.,  5., nan, nan, nan, nan, 10.])
2020-01-20T17:16:37.9994513Z 
2020-01-20T17:16:37.9994736Z xarray\tests\test_missing.py:568: AssertionError
2020-01-20T17:16:37.9994902Z _ test_interpolate_na_max_gap_time_specifier[max_gap1-<lambda>1-cftime_range] _
2020-01-20T17:16:37.9995011Z 
2020-01-20T17:16:37.9995137Z da_time = <xarray.DataArray (t: 11)>
2020-01-20T17:16:37.9995463Z array([nan,  1.,  2., nan, nan,  5., nan, nan, nan, nan, 10.])
2020-01-20T17:16:37.9995586Z Coordinates:
2020-01-20T17:16:37.9995871Z   * t        (t) object 2001-01-01 00:00:00 ... 2001-01-01 10:00:00
2020-01-20T17:16:37.9996025Z max_gap = numpy.timedelta64(3,'h')
2020-01-20T17:16:37.9996144Z transform = <function <lambda> at 0x000001EB4C167EE8>
2020-01-20T17:16:37.9996314Z time_range_func = <function cftime_range at 0x000001EB453B0B88>
2020-01-20T17:16:37.9996419Z 
2020-01-20T17:16:37.9996535Z     @requires_bottleneck
2020-01-20T17:16:37.9996659Z     @pytest.mark.parametrize("time_range_func", [pd.date_range, xr.cftime_range])
2020-01-20T17:16:37.9996878Z     @pytest.mark.parametrize("transform", [lambda x: x, lambda x: x.to_dataset(name="a")])
2020-01-20T17:16:37.9997010Z     @pytest.mark.parametrize(
2020-01-20T17:16:37.9997175Z         "max_gap", ["3H", np.timedelta64(3, "h"), pd.to_timedelta("3H")]
2020-01-20T17:16:37.9997293Z     )
2020-01-20T17:16:37.9998011Z     def test_interpolate_na_max_gap_time_specifier(
2020-01-20T17:16:37.9998222Z         da_time, max_gap, transform, time_range_func
2020-01-20T17:16:37.9998361Z     ):
2020-01-20T17:16:37.9998504Z         da_time["t"] = time_range_func("2001-01-01", freq="H", periods=11)
2020-01-20T17:16:37.9998680Z         expected = transform(
2020-01-20T17:16:37.9998829Z             da_time.copy(data=[np.nan, 1, 2, 3, 4, 5, np.nan, np.nan, np.nan, np.nan, 10])
2020-01-20T17:16:37.9999016Z         )
2020-01-20T17:16:37.9999542Z         actual = transform(da_time).interpolate_na("t", max_gap=max_gap)
2020-01-20T17:16:37.9999700Z >       assert_equal(actual, expected)
2020-01-20T17:16:37.9999888Z E       AssertionError: Left and right Dataset objects are not equal
2020-01-20T17:16:38.0000044Z E       
2020-01-20T17:16:38.0000198Z E       
2020-01-20T17:16:38.0000340Z E       Differing data variables:
2020-01-20T17:16:38.0000534Z E       L   a        (t) float64 nan 1.0 2.0 3.0 4.0 5.0 nan nan nan nan 10.0
2020-01-20T17:16:38.0000684Z E       R   a        (t) float64 nan 1.0 2.0 3.0 4.0 5.0 nan nan nan nan 10.0
2020-01-20T17:16:38.0000806Z 
2020-01-20T17:16:38.0000980Z xarray\tests\test_missing.py:568: AssertionError
2020-01-20T17:16:38.0001330Z _ test_interpolate_na_max_gap_time_specifier[max_gap2-<lambda>0-cftime_range] _
2020-01-20T17:16:38.0001436Z 
2020-01-20T17:16:38.0001591Z da_time = <xarray.DataArray (t: 11)>
2020-01-20T17:16:38.0001829Z array([nan,  1.,  2., nan, nan,  5., nan, nan, nan, nan, 10.])
2020-01-20T17:16:38.0001949Z Coordinates:
2020-01-20T17:16:38.0002115Z   * t        (t) object 2001-01-01 00:00:00 ... 2001-01-01 10:00:00
2020-01-20T17:16:38.0002238Z max_gap = Timedelta('0 days 03:00:00')
2020-01-20T17:16:38.0002374Z transform = <function <lambda> at 0x000001EB4C167E58>
2020-01-20T17:16:38.0002537Z time_range_func = <function cftime_range at 0x000001EB453B0B88>
2020-01-20T17:16:38.0002639Z 
2020-01-20T17:16:38.0002755Z     @requires_bottleneck
2020-01-20T17:16:38.0002918Z     @pytest.mark.parametrize("time_range_func", [pd.date_range, xr.cftime_range])
2020-01-20T17:16:38.0003047Z     @pytest.mark.parametrize("transform", [lambda x: x, lambda x: x.to_dataset(name="a")])
2020-01-20T17:16:38.0003199Z     @pytest.mark.parametrize(
2020-01-20T17:16:38.0003328Z         "max_gap", ["3H", np.timedelta64(3, "h"), pd.to_timedelta("3H")]
2020-01-20T17:16:38.0003446Z     )
2020-01-20T17:16:38.0003609Z     def test_interpolate_na_max_gap_time_specifier(
2020-01-20T17:16:38.0003735Z         da_time, max_gap, transform, time_range_func
2020-01-20T17:16:38.0003851Z     ):
2020-01-20T17:16:38.0004009Z         da_time["t"] = time_range_func("2001-01-01", freq="H", periods=11)
2020-01-20T17:16:38.0004132Z         expected = transform(
2020-01-20T17:16:38.0004346Z             da_time.copy(data=[np.nan, 1, 2, 3, 4, 5, np.nan, np.nan, np.nan, np.nan, 10])
2020-01-20T17:16:38.0004501Z         )
2020-01-20T17:16:38.0004625Z         actual = transform(da_time).interpolate_na("t", max_gap=max_gap)
2020-01-20T17:16:38.0004780Z >       assert_equal(actual, expected)
2020-01-20T17:16:38.0004919Z E       AssertionError: Left and right DataArray objects are not equal
2020-01-20T17:16:38.0005088Z E       
2020-01-20T17:16:38.0005209Z E       Differing values:
2020-01-20T17:16:38.0005323Z E       L
2020-01-20T17:16:38.0005475Z E           array([nan,  1.,  2.,  3.,  4.,  5., nan, nan, nan, nan, 10.])
2020-01-20T17:16:38.0005604Z E       R
2020-01-20T17:16:38.0005725Z E           array([nan,  1.,  2.,  3.,  4.,  5., nan, nan, nan, nan, 10.])
2020-01-20T17:16:38.0005866Z 
2020-01-20T17:16:38.0005987Z xarray\tests\test_missing.py:568: AssertionError
2020-01-20T17:16:38.0006112Z _ test_interpolate_na_max_gap_time_specifier[max_gap2-<lambda>1-cftime_range] _
2020-01-20T17:16:38.0006261Z 
2020-01-20T17:16:38.0006381Z da_time = <xarray.DataArray (t: 11)>
2020-01-20T17:16:38.0006502Z array([nan,  1.,  2., nan, nan,  5., nan, nan, nan, nan, 10.])
2020-01-20T17:16:38.0006657Z Coordinates:
2020-01-20T17:16:38.0006780Z   * t        (t) object 2001-01-01 00:00:00 ... 2001-01-01 10:00:00
2020-01-20T17:16:38.0006899Z max_gap = Timedelta('0 days 03:00:00')
2020-01-20T17:16:38.0007059Z transform = <function <lambda> at 0x000001EB4C167EE8>
2020-01-20T17:16:38.0007598Z time_range_func = <function cftime_range at 0x000001EB453B0B88>
2020-01-20T17:16:38.0007768Z 
2020-01-20T17:16:38.0007993Z     @requires_bottleneck
2020-01-20T17:16:38.0008175Z     @pytest.mark.parametrize("time_range_func", [pd.date_range, xr.cftime_range])
2020-01-20T17:16:38.0008326Z     @pytest.mark.parametrize("transform", [lambda x: x, lambda x: x.to_dataset(name="a")])
2020-01-20T17:16:38.0008506Z     @pytest.mark.parametrize(
2020-01-20T17:16:38.0008651Z         "max_gap", ["3H", np.timedelta64(3, "h"), pd.to_timedelta("3H")]
2020-01-20T17:16:38.0008838Z     )
2020-01-20T17:16:38.0008988Z     def test_interpolate_na_max_gap_time_specifier(
2020-01-20T17:16:38.0009133Z         da_time, max_gap, transform, time_range_func
2020-01-20T17:16:38.0009307Z     ):
2020-01-20T17:16:38.0009453Z         da_time["t"] = time_range_func("2001-01-01", freq="H", periods=11)
2020-01-20T17:16:38.0009595Z         expected = transform(
2020-01-20T17:16:38.0009786Z             da_time.copy(data=[np.nan, 1, 2, 3, 4, 5, np.nan, np.nan, np.nan, np.nan, 10])
2020-01-20T17:16:38.0009943Z         )
2020-01-20T17:16:38.0010092Z         actual = transform(da_time).interpolate_na("t", max_gap=max_gap)
2020-01-20T17:16:38.0011207Z >       assert_equal(actual, expected)
2020-01-20T17:16:38.0011341Z E       AssertionError: Left and right Dataset objects are not equal
2020-01-20T17:16:38.0011460Z E       
2020-01-20T17:16:38.0011606Z E       
2020-01-20T17:16:38.0011727Z E       Differing data variables:
2020-01-20T17:16:38.0011867Z E       L   a        (t) float64 nan 1.0 2.0 3.0 4.0 5.0 nan nan nan nan 10.0
2020-01-20T17:16:38.0012029Z E       R   a        (t) float64 nan 1.0 2.0 3.0 4.0 5.0 nan nan nan nan 10.0
2020-01-20T17:16:38.0012135Z 
2020-01-20T17:16:38.0012255Z xarray\tests\test_missing.py:568: AssertionError

@dcherian dcherian mentioned this pull request Jan 20, 2020
11 tasks
@dcherian
Copy link
Contributor

Is the switch to assert_allclose all that is left here?

@huard
Copy link
Contributor Author

huard commented Jan 24, 2020

I think so. If there's a branch fixing the assert_allclose failures, I can merge it here.

@dcherian
Copy link
Contributor

I think it's just this function: def test_interpolate_na_max_gap_time_specifier. If you switch assert_equal to assert_allclose things should work

@huard
Copy link
Contributor Author

huard commented Jan 24, 2020

It seems like assert_allclose doesn't like nans. Other suggestion ?

@spencerkclark
Copy link
Member

It seems like assert_allclose doesn't like nans. Other suggestion ?

Hmm...I think it should treat NaNs as equal by default. Are you using the version defined here?

def assert_allclose(a, b, **kwargs):

@huard
Copy link
Contributor Author

huard commented Jan 24, 2020

Ah ! No, I tried numpy.testing's version. Works now. Thanks !

@dcherian
Copy link
Contributor

Thanks @huard and @spencerkclark. This took a solid amount of work.

@spencerkclark merge if you think it's good to go?

@spencerkclark
Copy link
Member

Indeed thanks @huard -- I'll give this another once-over tomorrow. If everything checks out, which I anticipate it will, I'll merge it then.

Copy link
Member

@spencerkclark spencerkclark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@huard in looking things over again, I noticed a few things related to the implementation of timedelta_to_numeric that I think need attention. I think it would be helpful to have some explicit tests for it too.

Sorry for not catching these earlier.

xarray/core/duck_array_ops.py Outdated Show resolved Hide resolved
xarray/core/duck_array_ops.py Outdated Show resolved Hide resolved
xarray/core/duck_array_ops.py Outdated Show resolved Hide resolved
xarray/core/duck_array_ops.py Outdated Show resolved Hide resolved
Copy link
Member

@spencerkclark spencerkclark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks @huard, in particular for the new test. It would be great if you could take care of some last few nits; then I think things should be ready to merge.

I think it's clearer if both functions datetime_to_numeric and timedelta_to_numeric are side by side.

I see what you mean, particularly given the similarity in the handling of the datetime_unit argument. I think it's fine to leave them side by side for now. We can always move them later, given they are private functions.

xarray/core/duck_array_ops.py Outdated Show resolved Hide resolved
xarray/core/duck_array_ops.py Outdated Show resolved Hide resolved
xarray/core/duck_array_ops.py Show resolved Hide resolved
huard and others added 4 commits January 26, 2020 03:53
…named array to value for pd_timedelta_to_float. removed pd_timedeltaindex_to_float.
@spencerkclark spencerkclark merged commit 8772355 into pydata:master Jan 26, 2020
@spencerkclark
Copy link
Member

Thanks @huard!

@huard
Copy link
Contributor Author

huard commented Jan 26, 2020

Thanks @spencerkclark for shepherding this to completion.

dcherian added a commit to dcherian/xarray that referenced this pull request Jan 27, 2020
* master:
  Add support for CFTimeIndex in get_clean_interp_index (pydata#3631)
  sel with categorical index (pydata#3670)
  bump min deps for 0.15 (pydata#3713)
  setuptools-scm and isort tweaks (pydata#3720)
  Allow binned coordinates on 1D plots y-axis. (pydata#3685)
  apply_ufunc: Add meta kwarg + bump dask to 2.2 (pydata#3660)
  setuptools-scm and one-liner setup.py (pydata#3714)
  Feature/align in dot (pydata#3699)
  ENH: enable `H5NetCDFStore` to work with already open h5netcdf.File a… (pydata#3618)
  One-off isort run (pydata#3705)
  hardcoded xarray.__all__ (pydata#3703)
  Bump mypy to v0.761 (pydata#3704)
  remove DataArray and Dataset constructor deprecations for 0.15  (pydata#3560)
  Tests for variables with units (pydata#3654)
  Add an example notebook using apply_ufunc to vectorize 1D functions (pydata#3629)
  Use encoding['dtype'] over data.dtype when possible within CFMaskCoder.encode (pydata#3652)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

interp with long cftime coordinates raises an error
4 participants