New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

CFTimeIndex Resampling #2593

Merged

shoyer merged 72 commits into pydata:master from Ouranosinc:master

Feb 3, 2019

Contributor

jwenfai commented Dec 5, 2018

I would appreciate some feedback on why this implementation for CFTimeIndex resampling doesn't match pandas' output 100%.

Tentative attempt at addressing Adding resample functionality to CFTimeIndex #2191 (resampling CFTimeIndex). Downsampling produces results that match pandas' results in all the tests done thus far. However, upsampling has trouble assigning the right values to the right bins.
Tests (test_cftimeindex_resample.py) created for standard calendars but not for non-standard calendars (360 days etc.). Contents from test_cftimeindex_resample.py will be inserted into test_dataarray.py and test_cftimeindex_resample.py will be deleted once resampling implementation is finalized. Files found in the tests/temp folder is meant to highlight resampling discrepancies between pandas and this implementaion. Both the files and the folder will be removed once resampling implementation is finalized.
Not fully documented. Docstrings have yet to be added to resample_cftime.py. There were no doctrings from https://github.com/pandas-dev/pandas/blob/master/pandas/core/resample.py (which is where the codes were ported from) that could be conveniently copied.

jwenfai added 12 commits

November 9, 2018 16:58


          First implementation of resampling for CFTimeIndex.

daa3a71


          First implementation of resampling for CFTimeIndex, cleaned.

f9f3347


          First implementation of resampling for CFTimeIndex, cleaned.


          First implementation of resampling for CFTimeIndex, cleaned.

89f418a


          First implementation of resampling for CFTimeIndex.

39c9d11


          First implementation of resampling for CFTimeIndex,

073b8e0

more bugs fixed, cleaned.


          First implementation of resampling for CFTimeIndex, test file written.

193c4c4


          First implementation of resampling for CFTimeIndex, test file written…

2c97738

…, cleaned.


          First implementation of resampling for CFTimeIndex, test file written…

9993ed9

…, cleaned.


          First implementation of resampling for CFTimeIndex, test file written…

f01745c

…, cleaned.


          First implementation of resampling for CFTimeIndex, test file written…

ffbf265

…, cleaned.


          Merge pull request #1 from jwenfai/resample-v2-clean

e64fedb

Resample v2 clean

pep8speaks commented Dec 5, 2018 •

edited

Loading

Hello @jwenfai! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on February 02, 2019 at 20:46 Hours UTC

max-sixty reviewed

View reviewed changes

xarray/tests/test_cftimeindex_resample.py Outdated

+                                        list(itertools.product(
+                                            ['left', 'right'],
+                                            ['left', 'right'],
+                                            ['2MS', '2M', '3MS', '3M', '7MS', '7M'])))

Collaborator

max-sixty Dec 5, 2018

Brief note on the test approach - overall looks great - on these items, you can put each set of params in its own parametrize decorator, and avoid making the products yourself

(no need to change this time though)

Contributor Author

jwenfai Dec 17, 2018

Thanks for the comment, I made the change, much better than the torturous code I was writing before!

spencerkclark reviewed

View reviewed changes

Member

spencerkclark left a comment

@jwenfai I'm excited to see some progress on this! I may be a bit slow these next few days, but I'll try to provide some more extensive feedback soon.

xarray/tests/temp/cftime_resample_pandas_comparison.py Outdated Show resolved Hide resolved

jwenfai and others added 4 commits

December 8, 2018 01:43


          Docstrings for resample_cftime.py written. Upsample still not fixed.

2850dd5


          Fixed PEP8 and test parametrization.

770b778


          PEP8

181e82c


          Merge pull request #3 from Ouranosinc/PEP8

5a41ee2

PEP8

spencerkclark reviewed

View reviewed changes

Member

spencerkclark left a comment

@jwenfai thanks for your patience; I was away at a conference all of last week. Here are some initial comments on this PR, which is a very good start.

I'll probably need a few more passes on this, particularly to understand the issues with the upsampling portion.

xarray/core/resample_cftime.py Outdated

+              def _get_time_bins(index, freq, closed, label, base):
+                  # This portion of code comes from TimeGrouper __init__ #
+                  end_types = {'M', 'A'}

Member

spencerkclark Dec 16, 2018

Perhaps it might be cleaner to separate some of this logic into another method. For example define:

def _default_closed_or_label(freq):
    if freq._freq in {'M', 'A'}:
        return 'right'
    else:
        return 'left'

Then within _get_time_bins you could use something like this:

if closed is None:
    closed = _default_closed_or_label(freq)

if label is None:
    label = _default_closed_or_label(freq)

xarray/core/resample_cftime.py Outdated

+              def _adjust_bin_edges(binner, ax_values, freq):
+                  # Some hacks for > daily data, see #1471, #1458, #1483
+                  if freq._freq not in ['D', 'H', 'T', 'min', 'S']:

Member

spencerkclark Dec 16, 2018

It might be worth adding a CFTIME_TICKS variable in cftime_offsets.py that we could import and use for these instance checks.

# pandas defines these offsets as "Tick" objects, which for instance have 
# distinct behavior from monthly or longer frequencies in resample.
CFTIME_TICKS = (Day, Hour, Minute, Second)

Then here and in other methods this check could just be something like if not isinstance(freq, CFTIME_TICKS) or if isinstance(freq, CFTIME_TICKS).

xarray/core/resample_cftime.py Outdated

		return fresult, lresult


		def _offset_timedelta(offset):

Member

spencerkclark Dec 16, 2018

It might make sense to add an as_timedelta method (where applicable) to the cftime offset objects. For example:

class Day(BaseCFTimeOffset):
    _freq = 'D'

    def as_timedelta(self):
        return timedelta(days=self.n)

    def __apply__(self, other):
        return other + self.as_timedelta()

That way we would not need this method to do the translation.

xarray/core/resample_cftime.py Outdated

+                  base = base % offset.n
+                  start_day = normalize_date(first)
+                  base_td = datetime.timedelta(0)
+                  if offset._freq == 'D':

Member

spencerkclark Dec 16, 2018

If we defined an as_timedelta method for cftime offsets (as described below), we could instead replace this conditional block with one line:

base_td = type(offset)(n=base).as_timedelta()

xarray/core/resample_cftime.py Outdated

		return binner, labels


		def _adjust_bin_edges(binner, ax_values, freq):

Member

spencerkclark Dec 16, 2018

This method does not appear to be used in this implementation. Should it be?

Contributor Author

jwenfai Dec 17, 2018

Yeah, I also noticed I've been too slavishly copying pandas logic. This and some other unnecessary code have been/will be removed.

xarray/tests/test_cftimeindex_resample.py Outdated

+                                           ['left', 'right'],
+                                           ['2MS', '2M', '3MS', '3M', '7MS', '7M'])))
+              def test_downsampler(closed, label, freq):
+                  downsamp_series = series(pd_index()).resample(

Member

spencerkclark Dec 16, 2018

It seems like things might be more apples to apples in this test if we compared the results of resampling a DataArray indexed using a DatetimeIndex to the results of resampling a DataArray indexed using a CFTimeIndex (with a standard calendar type).

This would also allow us to make use of xarray's testing methods, like xarray.testing.assert_equal, which checks the equivalence of two DataArrays (including the equivalence of their coordinates, and NaN placement).

xarray/core/resample.py Outdated

+                      # from ..coding.cftimeindex import CFTimeIndex
+                      import cftime as cf
+                      import numpy as np
+                      if isinstance(self._obj[self._dim].values[0], cf.datetime):

Member

spencerkclark Dec 16, 2018

I think something like if isinstance(self._obj.indexes[self._dim], CFTimeIndex) might be cleaner here.

xarray/core/resample.py Outdated

+                      import numpy as np
+                      if isinstance(self._obj[self._dim].values[0], cf.datetime):
+                          t = self._obj[self._dim]
+                          x = np.insert([td.total_seconds() for td in

Member

spencerkclark Dec 16, 2018

I think you can just make use of xarray.core.utils.datetime_to_numeric here, e.g.

x = datetime_to_numeric(t, datetime_unit='s')

xarray/tests/test_cftimeindex_resample.py Outdated

+              def test_downsampler(closed, label, freq):
+                  downsamp_series = series(pd_index()).resample(
+                      freq, closed=closed, label=label).mean().dropna()
+                  downsamp_da = da(xr_index()).resample(

Member

spencerkclark Dec 16, 2018

pytest fixtures are typically provided as arguments to the test functions, rather than called directly.

xarray/tests/test_cftimeindex_resample.py Outdated

+              @pytest.fixture()
+              def xr_index():
+                  return xr.cftime_range('2000-01-01', periods=30, freq='MS', tz='UTC')

Member

spencerkclark Dec 16, 2018

cftime_range does not support a tz option.

Contributor Author

jwenfai Dec 17, 2018

I'll remove that option but someone needs to fix this misleading doc

http://xarray.pydata.org/en/stable/generated/xarray.cftime_range.html
xarray.cftime_range(start=None, end=None, periods=None, freq='D', tz=None, normalize=False, name=None, closed=None, calendar='standard')

Member

spencerkclark Dec 17, 2018

Sorry about that! Indeed that signature is misleading; see #2613 for a fix.

Contributor Author

jwenfai commented Dec 17, 2018

@spencerkclark Thanks for the detailed review! I'll fix up my code over the next few days.

I haven't completely solved the upsampling issue yet but I think I might have some clues as to what's happening. Timedelta operations on cftime.datetime does not always return correct values. Sometimes, they are a few microseconds or one second off.

The issue can be sidestepped by shifting the the bins 1 second forward for closed=='right' and 1 second back for closed=='left' in groupby.py, but this obviously introduces issues for resampling operations at the second and microsecond resolution. This workaround doesn't pass all the tests. An extra time bin is still sometimes created. You'll see what I mean when I make a new commit sometime next week.

spencerkclark mentioned this pull request

Remove tz argument in cftime_range #2613

Merged

jwenfai added 5 commits

December 18, 2018 08:10


          Test file fixes and other optimizations (2018-12-16 @spencerclark and…

6b948c5

… 2018-12-05 @max-sixty GitHub reviews for resample-v2-clean pull request). Not cleaned.


          Merge pull request #1 from Ouranosinc/master

97c0948

Get PEP8 changes from Ouranosinc.


          Merge remote-tracking branch 'origin/resample-v2-clean' into resample…

63d25ab

…-v2-clean

# Conflicts:
#	xarray/tests/test_cftimeindex_resample.py


          Test file fixes and other optimizations (2018-12-16 @spencerclark and…

05af869

… 2018-12-05 @max-sixty GitHub reviews for resample-v2-clean pull request). Cleaned.


          Merge branch 'resample-v2-upsample' into resample-v2-clean

85f1a84

# Conflicts:
#	xarray/core/resample.py
#	xarray/tests/test_cftimeindex_resample.py

jwenfai mentioned this pull request

Resample v2 clean Ouranosinc/xarray#4

Merged


          Merge pull request #4 from jwenfai/resample-v2-clean

2e8ced3

Resample v2 clean

Member

spencerkclark commented Dec 19, 2018

@jwenfai thanks for the updates. It looks like there are some merge conflicts that are preventing our CI from running. Could you please resolve those when you get chance, so we can see those results?

jwenfai added 2 commits

January 29, 2019 02:15


          Merge remote-tracking branch 'origin/master'

31ccebf


          Merge pull request #15 from jwenfai/master

8ac6f76

 Moved full_index and first_items generation logic to a helper function

tlogan2000 mentioned this pull request

module 'xclim' has no attribute 'icclim' Ouranosinc/xclim#149

Closed

Member

fmaussion commented Jan 29, 2019

I can't diagnose what's wrong from the error message (something to do with conda it seems)

Some connection error. I restarted Travis, let's see if this happens again.


          Merge branch 'master' into master

8dbee52

Member

shoyer commented Feb 1, 2019

It looks like tests are passing now. I'm going to give this another look over and then (probably) merge

shoyer reviewed

View reviewed changes

xarray/core/groupby.py Outdated

                               # TODO: sort instead of raising an error
                               raise ValueError('index must be monotonic for resampling')
                           s = pd.Series(np.arange(index.size), index)

Member

shoyer Feb 1, 2019

Could you make this object s inside the helper function instead? It's not needed outside here

xarray/tests/test_formatting.py Outdated

@@ @@ -1,6 +1,8 @@ @@
               # -*- coding: utf-8 -*-
               from textwrap import dedent
+              from textwrap import dedent

Member

shoyer Feb 1, 2019

It looks like changing from another PR have leaked in here? Let's try to figure that out...

Contributor Author

jwenfai Feb 1, 2019

Seems to have accidentally crept in when I merged changes from pydata/master into my local repo 4~5 days back. Here's what I managed to trace (from latest to earliest instance of test_formatting.py being changed):
Ouranosinc#14
jwenfai@31ccebf
jwenfai@9fbb016

Member

shoyer Feb 2, 2019

It looks like it’s just a bad merge — these tests are now duplicated twice. You can simply delete the redundant code and push a new commit.

jwenfai added 4 commits

February 1, 2019 16:45


          In groupby.py, moved s to _get_index_and_items helper function.

afad30d


          Removed redundant code from test_formatting.py due to bad merge.

1381dab


          Merge pull request #16 from jwenfai/master

Fix helper function and undo bad merge


          Merge branch 'master' into master

shoyer reviewed

View reviewed changes

xarray/tests/test_formatting.py Outdated

@@ @@ -189,6 +189,53 @@ def test_attribute_repr(self): @@
                       assert '\n' not in newlines
                       assert '\t' not in tabs
+                  def test_diff_dataset_repr(self):

Member

shoyer Feb 2, 2019

You still need to delete this repeated method

Contributor Author

jwenfai Feb 2, 2019

Sorry about that, didn't know how I missed it.

xarray/tests/test_dataarray.py Outdated Show resolved Hide resolved

xarray/core/groupby.py Outdated

+                      if isinstance(grouper, CFTimeGrouper):
+                          first_items = grouper.first_items(index)
+                          full_index = first_items.index
+                          if first_items.isnull().any():

Member

shoyer Feb 2, 2019

if you merge in master against, you could switch this block back to using Series.dropna().

Contributor Author

jwenfai Feb 2, 2019

Done.

jwenfai and others added 6 commits

February 2, 2019 15:17


          Merge branch 'pydata-master'

6c4b609


          Removed redundant test and simplify code now that dropna is implemented.

59f1f94


          Merge branch 'master' into master

db62a96


          Merge pull request #17 from jwenfai/master

6edb45a

 Removed redundant test and simplify code now that dropna is implemented.


          delete unnecessary test

f7f2c38


          eliminate some repetition

ef68960

Member

shoyer commented Feb 2, 2019

I'm going to merge this when tests pass

Member

spencerkclark commented Feb 2, 2019

Sounds good @shoyer, thanks for bringing this to the finish line.

jwenfai mentioned this pull request

Quarter offset implemented (base is now latest pydata-master). #2721

Merged

3 tasks

Contributor Author

jwenfai commented Feb 2, 2019

All tests passed. Thanks, @spencerkclark and @shoyer, for all the help!

shoyer merged commit d8ff079 into pydata:master

Member

shoyer commented Feb 3, 2019

thanks @jwenfai and @spencerkclark !

shoyer mentioned this pull request

WIP: sketch of resample support for CFTimeIndex #2458

Closed

spencerkclark mentioned this pull request

Adding resample functionality to CFTimeIndex #2191

Closed

dcherian pushed a commit to yohai/xarray that referenced this pull request


          Merge branch 'master' into yohai-ds_scatter

4e41fc3

* master:
  remove xfail from test_cross_engine_read_write_netcdf4 (pydata#2741)
  Reenable cross engine read write netCDF test (pydata#2739)
  remove bottleneck dev build from travis, this test env was failing to build (pydata#2736)
  CFTimeIndex Resampling (pydata#2593)
  add tests for handling of empty pandas objects in constructors (pydata#2735)
  dropna() for a Series indexed by a CFTimeIndex (pydata#2734)
  deprecate compat & encoding (pydata#2703)
  Implement integrate (pydata#2653)
  ENH: resample methods with tolerance (pydata#2716)
  improve error message for invalid encoding (pydata#2730)
  silence a couple of warnings (pydata#2727)

spencerkclark mentioned this pull request

xarray potential inconstistencies with cftime #2437

Closed

spencerkclark mentioned this pull request

Resampling daily input data to half-yearly data generates an excessive time coordinate #2787

Closed

spencerkclark mentioned this pull request

.sel() failures when using latest cftime release (v1.0.4) #3426

Closed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment