Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

An asfreq method without resample, and clarify or improve resample().asfreq() behavior for down-sampling #3242

Open
amatsukawa opened this issue Aug 22, 2019 · 2 comments
Labels

Comments

@amatsukawa
Copy link
Contributor

amatsukawa commented Aug 22, 2019

MCVE Code Sample

# Your code here
>>> import numpy as np
>>> import xarray as xr
>>> import pandas as pd

>>> data = np.random.random(300)

# Make a time grid that doesn't start exactly on the hour.
>>> time = pd.date_range('2019-01-01', periods=300, freq='T') + pd.Timedelta('3T')
>>> time
DatetimeIndex(['2019-01-01 00:03:00', '2019-01-01 00:04:00',
               '2019-01-01 00:05:00', '2019-01-01 00:06:00',
               '2019-01-01 00:07:00', '2019-01-01 00:08:00',
               '2019-01-01 00:09:00', '2019-01-01 00:10:00',
               '2019-01-01 00:11:00', '2019-01-01 00:12:00',
               ...
               '2019-01-01 04:53:00', '2019-01-01 04:54:00',
               '2019-01-01 04:55:00', '2019-01-01 04:56:00',
               '2019-01-01 04:57:00', '2019-01-01 04:58:00',
               '2019-01-01 04:59:00', '2019-01-01 05:00:00',
               '2019-01-01 05:01:00', '2019-01-01 05:02:00'],
              dtype='datetime64[ns]', length=300, freq='T')

>>> da = xr.DataArray(data, dims=['time'], coords={'time': time})
>>> resampled = da.resample(time='H').asfreq()
>>> resampled
<xarray.DataArray (time: 6)>
array([0.478601, 0.488425, 0.496322, 0.479256, 0.523395, 0.201718])
Coordinates:
  * time     (time) datetime64[ns] 2019-01-01 ... 2019-01-01T05:00:00

# The value is actually the mean over the time window, eg. the third value is:
>>> da.loc['2019-01-01T02:00:00':'2019-01-01T02:59:00'].mean()
<xarray.DataArray ()>
array(0.496322)

Expected Output

Docs say this:

Return values of original object at the new up-sampling frequency; 
essentially a re-index with new times set to NaN.

I suppose this doc is not technically wrong, since upon careful reading, I realize it does not define a behavior for down-sampling. But it's easy to: (1) assume the same behavior (reindexing) for down-sampling and up-sampling and/or (2) expect behavior similar to df.asfreq() in pandas.

Problem Description

I would argue for an asfreq method without resampling that matches the pandas behavior, which AFAIK, is to reindex starting at the first timestamp, at the specified interval.

>>> df = pd.DataFrame(da, index=time)
>>> df.asfreq('H')
                            0
2019-01-01 00:03:00  0.065304
2019-01-01 01:03:00  0.325814
2019-01-01 02:03:00  0.841201
2019-01-01 03:03:00  0.610266
2019-01-01 04:03:00  0.613906

This can currently easily be achieved, so it's not a blocker.

>>> da.reindex(time=pd.date_range(da.time[0].values, da.time[-1].values, freq='H'))
<xarray.DataArray (time: 5)>
array([0.065304, 0.325814, 0.841201, 0.610266, 0.613906])
Coordinates:
  * time     (time) datetime64[ns] 2019-01-01T00:03:00 ... 2019-01-01T04:03:00

Why I argue for asfreq functionality outside of resampling is that asfreq(freq) in pandas is purely a reindex, compared to eg resample(freq).first() which would give you a different time index.

Output of xr.show_versions()

Still on python27, show_versions actually throws an exception, because some HDF5 library doesn't have a magic property. I don't think this detail is relevant here though.

``` >>> xr.__version__ u'0.11.3' ```
@stale
Copy link

stale bot commented Jul 24, 2021

In order to maintain a list of currently relevant issues, we mark issues as stale after a period of inactivity

If this issue remains relevant, please comment here or remove the stale label; otherwise it will be marked as closed automatically

@stale stale bot added the stale label Jul 24, 2021
@dcherian dcherian removed the stale label Apr 18, 2022
@dcherian
Copy link
Contributor

This does seem useless since we can get the same behaviour with just resample(time="H").mean(). I'll mark it as a bug.

@dcherian dcherian added the bug label Apr 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants