New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: resample().interpolate() #12925

Closed
jorisvandenbossche opened this Issue Apr 19, 2016 · 5 comments

Comments

Projects
None yet
3 participants
@jorisvandenbossche
Member

jorisvandenbossche commented Apr 19, 2016

From #12449 (comment)

When downsampling on a Resampler object, you now have different fillna methods to fill the NaNs (or asfreq for a plain reindex like operation without NaN filling). I can possibly make sense to also have interpolate available to fill the missing values directly (instead of first calling mean/asfreq)

In [20]: df = pd.DataFrame(data=[1,3], index=[dt.timedelta(), dt.timedelta(minut
es=3)])

In [21]: df
Out[21]:
          0
00:00:00  1
00:03:00  3

In [22]: df.resample('1T').mean().interpolate('linear')
Out[22]:
                 0
00:00:00  1.000000
00:01:00  1.666667
00:02:00  2.333333
00:03:00  3.000000

Should df.resample('1T').interpolate('linear') give the same result?

@benoit9126

This comment has been minimized.

Show comment
Hide comment
@benoit9126

benoit9126 Apr 22, 2016

Contributor

Hi!
Finally, I found some time to create what could be considered as a pull request for #12925. Unfortunately, in my additional tests, I face the problem raised by @jorisvandenbossche with .asfreq in #12449.

In[39]: df.resample('1T').ffill()
Out[39]: 
          0
00:00:00  1
00:01:00  1
00:02:00  1
In[40]: df.resample('1T').bfill()
Out[40]: 
          0
00:00:00  1
00:01:00  3
00:02:00  3

It seams that .mean() is the only method which retrieves 4 values in this example and of course, the additional interpolate method of the Resampler has the same problem...

Should I propose my PR with a test which fails because of this bug? or should I remove one of my test for the moment?

Contributor

benoit9126 commented Apr 22, 2016

Hi!
Finally, I found some time to create what could be considered as a pull request for #12925. Unfortunately, in my additional tests, I face the problem raised by @jorisvandenbossche with .asfreq in #12449.

In[39]: df.resample('1T').ffill()
Out[39]: 
          0
00:00:00  1
00:01:00  1
00:02:00  1
In[40]: df.resample('1T').bfill()
Out[40]: 
          0
00:00:00  1
00:01:00  3
00:02:00  3

It seams that .mean() is the only method which retrieves 4 values in this example and of course, the additional interpolate method of the Resampler has the same problem...

Should I propose my PR with a test which fails because of this bug? or should I remove one of my test for the moment?

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Apr 22, 2016

Contributor

#12928 fixed that. make sure you are up to date.

Contributor

jreback commented Apr 22, 2016

#12928 fixed that. make sure you are up to date.

@benoit9126

This comment has been minimized.

Show comment
Hide comment
@benoit9126

benoit9126 Apr 24, 2016

Contributor

Are you sure it really works??

In[2]: import pandas as pd
In[3]: pd.__version__
Out[3]: '0.18.0+142.g92b5322'

So the latest version. (10 commits above #12928)

Here an example with timedeltas:

In[4]: df = pd.DataFrame(data=[[1, 3], [-5, 10], [0, 0]],index=pd.timedelta_range('00:00:00', '00:10:00', freq='5T'))
In[5]: df
Out[5]: 
          0   1
00:00:00  1   3
00:05:00 -5  10
00:10:00  0   0
In[6]: df.resample('1T').mean()
Out[6]: 
            0     1
00:00:00  1.0   3.0
00:01:00  NaN   NaN
00:02:00  NaN   NaN
00:03:00  NaN   NaN
00:04:00  NaN   NaN
00:05:00 -5.0  10.0
00:06:00  NaN   NaN
00:07:00  NaN   NaN
00:08:00  NaN   NaN
00:09:00  NaN   NaN
00:10:00  0.0   0.0
In[7]: df.resample('1T').asfreq()
Out[7]: 
            0     1
00:00:00  1.0   3.0
00:01:00  NaN   NaN
00:02:00  NaN   NaN
00:03:00  NaN   NaN
00:04:00  NaN   NaN
00:05:00 -5.0  10.0
00:06:00  NaN   NaN
00:07:00  NaN   NaN
00:08:00  NaN   NaN
00:09:00  NaN   NaN

And here the example with datetimes:

In[8]: df = pd.DataFrame(data=[[1, 3],[-5,10]], index=pd.date_range('22/4/2016 00:00:00', '22/4/2016 00:05:00', freq='5T'))
In[9]: df
Out[9]: 
                     0   1
2016-04-22 00:00:00  1   3
2016-04-22 00:05:00 -5  10

In[11]: df.resample('1T').mean()
Out[11]: 
                       0     1
2016-04-22 00:00:00  1.0   3.0
2016-04-22 00:01:00  NaN   NaN
2016-04-22 00:02:00  NaN   NaN
2016-04-22 00:03:00  NaN   NaN
2016-04-22 00:04:00  NaN   NaN
2016-04-22 00:05:00 -5.0  10.0
In[12]: df.resample('1T').asfreq()
Out[12]: 
                       0     1
2016-04-22 00:00:00  1.0   3.0
2016-04-22 00:01:00  NaN   NaN
2016-04-22 00:02:00  NaN   NaN
2016-04-22 00:03:00  NaN   NaN
2016-04-22 00:04:00  NaN   NaN
2016-04-22 00:05:00 -5.0  10.0

The behaviour is still not the same between .mean() and .asfreq() (and some other operations .bfill(), etc.) for timedeltas.

Contributor

benoit9126 commented Apr 24, 2016

Are you sure it really works??

In[2]: import pandas as pd
In[3]: pd.__version__
Out[3]: '0.18.0+142.g92b5322'

So the latest version. (10 commits above #12928)

Here an example with timedeltas:

In[4]: df = pd.DataFrame(data=[[1, 3], [-5, 10], [0, 0]],index=pd.timedelta_range('00:00:00', '00:10:00', freq='5T'))
In[5]: df
Out[5]: 
          0   1
00:00:00  1   3
00:05:00 -5  10
00:10:00  0   0
In[6]: df.resample('1T').mean()
Out[6]: 
            0     1
00:00:00  1.0   3.0
00:01:00  NaN   NaN
00:02:00  NaN   NaN
00:03:00  NaN   NaN
00:04:00  NaN   NaN
00:05:00 -5.0  10.0
00:06:00  NaN   NaN
00:07:00  NaN   NaN
00:08:00  NaN   NaN
00:09:00  NaN   NaN
00:10:00  0.0   0.0
In[7]: df.resample('1T').asfreq()
Out[7]: 
            0     1
00:00:00  1.0   3.0
00:01:00  NaN   NaN
00:02:00  NaN   NaN
00:03:00  NaN   NaN
00:04:00  NaN   NaN
00:05:00 -5.0  10.0
00:06:00  NaN   NaN
00:07:00  NaN   NaN
00:08:00  NaN   NaN
00:09:00  NaN   NaN

And here the example with datetimes:

In[8]: df = pd.DataFrame(data=[[1, 3],[-5,10]], index=pd.date_range('22/4/2016 00:00:00', '22/4/2016 00:05:00', freq='5T'))
In[9]: df
Out[9]: 
                     0   1
2016-04-22 00:00:00  1   3
2016-04-22 00:05:00 -5  10

In[11]: df.resample('1T').mean()
Out[11]: 
                       0     1
2016-04-22 00:00:00  1.0   3.0
2016-04-22 00:01:00  NaN   NaN
2016-04-22 00:02:00  NaN   NaN
2016-04-22 00:03:00  NaN   NaN
2016-04-22 00:04:00  NaN   NaN
2016-04-22 00:05:00 -5.0  10.0
In[12]: df.resample('1T').asfreq()
Out[12]: 
                       0     1
2016-04-22 00:00:00  1.0   3.0
2016-04-22 00:01:00  NaN   NaN
2016-04-22 00:02:00  NaN   NaN
2016-04-22 00:03:00  NaN   NaN
2016-04-22 00:04:00  NaN   NaN
2016-04-22 00:05:00 -5.0  10.0

The behaviour is still not the same between .mean() and .asfreq() (and some other operations .bfill(), etc.) for timedeltas.

@benoit9126 benoit9126 referenced this issue Apr 24, 2016

Closed

ENH: add .resample(..).interpolate() #12925 #12974

4 of 4 tasks complete
@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Apr 24, 2016

Contributor

ok, so soln is not complete. seeing if I can fix.

Contributor

jreback commented Apr 24, 2016

ok, so soln is not complete. seeing if I can fix.

@benoit9126

This comment has been minimized.

Show comment
Hide comment
@benoit9126

benoit9126 Apr 24, 2016

Contributor

Thanks a lot.

Contributor

benoit9126 commented Apr 24, 2016

Thanks a lot.

@jreback jreback modified the milestones: 0.18.1, 0.18.2 Apr 25, 2016

@jreback jreback closed this in 8ab7d3a Apr 26, 2016

nps added a commit to nps/pandas that referenced this issue May 17, 2016

ENH: add .resample(..).interpolate() pandas-dev#12925
closes pandas-dev#12925

Author: Benoît Vinot <benoit.vinot@inria.fr>

Closes pandas-dev#12974 from benoit9126/bug_12925 and squashes the following commits:

b860b5b [Benoît Vinot] ENH resample().interpolate() pandas-dev#12925
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment