Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: reindex() and reindex_like() fill behavior is different in pandas 12.0 and 13.1? #6418

Closed
meelmaar opened this issue Feb 20, 2014 · 5 comments · Fixed by #6421
Closed
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Milestone

Comments

@meelmaar
Copy link

I just came across an issue which caused me serious troubles since upgrading from pandas 12.0 to 13.1. It happens when using a fill method with a reindex() or reindex_like() method. Moreover, those method are not giving consistent results anymore! I have not tested how this issue or originates from changed .ffill() and similar method, but I see it propagates to resample(). Could not find any recent mentioning of the strange behavior and no hints in the docs or What's New section.

This is the problem I encounter using pandas 12.0 (with numpy 1.7.1, in both, 32bit Python 2.7.5 Python x,y and 64bit, WinPython-64bit-2.7.4.1; windows 7) and pandas 13.1 (D:\PortableApps\WinPython-64bit-2.7.6.2, numpy 1.8.0). Pandas 12.0 behavior is the same for the 32 bit and 64 bit versions, so this cannot explain the problem.

Code:

import pandas as pd
# Make low frequency timeseries:
i30 = index=pd.date_range('2002-02-02', periods=4, freq='30T')
s=pd.Series(np.arange(4.), index=i30)
s[2] = np.NaN 

# Upsample by factor 3 with reindex() and resample() methods:
i10 = pd.date_range(i30[0], i30[-1], freq='10T')
s10 = s.reindex(index=i10, method='bfill')
s10_2 = s.reindex(index=i10, method='bfill', limit=2)
r10 = s.resample('10Min', fill_method='bfill')
r10_2 = s.resample('10Min', fill_method='bfill', limit=2)

In pandas 12.0: s10 equals s10_2 equals r10 equals r10_2

s10
Out[60]: 
2002-02-02 00:00:00     0
2002-02-02 00:10:00     1
2002-02-02 00:20:00     1
2002-02-02 00:30:00     1
2002-02-02 00:40:00   NaN
2002-02-02 00:50:00   NaN
2002-02-02 01:00:00   NaN
2002-02-02 01:10:00     3
2002-02-02 01:20:00     3
2002-02-02 01:30:00     3
Freq: 10T, dtype: float64

In pandas 13.1: s10 does not equal s10_2; s10 has all NaN's filled

s10
Out[120]: 
2002-02-02 00:00:00    0
2002-02-02 00:10:00    1
2002-02-02 00:20:00    1
2002-02-02 00:30:00    1
2002-02-02 00:40:00    3
2002-02-02 00:50:00    3
2002-02-02 01:00:00    3
2002-02-02 01:10:00    3
2002-02-02 01:20:00    3
2002-02-02 01:30:00    3
Freq: 10T, dtype: float64

Same holds for resampled series r10
Conclusion: in pandas 13.1, all is filled if limit=None which breaks with the pandas 12.0 behavior. I think the 12.0 behavior is mre sensible; only fill the gaps created from upsampling.
This even more import for the reindex_like method because there the "limit" key cannot limit which gaps are filled in pandas 13.1:

s.reindex_like(s10, method='bfill', limit=2)
Out[121]: 
2002-02-02 00:00:00    0
2002-02-02 00:10:00    1
2002-02-02 00:20:00    1
2002-02-02 00:30:00    1
2002-02-02 00:40:00    3
2002-02-02 00:50:00    3
2002-02-02 01:00:00    3
2002-02-02 01:10:00    3
2002-02-02 01:20:00    3
2002-02-02 01:30:00    3
Freq: 10T, dtype: float64

Hope this is clear and I can be reproduced? I hope this can be fixed soon. But of course, if you can reproduce this behavior and it has indeed change from 12.0 to 13.1, this should be in the docs

@jreback
Copy link
Contributor

jreback commented Feb 20, 2014

the reindex_like wasn't passing the limit keyword thru to reindex so that is already fixed...will have to look at the rest

@jreback
Copy link
Contributor

jreback commented Feb 20, 2014

@meelmaar all fixed up...I had a weird case that was trying to fix which 'caused' this.

@jreback
Copy link
Contributor

jreback commented Feb 20, 2014

@meelmaar thanks for reporting.....you can give a try with master if you'd like

@meelmaar
Copy link
Author

wow, that is fast! Thanks!
I might try the master but this will take a bit...

@jreback
Copy link
Contributor

jreback commented Feb 20, 2014

I post windows binaries here FYI: http://pandas.pydata.org/pandas-build/dev/ (this will update in the next day or 2 for the latest commits)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Projects
None yet
2 participants