Odd behaviour of resample+mean+interpolate on int64 series #16361

Closed
myyc opened this Issue May 15, 2017 · 4 comments

Comments

Projects
None yet
4 participants
@myyc

myyc commented May 15, 2017

this issue is present on the latest stable release (as well as latest master at the time of this writing). for frames with only int64 values, the following has strange behaviour

df = {"a": [1,3,1,4]}
df = pd.DataFrame(df, index=pd.date_range("2017-01-01", "2017-01-04"))

# these two are not the same
df.resample("H").mean()["a"].interpolate("cubic")  # bad
df.resample("H")["a"].mean().interpolate("cubic")  # good

# this works
df.astype("float64").resample("H").mean()["a"].interpolate("cubic")

my workaround is more than enough for me but i figured i'd report it anyway...

INSTALLED VERSIONS ------------------ commit: None python: 3.6.1.final.0 python-bits: 64 OS: Darwin OS-release: 16.5.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: en_IE.UTF-8 LANG: en_IE.UTF-8 LOCALE: en_IE.UTF-8

pandas: 0.21.0.dev+31.g0ea0f25bf
pytest: 3.0.7
pip: 9.0.1
setuptools: 35.0.2
Cython: 0.25.2
numpy: 1.12.1
scipy: 0.19.0
xarray: None
IPython: 6.0.0
sphinx: None
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.0.0
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.999999999
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: None

@chris-b1

This comment has been minimized.

Show comment
Hide comment
@chris-b1

chris-b1 May 15, 2017

Contributor

Thanks for the report! It looks like the in the first case the internal data structures are getting into an invalid state.

In [106]: s1 = df.resample("H").mean()["a"]

In [107]: s1._data.blocks[0]
Out[107]: IntBlock: 73 dtype: float64

In [108]: s2 = df.resample("H")["a"].mean()

In [109]: s2._data.blocks[0]
Out[109]: FloatBlock: 73 dtype: float64
Contributor

chris-b1 commented May 15, 2017

Thanks for the report! It looks like the in the first case the internal data structures are getting into an invalid state.

In [106]: s1 = df.resample("H").mean()["a"]

In [107]: s1._data.blocks[0]
Out[107]: IntBlock: 73 dtype: float64

In [108]: s2 = df.resample("H")["a"].mean()

In [109]: s2._data.blocks[0]
Out[109]: FloatBlock: 73 dtype: float64

@chris-b1 chris-b1 added this to the Next Major Release milestone May 15, 2017

@jreback jreback modified the milestones: 0.20.2, Next Major Release May 25, 2017

@TomAugspurger

This comment has been minimized.

Show comment
Hide comment
@TomAugspurger

TomAugspurger May 31, 2017

Contributor

@jreback I think this blocker for 0.20.2? I can take a look tomorrow if you don't have time.

Contributor

TomAugspurger commented May 31, 2017

@jreback I think this blocker for 0.20.2? I can take a look tomorrow if you don't have time.

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback May 31, 2017

Contributor

hmm did this work on 0.19.2?

Contributor

jreback commented May 31, 2017

hmm did this work on 0.19.2?

@TomAugspurger

This comment has been minimized.

Show comment
Hide comment
@TomAugspurger

TomAugspurger May 31, 2017

Contributor

I think so

In [6]: pd.__version__
Out[6]: '0.19.2'

In [7]: df = {"a": [1,3,1,4]}
   ...: df = pd.DataFrame(df, index=pd.date_range("2017-01-01", "2017-01-04"))
   ...:

In [8]: df.resample("H").mean()["a"]._data.blocks[0]
Out[8]: FloatBlock: 73 dtype: float64
Contributor

TomAugspurger commented May 31, 2017

I think so

In [6]: pd.__version__
Out[6]: '0.19.2'

In [7]: df = {"a": [1,3,1,4]}
   ...: df = pd.DataFrame(df, index=pd.date_range("2017-01-01", "2017-01-04"))
   ...:

In [8]: df.resample("H").mean()["a"]._data.blocks[0]
Out[8]: FloatBlock: 73 dtype: float64

jreback added a commit to jreback/pandas that referenced this issue May 31, 2017

jreback added a commit to jreback/pandas that referenced this issue May 31, 2017

@jreback jreback closed this in #16549 May 31, 2017

jreback added a commit that referenced this issue May 31, 2017

TomAugspurger added a commit to TomAugspurger/pandas that referenced this issue Jun 1, 2017

TomAugspurger added a commit that referenced this issue Jun 4, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment