Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

resample using for subsample will regards NaN as zeros with sum #29382

Open
veager opened this issue Nov 3, 2019 · 3 comments
Open

resample using for subsample will regards NaN as zeros with sum #29382

veager opened this issue Nov 3, 2019 · 3 comments
Labels
Bug Resample resample method

Comments

@veager
Copy link

veager commented Nov 3, 2019

Problem description

I found the resample for subsampling to aggregate would take NaN(np.nan) as zero.
I have seen the similar question in Handling NaN during resampling #22065

I use .sum(), .sum(skipna=False), .apply(np.sum), and my function .apply(lambda x: np.sum(x.values) to test the same series. Only my function work out with my expected result.

my environment
numpy: 1.16.5
pandas: 0.25.1

import numpy as np
import pandas as pd 
index = pd.date_range('1/1/2019', periods=12, freq='T')
series = pd.Series(range(12), index=index)
series[5:7] = np.nan
series
> 2019-01-01 00:00:00     0.0
> 2019-01-01 00:01:00     1.0
> 2019-01-01 00:02:00     2.0
> 2019-01-01 00:03:00     3.0
> 2019-01-01 00:04:00     4.0
> 2019-01-01 00:05:00     NaN
> 2019-01-01 00:06:00     NaN
> 2019-01-01 00:07:00     7.0
> 2019-01-01 00:08:00     8.0
> 2019-01-01 00:09:00     9.0
> 2019-01-01 00:10:00    10.0
> 2019-01-01 00:11:00    11.0
> Freq: T, dtype: float64

1 using .sum()

series.resample('2T').sum()
print(series.resample('2T').apply(lambda x: np.sum(x.values)))
> 2019-01-01 00:00:00     1.0
> 2019-01-01 00:02:00     5.0
> 2019-01-01 00:04:00     4.0
> 2019-01-01 00:06:00     7.0
> 2019-01-01 00:08:00    17.0
> 2019-01-01 00:10:00    21.0
> Freq: 2T, dtype: float64

2 using .sum(skipna=False) will raise an error

series.resample('2T').sum(skipna=False)
>UnsupportedFunctionCall: numpy operations are not valid with resample. Use .resample(...).sum() instead

3 using .apply(np.sum)

series.resample('2T').apply(np.sum)
> 2019-01-01 00:00:00     1.0
> 2019-01-01 00:02:00     5.0
> 2019-01-01 00:04:00     4.0
> 2019-01-01 00:06:00     7.0
> 2019-01-01 00:08:00    17.0
> 2019-01-01 00:10:00    21.0
> Freq: 2T, dtype: float64

4 using .apply(lambda x: np.sum(x.values))

series.resample('2T').apply(lambda x: np.sum(x.values))
> 2019-01-01 00:00:00     1.0
> 2019-01-01 00:02:00     5.0
> 2019-01-01 00:04:00     NaN
> 2019-01-01 00:06:00     NaN
> 2019-01-01 00:08:00    17.0
> 2019-01-01 00:10:00    21.0
> Freq: 2T, dtype: float64
@alimcmaster1 alimcmaster1 added the Resample resample method label Nov 3, 2019
@mroeschke mroeschke added the Bug label May 11, 2020
@mroeschke mroeschke changed the title resample using for subsample will regards NaN as zeros resample using for subsample will regards NaN as zeros with sum May 11, 2020
@aricooperdavis
Copy link

Just an observation: your solution using .apply() takes far longer than the built in .sum() 🤷‍♂️

@Steven-Livingstone
Copy link

I can confirm that this bug is still present in pandas: v1.3.4, and effects other aggregation functions such as .max and .mean when used in combination with .resample.

@bheudorfer
Copy link

I double Steven, the bug persists.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Resample resample method
Projects
None yet
Development

No branches or pull requests

6 participants