resample using for subsample will regards NaN as zeros with sum #29382

veager · 2019-11-03T12:53:41Z

Problem description

I found the resample for subsampling to aggregate would take NaN(np.nan) as zero.
I have seen the similar question in Handling NaN during resampling #22065

I use .sum(), .sum(skipna=False), .apply(np.sum), and my function .apply(lambda x: np.sum(x.values) to test the same series. Only my function work out with my expected result.

my environment
numpy: 1.16.5
pandas: 0.25.1

import numpy as np
import pandas as pd 
index = pd.date_range('1/1/2019', periods=12, freq='T')
series = pd.Series(range(12), index=index)
series[5:7] = np.nan

series
> 2019-01-01 00:00:00     0.0
> 2019-01-01 00:01:00     1.0
> 2019-01-01 00:02:00     2.0
> 2019-01-01 00:03:00     3.0
> 2019-01-01 00:04:00     4.0
> 2019-01-01 00:05:00     NaN
> 2019-01-01 00:06:00     NaN
> 2019-01-01 00:07:00     7.0
> 2019-01-01 00:08:00     8.0
> 2019-01-01 00:09:00     9.0
> 2019-01-01 00:10:00    10.0
> 2019-01-01 00:11:00    11.0
> Freq: T, dtype: float64

1 using .sum()

series.resample('2T').sum()
print(series.resample('2T').apply(lambda x: np.sum(x.values)))
> 2019-01-01 00:00:00     1.0
> 2019-01-01 00:02:00     5.0
> 2019-01-01 00:04:00     4.0
> 2019-01-01 00:06:00     7.0
> 2019-01-01 00:08:00    17.0
> 2019-01-01 00:10:00    21.0
> Freq: 2T, dtype: float64

2 using .sum(skipna=False) will raise an error

series.resample('2T').sum(skipna=False)
>UnsupportedFunctionCall: numpy operations are not valid with resample. Use .resample(...).sum() instead

3 using .apply(np.sum)

series.resample('2T').apply(np.sum)
> 2019-01-01 00:00:00     1.0
> 2019-01-01 00:02:00     5.0
> 2019-01-01 00:04:00     4.0
> 2019-01-01 00:06:00     7.0
> 2019-01-01 00:08:00    17.0
> 2019-01-01 00:10:00    21.0
> Freq: 2T, dtype: float64

4 using .apply(lambda x: np.sum(x.values))

series.resample('2T').apply(lambda x: np.sum(x.values))
> 2019-01-01 00:00:00     1.0
> 2019-01-01 00:02:00     5.0
> 2019-01-01 00:04:00     NaN
> 2019-01-01 00:06:00     NaN
> 2019-01-01 00:08:00    17.0
> 2019-01-01 00:10:00    21.0
> Freq: 2T, dtype: float64

The text was updated successfully, but these errors were encountered:

aricooperdavis · 2021-08-19T10:22:00Z

Just an observation: your solution using .apply() takes far longer than the built in .sum() 🤷‍♂️

Steven-Livingstone · 2022-01-27T03:53:19Z

I can confirm that this bug is still present in pandas: v1.3.4, and effects other aggregation functions such as .max and .mean when used in combination with .resample.

bheudorfer · 2023-12-05T13:07:44Z

I double Steven, the bug persists.

alimcmaster1 added the Resample resample method label Nov 3, 2019

mroeschke added the Bug label May 11, 2020

mroeschke changed the title ~~resample using for subsample will regards NaN as zeros~~ resample using for subsample will regards NaN as zeros with sum May 11, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

resample using for subsample will regards NaN as zeros with sum #29382

resample using for subsample will regards NaN as zeros with sum #29382

veager commented Nov 3, 2019

aricooperdavis commented Aug 19, 2021

Steven-Livingstone commented Jan 27, 2022

bheudorfer commented Dec 5, 2023

resample using for subsample will regards NaN as zeros with sum #29382

resample using for subsample will regards NaN as zeros with sum #29382

Comments

veager commented Nov 3, 2019

Problem description

aricooperdavis commented Aug 19, 2021

Steven-Livingstone commented Jan 27, 2022

bheudorfer commented Dec 5, 2023