Skip to content

resample using for subsample will regards NaN as zeros with sum #29382

@veager

Description

@veager

Problem description

I found the resample for subsampling to aggregate would take NaN(np.nan) as zero.
I have seen the similar question in Handling NaN during resampling #22065

I use .sum(), .sum(skipna=False), .apply(np.sum), and my function .apply(lambda x: np.sum(x.values) to test the same series. Only my function work out with my expected result.

my environment
numpy: 1.16.5
pandas: 0.25.1

import numpy as np
import pandas as pd 
index = pd.date_range('1/1/2019', periods=12, freq='T')
series = pd.Series(range(12), index=index)
series[5:7] = np.nan
series
> 2019-01-01 00:00:00     0.0
> 2019-01-01 00:01:00     1.0
> 2019-01-01 00:02:00     2.0
> 2019-01-01 00:03:00     3.0
> 2019-01-01 00:04:00     4.0
> 2019-01-01 00:05:00     NaN
> 2019-01-01 00:06:00     NaN
> 2019-01-01 00:07:00     7.0
> 2019-01-01 00:08:00     8.0
> 2019-01-01 00:09:00     9.0
> 2019-01-01 00:10:00    10.0
> 2019-01-01 00:11:00    11.0
> Freq: T, dtype: float64

1 using .sum()

series.resample('2T').sum()
print(series.resample('2T').apply(lambda x: np.sum(x.values)))
> 2019-01-01 00:00:00     1.0
> 2019-01-01 00:02:00     5.0
> 2019-01-01 00:04:00     4.0
> 2019-01-01 00:06:00     7.0
> 2019-01-01 00:08:00    17.0
> 2019-01-01 00:10:00    21.0
> Freq: 2T, dtype: float64

2 using .sum(skipna=False) will raise an error

series.resample('2T').sum(skipna=False)
>UnsupportedFunctionCall: numpy operations are not valid with resample. Use .resample(...).sum() instead

3 using .apply(np.sum)

series.resample('2T').apply(np.sum)
> 2019-01-01 00:00:00     1.0
> 2019-01-01 00:02:00     5.0
> 2019-01-01 00:04:00     4.0
> 2019-01-01 00:06:00     7.0
> 2019-01-01 00:08:00    17.0
> 2019-01-01 00:10:00    21.0
> Freq: 2T, dtype: float64

4 using .apply(lambda x: np.sum(x.values))

series.resample('2T').apply(lambda x: np.sum(x.values))
> 2019-01-01 00:00:00     1.0
> 2019-01-01 00:02:00     5.0
> 2019-01-01 00:04:00     NaN
> 2019-01-01 00:06:00     NaN
> 2019-01-01 00:08:00    17.0
> 2019-01-01 00:10:00    21.0
> Freq: 2T, dtype: float64

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions