New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

resample().apply not returning multiple columns like groupby(pd.Timegrouper()).apply #15169

Closed
tdpetrou opened this Issue Jan 19, 2017 · 5 comments

Comments

Projects
None yet
3 participants
@tdpetrou
Contributor

tdpetrou commented Jan 19, 2017

Code Sample, a copy-pastable example if possible

In [3]: date = pd.date_range('1-1-2015', '12-31-15', freq='D')

In [4]: df = pd.DataFrame(data={'col1':np.random.rand(len(date))}, index=date)

In [5]: def calc(x):
                s = pd.Series([1,2], index=['a', 'b'])
                return s

In [6]: df.resample('M').apply(calc)
Out[6]:
            col1
2015-01-31   NaN
2015-02-28   NaN
2015-03-31   NaN
2015-04-30   NaN
2015-05-31   NaN
2015-06-30   NaN
2015-07-31   NaN
2015-08-31   NaN
2015-09-30   NaN
2015-10-31   NaN
2015-11-30   NaN
2015-12-31   NaN

In [7]: df.groupby(pd.TimeGrouper('M')).apply(calc)
Out[7]:
            a  b
2015-01-31  1  2
2015-02-28  1  2
2015-03-31  1  2
2015-04-30  1  2
2015-05-31  1  2
2015-06-30  1  2
2015-07-31  1  2
2015-08-31  1  2
2015-09-30  1  2
2015-10-31  1  2
2015-11-30  1  2
2015-12-31  1  2

Problem description

It is my understanding that resample with apply should work very similarly as groupby(pd.Timegrouper) with apply. In a more complex example I was trying to return many aggregated results that are calculated with several columns. It seems resample with apply is unable to return anything but a Series that has the same index as the calling DataFrame columns.

Expected Output

Should look exactly like the output from df.groupby(pd.TimeGrouper('M')).apply(calc)

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.5.1.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 60 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.19.2
nose: 1.3.7
pip: 8.1.2
setuptools: 23.0.0
Cython: 0.24
numpy: 1.12.0
scipy: 0.18.1
statsmodels: 0.6.1
xarray: None
IPython: 4.2.0
sphinx: 1.3.1
patsy: 0.4.1
dateutil: 2.5.3
pytz: 2016.4
blosc: None
bottleneck: 1.0.0
tables: 3.2.2
numexpr: 2.6.1
matplotlib: 1.5.1
openpyxl: 2.3.2
xlrd: 0.9.4
xlwt: 1.0.0
xlsxwriter: 0.8.7
lxml: 3.6.0
bs4: 4.4.1
html5lib: 0.9999999
httplib2: None
apiclient: None
sqlalchemy: 1.0.13
pymysql: 0.7.5.None
psycopg2: None
jinja2: 2.8
boto: 2.40.0
pandas_datareader: 0.2.1

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Jan 19, 2017

Contributor

these should be the same. welcome to have a look.

Contributor

jreback commented Jan 19, 2017

these should be the same. welcome to have a look.

@jreback jreback added this to the Next Major Release milestone Jan 19, 2017

@tdpetrou

This comment has been minimized.

Show comment
Hide comment
@tdpetrou

tdpetrou Jan 20, 2017

Contributor

@jreback Line 330 in tseries/resample.py has apply = aggregate so they are exactly the same thing. apply was never implemented.

Contributor

tdpetrou commented Jan 20, 2017

@jreback Line 330 in tseries/resample.py has apply = aggregate so they are exactly the same thing. apply was never implemented.

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Jan 20, 2017

Contributor

you have to trace deeper

there is a lot of indirection

Contributor

jreback commented Jan 20, 2017

you have to trace deeper

there is a lot of indirection

@discort

This comment has been minimized.

Show comment
Hide comment
@discort

discort Jul 7, 2017

Contributor

@jreback
The source of an error is a call of aggregate method in core/resample.py:Resampler.apply. apply method is called when aggregate is failing.
But in our example, aggregate returns the result

            col1
2015-01-31   NaN
2015-02-28   NaN
2015-03-31   NaN
2015-04-30   NaN
2015-05-31   NaN
2015-06-30   NaN
2015-07-31   NaN
2015-08-31   NaN
2015-09-30   NaN
2015-10-31   NaN
2015-11-30   NaN
2015-12-31   NaN

And apply method is not called.

A possible solution would be the check if applied function is reducing or not, instead of calling of aggregate directly.

What do you think?

Contributor

discort commented Jul 7, 2017

@jreback
The source of an error is a call of aggregate method in core/resample.py:Resampler.apply. apply method is called when aggregate is failing.
But in our example, aggregate returns the result

            col1
2015-01-31   NaN
2015-02-28   NaN
2015-03-31   NaN
2015-04-30   NaN
2015-05-31   NaN
2015-06-30   NaN
2015-07-31   NaN
2015-08-31   NaN
2015-09-30   NaN
2015-10-31   NaN
2015-11-30   NaN
2015-12-31   NaN

And apply method is not called.

A possible solution would be the check if applied function is reducing or not, instead of calling of aggregate directly.

What do you think?

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Aug 30, 2017

Contributor

sure u are welcome to propose that as a soln

Contributor

jreback commented Aug 30, 2017

sure u are welcome to propose that as a soln

@jreback jreback modified the milestones: Next Major Release, 0.21.1 Oct 27, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment