New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataFrame.apply() raises ValueError when output df is different size than input df #17437

Closed
jennirinker opened this Issue Sep 5, 2017 · 4 comments

Comments

Projects
None yet
2 participants
@jennirinker

jennirinker commented Sep 5, 2017

MWE

import numpy as np
import pandas as pd

df = pd.DataFrame(np.random.rand(10, 2))  # dummy array
df1 = df.apply(np.fft.fft, axis=0)  # works
print(df1.shape)  # for testing
df2 = df.apply(np.fft.rfft, axis=0)  # breaks
print(df2.shape)  # for testing

Problem description

I would like to take a DataFrame of time series and apply the real-fft along the columns, but it seems that DataFrame.apply only works if the function to be applied returns output that is the same size as the input.

Expected Output

I expect the code block above to run without error and produce the following output:

(10, 2)
(6, 2)

Alternatively, raising an error that tells the user "output array size must match input array size" would be fine, if we want to restrict apply to working only for functions that return the same-size arrays as the inputs.

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.1.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 78 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: en
LOCALE: None.None

pandas: 0.20.2
pytest: 3.2.1
pip: 9.0.1
setuptools: 27.2.0
Cython: 0.25.2
numpy: 1.13.1
scipy: 0.19.0
xarray: 0.9.6
IPython: 6.1.0
sphinx: 1.6.2
patsy: None
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: None
tables: 3.2.2
numexpr: 2.6.2
feather: None
matplotlib: 2.0.2
openpyxl: None
xlrd: 1.0.0
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.999
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: None

@jreback

This comment has been minimized.

Contributor

jreback commented Sep 6, 2017

see commentary #15628

this is a thorny issue. someone would have to dig in and see what could / if anything could be done. .apply tries to infer the output shape and is not always successful.

@jreback jreback added this to the Next Major Release milestone Sep 6, 2017

@jennirinker

This comment has been minimized.

jennirinker commented Sep 6, 2017

Yeah, I figured that this was non-trivial. I suppose the easiest fix would be to just update the documentation so that it explicitly states that .apply must return a vector of the same length as the input. A slightly less trivial fix could be to apply .apply to the first column and use the resulting size to initalize the new dataframe. But I'm not sure if this works with the underlying logic, nor am I certain how much overhead that might add to the function.

@jreback

This comment has been minimized.

Contributor

jreback commented Sep 6, 2017

no this needs some debugging. can you trace both cases and see where things go wrong? basically step thru.

@jennirinker

This comment has been minimized.

jennirinker commented Sep 7, 2017

I can try, but it won't be for a few days at least.

jreback added a commit to jreback/pandas that referenced this issue Nov 30, 2017

@jreback jreback modified the milestones: Next Major Release, 0.22.0 Nov 30, 2017

jreback added a commit to jreback/pandas that referenced this issue Dec 2, 2017

jreback added a commit to jreback/pandas that referenced this issue Dec 3, 2017

jreback added a commit to jreback/pandas that referenced this issue Dec 7, 2017

jreback added a commit to jreback/pandas that referenced this issue Dec 10, 2017

jreback added a commit to jreback/pandas that referenced this issue Dec 14, 2017

jreback added a commit to jreback/pandas that referenced this issue Dec 21, 2017

jreback added a commit to jreback/pandas that referenced this issue Dec 23, 2017

jreback added a commit to jreback/pandas that referenced this issue Jan 6, 2018

API/BUG: .apply will correctly infer output shape when axis=1
closes #16353
closes #17348
closes #17437
closes #18573
closes #17970
closes #17892
closes #17602
closes #18775
closes #18901
closes #18919

jreback added a commit to jreback/pandas that referenced this issue Feb 5, 2018

API/BUG: .apply will correctly infer output shape when axis=1
closes #16353
closes #17348
closes #17437
closes #18573
closes #17970
closes #17892
closes #17602
closes #18775
closes #18901
closes #18919

jorisvandenbossche added a commit that referenced this issue Feb 7, 2018

harisbal pushed a commit to harisbal/pandas that referenced this issue Feb 28, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment