New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataFrame.apply with axis=1 returning (also erroring) different results when returning a list #17970

Closed
tdpetrou opened this Issue Oct 25, 2017 · 7 comments

Comments

Projects
None yet
4 participants
@tdpetrou
Contributor

tdpetrou commented Oct 25, 2017

Code Sample, a copy-pastable example if possible

>>> df = pd.DataFrame(data=np.random.randint(0, 5, (5,3)),
                  columns=['a', 'b', 'c'])
>>> df
   a  b  c
0  4  0  0
1  2  0  1
2  2  2  2
3  1  2  2
4  3  0  0

>>> df.apply(lambda x: list(range(2)), axis=1)  # returns a Series
0    [0, 1]
1    [0, 1]
2    [0, 1]
3    [0, 1]
4    [0, 1]
dtype: object

>>> df.apply(lambda x: list(range(3)), axis=1) # returns a DataFrame
   a  b  c
0  0  1  2
1  0  1  2
2  0  1  2
3  0  1  2
4  0  1  2

>>> i = 0
>>> def f(x):
        global i
        if i == 0:
            i += 1
            return list(range(3))
        return list(range(4))

>>> df.apply(f, axis=1) 
ValueError: Shape of passed values is (5, 4), indices imply (5, 3)

Problem description

There are three possible outcomes. When the length of the returned list is equal to the number of columns then a DataFrame is returned and each column gets the corresponding value in the list.

If the length of the returned list is not equal to the number of columns, then a Series of lists is returned.

If the length of the returned list equals the number of columns for the first row but has at least one row where the list has a different number of elements than number of columns a ValueError is raised.

Expected Output

Need consistency. Probably should default to a Series of lists for all examples.

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.1.final.0
python-bits: 64
OS: Darwin
OS-release: 15.6.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.21.0rc1
pytest: 3.0.7
pip: 9.0.1
setuptools: 35.0.2
Cython: 0.25.2
numpy: 1.13.3
scipy: 0.19.0
pyarrow: None
xarray: None
IPython: 6.0.0
sphinx: 1.5.5
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: 1.2.0
tables: 3.4.2
numexpr: 2.6.2
feather: None
matplotlib: 2.1.0
openpyxl: 2.4.7
xlrd: 1.0.0
xlwt: 1.2.0
xlsxwriter: 0.9.6
lxml: 3.7.3
bs4: 4.6.0
html5lib: 0.9999999
sqlalchemy: 1.1.9
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: 0.5.0

@jonathanrocher

This comment has been minimized.

jonathanrocher commented Nov 10, 2017

The problem is wider. I am running the same bug when running the following

>>> df = DataFrame({"a": [1, 2, 3]})
>>> df.apply(lambda row: np.ones(1), axis=1)
     a
0  1.0
1  1.0
2  1.0
>>> df.apply(lambda row: np.ones(2), axis=1)
ValueError: Shape of passed values is (3, 2), indices imply (3, 1)

Related to #17437 (where there are some comments from @jreback )

@jreback

This comment has been minimized.

Contributor

jreback commented Nov 10, 2017

this is a duplicate of #17437 & #15628.

@jreback jreback closed this Nov 10, 2017

@jreback jreback added this to the No action milestone Nov 10, 2017

@tdpetrou

This comment has been minimized.

Contributor

tdpetrou commented Nov 10, 2017

@jreback Do the others cover the three possible outcomes? Its really bizarre behavior.

@jreback

This comment has been minimized.

Contributor

jreback commented Nov 10, 2017

@tdpetrou having lists as elements it the bizarre part. These are not in any way supported. Thus the apply behavior is really undefined. If you want to have a look, go right ahead. This is an edge case which is requires apply to basically guess at user intentions.

jreback added a commit to jreback/pandas that referenced this issue Nov 30, 2017

@jreback jreback modified the milestones: No action, 0.22.0 Nov 30, 2017

jreback added a commit to jreback/pandas that referenced this issue Dec 2, 2017

jreback added a commit to jreback/pandas that referenced this issue Dec 3, 2017

jreback added a commit to jreback/pandas that referenced this issue Dec 7, 2017

jreback added a commit to jreback/pandas that referenced this issue Dec 10, 2017

jreback added a commit to jreback/pandas that referenced this issue Dec 14, 2017

jreback added a commit to jreback/pandas that referenced this issue Dec 21, 2017

jreback added a commit to jreback/pandas that referenced this issue Dec 23, 2017

jreback added a commit to jreback/pandas that referenced this issue Jan 6, 2018

API/BUG: .apply will correctly infer output shape when axis=1
closes #16353
closes #17348
closes #17437
closes #18573
closes #17970
closes #17892
closes #17602
closes #18775
closes #18901
closes #18919
@jorisvandenbossche

This comment has been minimized.

Member

jorisvandenbossche commented Jan 28, 2018

FYI, this will be fixed in #18577

@tdpetrou

This comment has been minimized.

Contributor

tdpetrou commented Jan 28, 2018

@jorisvandenbossche Personally, I would disallow any complex data structures to be an element in a pandas dataframe, especially if they are not supported

@jorisvandenbossche

This comment has been minimized.

Member

jorisvandenbossche commented Jan 28, 2018

You can comment on the PR if you want. But changing that would be a big backwards compatibility break (much bigger than the current PR).
And they are in some way supported, just discouraged.

jreback added a commit to jreback/pandas that referenced this issue Feb 5, 2018

API/BUG: .apply will correctly infer output shape when axis=1
closes #16353
closes #17348
closes #17437
closes #18573
closes #17970
closes #17892
closes #17602
closes #18775
closes #18901
closes #18919

jorisvandenbossche added a commit that referenced this issue Feb 7, 2018

harisbal pushed a commit to harisbal/pandas that referenced this issue Feb 28, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment