Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: raise when doing arithmetic on DataFrame and list of iterables #37132

Merged
merged 11 commits into from
Oct 16, 2020

Conversation

AlexKirko
Copy link
Member

Problem

We currently allow doing arithmetic on a DataFrame, and, for example, a list of Series. The latter gets broadcast with align_method_FRAME and this leads to weirdness.

Solution

We should raise. There is no definite answer to how such arithmetic should be handled, and the user should just make another DataFrame and add that to the frame instead of doing stuff like [Series, Series].

@jreback jreback added Numeric Operations Arithmetic, Comparison, and Logical operations Error Reporting Incorrect or improved errors from pandas labels Oct 15, 2020
@@ -494,6 +494,7 @@ Other
- Fixed metadata propagation in the :class:`Series.dt` and :class:`Series.str` accessors (:issue:`28283`)
- Bug in :meth:`Index.union` behaving differently depending on whether operand is a :class:`Index` or other list-like (:issue:`36384`)
- Passing an array with 2 or more dimensions to the :class:`Series` constructor now raises the more specific ``ValueError``, from a bare ``Exception`` previously (:issue:`35744`)
- Bug in :class:`DataFrame` allowing arithmetic operations with list of array-likes with undefined results. Behavior changed to raising ``ValueError`` (:issue:`36702`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put in Numeric section

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

pandas/tests/frame/test_arithmetic.py Show resolved Hide resolved
pandas/core/ops/__init__.py Show resolved Hide resolved
@jreback
Copy link
Contributor

jreback commented Oct 15, 2020

cc @jbrockmendel

@@ -1577,3 +1577,17 @@ def test_arith_reindex_with_duplicates():
result = df1 + df2
expected = pd.DataFrame([[np.nan, 0, 0]], columns=["first", "second", "second"])
tm.assert_frame_equal(result, expected)


@pytest.mark.parametrize("arg1", [pd.DataFrame({"x": [1, 2], "y": [1, 2]})])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if there's just one of these (and i think there only needs to be one), then just put it in the test function instead of parameter

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for spotting this! Moved the DataFrame inside the test.

@jreback jreback added this to the 1.2 milestone Oct 16, 2020
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment by @jbrockmendel ping on green

@AlexKirko
Copy link
Member Author

AlexKirko commented Oct 16, 2020

Ran indexing benchmarks three times, no stable regressions on my machine.
I'm running the benchmark suite on Ubuntu inside Windows with plenty of free CPU and memory, but I still saw different 2-5 regressions / improvements during the three runs that I did. Sorry, don't have a dedicated machine for benchmarking, and my setup isn't always stable (probably because of all the junk in Windows grabbing and releasing resources).
We should be fine, since no regression or improvement held across runs.

@AlexKirko
Copy link
Member Author

AlexKirko commented Oct 16, 2020

@jreback

Made the change suggested by @jbrockmendel , all green.

@jreback jreback merged commit 9140647 into pandas-dev:master Oct 16, 2020
@jreback
Copy link
Contributor

jreback commented Oct 16, 2020

thanks @AlexKirko nice!

@AlexKirko AlexKirko deleted the add-list-to-df-raise branch October 17, 2020 05:22
@vnmabus
Copy link

vnmabus commented Jul 10, 2021

Unfortunately this fix breaks test_arith_frame_with_scalar for my custom ExtensionArray. I have to say that my type is a bit "weird" in the sense that indexing it gives a object of the same type (like the string type in the standard library). This fix does not seem to like that very much.

@jbrockmendel
Copy link
Member

@vnmabus can you give a link or example?

@vnmabus
Copy link

vnmabus commented Jul 10, 2021

I have two extension arrays in https://github.com/GAA-UAM/scikit-fda, FDataBasis and FDataGrid. Both of them are used to represent a sample of functions (functional data). However, for simplicity, as I did not want to create a "just one function" object to represent a scalar value which won't be used much, I made __getitem__ with an integer key return an object of the same kind and length 1.

These extension arrays (mostly) work with Pandas, but this fix raises an exception, as they are "list like" and the elements (that is, the same class) are "array like". Yo can see the error here: https://github.com/GAA-UAM/scikit-fda/runs/3037343824.

@jbrockmendel
Copy link
Member

Haven't looked at this too closely, but I think maybe if you make FDataAFoo.__iter__ yield the scalars (functions) that might avoid this particular problem. That would mean an inconsistency between list(fdata) vs [fdata[i] for i in range(len(fdata))] though.

@vnmabus
Copy link

vnmabus commented Jul 11, 2021

The thing is that currently there isn't a separate class for the scalars. A scalar in my case is just an object of the same kind but with length one.

@jbrockmendel
Copy link
Member

Yah I don't see any obvious way to avoid inconsistencies. If this is a serious problem, I'd suggest opening a new issue for dedicated discussion

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Error Reporting Incorrect or improved errors from pandas Numeric Operations Arithmetic, Comparison, and Logical operations
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Undocumented behavior change (1.0.5->1.1.0) when using arithmetic operations on dataframes
4 participants