-
-
Notifications
You must be signed in to change notification settings - Fork 18.8k
BUG: Dataframe arithmatic operators don't work with Series using fill_value #61828
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
||
|
||
@pytest.mark.parametrize("op", ["add", "sub", "mul", "div", "mod", "truediv", "pow"]) | ||
def test_df_series_fill_value(op): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ill need to take a closer look at this, just because im really skeptical that the fix is this easy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, i think the trouble is that in _maybe_align_series_as_frame we will broadcast the 1D object to 2D for numpy dtypes, but not EA dtypes. so can you add a test for non-numpy dtypes and see how it goes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're correct about the EAs, Ill update the testcases and then change the function to work with other EA types (mainly the ones that work with operators, int and float I believe)
Im closing the PR for now until the additional fixes for EA are deployed |
Reopened to talk about fixes for this specific issue before I get sidetracked by 1D operations again (ignore all the failed checks for now) |
The appropriate fix is going to be in _maybe_align_series_as_frame |
So this was what I was working on locally, and had questions about. I was able to reshape EAs in _maybe_align_series_as_frame and am still working on various places to get the operation smoothed out. But I feel like this issue deviates from the original issue, which is only related to fill_value. As far as I can tell this is not related to that issue so we should probably file it under another and mark the original closed for bookkeeping. I can add another test case which wouldn't require 2D EA operations for the dtype test. (There was original a bunch of brain spew about issues I was currently having, but I'll organize it before reposting if needed) |
for i in range(5): | ||
df.iat[i, i] = np.nan | ||
df.iat[i + 1, i] = pd.NA | ||
df.iat[i + 4, i] = pd.NA |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there's a lot going on here. is this the simplest example you can come up with that hits the relevant issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted two sequential NaNs and then one with a gap. Added the diagonal to make sure that no issues are caused by more complicated patterns. Used np.nan and pd.NA to test the two versions of NaNs. I can simplify it if you think any of these are uneeded
|
||
|
||
@pytest.mark.parametrize("op", ["add", "sub", "mul", "div", "mod", "truediv", "pow"]) | ||
def test_df_fill_value_operations(op): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this test seems redundant given the next one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
valid, Ill collapse it into the next one once I get it working. Sorry for the state of the PR, wasn't expecting to open it so soon
Just making sure, do you agree with splitting the 1D part off? |
pandas/core/frame.py
Outdated
# We can losslessly+cheaply cast to ndarray | ||
if rvalues.dtype in ("datetime64[ns]", "timedelta64[ns]") or isinstance( | ||
rvalues, (IntegerArray, FloatingArray) | ||
): | ||
rvalues = np.asarray(rvalues) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this isn't it. do you want me to tell you the answer or to figure it out yourself?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, what were you thinking
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
something like:
if not isinstance(rvalues, (np.ndarray, DatetimeArray, TimedeltaArray)):
if axis == 0:
df = DataFrame({i: rvalues for i in range(other.shape[1])})
else:
nrows = other.shape[0]
df = DataFrame({i: rvalues[[i]].repeat(nrows) for i in range(other.shape[1])}, dtype=rvalues.dtype)
df.index = other.index
df.columns = other.columns
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see how it would be a better implementation that what I had. I've gone ahead and implemented it with some tweaks like keeping the datetime64[ns] as it throws efficiency errors if removed. Thank you very much for taking the time to help me out, still trying to get better so I can help out more.
It looks like the change might have inadvertently changed some behavior that I don't know if I should keep or not. It reverts the error message that is expected in the test_period_add_timestamp_raises test back to what it was pre-resolution-inference according to your comment from a year ago. And it makes the test_add_strings in test_string.py return a success, rather than the xfail that it was supposed to be. test_add_frame unfortunately still fails though so I don't know if I should purposefully break it to keep the actions in line with each other. I read the linked issue but don't think there was a consensus (#28527 ) |
whats the updated exception messsage for the period one? Fixing xfailed tests is a good thing. |
Can you remove the xfail and let’s see how the CI does |
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.Removed a test which checked for expected error to be raised and a corner case. Added a test case to test multiple operators with Dataframe x Series operations while using fill_value