Skip to content

BUG: Dataframe arithmatic operators don't work with Series using fill_value #61828

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

eicchen
Copy link
Contributor

@eicchen eicchen commented Jul 10, 2025

Removed a test which checked for expected error to be raised and a corner case. Added a test case to test multiple operators with Dataframe x Series operations while using fill_value



@pytest.mark.parametrize("op", ["add", "sub", "mul", "div", "mod", "truediv", "pow"])
def test_df_series_fill_value(op):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ill need to take a closer look at this, just because im really skeptical that the fix is this easy.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, i think the trouble is that in _maybe_align_series_as_frame we will broadcast the 1D object to 2D for numpy dtypes, but not EA dtypes. so can you add a test for non-numpy dtypes and see how it goes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're correct about the EAs, Ill update the testcases and then change the function to work with other EA types (mainly the ones that work with operators, int and float I believe)

@eicchen
Copy link
Contributor Author

eicchen commented Jul 15, 2025

Im closing the PR for now until the additional fixes for EA are deployed

@eicchen eicchen closed this Jul 15, 2025
@eicchen eicchen reopened this Aug 19, 2025
@eicchen
Copy link
Contributor Author

eicchen commented Aug 19, 2025

Reopened to talk about fixes for this specific issue before I get sidetracked by 1D operations again (ignore all the failed checks for now)

@jbrockmendel
Copy link
Member

The appropriate fix is going to be in _maybe_align_series_as_frame

@eicchen
Copy link
Contributor Author

eicchen commented Aug 20, 2025

The appropriate fix is going to be in _maybe_align_series_as_frame

So this was what I was working on locally, and had questions about. I was able to reshape EAs in _maybe_align_series_as_frame and am still working on various places to get the operation smoothed out. But I feel like this issue deviates from the original issue, which is only related to fill_value. As far as I can tell this is not related to that issue so we should probably file it under another and mark the original closed for bookkeeping.

I can add another test case which wouldn't require 2D EA operations for the dtype test.

(There was original a bunch of brain spew about issues I was currently having, but I'll organize it before reposting if needed)

for i in range(5):
df.iat[i, i] = np.nan
df.iat[i + 1, i] = pd.NA
df.iat[i + 4, i] = pd.NA
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's a lot going on here. is this the simplest example you can come up with that hits the relevant issue?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted two sequential NaNs and then one with a gap. Added the diagonal to make sure that no issues are caused by more complicated patterns. Used np.nan and pd.NA to test the two versions of NaNs. I can simplify it if you think any of these are uneeded



@pytest.mark.parametrize("op", ["add", "sub", "mul", "div", "mod", "truediv", "pow"])
def test_df_fill_value_operations(op):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this test seems redundant given the next one

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

valid, Ill collapse it into the next one once I get it working. Sorry for the state of the PR, wasn't expecting to open it so soon

@eicchen
Copy link
Contributor Author

eicchen commented Aug 20, 2025

Just making sure, do you agree with splitting the 1D part off?

# We can losslessly+cheaply cast to ndarray
if rvalues.dtype in ("datetime64[ns]", "timedelta64[ns]") or isinstance(
rvalues, (IntegerArray, FloatingArray)
):
rvalues = np.asarray(rvalues)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this isn't it. do you want me to tell you the answer or to figure it out yourself?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, what were you thinking

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

something like:

if not isinstance(rvalues, (np.ndarray, DatetimeArray, TimedeltaArray)):
   if axis == 0:
       df = DataFrame({i: rvalues for i in range(other.shape[1])})
    else:
        nrows = other.shape[0]
        df = DataFrame({i: rvalues[[i]].repeat(nrows) for i in range(other.shape[1])}, dtype=rvalues.dtype)
   df.index = other.index
   df.columns = other.columns

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see how it would be a better implementation that what I had. I've gone ahead and implemented it with some tweaks like keeping the datetime64[ns] as it throws efficiency errors if removed. Thank you very much for taking the time to help me out, still trying to get better so I can help out more.

@eicchen
Copy link
Contributor Author

eicchen commented Aug 21, 2025

It looks like the change might have inadvertently changed some behavior that I don't know if I should keep or not.

It reverts the error message that is expected in the test_period_add_timestamp_raises test back to what it was pre-resolution-inference according to your comment from a year ago.

And it makes the test_add_strings in test_string.py return a success, rather than the xfail that it was supposed to be. test_add_frame unfortunately still fails though so I don't know if I should purposefully break it to keep the actions in line with each other. I read the linked issue but don't think there was a consensus (#28527 )

@jbrockmendel
Copy link
Member

whats the updated exception messsage for the period one?

Fixing xfailed tests is a good thing.

@eicchen
Copy link
Contributor Author

eicchen commented Aug 21, 2025

it is now "cannot add PeriodArray and DatetimeArray", which is inline with what it is for everything else.

here's the code snippet. I modified.
image

However, it looks like contrary to my earlier statement, add_to_frame doesn’t consistently pass as xfail on the pipeline, some jobs fail while others don’t. It works as expected locally, so I’m not sure how best to debug this properly. Do you have any advice?

@jbrockmendel
Copy link
Member

Can you remove the xfail and let’s see how the CI does

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: pd.DataFrame.mul has not support fill_value?
2 participants