Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLN: enforce deprecation of interpolate with object dtype #57820

Conversation

natmokval
Copy link
Contributor

xref #53638

enforced deprecation of interpolate with object dtype

@natmokval natmokval marked this pull request as ready for review March 12, 2024 16:05
@natmokval natmokval added Clean Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate labels Mar 12, 2024
@natmokval natmokval requested a review from phofl March 12, 2024 16:07
@natmokval
Copy link
Contributor Author

I enforced deprecation of interpolate with object dtype. @phofl could you please take a look at this PR? I think CI failures are unrelated to my changes.

Comment on lines 7848 to 7850
if np.any(obj.dtypes == object):
# GH#53631
if not (obj.ndim == 2 and np.all(obj.dtypes == object)):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you combine these conditionals? np.any and np.all overlap

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you for the comment. I combined two if statement into one.

)
# GH#53631
if np.any(obj.dtypes == object) and (
obj.ndim == 1 or not np.all(obj.dtypes == object)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like part of this is already handled below (if obj.ndim == 2 and np.all(obj.dtypes == object)).

In instead nearby that check could you add a if obj.ndim == 1 and obj.dtype == object:?

Copy link
Contributor Author

@natmokval natmokval Mar 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. To avoid duplicate checks I changed blocks of if statements as you suggested.

Comment on lines 7854 to 7859
if np.all(obj.dtypes == object):
raise TypeError(
"Cannot interpolate with all object-dtype columns "
"in the DataFrame. Try setting at least one "
"column to a numeric dtype."
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can now simplify things and remove this check. The reason: we raise

TypeError(
    f"{type(self).__name__} cannot interpolate with object dtype."
)

in case np.any(obj.dtypes == object) below and there is no need to raise Type Error if np.all(obj.dtypes == object). Should I remove raising

if np.all(obj.dtypes == object):
    raise TypeError(
        "Cannot interpolate with all object-dtype columns "
        "in the DataFrame. Try setting at least one "
        "column to a numeric dtype."
    )

?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch! Yeah try that out

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the messaging will need to probably change, this could be a Series

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, I don't understand why is it a Series? I thought it's a DataFrame, because we are in the block
if obj.ndim == 2:

I suggest to use the message f"{type(self).__name__} cannot interpolate with object dtype.", the same one we use in the block below

elif np.any(obj.dtypes == object):
    raise TypeError(
        f"{type(self).__name__} cannot interpolate with object dtype."
    )

I think we can just remove the block with the old message:

if obj.ndim == 2:
    if np.all(obj.dtypes == object):
...

and change the old message in tests to the new one "DataFrame cannot interpolate with object dtype"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah sorry I had misunderstood your initial suggestion.

You can also go a step further and replace All of these checks with

if np.any(obj.dtypes == object):
    raise TypeError(f"{type(self).__name__} cannot interpolate with object dtype.")

Since Series.dtypes and DataFrame.dtypes can be checked at the same time

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, my wording wasn't clear.

My idea was to remove the second part of the error message:

"Cannot interpolate with all object-dtype columns in the DataFrame. Try setting at least one column to a numeric dtype."

"Try setting at least one column to a numeric dtype" won't work from now on because of the enforcing in this PR. Then I thought why not to replace the first part: "Cannot interpolate with all object-dtype columns in the DataFrame." with a new msg: "... cannot interpolate with object dtype."
If we do this we can remove one check block.

I made changes. Could you please take a look at it?

@mroeschke mroeschke merged commit 7ea1c44 into pandas-dev:main Mar 15, 2024
46 of 47 checks passed
@mroeschke
Copy link
Member

Thanks @natmokval

@mroeschke mroeschke added this to the 3.0 milestone Mar 15, 2024
FutureWarning,
stacklevel=find_stack_level(),
)
raise TypeError(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest doing this check inside Block.interpolate

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's a TODO(3.0) comment related to this inside Block.interpolate

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you, I found the comment. In #58083 I moved raising TypeError from NDFrame to Block.

I think maybe it's better to change the error message from "NumpyBlock cannot interpolate with object dtype." to "Can not interpolate with object dtype."?

pmhatre1 pushed a commit to pmhatre1/pandas-pmhatre1 that referenced this pull request May 7, 2024
…ev#57820)

* remove interpolate with object dtype

* enforce deprecation interpolate with object dtype, correct tests

* fix ruff error

* add a note to v3.0.0

* combine two conditions

* change blocks of if statements to avoid duplicate checks

* replace err msg containing 'Try setting at least one column to a numeric dtype'

* simplify if condition
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Clean Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants