Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

properly handle pandas 2.1.4 deprecation warning #203

Closed
wants to merge 1 commit into from

Conversation

EpigeneMax
Copy link
Contributor

Following additional comments in #198, this PR improves on #199.

@codecov-commenter
Copy link

codecov-commenter commented Jan 3, 2024

Codecov Report

Attention: 1 lines in your changes are missing coverage. Please review.

Comparison is base (f43e282) 98.37% compared to head (cc44908) 98.37%.

Files Patch % Lines
patsy/util.py 0.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##           master     #203   +/-   ##
=======================================
  Coverage   98.37%   98.37%           
=======================================
  Files          30       30           
  Lines        3140     3140           
  Branches      695      695           
=======================================
  Hits         3089     3089           
  Misses         26       26           
  Partials       25       25           
Flag Coverage Δ
unittests 97.96% <0.00%> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@lesteve
Copy link

lesteve commented Jan 3, 2024

Thanks for the PR, I'll try to check whether that gets rid of the warnings. In the mean time I have a few comments.

It would seem more straightforward to first try isinstance(..., pd.CategoricalDtype) since this is the recommended way since pandas 2.1.0 and I am guessing that it works for older pandas versions as well. Having said that I did not manage to find the relevant info about which pandas version supports it.

This does seem cleaner than trying to catch a warning with a deprecated code path and the warnings catching logic seems a bit brittle. Why catching the last warning rather than any warnings if there are more than one, although this is probably edge-casy.

Also when using the warnings catching logic, when pd.types.api.is_categorical_dtype gets removed from pandas you once again need to fix in patsy which does not seem great.

To finish with I don't think the code logic is correct. When you catch the warning, you set _pandas_is_categorical_dtype = None. I believe it was correct (although a bit convoluted) in your original PR #199 because you added a if _pandas_is_categorical_dtype is None afterwards, but a subsequent commit 01c6b70 by @matthewwardrop broke it by removing the if clause.

All in all I do think that in 0.5.5 patsy.util.safe_is_categorical_pandas is currently broken with pandas 2.1.0 to 2.1.3:

import pandas as pd
import patsy
print(f"{pd.__version__=}")
import patsy
print(f"{patsy.__version__=}")

categorical_dtype = pd.CategoricalDtype(['a', 'b'])
print(patsy.util.safe_is_pandas_categorical_dtype(categorical_dtype))

Output:

pd.__version__='2.1.3'
patsy.__version__='0.5.5'
Traceback (most recent call last):
  File "/tmp/test.py", line 8, in <module>
    print(patsy.util.safe_is_pandas_categorical_dtype(categorical_dtype))
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/lesteve/micromamba/lib/python3.11/site-packages/patsy/util.py", line 681, in safe_is_pandas_categorical_dtype
    return _pandas_is_categorical_dtype(dt)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: 'NoneType' object is not callable

@matthewwardrop
Copy link
Collaborator

matthewwardrop commented Jan 3, 2024

It would seem more straightforward to first try isinstance(..., pd.CategoricalDtype) since this is the recommended way since pandas 2.1.0 and I am guessing that it works for older pandas versions as well. Having said that I did not manage to find the relevant info about which pandas version supports it.

Agreed. I took a quick look into this and didn't find it either when reviewing the original PR. I figured that the code would unbreak things in the indefinite future, and so wasn't too bothered.

Also when using the warnings catching logic, when pd.types.api.is_categorical_dtype gets removed from pandas you once again need to fix in patsy which does not seem great.

+1.

To finish with I don't think the code logic is correct. When you catch the warning, you set _pandas_is_categorical_dtype = None. I believe it was correct (although a bit convoluted) in your original PR #199 because you added a if _pandas_is_categorical_dtype is None afterwards, but a subsequent commit 01c6b70 by @matthewwardrop broke it by removing the if clause.

Guilty as charged. Should have looked over these changes a little bit more closely. Sorry.


I took a look into switching the order of these lookups again and it is subtle, which is why I didn't recommend switching the order in the original PR. There's small differences between the old and new behaviour, especially around looking up things from dtype strings. For the most part I doubt it matters, since the series type is realised already from actual pandas Series objects; and so we should just be able to invert the lookup check and that should work for any version of pandas that defined the Categorical dtype. We might even be able to just drop the alternative lookup path, which would simplify this considerably. I'll check this and put up a PR tomorrow.

@lesteve
Copy link

lesteve commented Jan 3, 2024

Sounds great, let me know if I can help!

@EpigeneMax
Copy link
Contributor Author

Also when using the warnings catching logic, when pd.types.api.is_categorical_dtype gets removed from pandas you once again need to fix in patsy which does not seem great.

Guilty as charged, I did focus too much on detecting deprecation. I agree with you that the warning catching logic is not great, but I think it is good enough for this use case.

It would seem more straightforward to first try isinstance(..., pd.CategoricalDtype) since this is the recommended way since pandas 2.1.0

Agreed, but I did not want to break the behavior on older versions of pandas.

I'll let @matthewwardrop go with a PR, let me know if I can help!

@matthewwardrop
Copy link
Collaborator

Fixed in #204

@lesteve
Copy link

lesteve commented Jan 5, 2024

Nice, thanks a lot for this and the 0.5.6 release!

Also, since there there is an error for patsy 0.5.5 and pandas>=2.1.0,<=2.1.3 as noted in #203 (comment), it may be worth considering yanking patsy 0.5.5 from PyPI and marking the package as broken in conda-forge?

It could well be the case that this is not worth it, since that not that many people will bump into this error and if they do, they will find this issue where the easy fix is to update patsy to 0.5.6.

@matthewwardrop
Copy link
Collaborator

matthewwardrop commented Jan 5, 2024

I don't think there was a whole tonne of utility, but givent that is trivial I went ahead and yanked it from PyPI. Marking packages as broken on conda (or updating the metadata to require pandas < 2.1) is a little more involved, and not worth the effort, IMHO. If you feel differently, feel free to propose the changes via the appropriate conda-forge channels.

@lesteve
Copy link

lesteve commented Jan 6, 2024

Thanks 🙏, this seems perfectly good enough.

I was not too sure it was worth it either, so I hope I did not twist your arm too much 😉.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants