Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue warning in slow ExtensionArray base methods #24433

Open
TomAugspurger opened this issue Dec 26, 2018 · 4 comments
Open

Issue warning in slow ExtensionArray base methods #24433

TomAugspurger opened this issue Dec 26, 2018 · 4 comments
Labels
ExtensionArray Extending pandas with custom dtypes or arrays. Performance Memory or execution speed performance Warnings Warnings that appear or should be added to pandas

Comments

@TomAugspurger
Copy link
Contributor

In the ExtensionArray docstring, we note that

    Some methods require casting the ExtensionArray to an ndarray of Python
    objects with ``self.astype(object)``, which may be expensive. When
    performance is a concern, we highly recommend overriding the following
    methods:

    * fillna
    * dropna
    * unique
    * factorize / _values_for_factorize
    * argsort / _values_for_argsort

I think we should also issue a warning in those cases. We would filter the warning by default, so it doesn't appear to users, but would show up when the base tests are being run. Then the EA author can choose to ignore that warning, or implement it more efficiently.

@TomAugspurger TomAugspurger added Performance Memory or execution speed performance ExtensionArray Extending pandas with custom dtypes or arrays. labels Dec 26, 2018
@jreback
Copy link
Contributor

jreback commented Dec 26, 2018

sounds reasonable to issue a PerformanceWarning but don’t want it to show for end users

@TomAugspurger
Copy link
Contributor Author

TomAugspurger commented Dec 26, 2018 via email

@TomAugspurger
Copy link
Contributor Author

I may not get to this for 0.24.

https://github.com/pandas-dev/pandas/compare/master...TomAugspurger:astype-warning?expand=1 is basically done, but it exposed some troubles with our current approach to parametrizing tests. Basically, if a parametrized test was defined directly on a fixture, I had to duplicate the parametrization in the subclass, which is no good.

That branch has started moving those parametrization to class attributes, e.g.
https://github.com/pandas-dev/pandas/compare/master...TomAugspurger:astype-warning?expand=1#diff-9bd888be54b675f517a4ee682a58cbe9R88. If anyone has a cleaner alternative I'd be happy to hear it.

@mroeschke mroeschke added the Warnings Warnings that appear or should be added to pandas label Jun 25, 2021
@jbrockmendel
Copy link
Member

Tried this out and got 500ish test failures for uncaught warnings. A little more than half of those were in IntervalArray and can be silenced be overriding IntervalArray._values_for_factorize to not issue the warning (on the theory that the warning is more useful for 3rd party authors than internal EAss). A bunch more are for MaskedArray and decimal.

The tests that would be annoying to fix are the arrow tests that already check for a PerformanceWarning.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ExtensionArray Extending pandas with custom dtypes or arrays. Performance Memory or execution speed performance Warnings Warnings that appear or should be added to pandas
Projects
None yet
Development

No branches or pull requests

4 participants