Skip to content

Conversation

@jorisvandenbossche
Copy link
Member

@jorisvandenbossche jorisvandenbossche commented May 31, 2023

PR #51319 added the EA._hash_pandas_object method to let ExtensionArrays override how they are hashed. But it also changed to no longer use the values returned by EA._values_for_factorize() by default for hashing, but change this to EA.to_numpy(). The previous behaviour was documented, and changing this can cause regressions / changes in behaviour or performance (depending on the return values of those two methods).

See https://github.com/pandas-dev/pandas/pull/51319/files#r1212106303 for some more details.

@jorisvandenbossche jorisvandenbossche added the ExtensionArray Extending pandas with custom dtypes or arrays. label May 31, 2023
@jorisvandenbossche jorisvandenbossche added this to the 2.0.3 milestone May 31, 2023
@jorisvandenbossche
Copy link
Member Author

@jbrockmendel are you OK with this?

@jbrockmendel
Copy link
Member

No objections.

@jorisvandenbossche jorisvandenbossche merged commit 2a1d2da into pandas-dev:main Jun 19, 2023
@jorisvandenbossche jorisvandenbossche deleted the EA-hash branch June 19, 2023 14:31
@lumberbot-app

This comment was marked as outdated.

@jorisvandenbossche
Copy link
Member Author

Sorry for the backport noise, the original PR was only for 2.1, so of course this fixup only needs to target 2.1 as well

@lithomas1
Copy link
Contributor

Does this need a whatsnew?

This looks like only a regression on main.
(If so, the whatsnew can be deleted)

@jorisvandenbossche
Copy link
Member Author

Good catch, sorry! I only realized after merging it was only a change on main and thus didn't need to be backported, but forgot that I actually added a whatsnew ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ExtensionArray Extending pandas with custom dtypes or arrays.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants