Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REF: Add ExtensionArray.map #51809

Merged
merged 6 commits into from
Mar 13, 2023
Merged

Conversation

topper-123
Copy link
Contributor

@topper-123 topper-123 commented Mar 6, 2023

Working with the .map method is very uncomfortable in main, because of issues with ExtensionArray:

  • ExtensionArray is not required to have a .map method, but may have it.
  • Some ExtensionArray subclasses, that do have a map method have one function parameter (mapper), while other have two parameters (mapper and na_action).
  • There is fallback functionality for calculation ExtensionArray.map in IndexOps.Mixin._map_values. This means we can calculate .map for ExtensionArrays indirectly, if we wrap them in a Series or Index, but we cannot rely on the ability to do it directly on the ExtensionArray.
  • ExtensionArray subclasses often want create their own .map methods
  • Because of all the above issues it is often very maze-like to follow code that uses .map

This PR solves this by:

  1. moving the functionality in IndexOpsMixin._map_values to a function map_array in algorithms.
  2. adding a .map method to ExtensionArray that calls the new map_array function. Subclasses will override this method where needed.
  3. Call the new map_array function inside IndexOpsMixin._map_values.
  4. Ensuring a common signature for the .map method for ExtensionArray, i.e. always two method parameters (mapper and na_action), so we can reliably call the .map method on any ExtensionArray subclass.

Note that this PR doesn't really add new functionality, because we're mostly just moving code around, though it is now possible to call the map method on any Extensionarray directly, instead of going through e.g. Series.map.

In the main branch, there are currently several ExtensionArray subclasses where na_action='ignore' doesn't work. I'm working on making it work with all of the ExtensionArray subclasses and there is already #51645, which handles Categorical/CategoricalIndex and more will follow.

@jbrockmendel
Copy link
Member

Does this close any of #23179, #15706, #32815?

@topper-123
Copy link
Contributor Author

topper-123 commented Mar 7, 2023

Does this close any of #23179, #15706, #32815?

return new_values

# we must convert to python types
values = arr.astype(object)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copy=False?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, good point.

@jbrockmendel
Copy link
Member

Might need something in the extension tests?

@mroeschke mroeschke added the ExtensionArray Extending pandas with custom dtypes or arrays. label Mar 8, 2023
@topper-123
Copy link
Contributor Author

Might need something in the extension tests?

I've added some simple tests in the newest version.

.pre-commit-config.yaml Outdated Show resolved Hide resolved
@topper-123
Copy link
Contributor Author

I've updated.

Copy link
Member

@mroeschke mroeschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation LGTM

@topper-123
Copy link
Contributor Author

Gentle ping.

I’m intending to follow up with fixes towards #51644, and it would be great if this could be merged.

@mroeschke
Copy link
Member

Merge when ready @jbrockmendel

@jbrockmendel jbrockmendel merged commit ecc6ead into pandas-dev:main Mar 13, 2023
@jbrockmendel
Copy link
Member

thanks @topper-123

@jbrockmendel
Copy link
Member

@topper-123 does any of your recent work on this close #15706?

@topper-123
Copy link
Contributor Author

I'vew answered in #15706 to keep the discussion in one place.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ExtensionArray Extending pandas with custom dtypes or arrays.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants