Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: MultiIndex.putmask losing ea dtype #49847

Merged
merged 4 commits into from
Nov 24, 2022
Merged

Conversation

phofl
Copy link
Member

@phofl phofl commented Nov 22, 2022

This is something we have to fix before we can address the issue

Its also significantly faster

       before           after         ratio
     [80527f48]       [b73beee0]
                      <49830>   
-        68.3±1ms      2.26±0.04ms     0.03  multiindex_object.Putmask.time_putmask_all_different
-      72.4±0.6ms      2.20±0.03ms     0.03  multiindex_object.Putmask.time_putmask

@@ -162,6 +162,34 @@ def test_putmask_multiindex_other(self):
expected = MultiIndex.from_tuples([right[0], right[1], left[2]])
tm.assert_index_equal(result, expected)

def test_putmask_keep_dtype(self, any_numeric_ea_dtype):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the two test cases also covering for the all-nan case in #49830 (comment) or do you think it's not necessary?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's about losing the dtype, so yes this covers the all nan case. But this does not fix the issue yet

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable might be a misnomer then any_numeric_ea_dtype (if it covers all nans, this would also include non EA dtype).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It covers the case in a sense that the current tests represent the all nan case as well, I’ll add a specific test when actually fixing the issue anyway

zip(subset.levels, self.levels, self.codes)
):
new_elements = value_level.difference(level)
new_level = level.append(new_elements)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the two lines above be replaced with new_level = level.union(value_level)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, was thinking about duplicates when implementing this, but levels are unique…

@phofl phofl mentioned this pull request Nov 23, 2022
5 tasks
@phofl
Copy link
Member Author

phofl commented Nov 23, 2022

failure unrelated

@mroeschke mroeschke added this to the 2.0 milestone Nov 24, 2022
@mroeschke mroeschke merged commit 2236346 into pandas-dev:main Nov 24, 2022
@mroeschke
Copy link
Member

Thanks @phofl

mliu08 pushed a commit to mliu08/pandas that referenced this pull request Nov 27, 2022
* BUG: MultiIndex.putmask losing ea dtype

* Fix typing

* Add asv

* Simplify and add whatsnew
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants