Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Concat multiple different ExtensionArray types #22997

Merged
merged 9 commits into from Oct 18, 2018

Conversation

TomAugspurger
Copy link
Contributor

@TomAugspurger TomAugspurger commented Oct 4, 2018

xref #22994

This builds on #22996, since we need hashing.

2a1660c is the relevant commit.

Sparse tests will fail for now. I'll revisit once SparseArray is in.

@pep8speaks
Copy link

Hello @TomAugspurger! Thanks for submitting the PR.

@TomAugspurger TomAugspurger changed the title Dtype concat Concat multiple different ExtensionArray types Oct 4, 2018
@TomAugspurger TomAugspurger added this to the 0.24.0 milestone Oct 4, 2018
@TomAugspurger TomAugspurger added Reshaping Concat, Merge/Join, Stack/Unstack, Explode Dtype Conversions Unexpected or buggy dtype conversions ExtensionArray Extending pandas with custom dtypes or arrays. labels Oct 4, 2018
@TomAugspurger TomAugspurger changed the title Concat multiple different ExtensionArray types [WIP]Concat multiple different ExtensionArray types Oct 8, 2018
return all(
getattr(self, attr) == getattr(other, attr)
for attr in self._metadata
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corner case: other is an instance of a subclass of type(self) and self._metadata != other._metadata

@@ -560,11 +560,6 @@ def _concat_sparse(to_concat, axis=0, typs=None):

fill_values = [x.fill_value for x in to_concat
if isinstance(x, SparseArray)]

if len(set(fill_values)) > 1:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this was leftover from an early implementation that didn't handle multiple fill values. And I think @jorisvandenbossche noticed this in his review, but I misunderstood.

Regardless, we do allow concating Series[sparse] for different fill_values, we just change them all to have the first one's fill value (in SparseSeries._concat_same_type)

@TomAugspurger
Copy link
Contributor Author

Corner case: other is an instance of a subclass of type(self) and self._metadata != other._metadata

This relates to the commits adding hashing (which I was building on top of).

I think that's probably up to the EA author to define a __eq__ that's correct. We can only do so much.

@codecov
Copy link

codecov bot commented Oct 15, 2018

Codecov Report

Merging #22997 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #22997      +/-   ##
==========================================
+ Coverage   92.14%   92.14%   +<.01%     
==========================================
  Files         170      170              
  Lines       51016    51015       -1     
==========================================
+ Hits        47009    47010       +1     
+ Misses       4007     4005       -2
Flag Coverage Δ
#multiple 90.57% <100%> (ø) ⬆️
#single 42.31% <100%> (ø) ⬆️
Impacted Files Coverage Δ
pandas/core/dtypes/concat.py 98.13% <ø> (+0.44%) ⬆️
pandas/core/internals/managers.py 95.77% <100%> (ø) ⬆️
pandas/core/arrays/datetimelike.py 94.89% <0%> (-0.46%) ⬇️
pandas/core/strings.py 98.58% <0%> (-0.05%) ⬇️
pandas/core/frame.py 97.11% <0%> (ø) ⬆️
pandas/util/testing.py 86.45% <0%> (+0.02%) ⬆️
pandas/core/arrays/period.py 95.56% <0%> (+1.27%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a0ca4d7...e17d397. Read the comment docs.

@TomAugspurger
Copy link
Contributor Author

I had to skip some decimal tests on py2.

I couldn't conditionally skip individual tests that had used fixtures defined on the test, e.g. TestReshaping.test_concat_all_na_block[False], since that in_frame fixture isn't available to the subclass.

So I opted to skip all the tests on Py2 that had at least one test case failing.

@@ -602,6 +602,7 @@ update the ``ExtensionDtype._metadata`` tuple to match the signature of your
- :meth:`Series.astype` and :meth:`DataFrame.astype` now dispatch to :meth:`ExtensionArray.astype` (:issue:`21185:`).
- Slicing a single row of a ``DataFrame`` with multiple ExtensionArrays of the same type now preserves the dtype, rather than coercing to object (:issue:`22784`)
- Added :meth:`pandas.api.types.register_extension_dtype` to register an extension type with pandas (:issue:`22664`)
- Bug in concatenation an Series with two different extension dtypes not casting to object dtype (:issue:`22994`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

double back ticks

@TomAugspurger TomAugspurger changed the title [WIP]Concat multiple different ExtensionArray types Concat multiple different ExtensionArray types Oct 18, 2018
@TomAugspurger
Copy link
Contributor Author

I think this is good to go. Merging later today.

FYI, I'm leaving #22994 open to discuss a more general concat protocol so that types can negotiate on what should happen. I'm hopeful that will clean up all the special concats we have like concat_sparse and concat_categorical.

@jorisvandenbossche jorisvandenbossche changed the title Concat multiple different ExtensionArray types BUG: Concat multiple different ExtensionArray types Oct 18, 2018
@jorisvandenbossche jorisvandenbossche merged commit 6030452 into pandas-dev:master Oct 18, 2018
@jorisvandenbossche
Copy link
Member

Thanks!

tm9k1 pushed a commit to tm9k1/pandas that referenced this pull request Nov 19, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dtype Conversions Unexpected or buggy dtype conversions ExtensionArray Extending pandas with custom dtypes or arrays. Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants