Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: divmod return type #22932

Merged
merged 15 commits into from Oct 8, 2018
Merged

Conversation

TomAugspurger
Copy link
Contributor

Closes #22930

@TomAugspurger TomAugspurger added Numeric Operations Arithmetic, Comparison, and Logical operations ExtensionArray Extending pandas with custom dtypes or arrays. labels Oct 1, 2018
@TomAugspurger TomAugspurger added this to the 0.24.0 milestone Oct 1, 2018
@pep8speaks
Copy link

Hello @TomAugspurger! Thanks for submitting the PR.

@@ -0,0 +1,22 @@
import pytest
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't see a good place for generic ops tests. Am I missing someplace?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean one that is not supposed to be subclassed by the actual EA implementations?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean things that are testing pandas dispatching, not something in pandas (so not in tests/arrays) and not something that's part of the EA interface (so not part of tests/extension/base).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not something that's part of the EA interface (so not part of tests/extension/base).

OK, but basically we have been testing the ScalarOpsMixin through its use in DecimalArray I think, so you could see this one also as such a test.

@TomAugspurger TomAugspurger mentioned this pull request Oct 2, 2018
4 tasks
@codecov
Copy link

codecov bot commented Oct 2, 2018

Codecov Report

Merging #22932 into master will decrease coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #22932      +/-   ##
==========================================
- Coverage    92.2%   92.19%   -0.01%     
==========================================
  Files         169      169              
  Lines       50822    50837      +15     
==========================================
+ Hits        46858    46870      +12     
- Misses       3964     3967       +3
Flag Coverage Δ
#multiple 90.61% <100%> (-0.01%) ⬇️
#single 42.34% <0%> (-0.02%) ⬇️
Impacted Files Coverage Δ
pandas/core/arrays/base.py 95.89% <100%> (+0.17%) ⬆️
pandas/core/reshape/merge.py 93.89% <0%> (-0.26%) ⬇️
pandas/util/testing.py 86.18% <0%> (-0.05%) ⬇️
pandas/core/strings.py 98.63% <0%> (ø) ⬆️
pandas/core/dtypes/cast.py 88.58% <0%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fe67b94...ec814db. Read the comment docs.

@TomAugspurger
Copy link
Contributor Author

FYI, this, and the following are holding up PeriodArray a bit, so if I could get a quick +/- 1 on

it'd be appreciated. (cc @jbrockmendel for this one in particular.)

result of the element-wise operation. Whether or not that succeeds depends on
whether the operation returns a result that's valid for the ``ExtensionArray``.
If an ``ExtensionArray`` cannot be reconstructed, a list containing the scalars
returned instead.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find it a bit strange we return a list and not an array ?
(but that's maybe off topic for this PR)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I found that strange as well. An array would be better to return.

This is new in 0.24 right? If so, I'll just make the change here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have another PR touching about the same place right now, so I'm going to hold off on changing that till later.

@@ -0,0 +1,22 @@
import pytest
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean one that is not supposed to be subclassed by the actual EA implementations?

@jorisvandenbossche
Copy link
Member

FYI, this, and the following are holding up PeriodArray a bit

Just a question, I would expect that Period does not need the ScalarOpsMixin? (it has its own vectorized implementations?)

@TomAugspurger
Copy link
Contributor Author

Hmm, yes you're right. I'll look on my other branch to see if / why I was hitting this...

pass
if op.__name__ in {'divmod', 'rdivmod'}:
try:
a, b = zip(*res)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the zip-star necessary? If we get here shouldn't res just be a 2-tuple? so a, b = res?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right now res is a list of tuples, so I think we have to split each of those with a zip(*)

> /Users/taugspurger/sandbox/pandas/pandas/core/arrays/base.py(781)_binop()
    779                     try:
    780                         import pdb; pdb.set_trace()
--> 781                         a, b = zip(*res)
    782                         res = (self._from_sequence(a),
    783                                self._from_sequence(b))

ipdb> res
[(Decimal('0'), Decimal('1')), (Decimal('1'), Decimal('0')), (Decimal('1'), Decimal('1')), (Decimal('2'), Decimal('0'))]
ipdb> n
> /Users/taugspurger/sandbox/pandas/pandas/core/arrays/base.py(782)_binop()
    780                         import pdb; pdb.set_trace()
    781                         a, b = zip(*res)
--> 782                         res = (self._from_sequence(a),
    783                                self._from_sequence(b))
    784                     except TypeError:

ipdb> p a, b
((Decimal('0'), Decimal('1'), Decimal('1'), Decimal('2')), (Decimal('1'), Decimal('0'), Decimal('1'), Decimal('0')))

@jbrockmendel
Copy link
Member

This may be over-optimizing for code-sharing, but it might be worth updating ops._construct_result and ops._construct_divmod_result for compat with EA (instead of just Series) and use the same pattern as _arith_method_SERIES

construct_result = (_construct_divmod_result
                    if op is divmod else _construct_result)

Now that I look at it, it looks like ops.dispatch_to_extension_op hard-codes _construct_result so will fail on divmod/rdivmod

@jbrockmendel
Copy link
Member

Can you add a test that does divmod(Series[EA], Series[EA])? (or divmod(Series[EA], EA), or ...). I think that will necessitate a small fix in ops.dispatch_to_extension_op, but that should be fixed at the same time as this. (or heck, add the test(s) and xfail it and I'll fix it in a follow-up)

Pending that: LGTM.

@TomAugspurger
Copy link
Contributor Author

I'll make those fixes to ops.dispatch_to_extension_op here. Will see what we can share.

@TomAugspurger
Copy link
Contributor Author

Added a test with divmod(Series[EA], EA]).

Now that I look at it, it looks like ops.dispatch_to_extension_op hard-codes _construct_result so will fail on divmod/rdivmod

It seems correct on master, maybe someone (probably you) fixed it in the meantime?

but it might be worth updating ops._construct_result and ops._construct_divmod_result for compat with EA

Perhaps leave for #22974, as I think there are some details to work out. I haven't put any tests in place for divmod(EA, Series[EA]) yet, since I think master is incorrect. Right now that's a tuple of (EA, EA), but should maybe be a tuple of (Series[EA], Series[EA]).

@jbrockmendel
Copy link
Member

Perhaps leave for #22974, as I think there are some details to work out

Sounds good.

@TomAugspurger
Copy link
Contributor Author

Updated.

  1. Deduplicated the try / except, now that we do it in 3 places instead of 2
  2. changed some tests that were unexpectedly passing, when they should be skips.

(True, [2, 1, 0, 0], [0, 0, 2, 2]),
])
def test_divmod_array(self, reverse, expected_div, expected_mod):
# https://github.com/pandas-dev/pandas/issues/22930
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we try to better indicate that this is an "extra" test (not overriding a base one)?

(not necessarily needs to be solved here, but question is relevant in general, as we sometimes add tests to eg decimal to test specific aspects not covered in the base tests, to clearly see which those tests are. Maybe we would put them outside the class?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having it off the class makes the most sense. Moved.

res = np.asarray(arr)
return res

if op.__name__ in {'divmod', 'rdivmod'}:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jbrockmendel can't we use dispatch_to_extension_op here to avoid duplication of code?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I asked something similar a few days ago. If Tom says it isn't feasible, I believe him.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are inside the extension array here, so it would also be strange to use (which doesn't prevent that both could share a helper function, if that would be appropriate).
But here we need to construct the divmod correctly, while dispatch_to_extension_op should assume this is already done correctly by the EA

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which doesn't prevent that both could share a helper function,

Right. This is possible, if people want it. I'll push up a commit with some kind of do_extension_op that both of these call to so people can take a look.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, now that I take a look it's not so straightforward. The two are similar but just slightly different in enough places that they wouldn't benefit from sharing code really.

  1. The unboxing of values. dispatch_to_extension_op knows that at least one of the two is a Series[EA]. _binop knows that self is an EA.
  2. The op: dispatch_to_extension_op dispatches, _binop is defining it in a list comprehension
  3. The re-boxing: _binop has the whole maybe re-constructing _from_seqence that the dispatch_to_extension_op doesn't have to worry about at all.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They do not do "conceptually exactly the same thing". Paraphrasing myself from above:

The dispatch function calls the EA to perform an operation, the above code is the EA doing the operation.

Why would those two different things necessarily need to live in the same place / code path?

Of course, we could still move the whole EA._create_method to ops.py (which would indeed be similar as functions like add_flex_arithmetic_methods in ops.py that is used in series.py to add methods to Series). But this is then not related to the change in this PR, and should be left for another issue/PR to discuss (personally I don't think that would be an improvement).

I would see if its is possible to integrate these rather than adding a bunch of new code.

Well, and both Tom and me who have looked into the code, say: we don't think it is possible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TomAugspurger I saw your comment. I also @jorisvandenbossche comments. I have not looked at this in detail, nor do I have time to. My point is that this instantly creates technical debt no matter how you slice it.

It may require some reorganization to integrate this, and I appreciate that. So happy to defer this, maybe @jbrockmendel has more insight.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll be doing a sparse-de-duplication PR following #22880, can take a fresh look at this then. In the interim, I wouldn't let this issue hold up this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My point is that this instantly creates technical debt no matter how you slice it.

It really doesn't. They're doing two different things.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is fixing a bug, not doing a refactor of how the ops on EA's are implemented** . If somebody want to look into that, it should be done in a separate PR anyway. So merging.

** and I fully acknowledge that sometimes, to properly fix a bug, you also need to refactor otherwise you just keep adding hacks. However, I don't think that is the case here, see all the comments above.

@TomAugspurger
Copy link
Contributor Author

Good to merge this then?

@jorisvandenbossche jorisvandenbossche merged commit e510b1a into pandas-dev:master Oct 8, 2018
@TomAugspurger TomAugspurger deleted the ea-divmod branch October 8, 2018 13:56
tm9k1 pushed a commit to tm9k1/pandas that referenced this pull request Nov 19, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ExtensionArray Extending pandas with custom dtypes or arrays. Numeric Operations Arithmetic, Comparison, and Logical operations
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants