Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dispatch scalar DataFrame ops to Series #22163

Merged
merged 19 commits into from Aug 14, 2018

Conversation

Projects
None yet
4 participants
@jbrockmendel
Copy link
Member

commented Aug 2, 2018

Many issues closed; will track them down and update. Will also need whatsnew.

closes #18874
closes #20088
closes #15697
closes #13128
closes #8554
closes #8932
closes #21610
closes #22005
closes #22047
closes #22242

This will be less verbose after #22068 implements ops.dispatch_to_series.

This still only dispatches a subset of ops. #22019 dispatches another (disjoint) subset. After that is another easy-ish case where alignment is known. Saved for last are cases with ambiguous alignment that is currently done in an ad-hoc best-guess way.

@pep8speaks

This comment has been minimized.

Copy link

commented Aug 2, 2018

Hello @jbrockmendel! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on August 10, 2018 at 14:11 Hours UTC

jbrockmendel added some commits Aug 2, 2018

@jbrockmendel

This comment has been minimized.

Copy link
Member Author

commented Aug 2, 2018

Uses too many kludges, should wait for fixes in series and index comparisons. Closing.

@jbrockmendel

This comment has been minimized.

Copy link
Member Author

commented Aug 3, 2018

Re-opening after scaling back an unreasonably ambitious py2/py3 compat goal. In particular consider:

df = pd.Series(['bar', 'bar'], name='foo').to_frame()
df < 0
df['foo'] < 0

In PY3 ATM this gives:

>>> df < 0
    foo
0  True
1  True
>>> df['foo'] < 0
[...]
TypeError: '<' not supported between instances of 'str' and 'int'

And in PY2:

>>> df < 0
     foo
0  False
1  False
>>> df['foo'] < 0
0    False
1    False
Name: foo, dtype: bool

Making the PY2/PY3 behavior identical is not feasible, but we can (and this PR does) ensure that the DataFrame/Series behavior matches. In PY2 this is unchanged, in PY3 the DataFrame comparison now correctly raises.

@jbrockmendel jbrockmendel reopened this Aug 3, 2018

@codecov

This comment has been minimized.

Copy link

commented Aug 3, 2018

Codecov Report

Merging #22163 into master will decrease coverage by 0.03%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #22163      +/-   ##
==========================================
- Coverage   92.08%   92.05%   -0.04%     
==========================================
  Files         169      169              
  Lines       50694    50700       +6     
==========================================
- Hits        46682    46672      -10     
- Misses       4012     4028      +16
Flag Coverage Δ
#multiple 90.46% <100%> (-0.04%) ⬇️
#single 42.26% <80%> (-0.08%) ⬇️
Impacted Files Coverage Δ
pandas/core/ops.py 96.71% <100%> (+0.14%) ⬆️
pandas/core/frame.py 97.26% <100%> (ø) ⬆️
pandas/core/internals/blocks.py 93.83% <0%> (-0.81%) ⬇️
pandas/core/dtypes/missing.py 92.98% <0%> (-0.59%) ⬇️
pandas/util/testing.py 85.69% <0%> (-0.21%) ⬇️
pandas/core/generic.py 96.42% <0%> (-0.05%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cc3ab4a...9b62135. Read the comment docs.

@gfyoung gfyoung requested a review from jreback Aug 7, 2018

@jbrockmendel

This comment has been minimized.

Copy link
Member Author

commented Aug 7, 2018

Updated OP with issues this closes

@jreback
Copy link
Contributor

left a comment

i would like to have gh comments on the tests where appropriate. pls also do the whatsnew. its pretty important that we nail down and match the closed issues with the code.

@@ -4949,6 +4949,14 @@ def _combine_match_columns(self, other, func, level=None, try_cast=True):
return self._constructor(new_data)

def _combine_const(self, other, func, errors='raise', try_cast=True):
if lib.is_scalar(other) or np.ndim(other) == 0:

This comment has been minimized.

Copy link
@jreback

jreback Aug 8, 2018

Contributor

is is pretty annoything that we have to do this, I would make an explict function maybe is_any_scalar I think as we have these types of checks all over. pls make an issue for this.

This comment has been minimized.

Copy link
@jbrockmendel

jbrockmendel Aug 8, 2018

Author Member

Done.

@@ -1327,6 +1327,10 @@ def wrapper(self, other, axis=None):

res_name = get_op_result_name(self, other)

if isinstance(other, list):

This comment has been minimized.

Copy link
@jreback

jreback Aug 8, 2018

Contributor

why is is_list_like (maybe after some other comparisons) enough here?

This comment has been minimized.

Copy link
@jbrockmendel

jbrockmendel Aug 8, 2018

Author Member

ATM the isinstance(other, list) check is done below the isinstance(other, (np.ndarray, pd.Index)) check. Wrapping lists earlier let us send lists through that same ndarray/Index block. Ideally the catchall else: block can be reduced to only-scalars, but we're not there yet.

@@ -1706,7 +1708,8 @@ def f(self, other, axis=default_axis, level=None, fill_value=None):
if fill_value is not None:
self = self.fillna(fill_value)

return self._combine_const(other, na_op, try_cast=True)
pass_op = op if lib.is_scalar(other) else na_op

This comment has been minimized.

Copy link
@jreback

jreback Aug 8, 2018

Contributor

you are checking for a scalar here and above?

This comment has been minimized.

Copy link
@jbrockmendel

jbrockmendel Aug 8, 2018

Author Member

It's kind of annoying. If lib.is_scalar(other) then we will be dispatching to the Series op, in which case we want to pass the "raw" op (e.g. operator.add) and not the wrapped op na_op.

This PR handles only scalars since that is a relatively easy case. A few PRs down the road we'll have all these ops dispatch to series, at which point this won't be necessary.

@@ -273,6 +273,8 @@ def test_getitem_boolean(self):
# test df[df > 0]
for df in [self.tsframe, self.mixed_frame,
self.mixed_float, self.mixed_int]:
if compat.PY3 and df is self.mixed_frame:
continue

This comment has been minimized.

Copy link
@jreback

jreback Aug 8, 2018

Contributor

let's strip out the mixed_frame to another function (even though that duplicates some code), bonus can parametrize this test.

This comment has been minimized.

Copy link
@jbrockmendel

jbrockmendel Aug 8, 2018

Author Member

bonus can parametrize this test.

I don't think tsframe, mixed_frame, mixed_float, mixed_int are available in the namespace.

This comment has been minimized.

Copy link
@jreback

jreback Aug 9, 2018

Contributor

these need to be made fixtures. this becomes so much easier.

This comment has been minimized.

Copy link
@jbrockmendel

jbrockmendel Aug 9, 2018

Author Member

I agree, am starting to implement this in the test_arithmetic sequence of PRs. Will update this test when that lands.

This comment has been minimized.

Copy link
@jreback

jreback Aug 14, 2018

Contributor

k thanks

tm.assert_frame_equal(actual, dfn)
actual = df1 - NA
tm.assert_frame_equal(actual, dfn)
with pytest.raises(TypeError):

This comment has been minimized.

Copy link
@jreback

jreback Aug 8, 2018

Contributor

why is this raising? this is a big change if you don't allow nan to act as NaT in ops

This comment has been minimized.

Copy link
@jbrockmendel

jbrockmendel Aug 8, 2018

Author Member

This is the current behavior for Series and Index.

This comment has been minimized.

Copy link
@jreback

jreback Aug 9, 2018

Contributor

this needs a subsection in the whatsnew then, marked as an api change.

expected = op(s, value).dtypes
assert_series_equal(result, expected)

invalid = {(operator.pow, '<M8[ns]'),

This comment has been minimized.

Copy link
@jreback

jreback Aug 8, 2018

Contributor

pull this out and parametrize

This comment has been minimized.

Copy link
@jbrockmendel

jbrockmendel Aug 8, 2018

Author Member

This test already has two layers of parametrization; it isn't clear how to pull this out without making it more verbose+repetitive. Let me give this some thought and circle back.

@jbrockmendel

This comment has been minimized.

Copy link
Member Author

commented Aug 8, 2018

Phew. Just added GH references to tests and a ton of Whats New.

@jreback

This comment has been minimized.

Copy link
Contributor

commented Aug 9, 2018

needs a rebase and some comments.

jbrockmendel added some commits Aug 9, 2018

@jbrockmendel jbrockmendel referenced this pull request Aug 10, 2018

Merged

implement masked_arith_op to de-duplicate ops code #22182

1 of 1 task complete
@jreback

This comment has been minimized.

Copy link
Contributor

commented Aug 10, 2018

needs rebase again

@jreback
Copy link
Contributor

left a comment

lgtm.

@@ -273,6 +273,8 @@ def test_getitem_boolean(self):
# test df[df > 0]
for df in [self.tsframe, self.mixed_frame,
self.mixed_float, self.mixed_int]:
if compat.PY3 and df is self.mixed_frame:
continue

This comment has been minimized.

Copy link
@jreback

jreback Aug 14, 2018

Contributor

k thanks

@jreback jreback merged commit f7f266c into pandas-dev:master Aug 14, 2018

3 checks passed

ci/circleci Your tests passed on CircleCI!
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@jreback

This comment has been minimized.

Copy link
Contributor

commented Aug 14, 2018

thanks @jbrockmendel

nice squashings!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.