Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dispatch scalar DataFrame ops to Series #22163

Merged
merged 19 commits into from
Aug 14, 2018
Merged

Conversation

jbrockmendel
Copy link
Member

@jbrockmendel jbrockmendel commented Aug 2, 2018

Many issues closed; will track them down and update. Will also need whatsnew.

closes #18874
closes #20088
closes #15697
closes #13128
closes #8554
closes #8932
closes #21610
closes #22005
closes #22047
closes #22242

This will be less verbose after #22068 implements ops.dispatch_to_series.

This still only dispatches a subset of ops. #22019 dispatches another (disjoint) subset. After that is another easy-ish case where alignment is known. Saved for last are cases with ambiguous alignment that is currently done in an ad-hoc best-guess way.

@pep8speaks
Copy link

pep8speaks commented Aug 2, 2018

Hello @jbrockmendel! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on August 10, 2018 at 14:11 Hours UTC

@jbrockmendel
Copy link
Member Author

Uses too many kludges, should wait for fixes in series and index comparisons. Closing.

@jbrockmendel
Copy link
Member Author

Re-opening after scaling back an unreasonably ambitious py2/py3 compat goal. In particular consider:

df = pd.Series(['bar', 'bar'], name='foo').to_frame()
df < 0
df['foo'] < 0

In PY3 ATM this gives:

>>> df < 0
    foo
0  True
1  True
>>> df['foo'] < 0
[...]
TypeError: '<' not supported between instances of 'str' and 'int'

And in PY2:

>>> df < 0
     foo
0  False
1  False
>>> df['foo'] < 0
0    False
1    False
Name: foo, dtype: bool

Making the PY2/PY3 behavior identical is not feasible, but we can (and this PR does) ensure that the DataFrame/Series behavior matches. In PY2 this is unchanged, in PY3 the DataFrame comparison now correctly raises.

@jbrockmendel jbrockmendel reopened this Aug 3, 2018
@codecov
Copy link

codecov bot commented Aug 3, 2018

Codecov Report

Merging #22163 into master will decrease coverage by 0.03%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #22163      +/-   ##
==========================================
- Coverage   92.08%   92.05%   -0.04%     
==========================================
  Files         169      169              
  Lines       50694    50700       +6     
==========================================
- Hits        46682    46672      -10     
- Misses       4012     4028      +16
Flag Coverage Δ
#multiple 90.46% <100%> (-0.04%) ⬇️
#single 42.26% <80%> (-0.08%) ⬇️
Impacted Files Coverage Δ
pandas/core/ops.py 96.71% <100%> (+0.14%) ⬆️
pandas/core/frame.py 97.26% <100%> (ø) ⬆️
pandas/core/internals/blocks.py 93.83% <0%> (-0.81%) ⬇️
pandas/core/dtypes/missing.py 92.98% <0%> (-0.59%) ⬇️
pandas/util/testing.py 85.69% <0%> (-0.21%) ⬇️
pandas/core/generic.py 96.42% <0%> (-0.05%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cc3ab4a...9b62135. Read the comment docs.

@gfyoung gfyoung added Refactor Internal refactoring of code Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff labels Aug 7, 2018
@gfyoung gfyoung requested a review from jreback August 7, 2018 17:03
@jbrockmendel
Copy link
Member Author

Updated OP with issues this closes

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would like to have gh comments on the tests where appropriate. pls also do the whatsnew. its pretty important that we nail down and match the closed issues with the code.

@@ -4949,6 +4949,14 @@ def _combine_match_columns(self, other, func, level=None, try_cast=True):
return self._constructor(new_data)

def _combine_const(self, other, func, errors='raise', try_cast=True):
if lib.is_scalar(other) or np.ndim(other) == 0:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is is pretty annoything that we have to do this, I would make an explict function maybe is_any_scalar I think as we have these types of checks all over. pls make an issue for this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -1327,6 +1327,10 @@ def wrapper(self, other, axis=None):

res_name = get_op_result_name(self, other)

if isinstance(other, list):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is is_list_like (maybe after some other comparisons) enough here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ATM the isinstance(other, list) check is done below the isinstance(other, (np.ndarray, pd.Index)) check. Wrapping lists earlier let us send lists through that same ndarray/Index block. Ideally the catchall else: block can be reduced to only-scalars, but we're not there yet.

@@ -1706,7 +1708,8 @@ def f(self, other, axis=default_axis, level=None, fill_value=None):
if fill_value is not None:
self = self.fillna(fill_value)

return self._combine_const(other, na_op, try_cast=True)
pass_op = op if lib.is_scalar(other) else na_op
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are checking for a scalar here and above?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's kind of annoying. If lib.is_scalar(other) then we will be dispatching to the Series op, in which case we want to pass the "raw" op (e.g. operator.add) and not the wrapped op na_op.

This PR handles only scalars since that is a relatively easy case. A few PRs down the road we'll have all these ops dispatch to series, at which point this won't be necessary.

@@ -273,6 +273,8 @@ def test_getitem_boolean(self):
# test df[df > 0]
for df in [self.tsframe, self.mixed_frame,
self.mixed_float, self.mixed_int]:
if compat.PY3 and df is self.mixed_frame:
continue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's strip out the mixed_frame to another function (even though that duplicates some code), bonus can parametrize this test.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bonus can parametrize this test.

I don't think tsframe, mixed_frame, mixed_float, mixed_int are available in the namespace.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these need to be made fixtures. this becomes so much easier.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, am starting to implement this in the test_arithmetic sequence of PRs. Will update this test when that lands.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

k thanks

tm.assert_frame_equal(actual, dfn)
actual = df1 - NA
tm.assert_frame_equal(actual, dfn)
with pytest.raises(TypeError):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this raising? this is a big change if you don't allow nan to act as NaT in ops

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the current behavior for Series and Index.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs a subsection in the whatsnew then, marked as an api change.

expected = op(s, value).dtypes
assert_series_equal(result, expected)

invalid = {(operator.pow, '<M8[ns]'),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pull this out and parametrize

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test already has two layers of parametrization; it isn't clear how to pull this out without making it more verbose+repetitive. Let me give this some thought and circle back.

@jbrockmendel
Copy link
Member Author

Phew. Just added GH references to tests and a ton of Whats New.

@jreback
Copy link
Contributor

jreback commented Aug 9, 2018

needs a rebase and some comments.

@jreback
Copy link
Contributor

jreback commented Aug 10, 2018

needs rebase again

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm.

@@ -273,6 +273,8 @@ def test_getitem_boolean(self):
# test df[df > 0]
for df in [self.tsframe, self.mixed_frame,
self.mixed_float, self.mixed_int]:
if compat.PY3 and df is self.mixed_frame:
continue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

k thanks

@jreback jreback merged commit f7f266c into pandas-dev:master Aug 14, 2018
@jreback
Copy link
Contributor

jreback commented Aug 14, 2018

thanks @jbrockmendel

nice squashings!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment