New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Frame ops prelims - de-duplicate, remove unused kwargs #19522

Merged
merged 8 commits into from Feb 7, 2018

Conversation

Projects
None yet
3 participants
@jbrockmendel
Member

jbrockmendel commented Feb 3, 2018

Starting in on making DataFrame ops consistent with Series and Index ops. As a first pass, this goes through and removes some duplicate code and unused arguments.

Changes logic in exactly two corner cases: arithmetic operations frame.op(series, axis=None, fill_value=not_none) where one of series or frame has length zero. axis needs to be specifically passed as None to hit the changed case.

ser = pd.Series([])
df = pd.DataFrame([[1, 2], [3, 4]])

>>> df.add(ser, axis=None, fill_value='E')
>>> ser.to_frame().sub(df[0], axis=None, fill_value=3)

will now raise instead of returning

>>> df.add(ser, axis=None, fill_value='E')
    0   1
0 NaN NaN
1 NaN NaN

>>> ser.to_frame().sub(df[0], axis=None, fill_value=3)
Empty DataFrame
Columns: [0]
Index: []

jbrockmendel added some commits Feb 3, 2018

@codecov

This comment has been minimized.

codecov bot commented Feb 3, 2018

Codecov Report

Merging #19522 into master will decrease coverage by <.01%.
The diff coverage is 90.9%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #19522      +/-   ##
==========================================
- Coverage    91.6%    91.6%   -0.01%     
==========================================
  Files         150      150              
  Lines       48750    48736      -14     
==========================================
- Hits        44657    44644      -13     
+ Misses       4093     4092       -1
Flag Coverage Δ
#multiple 89.97% <90.9%> (-0.01%) ⬇️
#single 41.74% <50%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/core/sparse/frame.py 94.83% <100%> (-0.02%) ⬇️
pandas/core/indexes/base.py 96.43% <100%> (-0.02%) ⬇️
pandas/core/ops.py 95.22% <100%> (ø) ⬆️
pandas/core/frame.py 97.42% <80%> (-0.01%) ⬇️
pandas/core/reshape/merge.py 94.22% <0%> (ø) ⬆️
pandas/io/parsers.py 95.56% <0%> (+0.06%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ed10bf6...85d501c. Read the comment docs.

fill_value=fill_value,
try_cast=try_cast)
else:
if len(other) == 0:

This comment has been minimized.

@jreback

jreback Feb 4, 2018

Contributor

not len

fill_value=None, try_cast=True):
if len(other) == 0:
return self * np.nan
if len(self) == 0:

This comment has been minimized.

@jreback

jreback Feb 4, 2018

Contributor

same

if len(other) == 0:
return self * np.nan
if len(self) == 0:
# Ambiguous case, use _series so works with DataFrame

This comment has been minimized.

@jreback

jreback Feb 4, 2018

Contributor

what is this comment?

This comment has been minimized.

@jbrockmendel

jbrockmendel Feb 4, 2018

Member

Not totally sure. Possibly a holdover from higher-dim implementation?

This comment has been minimized.

@jbrockmendel

jbrockmendel Feb 5, 2018

Member

Blame shows that this comment was introduced in c35a39d in 2011. Looks like it replaced a comment # Ambiguous case, use _series so works with DataMatrix. @wesm is this still relevant? Suggestions for fleshing this out?

@@ -3960,52 +3959,40 @@ def f(i):
def _combine_series(self, other, func, fill_value=None, axis=None,
level=None, try_cast=True):
if fill_value is not None:
raise NotImplementedError("fill_value %r not supported." %

This comment has been minimized.

@jreback

jreback Feb 4, 2018

Contributor

is this hit in tests?

This comment has been minimized.

@jreback

jreback Feb 4, 2018

Contributor

can you use .format() here instead

This comment has been minimized.

@jbrockmendel

jbrockmendel Feb 4, 2018

Member

Yes to both. tests.frame.test_operators.TestDataFrameOperators.test_arith_flex_frame:

        with tm.assert_raises_regex(NotImplementedError, 'fill_value'):
           self.frame.add(self.frame.iloc[0], fill_value=3)
@jreback

This comment has been minimized.

Contributor

jreback commented Feb 6, 2018

so you need some tests to validate your claims at the top of the PR

Changes logic in exactly two corner cases: arithmetic operations frame.op(series, axis=None,

and a release note.

@@ -581,6 +581,7 @@ Numeric
- Bug in :class:`Index` multiplication and division methods where operating with a ``Series`` would return an ``Index`` object instead of a ``Series`` object (:issue:`19042`)
- Bug in the :class:`DataFrame` constructor in which data containing very large positive or very large negative numbers was causing ``OverflowError`` (:issue:`18584`)
- Bug in :class:`Index` constructor with ``dtype='uint64'`` where int-like floats were not coerced to :class:`UInt64Index` (:issue:`18400`)
- Bug in :class:`DataFrame` flex arithmetic methods that failed to raise ``NotImplementedError` in cases where either argument has length zero (:issue:`19522`)

This comment has been minimized.

@jreback

jreback Feb 6, 2018

Contributor

need double backticks on NotImplementedError. can you be slightly more descriptive here.

This comment has been minimized.

@jreback

jreback Feb 6, 2018

Contributor

a user is reading this and would want to know what if anything they need to change.

@@ -80,6 +80,26 @@ def _try_get_item(x):
return x
def _make_invalid_op(name):

This comment has been minimized.

@jreback

jreback Feb 6, 2018

Contributor

consider moving this function to core/ops (and parameterizing so you can use it for index/series)

@@ -451,6 +451,18 @@ def test_arith_flex_frame(self):
with tm.assert_raises_regex(NotImplementedError, 'fill_value'):
self.frame.add(self.frame.iloc[0], axis='index', fill_value=3)
def test_arith_flex_zero_len_raises(self):

This comment has been minimized.

@jreback

jreback Feb 6, 2018

Contributor

shouldn't these be in test_arithmetic?

This comment has been minimized.

@jbrockmendel

jbrockmendel Feb 6, 2018

Member

Eventually yes. The more important grouping for now is that this belongs with the test above it, which I plan to split/parametrize before moving to test_arithmetic

df.add(ser, fill_value='E')
with tm.assert_raises_regex(NotImplementedError, 'fill_value'):
ser.to_frame().sub(df[0], axis=None, fill_value=3)

This comment has been minimized.

@jreback

jreback Feb 6, 2018

Contributor

just create the required frame here rather than using ser.to_frame()

This comment has been minimized.

@jbrockmendel
df = pd.DataFrame([[1, 2], [3, 4]])
with tm.assert_raises_regex(NotImplementedError, 'fill_value'):
df.add(ser, fill_value='E')

This comment has been minimized.

@jreback

jreback Feb 6, 2018

Contributor

this should raise ValueError, not NIE.

This comment has been minimized.

@jbrockmendel

jbrockmendel Feb 6, 2018

Member

The existing implementation raises NIE outside of these corner cases. For the time being I'm assuming there's a reason for that.

jbrockmendel added some commits Feb 6, 2018

@jreback jreback added this to the 0.23.0 milestone Feb 7, 2018

@jreback

jreback approved these changes Feb 7, 2018

@jreback jreback merged commit a7d1103 into pandas-dev:master Feb 7, 2018

3 checks passed

ci/circleci Your tests passed on CircleCI!
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@jreback

This comment has been minimized.

Contributor

jreback commented Feb 7, 2018

thanks

@jbrockmendel jbrockmendel deleted the jbrockmendel:ops-frame3 branch Feb 11, 2018

harisbal pushed a commit to harisbal/pandas that referenced this pull request Feb 28, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment