Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make DataFrame arithmetic ops with 2D arrays behave like numpy analogues #23000

Merged
merged 14 commits into from Oct 7, 2018

Conversation

Projects
None yet
3 participants
@jbrockmendel
Copy link
Member

commented Oct 4, 2018

  • closes #22686
  • tests added / passed
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

Change DataFrame arithmetic behavior when operating against 2D-ndarray such that op(df, arr) broadcasts like op(df.values, arr).

Related: #22880. Note this does NOT change the behavior of DataFrame comparison operations, so that PR (more specifically, the tests) will have to be updated if this is merged.

@timlod can you confirm that this is what you ha din mind in #22686?

@pep8speaks

This comment has been minimized.

Copy link

commented Oct 4, 2018

Hello @jbrockmendel! Thanks for submitting the PR.

@codecov

This comment has been minimized.

Copy link

commented Oct 5, 2018

Codecov Report

Merging #23000 into master will decrease coverage by <.01%.
The diff coverage is 80%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #23000      +/-   ##
==========================================
- Coverage   92.19%   92.19%   -0.01%     
==========================================
  Files         169      169              
  Lines       50835    50843       +8     
==========================================
+ Hits        46868    46874       +6     
- Misses       3967     3969       +2
Flag Coverage Δ
#multiple 90.61% <80%> (-0.01%) ⬇️
#single 42.34% <0%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/core/ops.py 97.19% <80%> (-0.24%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d523d9f...01ab2b5. Read the comment docs.

df = pd.DataFrame(arr, columns=[True, False], index=['A', 'B', 'C'])

rowlike = arr[[1], :] # shape --> (1, ncols)
expected = pd.DataFrame([[2, 4],

This comment has been minimized.

Copy link
@jreback

jreback Oct 5, 2018

Contributor

can you assert on the shape of rowlike here (like your comment but more explict)

This comment has been minimized.

Copy link
@jbrockmendel

jbrockmendel Oct 5, 2018

Author Member

Good idea; I had messed up one of them.

tm.assert_frame_equal(result, expected)
result = collike + df
tm.assert_frame_equal(result, expected)

This comment has been minimized.

Copy link
@jreback

jreback Oct 5, 2018

Contributor

do we have sufficient converage for a broadcast op with a non-homogenous frame?

This comment has been minimized.

Copy link
@jbrockmendel

jbrockmendel Oct 5, 2018

Author Member

its pretty scattered. specifically within this module its pretty bare

jbrockmendel added some commits Oct 5, 2018

@jreback jreback added the Numeric label Oct 6, 2018


.. ipython:: python
In [3]: arr = np.arange(6).reshape(3, 2)
In [4]: df = pd.DataFrame(arr)

This comment has been minimized.

Copy link
@jreback

jreback Oct 6, 2018

Contributor

don't need the ipython prompts, just the statements themselves. I would make 2 blocks, 1 where you show the starting frame, then do ops on it in the 2nd

arr = np.arange(6).reshape(3, 2)
df = pd.DataFrame(arr)
df

This comment has been minimized.

Copy link
@jreback

jreback Oct 6, 2018

Contributor

add another ipython:: python here, the blank line gets removed and it appears as a single block (the way it is written), if you add another block, then you get another cell

right = left._constructor(right,
index=left.index,
columns=left.columns)
# TODO: Double-check this doesn't make copies

This comment has been minimized.

Copy link
@jreback

jreback Oct 6, 2018

Contributor

is this relevant?

This comment has been minimized.

Copy link
@jbrockmendel

jbrockmendel Oct 6, 2018

Author Member

For performance, if the answer is that it does make copies, then yes. At least in the sufficiently-new numpy case, we're passing a view in left._constructor.

a = np.arange(3)
b = a.reshape(3, 1)
c = np.broadcast_to(b, (3, 2))
d = c.copy()

df = pd.DataFrame(c)
df2 = pd.DataFrame(d)

>>> df.values.base is a  # <-- the concern is that this comes back False
True

>>> df2.values.base is d
True

In this example its OK. I left the comment to do a more thorough check. Are you confident this is always OK?

@@ -252,6 +253,48 @@ def test_arith_flex_zero_len_raises(self):


class TestFrameArithmetic(object):
# TODO: tests for other arithmetic ops
def test_df_add_2d_array_rowlike_broadcasts(self):

This comment has been minimized.

Copy link
@jreback

jreback Oct 6, 2018

Contributor

can you expand to use the all_arithmetic_ops fixture? (or some of them)?

This comment has been minimized.

Copy link
@jbrockmendel

jbrockmendel Oct 6, 2018

Author Member

Sure. I'll keep these tests as-is because they have nice explicitly-written-out expecteds, and add another pair of tests using the fixtures.

jbrockmendel added some commits Oct 6, 2018

@jreback jreback added this to the 0.24.0 milestone Oct 7, 2018

@jreback

jreback approved these changes Oct 7, 2018

@jreback jreback merged commit c82552d into pandas-dev:master Oct 7, 2018

6 checks passed

ci/circleci: py27_compat Your tests passed on CircleCI!
Details
ci/circleci: py35_ascii Your tests passed on CircleCI!
Details
ci/circleci: py36_locale Your tests passed on CircleCI!
Details
ci/circleci: py36_locale_slow Your tests passed on CircleCI!
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
pandas-dev.pandas Build #20181006.14 succeeded
Details
@jreback

This comment has been minimized.

Copy link
Contributor

commented Oct 7, 2018

thanks!

@jbrockmendel jbrockmendel deleted the jbrockmendel:bcast branch Oct 8, 2018

brute4s99 added a commit to brute4s99/pandas that referenced this pull request Nov 19, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.