Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API/BUG: Fix Series ops inconsistencies #13894

Merged
merged 3 commits into from
Aug 25, 2016

Conversation

sinhrks
Copy link
Member

@sinhrks sinhrks commented Aug 3, 2016

This includes the fix only for #1134 and not for other inconsistencies (like #13637), as it needs further discussions.

Changes:

  • series comparison operator to check whether labels are identical (currently: ignores labels)
  • series boolean operator to align with labels (currently: only keeps left index)

@sinhrks sinhrks added API Design Numeric Operations Arithmetic, Comparison, and Logical operations labels Aug 3, 2016
@sinhrks sinhrks added this to the 0.19.0 milestone Aug 3, 2016
@codecov-io
Copy link

codecov-io commented Aug 4, 2016

Current coverage is 85.25% (diff: 100%)

Merging #13894 into master will decrease coverage by <.01%

@@             master     #13894   diff @@
==========================================
  Files           139        139          
  Lines         50386      50394     +8   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits          42959      42965     +6   
- Misses         7427       7429     +2   
  Partials          0          0          

Powered by Codecov. Last update 51b20de...fe322be

- ``Series`` comparison operators now raise ``ValueError`` when ``index`` are different.
- ``Series`` logical operators align both ``index``.

As a result, ``Series`` and ``DataFrame`` operators behave consistently as below:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a comment on how this is more strict than previously. (maybe in a warning). IOW in cases where it previously worked, it will now error if the alignment is incorrect. This will now raise rather than silently pass.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, added warning.

@sinhrks sinhrks force-pushed the ops_series_compat branch 2 times, most recently from 89515c9 to 3cf1f77 Compare August 7, 2016 12:10
@jreback
Copy link
Contributor

jreback commented Aug 11, 2016

will have a look.

@jorisvandenbossche
@wesm

s1 & s2

.. note::
``Series`` logical operators fill ``NaN`` result with ``False``.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is inconsistent with how DataFrame behaves?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, see #13896.

@jorisvandenbossche
Copy link
Member

For dataframes, the operator (eg ==) is strict regarding indices, but the equivalent method (eg df1.eq(df2)) is flexible as it aligns.
For series, both options raise now (current PR) or are broken (master). I think we should also make this distinction and make s1.eq(s2) flexible (=align and not raise) as discussed in #1134?

@wesm
Copy link
Member

wesm commented Aug 17, 2016

This seems OK to me. I'm sure this will cause some API breakage but it would be best to encourage folks to use the flex comparison methods if they want auto-alignment

@jorisvandenbossche
Copy link
Member

In light of Wes' comment, I think the whatsnew notice can be a bit more explicit in that regard. So say that if you did s1 == s2 before (which now raises), that you have to do s1.values == s2.values if you want to keep the same non-aligning / index-ignoring behaviour, or s1.eq(d2) to get auto-alignment.

@sinhrks sinhrks force-pushed the ops_series_compat branch 2 times, most recently from b4980b5 to b6a837b Compare August 17, 2016 21:56
Until 0.18.1, comparing ``Series`` with the same length has been succeeded even if
these ``index`` are different (the result ignores ``index``).
As of 0.19.1, it raises ``ValueError`` to be more strict.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0.19.0

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I said in another comment, I would be more explicit here about how to deal with this when you used such code (so in case you wanted this behaviour, you now have to use s1.values == s2.values)

Maybe an example where you show all 4 cases (old behaviour, new default, new .values, new flex method) next to each other. The overview below is very structured, which is certainly good to keep! But I would also give one example for probably the case where most likely breakages will occur that compares all different behaviour together.

@jorisvandenbossche
Copy link
Member

Modulo the doc comments, this is good to go for me!

@jorisvandenbossche
Copy link
Member

LGTM
@jreback ?

@jorisvandenbossche jorisvandenbossche merged commit 5152cdd into pandas-dev:master Aug 25, 2016
@jorisvandenbossche
Copy link
Member

@sinhrks Thanks a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Numeric Operations Arithmetic, Comparison, and Logical operations
Projects
None yet
5 participants