Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

return empty MultiIndex for symmetrical difference on equal MultiIndexes #16486

Merged
merged 12 commits into from May 31, 2017
Merged

Conversation

Tafkas
Copy link
Contributor

@Tafkas Tafkas commented May 24, 2017

@@ -909,6 +907,13 @@ def test_symmetric_difference(self):
expected = MultiIndex.from_tuples([('bar', 2), ('baz', 3), ('bar', 3)])
assert tm.equalContents(result, expected)

# on equal multiIndexs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you move this to it's own dedicated test, and add a comment with a link to the github issue?

@TomAugspurger
Copy link
Contributor

Some of your style changes caused linting errors: https://travis-ci.org/pandas-dev/pandas/jobs/235803966#L1385

pandas/core/indexes/base.py:184:25: E127 continuation line over-indented for visual indent
pandas/core/indexes/base.py:194:23: E127 continuation line over-indented for visual indent
pandas/core/indexes/base.py:298:37: E127 continuation line over-indented for visual indent
pandas/core/indexes/base.py:904:38: E127 continuation line over-indented for visual indent
pandas/core/indexes/base.py:1466:13: E122 continuation line missing indentation or outdented
pandas/core/indexes/base.py:1467:13: E122 continuation line missing indentation or outdented
pandas/core/indexes/base.py:3374:45: E127 continuation line over-indented for visual indent
pandas/core/indexes/base.py:3392:13: E128 continuation line under-indented for visual indent
pandas/core/indexes/base.py:3670:29: E127 continuation line over-indented for visual indent
pandas/core/indexes/base.py:3746:17: E122 continuation line missing indentation or outdented
pandas/core/indexes/base.py:3747:17: E122 continuation line missing indentation or outdented
pandas/core/indexes/base.py:3764:17: E122 continuation line missing indentation or outdented
pandas/core/indexes/base.py:3765:17: E122 continuation line missing indentation or outdented
pandas/core/indexes/base.py:3772:21: E122 continuation line missing indentation or outdented
pandas/core/indexes/base.py:3773:21: E122 continuation line missing indentation or outdented
pandas/core/indexes/base.py:3912:29: E127 continuation line over-indented for visual indent

if 'freq' in attribs:
attribs['freq'] = None
return self._shallow_copy_with_infer(the_diff, **attribs)
if self.nlevels > 1 and len(the_diff) == 0:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could use a comment explaining why we need the special case.

if self.nlevels > 1 and len(the_diff) == 0:
return type(self)([[] for _ in range(self.nlevels)],
[[] for _ in range(self.nlevels)])
else:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can remove this else condition, and just dedent everything like it was before.

@codecov
Copy link

codecov bot commented May 24, 2017

Codecov Report

Merging #16486 into master will decrease coverage by 50.23%.
The diff coverage is 22.22%.

Impacted file tree graph

@@             Coverage Diff             @@
##           master   #16486       +/-   ##
===========================================
- Coverage    90.4%   40.17%   -50.24%     
===========================================
  Files         161      161               
  Lines       51033    51035        +2     
===========================================
- Hits        46136    20501    -25635     
- Misses       4897    30534    +25637
Flag Coverage Δ
#multiple ?
#single 40.17% <22.22%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/core/indexes/base.py 56.39% <22.22%> (-39.35%) ⬇️
pandas/io/sas/sas_constants.py 0% <0%> (-100%) ⬇️
pandas/types/concat.py 0% <0%> (-100%) ⬇️
pandas/parser.py 0% <0%> (-100%) ⬇️
pandas/tslib.py 0% <0%> (-100%) ⬇️
pandas/tools/hashing.py 0% <0%> (-100%) ⬇️
pandas/io/sas/sas7bdat.py 0% <0%> (-90.67%) ⬇️
pandas/io/sas/sas_xport.py 0% <0%> (-90.28%) ⬇️
pandas/core/tools/numeric.py 10.44% <0%> (-89.56%) ⬇️
pandas/io/json/normalize.py 8.16% <0%> (-88.78%) ⬇️
... and 112 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 97ad3fb...743aa47. Read the comment docs.

@codecov
Copy link

codecov bot commented May 24, 2017

Codecov Report

Merging #16486 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #16486      +/-   ##
==========================================
+ Coverage   90.75%   90.75%   +<.01%     
==========================================
  Files         161      161              
  Lines       51071    51073       +2     
==========================================
+ Hits        46348    46351       +3     
+ Misses       4723     4722       -1
Flag Coverage Δ
#multiple 88.59% <100%> (ø) ⬆️
#single 40.11% <0%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/core/indexes/multi.py 96.64% <100%> (+0.08%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 92d0799...6c67a4a. Read the comment docs.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

couple of changes. pls add a whatsnew entry in Bug Fixes (reshaping) for 0.20.2

# On equal MultiIndexes the difference is empty. Therefore an empty
# MultiIndex is returned GH13490
if self.nlevels > 1 and len(the_diff) == 0:
return type(self)([[] for _ in range(self.nlevels)],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use self._shallow_copy

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That wouldn't work, self._shallow_copy fails if the_diff is empty. That's why I am returning an empty MI instead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In [2]: pd.MultiIndex([[], []], [[],[]])
Out[2]: 
MultiIndex(levels=[[], []],
           labels=[[], []])

In [3]: pd.MultiIndex([[], []], [[],[]])._shallow_copy()
Out[3]: 
MultiIndex(levels=[[], []],
           labels=[[], []])

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I was confusing it with self._shallow_copy_with_infer. But still, if I call self._shallow_copy with the_diff being empty from_tuples gives me TypeError: Cannot infer number of levels from empty list

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback what do you think about a special case in MulltiIndex._shallow_copy when values is length 0 (that's what is passed from symmetric_difference)?

    @Appender(_index_shared_docs['_shallow_copy'])
    def _shallow_copy(self, values=None, **kwargs):
        if values is not None:
            if 'name' in kwargs:
                kwargs['names'] = kwargs.pop('name', None)
            # discards freq
            kwargs.pop('freq', None)
            # this if block is new
            if len(values) == 0:
                return MultiIndex(levels=[[] for _ in range(self.nlevels)],
                                  labels=[[] for _ in range(self.nlabels)])
            return MultiIndex.from_tuples(values, **kwargs)
        return self.view()

this would "work" but I don't know if "array of length 0" means same structure, but empty. Maybe it's ok.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this change should be done in _shallow_copy_with_infer, you need to construct a MI equiv of self[0:0]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this change should be done in _shallow_copy_with_infer, you need to construct a MI equiv of self[0:0]

This leaves around some levels unfortunately:

In [16]: idx = pd.MultiIndex.from_product([['a', 'b'], ['A', 'B']])

In [17]: idx[0:0]
Out[17]:
MultiIndex(levels=[['a', 'b'], ['A', 'B']],
           labels=[[], []])

In [18]: idx.difference(idx) | idx.difference(idx)
Out[18]:
MultiIndex(levels=[[], []],
           labels=[[], []])

I believe we want Out[18]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, then I guess you can special case values=None in _shallow_copy to take a different path of construction (IOW don't use the MultiIndex.from_tuples). The crucial point is to propogate the meta-data.

idx2 = MultiIndex.from_tuples(self.tuples)
result = idx1.symmetric_difference(idx2)
expected = MultiIndex(levels=[[], []], labels=[[], []])
assert tm.equalContents(result, expected)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use tm.assert_index_equal

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would the expected value be None then?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why?

tm.assert_index_equal(result, expected) will work this compares the indexes, including all meta-data.

in fact your example is NOT propogating meta-data (that's what _shallow_copy does and why you need to use it). add some names for the levels and see.

@TomAugspurger
Copy link
Contributor

@Tafkas sorry if the above conversation about _shallow_copy wasn't clear. I think the idea is to revert the changes in core/indexes/base.py and only change MultiIndex._shallow_copy in core/indexes/multi.py. That's where the new if block can go

            if len(values) == 0:
                return MultiIndex(levels=[[] for _ in range(self.nlevels)],
                                  labels=[[] for _ in range(self.nlabels)], **kwargs)

Can you also add a couple tests with metadata (names)? I think the rule is to propagate names if they are the same on either index. So

def test_multiindex_symmetric_difference(self):
    idx = MultiIndex.from_product([['a', 'b'], ['A', 'B']],
                                  names=['a', 'b'])
    result = idx ^ idx
    assert result.names == idx.names

    idx2 = idx.copy().rename(['A', 'B'])
    result = idx ^ idx
    assert result.names == [None, None]

Do you have a chance to make those changes?

@Tafkas
Copy link
Contributor Author

Tafkas commented May 29, 2017

@TomAugspurger sure I can do that. But what about the case if values is None?

@TomAugspurger
Copy link
Contributor

Then it should do whatever it's doing now. The new case should only apply when the sym diff is empty.

@TomAugspurger TomAugspurger added this to the 0.20.2 milestone May 30, 2017
@TomAugspurger
Copy link
Contributor

@Tafkas if you can rebase / update, we can merge this before 0.20.2 is release (end of the week)

@@ -38,6 +38,7 @@ Bug Fixes
~~~~~~~~~

- Bug in using ``pathlib.Path`` or ``py.path.local`` objects with io functions (:issue:`16291`)
- Bug in ``Index`` calling symmetric_difference() on two equal multiindices results in a TypeError (:issue `13490`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Index.symmetric_difference() on two equal MultiIndex's, results in a TypeError

@@ -419,6 +419,11 @@ def _shallow_copy_with_infer(self, values=None, **kwargs):
@Appender(_index_shared_docs['_shallow_copy'])
def _shallow_copy(self, values=None, **kwargs):
if values is not None:
# On equal MultiIndexes the difference is empty.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this block can simply go in _shallow_copy_with_infer, idea being we don't pass None to _shallow_copy unless we actually mean that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed it

@jreback
Copy link
Contributor

jreback commented May 30, 2017

lgtm. ping on green.

@jreback jreback merged commit d31ffdb into pandas-dev:master May 31, 2017
@jreback
Copy link
Contributor

jreback commented May 31, 2017

thanks!

TomAugspurger pushed a commit to TomAugspurger/pandas that referenced this pull request Jun 1, 2017
TomAugspurger pushed a commit that referenced this pull request Jun 4, 2017
Kiv pushed a commit to Kiv/pandas that referenced this pull request Jun 11, 2017
stangirala pushed a commit to stangirala/pandas that referenced this pull request Jun 11, 2017
yarikoptic added a commit to neurodebian/pandas that referenced this pull request Jul 12, 2017
Version 0.20.2

* tag 'v0.20.2': (68 commits)
  RLS: v0.20.2
  DOC: Update release.rst
  DOC: Whatsnew fixups (pandas-dev#16596)
  ERRR: Raise error in usecols when column doesn't exist but length matches (pandas-dev#16460)
  BUG: convert numpy strings in index names in HDF pandas-dev#13492 (pandas-dev#16444)
  PERF: vectorize _interp_limit (pandas-dev#16592)
  DOC: whatsnew 0.20.2 edits (pandas-dev#16587)
  API: Make is_strictly_monotonic_* private (pandas-dev#16576)
  BUG: reimplement MultiIndex.remove_unused_levels (pandas-dev#16565)
  Strictly monotonic (pandas-dev#16555)
  ENH: add .ngroup() method to groupby objects (pandas-dev#14026) (pandas-dev#14026)
  fix linting
  BUG: Incorrect handling of rolling.cov with offset window (pandas-dev#16244)
  BUG: select_as_multiple doesn't respect start/stop kwargs GH16209 (pandas-dev#16317)
  return empty MultiIndex for symmetrical difference on equal MultiIndexes (pandas-dev#16486)
  BUG: Bug in .resample() and .groupby() when aggregating on integers (pandas-dev#16549)
  BUG: Fixed tput output on windows (pandas-dev#16496)
  Strictly monotonic (pandas-dev#16555)
  BUG: fixed wrong order of ordered labels in pd.cut()
  BUG: Fixed to_html ignoring index_names parameter
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Symmetric difference on equal MultiIndexes raises TypeError
3 participants