Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot represent empty MultiIndex #263

Closed
asqui opened this issue Oct 20, 2011 · 8 comments
Closed

Cannot represent empty MultiIndex #263

asqui opened this issue Oct 20, 2011 · 8 comments
Milestone

Comments

@asqui
Copy link

asqui commented Oct 20, 2011

An empty MultiIndex cannot be constructed:

In [120]: pa.MultiIndex([],[])
---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
C:\Users\dfortunov\<ipython-input-120-ca5695a01949> in <module>()
----> 1 pa.MultiIndex([],[])

C:\VirtualEnvs\mss.dev\lib\site-packages\pandas-0.5.0.dev20111019_17461b7-py2.6-win32.egg\pandas\core\index.pyc in __new
__(cls, levels, labels, sortorder, names)
    833         assert(len(levels) == len(labels))
    834         if len(levels) == 0:
--> 835             raise Exception('Must pass non-zero number of levels/labels')
    836
    837         if len(levels) == 1:

Exception: Must pass non-zero number of levels/labels

This does not maintain parity with regular indices, where an empty index may be constructed:

In [119]: pa.Index([])
Out[119]: Index([], dtype=object)

And causes problems when doing set operations with MultiIndexes.

@wesm
Copy link
Member

wesm commented Oct 20, 2011

Hm, that's a bit ill-defined, isn't it (having no levels vs. length 0 levels which is what Index([]) does)? This is not problem for example:

MultiIndex([[]], [[]])

Could you show how it leads to problems in set operations?

@asqui
Copy link
Author

asqui commented Oct 20, 2011

MultiIndex([[]], [[]]) doesn't fail immediately, but it also doesn't quite "work", because MultiIndex.__new__ cleverly turns that into a regular empty index, rather than an empty MultiIndex -- the result is Index([], dtype=object)

The only problem with set operations is with the patch which I submitted adding MultiIndex.diff, since difference is the only set operation that can result in an empty index. I'm just trying to fix that up, because some_multiindex.diff(some_multiindex) needs to result in an empty MultiIndex.

@wesm
Copy link
Member

wesm commented Oct 20, 2011

I got it. I'll take a look later today or early tomorrow and sort out a resolution, there may indeed be an upstream problem.

@wesm
Copy link
Member

wesm commented Oct 21, 2011

I think this is resolved in my commit earlier tonight. So this works fine now:

In [2]: index
Out[2]: 
MultiIndex([('foo', 'one'), ('foo', 'two'), ('foo', 'three'), ('bar', 'one'),
       ('bar', 'two'), ('baz', 'two'), ('baz', 'three'), ('qux', 'one'),
       ('qux', 'two'), ('qux', 'three')], dtype=object)

In [3]: index - index
Out[3]: MultiIndex([], dtype=object)

@wesm wesm closed this as completed Oct 21, 2011
@asqui
Copy link
Author

asqui commented Oct 21, 2011

Not quite. I think the MultiIndex constructor should permit construction of an empty MultiIndex.

Try this:

In [80]: index
Out[80]:
MultiIndex([('foo', 'one'), ('foo', 'two'), ('foo', 'three'), ('bar', 'one'),
       ('bar', 'two'), ('baz', 'two'), ('baz', 'three'), ('qux', 'one'),
       ('qux', 'two'), ('qux', 'three')], dtype=object)

In [81]: index - index.sortlevel(0)[0]
---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
C:\Users\dfortunov\<ipython-input-81-62c2391d25ca> in <module>()
----> 1 index - index.sortlevel(0)[0]

C:\VirtualEnvs\mss.dev\lib\site-packages\pandas-0.5.0.dev20111021_a8c38b1-py2.6-win32.egg\pandas\core\index.pyc in __sub
__(self, other)
    261
    262     def __sub__(self, other):
--> 263         return self.diff(other)
    264
    265     def __and__(self, other):

C:\VirtualEnvs\mss.dev\lib\site-packages\pandas-0.5.0.dev20111021_a8c38b1-py2.6-win32.egg\pandas\core\index.pyc in diff(
self, other)
   1537         difference = sorted(set(self.values) - set(other.values))
   1538         return MultiIndex.from_tuples(difference, sortorder=0,
-> 1539                                       names=self.names)
   1540
   1541     def _assert_can_do_setop(self, other):

C:\VirtualEnvs\mss.dev\lib\site-packages\pandas-0.5.0.dev20111021_a8c38b1-py2.6-win32.egg\pandas\core\index.pyc in from_
tuples(cls, tuples, sortorder, names)
   1002         arrays = zip(*tuples)
   1003         return MultiIndex.from_arrays(arrays, sortorder=sortorder,
-> 1004                                       names=names)
   1005
   1006     @property

C:\VirtualEnvs\mss.dev\lib\site-packages\pandas-0.5.0.dev20111021_a8c38b1-py2.6-win32.egg\pandas\core\index.pyc in from_
arrays(cls, arrays, sortorder, names)
    983
    984         return MultiIndex(levels=levels, labels=labels, sortorder=sortorder,
--> 985                           names=names)
    986
    987     @classmethod

C:\VirtualEnvs\mss.dev\lib\site-packages\pandas-0.5.0.dev20111021_a8c38b1-py2.6-win32.egg\pandas\core\index.pyc in __new
__(cls, levels, labels, sortorder, names)
    840         assert(len(levels) == len(labels))
    841         if len(levels) == 0:
--> 842             raise Exception('Must pass non-zero number of levels/labels')
    843
    844         if len(levels) == 1:

Exception: Must pass non-zero number of levels/labels

The different sort order means it fails the equality trap in MultiIndex.diff (which would normally creates an empty index with the sneaky self[:0], trick rather than creating one directly) and then goes on to pass the empty difference set to MultiIndex.from_tuples() which fails.


Sidenote 1: Are the repr outputs of MultiIndex supposed to be round-trippable? Python docs suggest that this should be the case for __repr__ (but not __str__) however I couldn't construct the example MultiIndex from your previous comment by simply pasting that in.

Sidenote 2: MultiIndex.sortlevel() docs are out of date -- they say it only returns the sorted index but in fact it returns a tuple containing the sorted index and an index array.

@wesm
Copy link
Member

wesm commented Oct 21, 2011

You're right, I was being lazy, this is the way to create an empty index (also only passing on the level names if they're the same...will write a test for this):

        if self.equals(other):
            names = self.names if self.names == other.names else other.names
            return MultiIndex(levels=[[]]*self.nlevels,
                              labels=[[]]*self.nlevels,
                              names=names)

here's the commit
wesm@1b30cfc

Sidenote 1: round-trippable __repr__ isn't that common in science.
sidenote 2: pull request pls? =P

@wesm
Copy link
Member

wesm commented Oct 21, 2011

There's a separate bug in the function which is that you should not pass an empty list to from_tuples because it's can't infer the correct number of levels

@wesm
Copy link
Member

wesm commented Oct 21, 2011

wesm@20ae0ed
wesm@b82a93f

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants