Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MultiIndex needs to validate that levels and labels are compatible. #5213

Closed
jtratner opened this issue Oct 14, 2013 · 4 comments · Fixed by #5214

Comments

@jtratner
Copy link
Contributor

commented Oct 14, 2013

Urp:

n [1]: import pandas as pd

In [2]: mi = pd.MultiIndex(labels = [[1.25, 0.25, 3.1], [3.2, 1.2, 4.1]], levels=[['a'], ['a', 'b', 'c']])

In [3]: mi
Out[3]:
MultiIndex(levels=[[u'a'], [u'a', u'b', u'c']],
           labels=[[1, 0, 3], [3, 1, 4]])

In [4]: mi.values
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-4-e9591d864c48> in <module>()
----> 1 mi.values

/Users/jtratner/projects/python/pandas/pandas/core/index.py in values(self)
   2163             values = []
   2164             for lev, lab in zip(self.levels, self.labels):
-> 2165                 taken = com.take_1d(lev.values, lab)
   2166                 # Need to box timestamps, etc.
   2167                 if hasattr(lev, '_box_values'):

/Users/jtratner/projects/python/pandas/pandas/core/common.py in take_nd(arr, indexer, axis, out, fill_value, mask_info, allow_fill)
    630     func = _get_take_nd_function(arr.ndim, arr.dtype, out.dtype,
    631                                  axis=axis, mask_info=mask_info)
--> 632     func(arr, indexer, out, fill_value)
    633     return out
    634

/Users/jtratner/projects/python/pandas/pandas/algos.so in pandas.algos.take_1d_object_object (pandas/algos.c:68489)()

IndexError: Out of bounds on buffer access (axis 0)

In [5]: mi
Out[5]:
MultiIndex(levels=[[u'a'], [u'a', u'b', u'c']],
           labels=[[1, 0, 3], [3, 1, 4]])

In [6]: mi.levels
Out[6]: FrozenList([[u'a'], [u'a', u'b', u'c']])

In [7]: mi.labels
Out[7]: FrozenList([[1, 0, 3], [3, 1, 4]])

Labels.max() needs to be <= len(levels) for each pair of levels and labels.

@ghost ghost assigned jtratner Oct 14, 2013

@jtratner

This comment has been minimized.

Copy link
Contributor Author

commented Oct 14, 2013

@jreback can this skip integrity check (from io/pytables)?:

    def read_multi_index(self, key):
        nlevels = getattr(self.attrs, '%s_nlevels' % key)

        levels = []
        labels = []
        names = []
        for i in range(nlevels):
            level_key = '%s_level%d' % (key, i)
            name, lev = self.read_index_node(getattr(self.group, level_key))
            levels.append(lev)
            names.append(name)

            label_key = '%s_label%d' % (key, i)
            lab = self.read_array(label_key)
            labels.append(lab)

        return MultiIndex(levels=levels, labels=labels, names=names)
@jreback

This comment has been minimized.

Copy link
Contributor

commented Oct 14, 2013

not sure where the integrity check is?

@jtratner

This comment has been minimized.

Copy link
Contributor Author

commented Oct 14, 2013

check the PR. This implicitly passed verify_integrity=True to the constructor. I'll edit it to reflect that. Just want to confirm that this is something that ought to be checked.

@jreback

This comment has been minimized.

Copy link
Contributor

commented Oct 14, 2013

I c
doesn't hurt to do
in theory the data could be corrupted in a subtle way

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.