s.loc[[]] raises error (only) if index is not unique #13691

toobaz · 2016-07-18T12:46:10Z

Code Sample, a copy-pastable example if possible


In [2]: s = pd.Series(0, index=pd.MultiIndex.from_product([[0], [1,2]]))

In [3]: s.loc[[]]
Out[3]: Series([], dtype: int64)

In [4]: s = pd.Series(0, index=pd.MultiIndex.from_product([[0], [1,1]]))

In [5]: s.loc[[]]
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-5-f9b6211189ca> in <module>()
----> 1 s.loc[[]]

/home/pietro/nobackup/repo/pandas/pandas/core/indexing.py in __getitem__(self, key)
   1304             return self._getitem_tuple(key)
   1305         else:
-> 1306             return self._getitem_axis(key, axis=0)
   1307 
   1308     def _getitem_axis(self, key, axis=0):

/home/pietro/nobackup/repo/pandas/pandas/core/indexing.py in _getitem_axis(self, key, axis)
   1464                     raise ValueError('Cannot index with multidimensional key')
   1465 
-> 1466                 return self._getitem_iterable(key, axis=axis)
   1467 
   1468             # nested tuple slicing

/home/pietro/nobackup/repo/pandas/pandas/core/indexing.py in _getitem_iterable(self, key, axis)
   1095 
   1096                 new_target, indexer, new_indexer = labels._reindex_non_unique(
-> 1097                     keyarr)
   1098 
   1099                 if new_indexer is not None:

/home/pietro/nobackup/repo/pandas/pandas/indexes/base.py in _reindex_non_unique(self, target)
   2497                 new_indexer[~check] = -1
   2498 
-> 2499         new_index = self._shallow_copy_with_infer(new_labels, freq=None)
   2500         return new_index, indexer, new_indexer
   2501 

/home/pietro/nobackup/repo/pandas/pandas/indexes/multi.py in _shallow_copy_with_infer(self, values, **kwargs)
    393 
    394     def _shallow_copy_with_infer(self, values=None, **kwargs):
--> 395         return self._shallow_copy(values, **kwargs)
    396 
    397     @Appender(_index_shared_docs['_shallow_copy'])

/home/pietro/nobackup/repo/pandas/pandas/indexes/multi.py in _shallow_copy(self, values, **kwargs)
    402             # discards freq
    403             kwargs.pop('freq', None)
--> 404             return MultiIndex.from_tuples(values, **kwargs)
    405         return self.view()
    406 

/home/pietro/nobackup/repo/pandas/pandas/indexes/multi.py in from_tuples(cls, tuples, sortorder, names)
    889         if len(tuples) == 0:
    890             # I think this is right? Not quite sure...
--> 891             raise TypeError('Cannot infer number of levels from empty list')
    892 
    893         if isinstance(tuples, (np.ndarray, Index)):

TypeError: Cannot infer number of levels from empty list

Expected Output

Out[5]: Series([], dtype: int64)

By the way, on top of the inconsistency above,

In [6]: s.loc[s.iloc[0:0]]

results in the same error, which is even more unexpected because s.iloc[0:0].index is a perfectly valid MultiIndex, and so it is certainly possible to infer the number of levels from it.

Might be related to #13490 (the error is the same, but I ignore the level at which the open PR applies).

output of `pd.show_versions()`


In [7]: pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 3.5.1.final.0
python-bits: 64
OS: Linux
OS-release: 4.5.0-2-amd64
machine: x86_64
processor: 
byteorder: little
LC_ALL: None
LANG: it_IT.utf8
LOCALE: it_IT.UTF-8

pandas: 0.18.1+206.gc42455b
nose: 1.3.7
pip: 1.5.6
setuptools: 18.4
Cython: 0.23.4
numpy: 1.10.4
scipy: 0.16.0
statsmodels: 0.8.0.dev0+111ddc0
xarray: None
IPython: 5.0.0.dev
sphinx: 1.3.1
patsy: 0.3.0-dev
dateutil: 2.2
pytz: 2012c
blosc: None
bottleneck: 1.1.0dev
tables: 3.2.2
numexpr: 2.5
matplotlib: 1.5.1
openpyxl: None
xlrd: 0.9.4
xlwt: 1.1.2
xlsxwriter: 0.7.3
lxml: None
bs4: 4.4.0
html5lib: 0.999
httplib2: 0.9.1
apiclient: 1.5.0
sqlalchemy: 1.0.11
pymysql: None
psycopg2: None
jinja2: 2.8
boto: 2.38.0
pandas_datareader: 0.2.1

The text was updated successfully, but these errors were encountered:

jreback · 2016-07-18T21:48:27Z

ok, will mark it, though non-unique multi-indexes are not very well tested, nor should they really be allowed (but they are) :<

mborysow · 2016-07-19T19:32:18Z

I have just recently run into this issue as well. I do this very often in fact, and now a lot of my code is broken. I would probably do a lot of differently now, as I've learned more about pandas, but it is what it is. (FYI, merging with df1_good with [left_index=True, right_index=True, how='inner'] produces an equivalent result)

df1 = pd.DataFrame(data=dict(A=[1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2],
B=[1, 1, 2, 2, 2, 3, 1, 1, 1, 2, 3, 4],
C=[1, 2, 1, 2, 3, 2, 1, 2, 3, 2, 1, 4],
V=[1, 5, 2, 3, 8, 3, 3, 3, 1, 2, 1, 4]))
df1 = df1.set_index(['A', 'B', 'C']).sortlevel()

df2 = df1.groupby(level=[0, 2])[['V']].max()

df2_good = df2[df2.V > 9]

...this apparently works and returns an empty dataframe with the index labels and columns intact.
df1_good = df1.loc[df2_good.index]

...this fails with TypeError: Cannot infer number of levels from empty list
df1_good = df1.reset_index().set_index(['A', 'C']).sortlevel().loc[df2_good.index]

Anyhow, it's perfectly acceptable in my analysis that initial query returns an empty DataFrame. This used to work just fine. Now, any time I want to do anything remotely like this I need to always check and make sure that the result of the initial query isn't empty.

This works fine in pandas 0.16, and does not in 0.17 or 0.18. I don't know exactly where it failed, but I get the same error above.

…as-dev#37161)

toobaz changed the title ~~s.loc[[]] raises error (only) if index is not unique~~ s.loc[[]] raises error (only) if index is not unique Jul 18, 2016

jreback added Bug Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex Difficulty Intermediate labels Jul 18, 2016

jreback added this to the Next Major Release milestone Jul 18, 2016

jbrockmendel removed Difficulty Intermediate labels Oct 21, 2019

jbrockmendel added a commit to jbrockmendel/pandas that referenced this issue Oct 15, 2020

BUG: series.loc[[]] with non-unique MultiInex pandas-dev#13691

42add9f

jbrockmendel mentioned this issue Oct 15, 2020

BUG: indexing bugs #26490, #13691 #37150

Closed

6 tasks

jreback modified the milestones: Contributions Welcome, 1.2 Oct 16, 2020

jbrockmendel added a commit to jbrockmendel/pandas that referenced this issue Oct 16, 2020

BUG: Series.loc[[]] with non-unique MultiIndex pandas-dev#13691

2d2b1ed

jbrockmendel mentioned this issue Oct 16, 2020

BUG: Series.loc[[]] with non-unique MultiIndex #13691 #37161

Merged

5 tasks

jreback closed this as completed in #37161 Oct 16, 2020

jreback pushed a commit that referenced this issue Oct 16, 2020

BUG: Series.loc[[]] with non-unique MultiIndex #13691 (#37161)

4c395b0

JulianWgs pushed a commit to JulianWgs/pandas that referenced this issue Oct 26, 2020

BUG: Series.loc[[]] with non-unique MultiIndex pandas-dev#13691 (pand…

f10c724

…as-dev#37161)

kesmit13 pushed a commit to kesmit13/pandas that referenced this issue Nov 2, 2020

BUG: Series.loc[[]] with non-unique MultiIndex pandas-dev#13691 (pand…

e4bb761

…as-dev#37161)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

s.loc[[]] raises error (only) if index is not unique #13691

s.loc[[]] raises error (only) if index is not unique #13691

toobaz commented Jul 18, 2016

jreback commented Jul 18, 2016

mborysow commented Jul 19, 2016 •

edited

s.loc[[]] raises error (only) if index is not unique #13691

s.loc[[]] raises error (only) if index is not unique #13691

Comments

toobaz commented Jul 18, 2016

Code Sample, a copy-pastable example if possible

Expected Output

output of pd.show_versions()

jreback commented Jul 18, 2016

mborysow commented Jul 19, 2016 • edited

output of `pd.show_versions()`

mborysow commented Jul 19, 2016 •

edited