Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

s.loc[[]] raises error (only) if index is not unique #13691

Closed
toobaz opened this issue Jul 18, 2016 · 2 comments · Fixed by #37161
Closed

s.loc[[]] raises error (only) if index is not unique #13691

toobaz opened this issue Jul 18, 2016 · 2 comments · Fixed by #37161
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex
Projects
Milestone

Comments

@toobaz
Copy link
Member

toobaz commented Jul 18, 2016

Code Sample, a copy-pastable example if possible


In [2]: s = pd.Series(0, index=pd.MultiIndex.from_product([[0], [1,2]]))

In [3]: s.loc[[]]
Out[3]: Series([], dtype: int64)

In [4]: s = pd.Series(0, index=pd.MultiIndex.from_product([[0], [1,1]]))

In [5]: s.loc[[]]
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-5-f9b6211189ca> in <module>()
----> 1 s.loc[[]]

/home/pietro/nobackup/repo/pandas/pandas/core/indexing.py in __getitem__(self, key)
   1304             return self._getitem_tuple(key)
   1305         else:
-> 1306             return self._getitem_axis(key, axis=0)
   1307 
   1308     def _getitem_axis(self, key, axis=0):

/home/pietro/nobackup/repo/pandas/pandas/core/indexing.py in _getitem_axis(self, key, axis)
   1464                     raise ValueError('Cannot index with multidimensional key')
   1465 
-> 1466                 return self._getitem_iterable(key, axis=axis)
   1467 
   1468             # nested tuple slicing

/home/pietro/nobackup/repo/pandas/pandas/core/indexing.py in _getitem_iterable(self, key, axis)
   1095 
   1096                 new_target, indexer, new_indexer = labels._reindex_non_unique(
-> 1097                     keyarr)
   1098 
   1099                 if new_indexer is not None:

/home/pietro/nobackup/repo/pandas/pandas/indexes/base.py in _reindex_non_unique(self, target)
   2497                 new_indexer[~check] = -1
   2498 
-> 2499         new_index = self._shallow_copy_with_infer(new_labels, freq=None)
   2500         return new_index, indexer, new_indexer
   2501 

/home/pietro/nobackup/repo/pandas/pandas/indexes/multi.py in _shallow_copy_with_infer(self, values, **kwargs)
    393 
    394     def _shallow_copy_with_infer(self, values=None, **kwargs):
--> 395         return self._shallow_copy(values, **kwargs)
    396 
    397     @Appender(_index_shared_docs['_shallow_copy'])

/home/pietro/nobackup/repo/pandas/pandas/indexes/multi.py in _shallow_copy(self, values, **kwargs)
    402             # discards freq
    403             kwargs.pop('freq', None)
--> 404             return MultiIndex.from_tuples(values, **kwargs)
    405         return self.view()
    406 

/home/pietro/nobackup/repo/pandas/pandas/indexes/multi.py in from_tuples(cls, tuples, sortorder, names)
    889         if len(tuples) == 0:
    890             # I think this is right? Not quite sure...
--> 891             raise TypeError('Cannot infer number of levels from empty list')
    892 
    893         if isinstance(tuples, (np.ndarray, Index)):

TypeError: Cannot infer number of levels from empty list

Expected Output

Out[5]: Series([], dtype: int64)

By the way, on top of the inconsistency above,

In [6]: s.loc[s.iloc[0:0]]

results in the same error, which is even more unexpected because s.iloc[0:0].index is a perfectly valid MultiIndex, and so it is certainly possible to infer the number of levels from it.

Might be related to #13490 (the error is the same, but I ignore the level at which the open PR applies).

output of pd.show_versions()


In [7]: pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 3.5.1.final.0
python-bits: 64
OS: Linux
OS-release: 4.5.0-2-amd64
machine: x86_64
processor: 
byteorder: little
LC_ALL: None
LANG: it_IT.utf8
LOCALE: it_IT.UTF-8

pandas: 0.18.1+206.gc42455b
nose: 1.3.7
pip: 1.5.6
setuptools: 18.4
Cython: 0.23.4
numpy: 1.10.4
scipy: 0.16.0
statsmodels: 0.8.0.dev0+111ddc0
xarray: None
IPython: 5.0.0.dev
sphinx: 1.3.1
patsy: 0.3.0-dev
dateutil: 2.2
pytz: 2012c
blosc: None
bottleneck: 1.1.0dev
tables: 3.2.2
numexpr: 2.5
matplotlib: 1.5.1
openpyxl: None
xlrd: 0.9.4
xlwt: 1.1.2
xlsxwriter: 0.7.3
lxml: None
bs4: 4.4.0
html5lib: 0.999
httplib2: 0.9.1
apiclient: 1.5.0
sqlalchemy: 1.0.11
pymysql: None
psycopg2: None
jinja2: 2.8
boto: 2.38.0
pandas_datareader: 0.2.1

@toobaz toobaz changed the title s.loc[[]] raises error (only) if index is not unique s.loc[[]] raises error (only) if index is not unique Jul 18, 2016
@jreback
Copy link
Contributor

jreback commented Jul 18, 2016

ok, will mark it, though non-unique multi-indexes are not very well tested, nor should they really be allowed (but they are) :<

@jreback jreback added Bug Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex Difficulty Intermediate labels Jul 18, 2016
@jreback jreback added this to the Next Major Release milestone Jul 18, 2016
@mborysow
Copy link

mborysow commented Jul 19, 2016

I have just recently run into this issue as well. I do this very often in fact, and now a lot of my code is broken. I would probably do a lot of differently now, as I've learned more about pandas, but it is what it is. (FYI, merging with df1_good with [left_index=True, right_index=True, how='inner'] produces an equivalent result)

df1 = pd.DataFrame(data=dict(A=[1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2],
B=[1, 1, 2, 2, 2, 3, 1, 1, 1, 2, 3, 4],
C=[1, 2, 1, 2, 3, 2, 1, 2, 3, 2, 1, 4],
V=[1, 5, 2, 3, 8, 3, 3, 3, 1, 2, 1, 4]))
df1 = df1.set_index(['A', 'B', 'C']).sortlevel()

df2 = df1.groupby(level=[0, 2])[['V']].max()

df2_good = df2[df2.V > 9]

...this apparently works and returns an empty dataframe with the index labels and columns intact.
df1_good = df1.loc[df2_good.index]

...this fails with TypeError: Cannot infer number of levels from empty list
df1_good = df1.reset_index().set_index(['A', 'C']).sortlevel().loc[df2_good.index]

Anyhow, it's perfectly acceptable in my analysis that initial query returns an empty DataFrame. This used to work just fine. Now, any time I want to do anything remotely like this I need to always check and make sure that the result of the initial query isn't empty.

This works fine in pandas 0.16, and does not in 0.17 or 0.18. I don't know exactly where it failed, but I get the same error above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex
Projects
No open projects
Indexing
Awaiting triage
4 participants