Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: "IndexingError: Too many indexers" when accessing a None value using .loc through a MultiIndex #34318

Closed
dechamps opened this issue May 22, 2020 · 7 comments · Fixed by #34450
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex
Milestone

Comments

@dechamps
Copy link

Steps to reproduce

print(pd.Series(
    [None],
    pd.MultiIndex.from_arrays([['Level1'], ['Level2']]))
    .loc[('Level1', 'Level2')])

Expected output

None

Actual output

/usr/local/lib/python3.6/dist-packages/pandas/core/indexing.py in __getitem__(self, key)
   1760                 except (KeyError, IndexError, AttributeError):
   1761                     pass
-> 1762             return self._getitem_tuple(key)
   1763         else:
   1764             # we by definition only have the 0th axis

/usr/local/lib/python3.6/dist-packages/pandas/core/indexing.py in _getitem_tuple(self, tup)
   1275 
   1276         # no multi-index, so validate all of the indexers
-> 1277         self._has_valid_tuple(tup)
   1278 
   1279         # ugly hack for GH #836

/usr/local/lib/python3.6/dist-packages/pandas/core/indexing.py in _has_valid_tuple(self, key)
    699         for i, k in enumerate(key):
    700             if i >= self.ndim:
--> 701                 raise IndexingError("Too many indexers")
    702             try:
    703                 self._validate_key(k, i)

IndexingError: Too many indexers

Additional information

If any value other than None (even np.nan) is used, the code behaves correctly.

If a single index level is used, the code behaves correctly.

Workaround

Seems to work if .loc(axis=0)[('Level1', 'Level2')] is used instead.

Output of pd.show_versions()

INSTALLED VERSIONS

commit : None
python : 3.8.3.candidate.1
python-bits : 64
OS : Linux
OS-release : 5.6.0-1-amd64
machine : x86_64
processor :
byteorder : little
LC_ALL : None
LANG : en_GB.UTF-8
LOCALE : en_GB.UTF-8

pandas : 1.0.3
numpy : 1.18.2
pytz : 2019.3
dateutil : 2.8.1
pip : 20.0.2
setuptools : 46.1.3
Cython : None
pytest : 4.6.9
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.5.0
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.11.1
IPython : 7.13.0
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : 4.5.0
matplotlib : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
pytest : 4.6.9
pyxlsb : None
s3fs : None
scipy : 1.4.1
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
xlsxwriter : None
numba : None

@dechamps dechamps added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels May 22, 2020
@jorisvandenbossche jorisvandenbossche added Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex and removed Needs Triage Issue that has not been reviewed by a pandas team member labels May 22, 2020
@jorisvandenbossche jorisvandenbossche added this to the Contributions Welcome milestone May 22, 2020
@jorisvandenbossche
Copy link
Member

@dechamps thanks for the report!

Using an example that is not of length-1 (that doesn't seem to matter):

In [32]: midx = pd.MultiIndex.from_product([['Level1'], ['Level2_a', 'Level2_b']])

In [33]: s = pd.Series([None]*len(midx), dtype=object, index=midx) 

In [35]: s.loc[('Level1', 'Level2_a')]  
...
IndexingError: Too many indexers

In [36]: s = pd.Series([1]*len(midx), dtype=object, index=midx) 

In [37]: s.loc[('Level1', 'Level2_a')] 
Out[37]: 1

So it is indeed only when the Series value that gets accessed is actually None

@jorisvandenbossche
Copy link
Member

The problem lies here:

result = self._handle_lowerdim_multi_index_axis0(tup)
if result is not None:
return result

If the result is None, we continue with other code (that then incorrectly raises), and this _handle_lowerdim_multi_index_axis0 method returns None to indicate an error happened. However, it can also return None when that was the result of indexing (as in the example here). So we will probably need another way to communicate between those methods.

Contributions to fix this are always welcome!

@pedrooa
Copy link
Contributor

pedrooa commented May 26, 2020

Can i work on this?

@jorisvandenbossche
Copy link
Member

@pedrooa Sure, that would be very welcome!

@pedrooa
Copy link
Contributor

pedrooa commented May 28, 2020

The problem lies here:

result = self._handle_lowerdim_multi_index_axis0(tup)
if result is not None:
return result

If the result is None, we continue with other code (that then incorrectly raises), and this _handle_lowerdim_multi_index_axis0 method returns None to indicate an error happened. However, it can also return None when that was the result of indexing (as in the example here). So we will probably need another way to communicate between those methods.

Contributions to fix this are always welcome!

I've managed to get it to work by changing the deafult return of _handle_lowerdim_multi_index_axis0(tup) to False instead of None, and then i check it as such:

 result = self._handle_lowerdim_multi_index_axis0(tup) 
 if result is not False: 
     return result 

Is this a valid solution?

@jorisvandenbossche
Copy link
Member

The problem, I think, is that False in principle can also be a valid result (so similar problem as with None).

So I think we need another way: either by raising an error and catching that in the layer above, or either with a custom object like no_result = object() and then if result != no_result: return result

@pedrooa
Copy link
Contributor

pedrooa commented May 28, 2020

Ah yes, you are correct.

I think that the cleanest solution would be raising an exception. If it catches something it simply doesn't return anything. As such:

try:
     result = self._handle_lowerdim_multi_index_axis0(tup)
     return result
except Exception:
      pass

It seems to solve this problem.

pedrooa added a commit to pedrooa/pandas that referenced this issue May 29, 2020
@jreback jreback modified the milestones: Contributions Welcome, 1.1 Jun 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants