Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataSet.reset_index() fails using level argument with MultiIndex #6

Closed
JBGreisman opened this issue Jul 28, 2020 · 0 comments
Closed
Assignees
Labels
bug Something isn't working

Comments

@JBGreisman
Copy link
Member

DataSet.reset_index() raises a KeyError when using the level argument to specify only a few labels in a MultiIndex. This occurs because reset_index() assumes that all labels are being removed from the index when trying to reassign cached MTZ dtypes:

dataset = rs.read_mtz("tests/data/algorithms/HEWL_unmerged.mtz")
print(dataset.index.names)      # prints ['H', 'K', 'L']
dataset.reset_index(level=['H', 'K'])

Outputs:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
~/rs/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2888             try:
-> 2889                 return self._engine.get_loc(casted_key)
   2890             except KeyError as err:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'L'

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
<ipython-input-12-93c78908ac77> in <module>
----> 1 dataset.reset_index(level=['H', 'K'])

~/reciprocalspaceship/reciprocalspaceship/dataset.py in reset_index(self, **kwargs)
    135                 for key in newdf._cache_index_dtypes.keys():
    136                     dtype = newdf._cache_index_dtypes[key]
--> 137                     newdf[key] = newdf[key].astype(dtype)
    138                 newdf._cache_index_dtypes = {}
    139             return newdf

~/rs/lib/python3.7/site-packages/pandas/core/frame.py in __getitem__(self, key)
   2897             if self.columns.nlevels > 1:
   2898                 return self._getitem_multilevel(key)
-> 2899             indexer = self.columns.get_loc(key)
   2900             if is_integer(indexer):
   2901                 indexer = [indexer]

~/rs/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2889                 return self._engine.get_loc(casted_key)
   2890             except KeyError as err:
-> 2891                 raise KeyError(key) from err
   2892 
   2893         if tolerance is not None:

KeyError: 'L'

Since pandas supports a level= argument to reset_index(), the overloaded method should be modified to only try to change dtypes of columns that are removed from the index.

@JBGreisman JBGreisman added the bug Something isn't working label Jul 28, 2020
@JBGreisman JBGreisman self-assigned this Jul 28, 2020
@JBGreisman JBGreisman changed the title DataSet.reset_index() fails using level argument with MultiIndex DataSet.reset_index() fails using level argument with MultiIndex Jul 28, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant