New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ERR: disallow non-hashables in Index construction & rename #20527

Closed
hodossy opened this Issue Mar 29, 2018 · 4 comments

Comments

Projects
None yet
4 participants
@hodossy

hodossy commented Mar 29, 2018

from pandas import Index, DataFrame

df = DataFrame(index=Index([1, 2, 3, 4]), columns=['A', 'B'])
df.index.rename(['foo'], inplace=True)
df.reset_index()

TypeError Traceback (most recent call last)
in ()
----> 5 df.reset_index()

c:\program files\python36\lib\site-packages\pandas\core\frame.py in reset_index(self, level, drop, inplace, col_level, col_fill)
3377 # to ndarray and maybe infer different dtype
3378 level_values = _maybe_casted_values(lev, lab)
-> 3379 new_obj.insert(0, name, level_values)
3380
3381 new_obj.index = new_index

c:\program files\python36\lib\site-packages\pandas\core\frame.py in insert(self, loc, column, value, allow_duplicates)
2611 value = self._sanitize_column(column, value, broadcast=False)
2612 self._data.insert(loc, column, value,
-> 2613 allow_duplicates=allow_duplicates)
2614
2615 def assign(self, **kwargs):

c:\program files\python36\lib\site-packages\pandas\core\internals.py in insert(self, loc, item, value, allow_duplicates)
4059
4060 """
-> 4061 if not allow_duplicates and item in self.items:
4062 # Should this be a different kind of error??
4063 raise ValueError('cannot insert {}, already exists'.format(item))

c:\program files\python36\lib\site-packages\pandas\core\indexes\base.py in contains(self, key)
1692 @Appender(_index_shared_docs['contains'] % _index_doc_kwargs)
1693 def contains(self, key):
-> 1694 hash(key)
1695 try:
1696 return key in self._engine

TypeError: unhashable type: 'list'

Problem description

It think it is reasonable to expect Index([1, 2, 3, 4]).rename(['foo']) and Index([1, 2, 3, 4]).rename('foo') resulting in the same, however this is not the case.

Probably this line be modified in Index.rename:

- return self.set_names([name], inplace=inplace)
+ return self.set_names(name, inplace=inplace)

Expected Output

>>>Index([1, 2, 3, 4]).rename(['foo'])
Int64Index([1, 2, 3, 4], dtype='int64', name='foo')

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.3.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 142 Stepping 9, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.22.0
pytest: None
pip: 9.0.1
setuptools: 28.8.0
Cython: None
numpy: 1.14.1
scipy: None
pyarrow: None
xarray: None
IPython: 6.2.1
sphinx: None
patsy: None
dateutil: 2.6.1
pytz: 2018.3
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 1.0.1
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

@jorisvandenbossche

This comment has been minimized.

Member

jorisvandenbossche commented Mar 29, 2018

The problem with your expected output is that some other 'containers', like a tuple, are actually allowed as an index name:

In [45]: df = DataFrame(index=Index([1, 2, 3, 4]), columns=['A', 'B'])

In [46]: df.index = df.index.rename(('foo',))

In [47]: df.reset_index()
Out[47]: 
   (foo,)    A    B
0       1  NaN  NaN
1       2  NaN  NaN
2       3  NaN  NaN
3       4  NaN  NaN

So since this is already allowed, we would need to keep this behaviour. And then unpacking a list but not a tuple would also be strange I think.

That said, I think if we choose to not change this behaviour, we should raise the error earlier, and df.index.rename(['foo']) could already raise an error.

@jreback

This comment has been minimized.

Contributor

jreback commented Mar 29, 2018

technically we do allow non-hashable things in index names. IIRC we did try to remove this in a previous PR (but was not merged). Note that we do require hashability in Series names. So would take a PR to raise (both on construction and renaming)

In [3]: Series([1,2, 3], name=['foo'])
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-3-7e95b8d7ccb6> in <module>()
----> 1 Series([1,2, 3], name=['foo'])

~/pandas/pandas/core/series.py in __init__(self, data, index, dtype, name, copy, fastpath)
    272         generic.NDFrame.__init__(self, data, fastpath=True)
    273 
--> 274         self.name = name
    275         self._set_axis(0, index, fastpath=True)
    276 

~/pandas/pandas/core/generic.py in __setattr__(self, name, value)
   4398             object.__setattr__(self, name, value)
   4399         elif name in self._metadata:
-> 4400             object.__setattr__(self, name, value)
   4401         else:
   4402             try:

~/pandas/pandas/core/series.py in name(self, value)
    391     def name(self, value):
    392         if value is not None and not is_hashable(value):
--> 393             raise TypeError('Series.name must be a hashable type')
    394         object.__setattr__(self, '_name', value)
    395 

TypeError: Series.name must be a hashable type

so will repurpose this issue

@jreback jreback added this to the Next Major Release milestone Mar 29, 2018

@jreback jreback changed the title from Calling rename() on Index object with a list makes reset_index() fail on a DataFrame to ERR: disallow non-hashables in Index construction & rename Mar 29, 2018

@jreback jreback added Error Reporting and removed Bug labels Mar 29, 2018

@arminv

This comment has been minimized.

Contributor

arminv commented Mar 29, 2018

I would like to work on this issue.

@hodossy

This comment has been minimized.

hodossy commented Apr 3, 2018

What if rename would be only modified as

def rename(self, *names, inplace=False):
"""..."""
    return self.set_names(names, inplace=inplace)

and remove the rename overwrite

class MultiIndex(Index):
    
    rename = Index.set_names  # delete this

from MultiIndex. This way old functionality is kept (except MultiIndex.rename), but one can achive what I want by passing multiple name arguments. Meaning that

df = DataFrame(index=Index([1, 2, 3, 4]), columns=['A', 'B'])
df.index.rename(['foo'], inplace=True)
df.reset_index()

Still raises an error but

index.rename(*['foo'])
index.rename(*['baz', 'quz'])

can be called on either Index and MultiIndex objects. So that in cases like mine, where it is not known beforehand whether I will be dealing with a multi or single level index, I can avoid the typecheck and rename the levels easily and then reset the index on the dataframe.

Please note that I did not have time to check te corner cases or run a regression on this idea, I am only interested in your opinion.

@jreback jreback modified the milestones: Next Major Release, 0.23.0 Apr 19, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment