ERR: disallow non-hashables in Index/MultiIndex construction & rename #20548

arminv · 2018-03-30T12:41:55Z

Index & MultiIndex names need to be hashable. Both constructing and renaming without a hashable name raise TypeError exceptions now.

Examples:

Index:

In [2]: pd.Index([1, 2, 3], name=['foo'])
>>> Int64Index([1, 2, 3], dtype='int64', name=['foo'])

In [3]: pd.Index([1, 2, 3], name='foo').rename(['bar'])
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-3-f3327eccf0fc> in <module>()
----> 1 pd.Index([1, 2, 3], name='foo').rename(['bar'])

~/Documents/GitHub/pandas/pandas/core/indexes/base.py in rename(self, name, inplace)
   1406         new index (of same type and class...etc) [if inplace, returns None]
   1407         """
-> 1408         return self.set_names([name], inplace=inplace)
   1409 
   1410     @property

~/Documents/GitHub/pandas/pandas/core/indexes/base.py in set_names(self, names, level, inplace)
   1387         else:
   1388             idx = self._shallow_copy()
-> 1389         idx._set_names(names, level=level)
   1390         if not inplace:
   1391             return idx

~/Documents/GitHub/pandas/pandas/core/indexes/base.py in _set_names(self, values, level)
   1323                 if not is_hashable(name):
   1324                     raise TypeError('{}.name must be a hashable type'
-> 1325                                     .format(self.__class__.__name__))
   1326         if len(values) != 1:
   1327             raise ValueError('Length of new names must be 1, got %d' %

TypeError: Int64Index.name must be a hashable type

MultiIndex:

In [4]: pd.MultiIndex(levels=[[1, 2], [u'one', u'two']],
                        labels=[[0, 0, 1, 1], [0, 1, 0, 1]],
                        names=((['foo'], ['bar'])))
                        
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-4-d211526eaa3d> in <module>()
      1 pd.MultiIndex(levels=[[1, 2], [u'one', u'two']],
      2                     labels=[[0, 0, 1, 1], [0, 1, 0, 1]],
----> 3                     names=((['foo'], ['bar'])))
      4 

~/Documents/GitHub/pandas/pandas/core/indexes/multi.py in __new__(cls, levels, labels, sortorder, names, dtype, copy, name, verify_integrity, _set_identity)
    230         if names is not None:
    231             # handles name validation
--> 232             result._set_names(names)
    233 
    234         if sortorder is not None:

~/Documents/GitHub/pandas/pandas/core/indexes/multi.py in _set_names(self, names, level, validate)
    646             if not is_hashable(name):
    647                 raise TypeError('{}.name must be a hashable type'
--> 648                                 .format(self.__class__.__name__))
    649 
    650         # GH 15110

TypeError: MultiIndex.name must be a hashable type

In [10]: pd.MultiIndex(levels=[[1, 2], [u'one', u'two']],
                       labels=[[0, 0, 1, 1], [0, 1, 0, 1]],
                       names=('foo', 'bar')).rename(([1], [2]))
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-5-ff74dfc48455> in <module>()
      1 pd.MultiIndex(levels=[[1, 2], [u'one', u'two']],
      2                      labels=[[0, 0, 1, 1], [0, 1, 0, 1]],
----> 3                      names=('foo', 'bar')).rename(([1], [2]))
      4 

~/Documents/GitHub/pandas/pandas/core/indexes/base.py in set_names(self, names, level, inplace)
   1387         else:
   1388             idx = self._shallow_copy()
-> 1389         idx._set_names(names, level=level)
   1390         if not inplace:
   1391             return idx

~/Documents/GitHub/pandas/pandas/core/indexes/multi.py in _set_names(self, names, level, validate)
    646             if not is_hashable(name):
    647                 raise TypeError('{}.name must be a hashable type'
--> 648                                 .format(self.__class__.__name__))
    649 
    650         # GH 15110

TypeError: MultiIndex.name must be a hashable type

Checklist for other PRs (remove this part if you are doing a PR for the pandas documentation sprint):

closes #20527
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

jreback · 2018-03-30T13:01:45Z

this already works on Serie

the issue is about Index

arminv · 2018-03-30T13:24:29Z

Sorry I got confused. I will update it for Index.

codecov · 2018-03-30T17:01:24Z

Codecov Report

Merging #20548 into master will decrease coverage by <.01%.
The diff coverage is 90%.

@@            Coverage Diff             @@
##           master   #20548      +/-   ##
==========================================
- Coverage   91.84%   91.84%   -0.01%     
==========================================
  Files         153      153              
  Lines       49305    49313       +8     
==========================================
+ Hits        45286    45293       +7     
- Misses       4019     4020       +1

Flag	Coverage Δ
#multiple	`90.23% <90%> (-0.01%)`	⬇️
#single	`41.89% <60%> (ø)`	⬆️

Impacted Files	Coverage Δ
pandas/core/indexes/multi.py	`95.07% <100%> (+0.01%)`	⬆️
pandas/core/indexes/base.py	`96.63% <80%> (-0.05%)`	⬇️
pandas/core/generic.py	`95.94% <0%> (ø)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8def649...97a2b06. Read the comment docs.

jreback

can you add a few tests. and a whatsnew new (other API changes).

jreback · 2018-03-30T18:17:38Z

pandas/core/indexes/base.py

-        return self.set_names([name], inplace=inplace)
+        if name is not None and not is_hashable(name):
+            raise TypeError('Index.name must be a hashable type')
+        else:


rather do this in set_names

I did this in set_names and a lot of tests failed. Is there a particular reason we can’t keep it here?

you are going to need it in set_names as that is the canonical way to do this. that's where it should validate. if we have tests that are clearly in error they should be changed.

jreback

can you add a note in other api changes section
can you add tests on construction & for rename (these should use our current infrastructure to exercise all subclasses)

jreback · 2018-03-30T18:21:51Z

pandas/core/indexes/base.py

@@ -251,6 +252,9 @@ def __new__(cls, data=None, dtype=None, copy=False, name=None,
        if name is None and hasattr(data, 'name'):
            name = data.name

+        if name is not None and not is_hashable(name):
+            raise TypeError('Index.name must be a hashable type')
+


this very likely also needs checking for MultiIndex (as that's a different path in some cases).

Do we allow non-hashable names for MultiIndex?

For a MultiIndex, it seems that names is converted into FrozenList after creation. I found this answer from you on StackOverflow about hashability of a FrozenList.

Right now, if names can’t be converted to a FrozenList (if not hashable), it throws an exception. For example:

In [1]: pd.MultiIndex(levels=[[1, 2], [u'one', u'two']], ...: labels=[[0, 0, 1, 1], [0, 1, 0, 1]], ...: names=(['foo'], ['bar'])) ...: TypeError: unhashable type: 'list'

while this passes:

In [2]: pd.MultiIndex(levels=[[1, 2], [u'one', u'two']], ...: labels=[[0, 0, 1, 1], [0, 1, 0, 1]], ...: names=[('foo'), ('bar')])

Do we need to change anything here?

no you just need to check that each name is hashable, not the frozen list itself. that's why .set_names is the best place for this

jreback · 2018-03-30T18:21:56Z

pandas/core/indexes/base.py

@@ -251,6 +252,9 @@ def __new__(cls, data=None, dtype=None, copy=False, name=None,
        if name is None and hasattr(data, 'name'):
            name = data.name

+        if name is not None and not is_hashable(name):
+            raise TypeError('Index.name must be a hashable type')


use self.__class__.__name__ rather than Index here

…e checking

… API change

…names

pep8speaks · 2018-04-02T06:33:57Z

Hello @arminv! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on April 22, 2018 at 14:48 Hours UTC

arminv · 2018-04-17T11:07:06Z

pandas/core/indexes/base.py

@@ -473,7 +474,7 @@ def _simple_new(cls, values, name=None, dtype=None, **kwargs):

        result = object.__new__(cls)
        result._data = values
-        result.name = name
+        result._set_names([name])


@jreback I wasn't sure if _set_names was getting called from _simple_new, so I made it explicit. Is this ok?

Also, we are not checking in __new__ anymore (as you suggested).

you shouldn't need to do this, and just leave the original code

setting .name name is a property that calls _set_names

arminv · 2018-04-17T11:07:59Z

pandas/core/indexes/base.py

+
+        Notes
+        -----
+        Both `set_names` and `rename` call this function to set name.


I'm not sure if this is needed?

jreback

lgtm. some doc-comments. ping on green.

jreback · 2018-04-19T15:56:25Z

pandas/core/indexes/base.py

@@ -473,7 +474,7 @@ def _simple_new(cls, values, name=None, dtype=None, **kwargs):

        result = object.__new__(cls)
        result._data = values
-        result.name = name
+        result._set_names([name])


you shouldn't need to do this, and just leave the original code

setting .name name is a property that calls _set_names

jreback · 2018-04-19T15:57:07Z

pandas/core/indexes/base.py

+
+        Examples
+        --------
+        on an index with no names:


you don't need the fulll doc string here (e.g. examples and such, leave Parameters and such), only on .set_names

jreback · 2018-04-21T16:33:18Z

can you update. ping on green.

jreback · 2018-04-21T18:02:13Z

pandas/core/indexes/base.py

+            If the index is a MultiIndex (hierarchical), level(s) to set (None
+            for all levels).  Otherwise level must be None
+
+        Returns


this should be Raises (and its a TypeError)

jreback · 2018-04-21T18:02:55Z

pandas/core/indexes/base.py

@@ -1311,6 +1312,28 @@ def _get_names(self):
        return FrozenList((self.name, ))

    def _set_names(self, values, level=None):
+        """


can you also add a mention on set_names itself that the names must be hashable (and examples if you want)

jreback · 2018-04-22T14:48:59Z

moved the logic slightly. will merge on green.

TomAugspurger · 2018-04-23T19:05:20Z

Thanks @arminv !

jorisvandenbossche · 2018-06-27T17:09:34Z

pandas/tests/indexes/test_multi.py

+        tm.assert_raises_regex(TypeError, message, mi.set_names, names=renamed)
+
+    @pytest.mark.parametrize('names', [['a', 'b', 'a'], ['1', '1', '2'],
+                                       ['1', 'a', '1']])


@arminv Is there a reason that you changed those parametrize values to all strings? (I suppose by accident?)
I am reworking the test in #21423, so will revert there if this was by accident

@jorisvandenbossche IIRC I changed it (in this commit) because the test was failing, but implementation changed a lot after that commit so I'm not sure if reverting this would cause a problem now

Seems to be passing there!

Check non-hashability on series construction and renaming

9047d60

arminv added 2 commits March 30, 2018 09:29

Removed changes from pandas/core/series.py

df7650d

Check non-hashability on Index construction and renaming

dd64219

arminv changed the title ~~ERR: disallow non-hashables in Series construction & rename~~ ERR: disallow non-hashables in Index construction & rename Mar 30, 2018

modified test_getitem_list example to disallow non-hashable names

89e92ab

jreback requested changes Mar 30, 2018

View reviewed changes

jreback reviewed Mar 30, 2018

View reviewed changes

jreback requested changes Mar 30, 2018

View reviewed changes

jreback added Indexing Related to indexing on series/frames, not to indexes themselves Error Reporting Incorrect or improved errors from pandas labels Mar 30, 2018

arminv added 14 commits March 30, 2018 16:10

Merge remote-tracking branch 'upstream/master' into non_hashable_err

cd3e53a

Merge remote-tracking branch 'upstream/master' into non_hashable_err

cd070e3

Changed ErrorType message for hashability requirement

351691f

Fixed how rename calls set_names to allow for MultiIndex hashable typ…

3a7b0b2

…e checking

Moved type checking from set_names back to rename

70933d5

Merge remote-tracking branch 'upstream/master' into non_hashable_err

56fd617

Moved hashable checking to set_names. Changed exception messages.

d4ed636

Modified test_duplicate_level_names to pass with new (hashable names)…

b554bb3

… API change

Added test_constructor_nonhashable_names for checking hashability on …

6efd6cc

…names

Fixed a typo

4fb3a6b

Minor refactoring of test_constructor_nonhashable_names

786f43f

Added test_constructor_nonhashable_name for checking hashability on name

01b712e

Added note in Other API Changes on hashability of names

6f13cd0

Improved wording of the note

26433c3

Addressed PEP 8 issues

91ef466

arminv added 7 commits April 16, 2018 09:44

Merge remote-tracking branch 'upstream/master' into non_hashable_err

c4c1011

Merge remote-tracking branch 'upstream/master' into non_hashable_err

bd75433

Merge remote-tracking branch 'upstream/master' into non_hashable_err

74a9b54

Refactoring. Internal docstring. Minor typos

b1cb7fd

PEP 8

863f7d3

Merge remote-tracking branch 'upstream/master' into non_hashable_err

0723009

Improved docstring wording

7092d49

arminv commented Apr 17, 2018

View reviewed changes

arminv closed this Apr 17, 2018

arminv reopened this Apr 17, 2018

Merge remote-tracking branch 'upstream/master' into non_hashable_err

1d8f67a

jreback requested changes Apr 19, 2018

View reviewed changes

jreback added this to the 0.23.0 milestone Apr 19, 2018

arminv added 2 commits April 20, 2018 18:23

Merge remote-tracking branch 'upstream/master' into non_hashable_err

12488ff

Shorten docstring

4a500ba

Merge remote-tracking branch 'upstream/master' into non_hashable_err

9ec64b0

jreback requested changes Apr 21, 2018

View reviewed changes

arminv and others added 4 commits April 21, 2018 20:23

Merge remote-tracking branch 'upstream/master' into non_hashable_err

47903ae

Added examples

04f2eed

remove examples from _set_names

1a68188

consolidate logic a bit

97a2b06

jreback approved these changes Apr 22, 2018

View reviewed changes

jreback mentioned this pull request Apr 22, 2018

DOC: clean Index.set_name / .rename doc-strings #20787

Closed

TomAugspurger merged commit add3fbf into pandas-dev:master Apr 23, 2018

arminv deleted the non_hashable_err branch April 23, 2018 19:35

jorisvandenbossche reviewed Jun 27, 2018

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ERR: disallow non-hashables in Index/MultiIndex construction & rename #20548

ERR: disallow non-hashables in Index/MultiIndex construction & rename #20548

arminv commented Mar 30, 2018 •

edited

jreback commented Mar 30, 2018

arminv commented Mar 30, 2018

codecov bot commented Mar 30, 2018 •

edited

jreback left a comment

jreback Mar 30, 2018

arminv Apr 1, 2018

jreback Apr 1, 2018

jreback left a comment

jreback Mar 30, 2018

arminv Mar 30, 2018

jreback Mar 30, 2018

arminv Apr 1, 2018 •

edited

jreback Apr 1, 2018

jreback Mar 30, 2018

pep8speaks commented Apr 2, 2018 •

edited

arminv Apr 17, 2018

jreback Apr 19, 2018

arminv Apr 17, 2018

jreback left a comment

jreback Apr 19, 2018

jreback Apr 19, 2018

jreback commented Apr 21, 2018

jreback Apr 21, 2018

jreback Apr 21, 2018

jreback commented Apr 22, 2018

TomAugspurger commented Apr 23, 2018

jorisvandenbossche Jun 27, 2018

arminv Jun 27, 2018

jorisvandenbossche Jun 27, 2018

ERR: disallow non-hashables in Index/MultiIndex construction & rename #20548

ERR: disallow non-hashables in Index/MultiIndex construction & rename #20548

Conversation

arminv commented Mar 30, 2018 • edited

jreback commented Mar 30, 2018

arminv commented Mar 30, 2018

codecov bot commented Mar 30, 2018 • edited

Codecov Report

jreback left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

arminv Apr 1, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pep8speaks commented Apr 2, 2018 • edited

Comment last updated on April 22, 2018 at 14:48 Hours UTC

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Apr 21, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Apr 22, 2018

TomAugspurger commented Apr 23, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

arminv commented Mar 30, 2018 •

edited

codecov bot commented Mar 30, 2018 •

edited

arminv Apr 1, 2018 •

edited

pep8speaks commented Apr 2, 2018 •

edited