Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Styler.applymap(subset=...) breaks promise that "any valid indexer to .loc will work." for mult-index #25858

Closed
jeremysalwen opened this issue Mar 24, 2019 · 1 comment · Fixed by #29346
Labels
good first issue Needs Tests Unit test(s) needed to prevent regressions
Milestone

Comments

@jeremysalwen
Copy link

jeremysalwen commented Mar 24, 2019

Code Sample, a copy-pastable example if possible

import pandas as pd
import numpy as np
labels = np.array([[0, 0,1,1],
                     [0, 1, 0, 1]])
columns = pd.MultiIndex(
  levels=[["a", "b"], ['%', '#']], labels=labels, names=['', ''])
df = pd.DataFrame(
  [[1,-1,1,1],[-1,1,1,1]],
  index=["hello", "world"],
  columns=columns)
pct_subset = pd.IndexSlice[:, pd.IndexSlice[:, '%':'%']]

def color_negative_red(val):
    color = 'red' if val < 0 else 'black'
    return 'color: %s' % color

# Works on both 0.22 and 0.24.
df.loc[pct_subset]

# This works on 0.22, but `TypeError: unhashable type` on 0.24!
df.style.applymap(color_negative_red, subset=pct_subset)

Problem description

To quote from the docs on the subset keyword argument for Styler.applymap (https://pandas.pydata.org/pandas-docs/stable/user_guide/style.html) :

For row and column slicing, any valid indexer to .loc will work.

The code sample demonstrates an indexer for a dataframe with multindex columns that works with .loc, but doesn't work as the subset argument to applymap. Note that this indexer previously worked in pandas version 0.22, but a regression has been introduced in 0.24.2

Expected Output

Expected the indexer to apply the styling the "%" columns, and not throw an error.

Full Backtrace


---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-17-994feb9d52cf> in <module>()
      1 df.style.format(
----> 2     '{:.1f}%', subset=pct_subset)

/home/jeremysalwen/.local/lib/python2.7/site-packages/pandas/io/formats/style.pyc in format(self, formatter, subset)
    399                 subset = subset, self.data.columns
    400 
--> 401             sub_df = self.data.loc[subset]
    402             row_locs = self.data.index.get_indexer_for(sub_df.index)
    403             col_locs = self.data.columns.get_indexer_for(sub_df.columns)

/home/jeremysalwen/.local/lib/python2.7/site-packages/pandas/core/indexing.pyc in __getitem__(self, key)
   1492             except (KeyError, IndexError, AttributeError):
   1493                 pass
-> 1494             return self._getitem_tuple(key)
   1495         else:
   1496             # we by definition only have the 0th axis

/home/jeremysalwen/.local/lib/python2.7/site-packages/pandas/core/indexing.pyc in _getitem_tuple(self, tup)
    866     def _getitem_tuple(self, tup):
    867         try:
--> 868             return self._getitem_lowerdim(tup)
    869         except IndexingError:
    870             pass

/home/jeremysalwen/.local/lib/python2.7/site-packages/pandas/core/indexing.pyc in _getitem_lowerdim(self, tup)
    967         # we may have a nested tuples indexer here
    968         if self._is_nested_tuple_indexer(tup):
--> 969             return self._getitem_nested_tuple(tup)
    970 
    971         # we maybe be using a tuple to represent multiple dimensions here

/home/jeremysalwen/.local/lib/python2.7/site-packages/pandas/core/indexing.pyc in _getitem_nested_tuple(self, tup)
   1046 
   1047             current_ndim = obj.ndim
-> 1048             obj = getattr(obj, self.name)._getitem_axis(key, axis=axis)
   1049             axis += 1
   1050 

/home/jeremysalwen/.local/lib/python2.7/site-packages/pandas/core/indexing.pyc in _getitem_axis(self, key, axis)
   1900                     raise ValueError('Cannot index with multidimensional key')
   1901 
-> 1902                 return self._getitem_iterable(key, axis=axis)
   1903 
   1904             # nested tuple slicing

/home/jeremysalwen/.local/lib/python2.7/site-packages/pandas/core/indexing.pyc in _getitem_iterable(self, key, axis)
   1203             # A collection of keys
   1204             keyarr, indexer = self._get_listlike_indexer(key, axis,
-> 1205                                                          raise_missing=False)
   1206             return self.obj._reindex_with_indexers({axis: [keyarr, indexer]},
   1207                                                    copy=True, allow_dups=True)

/home/jeremysalwen/.local/lib/python2.7/site-packages/pandas/core/indexing.pyc in _get_listlike_indexer(self, key, axis, raise_missing)
   1152             if len(ax) or not len(key):
   1153                 key = self._convert_for_reindex(key, axis)
-> 1154             indexer = ax.get_indexer_for(key)
   1155             keyarr = ax.reindex(keyarr)[0]
   1156         else:

/home/jeremysalwen/.local/lib/python2.7/site-packages/pandas/core/indexes/base.pyc in get_indexer_for(self, target, **kwargs)
   4453         """
   4454         if self.is_unique:
-> 4455             return self.get_indexer(target, **kwargs)
   4456         indexer, _ = self.get_indexer_non_unique(target, **kwargs)
   4457         return indexer

/home/jeremysalwen/.local/lib/python2.7/site-packages/pandas/core/indexes/multi.pyc in get_indexer(self, target, method, limit, tolerance)
   2157                                                           method=method,
   2158                                                           limit=limit,
-> 2159                                                           tolerance=tolerance)
   2160 
   2161         if not self.is_unique:

/home/jeremysalwen/.local/lib/python2.7/site-packages/pandas/core/indexes/base.pyc in get_indexer(self, target, method, limit, tolerance)
   2753                                  'backfill or nearest reindexing')
   2754 
-> 2755             indexer = self._engine.get_indexer(target._ndarray_values)
   2756 
   2757         return ensure_platform_int(indexer)

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_indexer()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.lookup()

TypeError: unhashable type

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 2.7.14.final.0
python-bits: 64
OS: Linux
OS-release: 4.19.20-1rodete1-amd64
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: None.None

pandas: 0.24.2
pytest: None
pip: 19.0.3
setuptools: 36.7.1
Cython: None
numpy: 1.16.2
scipy: None
pyarrow: None
xarray: None
IPython: 5.8.0
sphinx: None
patsy: None
dateutil: 2.8.0
pytz: 2018.9
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml.etree: 4.2.5
bs4: 4.6.0
html5lib: 0.999999999
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None

@mroeschke
Copy link
Member

This looks to work on master now. Could use a test.

In [49]: import pandas as pd
    ...: import numpy as np
    ...: labels = np.array([[0, 0,1,1],
    ...:                      [0, 1, 0, 1]])
    ...: columns = pd.MultiIndex(
    ...:   levels=[["a", "b"], ['%', '#']], labels=labels, names=['', ''])
    ...: df = pd.DataFrame(
    ...:   [[1,-1,1,1],[-1,1,1,1]],
    ...:   index=["hello", "world"],
    ...:   columns=columns)
    ...: pct_subset = pd.IndexSlice[:, pd.IndexSlice[:, '%':'%']]
    ...:
    ...: def color_negative_red(val):
    ...:     color = 'red' if val < 0 else 'black'
    ...:     return 'color: %s' % color
    ...:
    ...: # Works on both 0.22 and 0.24.
    ...: df.loc[pct_subset]
/anaconda3/envs/pandas-dev/bin/ipython:6: FutureWarning: the 'labels' keyword is deprecated, use 'codes' instead

Out[49]:
       a  b
       %  %
hello  1  1
world -1  1

In [50]: df.style.applymap(color_negative_red, subset=pct_subset)
    ...:
Out[50]: <pandas.io.formats.style.Styler at 0x1a24fb1b90>

In [51]: pd.__version__
Out[51]: '0.26.0.dev0+734.g0de99558b'

@mroeschke mroeschke added good first issue Needs Tests Unit test(s) needed to prevent regressions labels Nov 2, 2019
sofiane87 added a commit to sofiane87/pandas that referenced this issue Nov 2, 2019
@jreback jreback added this to the 1.0 milestone Nov 2, 2019
gfyoung pushed a commit that referenced this issue Nov 5, 2019
* TST: Adding styler applymap multindex & code test

Closes gh-25858
Reksbril pushed a commit to Reksbril/pandas that referenced this issue Nov 18, 2019
proost pushed a commit to proost/pandas that referenced this issue Dec 19, 2019
proost pushed a commit to proost/pandas that referenced this issue Dec 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Needs Tests Unit test(s) needed to prevent regressions
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants