ENH: make `Styler` compatible with non-unique indexes #41269

attack68 · 2021-05-02T18:38:41Z

closes ENH: Make Styler work with non-unique indexes #41143

The PR aims to make Styler compatible with non-unique indexes/columns, for the purpose of rendering all DataFrame types (even if no styling is applied)

Styler.format: made FULLY compatible with some modifications to the loops, inc TESTS.
~~Styler.apply and Styler.applymap: made PARTIALLY compatible:~~
- ~~if subsets are non-unique slices will raise a not compatible KeyError, inc. TESTS~~
Styler.apply and applymap are NOT compatible. Raises KeyError.
Styler.set_table_styles: made FULLY compatible and will style multiple rows/columns from a non-unique key, inc TESTS.
Styler.set_td_classes uses reindex so is PARTIALLY compatible where classes has unique index/columns: now returns a KeyError in non-unique case, inc TESTS.
Styler.set_tooltips uses reindex so is PARTIALLY compatible where ttips has unique index/columns: now returns a KeyError in non-unique case, inc TESTS.
Styler.hide_index and .hide_columns are already FULLY compatible through existing code (inc TESTS)
all the built-in styling functions use some version of apply or applymap so are captured by the above cases.

I believe this is all relevant functionality reviewed.

…sts)

…inc Tests)

…fix Tests

jreback · 2021-05-03T00:00:49Z

pandas/io/formats/style.py

+                    if not c:
+                        continue
+                    css_list = maybe_convert_css_to_tuples(c)
+                    i, j = self.index.get_loc(rn), self.columns.get_loc(cn)


can instead we just not use the indices to look up locations? and instead just use indexers (e.g. iterate over the number of columns and use iloc)

no, because attrs may be a subset of the main self.data, so to maintain performance and ensure the mapped css is placed in the right integer locations it needs to lookup - and lookup doesn't work (without ambiguity) for non-unique.

pandas/io/formats/style.py

jreback · 2021-05-05T12:55:22Z

pandas/io/formats/style.py

+            try:
+                for rn, c in attrs[[cn]].itertuples():
+                    if not c:


can you limit the try/except not to the entire loop here, e.g. just put it around the .get_loc

it would be even better to simply refuse to render non-uniques entirely.

i removed the separate cases, as suggested.

jreback · 2021-05-05T12:55:43Z

pandas/io/formats/style.py

+                    css_list = maybe_convert_css_to_tuples(c)
+                    i, j = self.index.get_loc(rn), self.columns.get_loc(cn)
+                    self.ctx[(i, j)].extend(css_list)
+            except ValueError as ve:


want to be much more fine grained

jreback · 2021-05-06T01:45:16Z

pandas/io/formats/style_render.py

-                self._display_funcs[(i, j)] = format_func
+            for row in data[[col]].itertuples():
+                i_ = self.index.get_indexer_for([row[0]])  # handle duplicate keys in
+                j_ = self.columns.get_indexer_for([col])  # non-unique indexes


you can do this outside of the loop right? (as col doesn't change), for j_

does this change perf at all? (I don't think so, but checking).

Good catch.. the multiple loops really killed it for the unique case (which is benchmarked)..

before after ratio [4af3eed5] [3a8f11e5] <styler_non_unique~1^2> <styler_non_unique> + 57.8±2ms 163±4ms 2.82 io.style.Render.time_format_render(24, 120) + 87.1±5ms 242±4ms 2.78 io.style.Render.time_format_render(36, 120) + 30.5±0.9ms 82.2±0.4ms 2.69 io.style.Render.time_format_render(12, 120) + 8.53±0.08ms 14.7±0.2ms 1.72 io.style.Render.time_format_render(12, 12) + 16.2±0.3ms 27.7±0.7ms 1.71 io.style.Render.time_format_render(24, 12) + 25.3±1ms 40.9±0.7ms 1.62 io.style.Render.time_format_render(36, 12) + 16.6±0.2ms 18.5±0.2ms 1.11 io.style.Render.time_classes_render(36, 12) SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY. PERFORMANCE DECREASED.

So I had to separate out the non-unique and unique cases with a conditional, then performance was the same...

BENCHMARKS NOT SIGNIFICANTLY CHANGED.

jreback · 2021-05-06T12:41:11Z

pandas/io/formats/style_render.py

+
+                j = self.columns.get_loc(col)  # single value
+                for row, value in data[[col]].itertuples():
+                    i = self.index.get_loc(row)  # single value


if you pull the col indexer out does this still have a perf hit? e.g. get_indexer_for calls get_loc if its unique (which is cached) anyhow.

Good reviewing! Third time's a charm, the code is simpler, works for non-unique and unique and performance improvement:

before after ratio [4af3eed5] [c3b7af82] <text_gradient^2> <styler_non_unique> - 33.1±0.4ms 25.9±0.5ms 0.78 io.style.Render.time_format_render(12, 120) - 86.4±0.4ms 66.8±0.5ms 0.77 io.style.Render.time_format_render(36, 120) - 62.3±2ms 46.4±0.4ms 0.75 io.style.Render.time_format_render(24, 120) - 10.1±0.3ms 4.16±0.2ms 0.41 io.style.Render.time_format_render(12, 12) - 24.7±0.2ms 9.53±0.2ms 0.39 io.style.Render.time_format_render(36, 12) - 17.7±0.2ms 6.76±0.1ms 0.38 io.style.Render.time_format_render(24, 12) SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY. PERFORMANCE INCREASED.

jreback · 2021-05-06T23:32:04Z

ok!

ENH: make Styler.format compatible with non-unique indexes (with Te…

1cc569f

…sts)

attack68 changed the title ~~ENH: make Styler.format compatible with non-unique indexes~~ ENH: make Styler compatible with non-unique indexes May 2, 2021

attack68 added 5 commits May 2, 2021 21:07

ENH: error when using a non-unique subset with .apply and .applymap (…

6982554

…inc Tests)

ENH: error when using a non-unique subset with .apply and .applymap: …

a3694db

…fix Tests

ENH: make table_styles work with non-unique + TST: refactor to own file

5c6669c

ENH: error catching

732c7d5

ENH: error catching

4c99130

jreback requested changes May 3, 2021

View reviewed changes

jreback added the Styler conditional formatting using DataFrame.style label May 3, 2021

attack68 added 6 commits May 3, 2021 07:45

Merge remote-tracking branch 'upstream/master' into styler_non_unique

57e8bef

ENH: deal with tooltips and raise (inc Tests)

a7a2966

ENH: deal with tooltips and raise (inc Tests)

19fb7f9

ENH: deal with tset_td_classes and raise (inc Tests)

4ce559e

ENH: tests for hide_columns

7f28111

ENH: remove style ValueError

5043c01

attack68 marked this pull request as ready for review May 3, 2021 06:57

attack68 added 2 commits May 4, 2021 10:19

Merge remote-tracking branch 'upstream/master' into styler_non_unique

9fc6cd3

whats new

09764ba

jreback requested changes May 5, 2021

View reviewed changes

attack68 added 2 commits May 5, 2021 16:09

Merge remote-tracking branch 'upstream/master' into styler_non_unique

9451aae

prohibit apply and applymap in non-unique case

4faeb29

jreback reviewed May 6, 2021

View reviewed changes

attack68 added 4 commits May 6, 2021 08:06

Merge remote-tracking branch 'upstream/master' into styler_non_unique

aed0536

move outside loop

3a8f11e

create conditional for performance

51233be

create conditional for performance

8454c5e

jreback requested changes May 6, 2021

View reviewed changes

attack68 added 2 commits May 6, 2021 20:51

take indexing out of loops

c3b7af8

take indexing out of loops

20cd19f

jreback added this to the 1.3 milestone May 6, 2021

jreback approved these changes May 6, 2021

View reviewed changes

jreback merged commit ebf3b98 into pandas-dev:master May 6, 2021

attack68 deleted the styler_non_unique branch May 7, 2021 05:18

attack68 mentioned this pull request May 9, 2021

ENH: Styler.to_latex(): conditional styling with native latex format #40422

Merged

15 tasks

This was referenced May 24, 2021

REF: DataFrame.to_latex directs to Styler.to_latex #41648

Closed

DEPR: LatexFormatter in DataFrame.to_latex in favour of Styler #41649

Closed

REF: DataFrame.to_html should call Styler.to_html #41693

Open

JulianWgs pushed a commit to JulianWgs/pandas that referenced this pull request Jul 3, 2021

ENH: make Styler compatible with non-unique indexes (pandas-dev#41269)

a24bf3b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: make `Styler` compatible with non-unique indexes #41269

ENH: make `Styler` compatible with non-unique indexes #41269

attack68 commented May 2, 2021 •

edited

Loading

jreback May 3, 2021

attack68 May 3, 2021

jreback May 5, 2021

jreback May 5, 2021

attack68 May 5, 2021

jreback May 5, 2021

jreback May 6, 2021 •

edited

Loading

attack68 May 6, 2021

jreback May 6, 2021

attack68 May 6, 2021

jreback commented May 6, 2021

ENH: make Styler compatible with non-unique indexes #41269

ENH: make Styler compatible with non-unique indexes #41269

Conversation

attack68 commented May 2, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback May 6, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented May 6, 2021

ENH: make `Styler` compatible with non-unique indexes #41269

ENH: make `Styler` compatible with non-unique indexes #41269

attack68 commented May 2, 2021 •

edited

Loading

jreback May 6, 2021 •

edited

Loading