CLN/API: implemented to_html in terms of .style #11700

jreback · 2015-11-25T13:37:40Z

Implement to_html / notebook repr based on .style.

prob need to expand this to take a use argument (to select the style, needs to be 'classic' for a while, to replicate the current .to_html one).

The text was updated successfully, but these errors were encountered:

jorisvandenbossche · 2017-01-03T22:41:44Z

Some discussion related to this was going on in #14975 (comment). Summarizing some elements here:

Barriers: some missing features are needed before such a replacement is possible (see also some elements in #11610)

truncated display
writing to a file (ENH: write Styler rendered output to file #13379)

Advantages:

would eliminate a lot of code that gives similar functionality (HTMLFormatter, possibly other formatters) -> converging to one formatting system

Disadvantages:

formally adding jinja2 as a dependency.
performance?
- plain html rendering on dataframe of 10 columns /10,000 rows of floats: df.style.render(): 19.6 s vs df.to_html() 2.7 s
- for notebook reprs (which are typically truncated) this will probably not be a problem

cc @TomAugspurger For basic html output / notebook repr, it would maybe be useful to have a base class that has a simpler template and does not support all the different customization methods? For example, I can imagine that leaving out all the id=.. (which are not needed for basic display I think?) can improve perf / simplify things.

TomAugspurger · 2017-01-03T22:49:51Z

For basic html output / notebook repr, it would maybe be useful to have a base class that has a simpler template and does not support all the different customization methods?

100% agree with your comments here. This wouldn't really be implementing df.to_html using .style.
Instead we'd have a common Jinja2 template that would handle the logic of iterating over rows, inserting tags.
Then .to_html() and .style would extend that base template. .to_html probably wouldn't change much from the base really.

Also, Jinja depends on MarkupSafe, so that becomes another dependency.

attack68 · 2021-02-21T11:32:39Z

Was there ever any progression on these ideas?

FYI the performance disadvantage above is much improved from 2017. 19.6s vs 2.7s, I now get about 3.9s versus 1.9s.

Also note #39951

moi90 · 2021-03-10T09:16:33Z

I don't agree with the advantage mentioned by @jorisvandenbossche: While I'm all for one convergent formatting system, a templating engine is not the solution. It just does not work for everything:
As I said in #21673, there are other formats (like Excel) that can not (realistically) be built using a templating engine.

Also, I am not enthusiastic about making Jinja a hard dependence to render templates (for both HTML and LaTex, or anything else).

EDIT: My idea is that the various (styleable) *Formatters (HTMLFormatter, NotebookFormatter, ExcelFormatter, ...) should be extended to get the ability to optionally apply styles to their output (like I described in #21673).

toobaz · 2021-03-10T16:31:07Z

EDIT: My idea is that the various (styleable) *Formatters (HTMLFormatter, NotebookFormatter, ExcelFormatter, ...) should be extended to get the ability to optionally apply styles to their output

Isn't ExcelFormatter already used to do precisely this?

attack68 · 2021-03-10T21:36:31Z

I don't agree with the advantage mentioned by @jorisvandenbossche: While I'm all for one convergent formatting system, a templating engine is not the solution. It just does not work for everything:

I don't believe the objective here is to have one convergent system for everything, rather this post is about having one convergent formatting system for to_html, as opposed to Styler with jinja2 and DataFrame.to_html with HTMLFormatter.

jinja2 is a goto for python generating HTML due to packages like flask and Django, so if you are rendering HTML tables from pandas it is a logical combination, as well as the additional template extension flexibility it gives users, that HTMLFormatter cannot.

Since jinja2 is a dependency of Styler and if we assume that is not going away, then any Styler.to_latex method would have jinja2 available to it and some initial work done suggests this is quite easy to incorporate, or at least replicate the existing Dataframe.to_latex() functionality, without having, imo, the horrible subclassing of Formatters. master...attack68:latex_styler_mvp

toobaz · 2021-03-10T23:25:56Z

I'm conflicted. On one hand, it's nice to remove code. On the other, I'm not sure of how much code we would really save in exchange for a "stronger" dependency on jinja2. In #40344, you say that some of the arguments of to_html() (e.g. min_rowsint) are pointless because they are "related to console display"... but if the idea is that DataFrame.to_html() and Styler.to_html() are formatted with templates but not DataFrame._repr_html_(), then we are not really gaining much - we still need internal code to produce html for console display, right? And by the way, the fact that Styler._repr_html() does not truncate data like DataFrame._repl_html_() does should probably be considered a bug.

The possibility to export to other formats via jinja2 is also something potentially interesting but to be better investigated. While your attempt in master...attack68:latex_styler_mvp is cool, I suspect the complexity will increase quite a bit once we start supporting formatting (which won't use stuff like css), to the point that what jinja2 actually delivers is only a small part of the task of formatting to LaTeX.

I would be happy to be proven wrong though. How difficult would it be, in #40312, to run the test suite with DataFrame.to_html() replaced with the jinja2 implementation, just to see what breaks?

moi90 · 2021-03-11T10:19:14Z

I don't believe the objective here is to have one convergent system for everything, rather this post is about having one convergent formatting system for to_html, as opposed to Styler with jinja2 and DataFrame.to_html with HTMLFormatter.

You're right if it is certain that HTMLFormatter can be completely removed. Is that the case? It seems not, guessing from @toobaz' comment.

attack68 · 2021-03-11T15:39:50Z

You're right if it is certain that HTMLFormatter can be completely removed. Is that the case? It seems not, guessing from @toobaz' comment.

@moi90 If the goal is to replicate all of the functionality from DataFrame.to_html() then yes it can be done and a lot has already been done in my wip pr. Not all though, because I wanted to raise the issue about simply blindly replicating a function which in some cases produces deprecated HTML, and instead consider the merits of making some changes perhaps with a view to pandas 2.0.

While your attempt in master...attack68:latex_styler_mvp is cool, I suspect the complexity will increase quite a bit once we start supporting formatting (which won't use stuff like css), to the point that what jinja2 actually delivers is only a small part of the task of formatting to LaTeX.

@toobaz I progressed the MVP to state where it now has a lot of general conditional styling capability for latex tables. See my response here
I still want to be able to add some table level styles like column colouring or odd/even colouring but these are quite easy extensions.

I would be happy to be proven wrong though. How difficult would it be, in #40312, to run the test suite with DataFrame.to_html() replaced with the jinja2 implementation, just to see what breaks?

Quite easy, just need to redirect the method, when I push it I will ping you to take a look at test results.

attack68 · 2021-03-11T18:09:09Z

And by the way, the fact that Styler._repr_html() does not truncate data like DataFrame.repl_html() does should probably be considered a bug.

Actually I think the opposite. The docstring for _repr_html states it is mainly for Ipython / Jupyter, which has its own auto scrolling feature. I find it a real nuisance when pandas truncates my dataframes, so always revert to the default df.style display because it shows everything. If you want to view a dataframe in a console don't use a html represenatation, no?

toobaz · 2021-03-11T18:17:37Z

The docstring for _repr_html states it is mainly for Ipython / Jupyter, which has its own auto scrolling feature.

Sure, but passing the notebook a table with millions of rows will just make it crash, whether or not you scroll. We can discuss the optimal numer of rows to show (notice that you can easily customize it), but I'm afraid "no limit" is not an option.

If you want to view a dataframe in a console don't use a html represenatation, no?

Sure, the point is indeed about notebooks.

attack68 · 2021-03-12T07:31:49Z

Sure, but passing the notebook a table with millions of rows will just make it crash, whether or not you scroll. We can discuss the optimal numer of rows to show (notice that you can easily customize it), but I'm afraid "no limit" is not an option.

Do pandas set a limit of the size of a DataFrame you can construct, or is its limit just naturally determined by system constraints? Same logic could be argued here, albeit one is inside native python and the the other is rendering in external application like Jupyter in a browser (so error might not be as obvious)

I have seen multiple use cases of wanting to visualise large tables one is here with the other upto 20,000 rows. To be honest thats the largest I've seen so even if I'm not convinced a limit is necessary I think having one above that would not have affected any use case I have seen so far - and from memory that only took seconds to render, so would be happy with that.

toobaz · 2021-03-12T08:17:34Z

I have seen multiple use cases of wanting to visualise large tables one is here with the other upto 20,000 rows.

I regularly use tables with a couple of million rows inside Jupyter and it's great to see them easily. I would hate to crash my notebook every time I view them without thinking about truncating them. I'm sure many people use pandas with much larger databases. Again, I think deprecating the truncated visualization is not an option. I might be wrong on the need to truncate Styler too, however, so we can leave that option out of this discussion.

jorisvandenbossche · 2021-03-12T08:50:59Z

Indeed, removing truncation from the default html repr is currently not an option I think (unless we would use a more advanced widget that eg does that automatically, but that's another discussion). There are already settings to change the number of rows to show, if you want to change this as a user.

So if we want to replace the to_html/_repr_html_ with Styler, the truncation functionality will need to be added to Styler (although I don't think that Styler needs to do that by default).

attack68 · 2021-03-12T10:42:12Z

OK seems well supported, adding this to the list of things needed.

jorisvandenbossche · 2021-08-25T20:34:05Z

This wasn't really closed by #40312, which only added a Styler.to_html, and didn't implement the main to_html in terms of Styler

attack68 · 2022-01-26T21:54:20Z

In #45382 I'm proposing changing the signature of DataFrame.to_latex to:

DataFrame.to_latex(hide, format, format_index, render_kwargs)

and this will perform the following:

DataFrame.style.hide(**hide).format(**format).format_index(**format_index).to_latex(**render_kwargs)

This has the advantage of:

converting the method to use Styler implementation
not require updates to the arguments signature of DataFrame.to_latex since it passes the kwargs through
allows a structured deprecation cycle where all the existing args can be restructured into this format as documented.

Is this reasonable and would it be appropriate to aim for something similar with to_html for v2.0?

jreback added Output-Formatting __repr__ of pandas objects, to_string API Design IO HTML read_html, to_html, Styler.apply, Styler.applymap Clean labels Nov 25, 2015

jreback added this to the 0.18.0 milestone Nov 25, 2015

TomAugspurger mentioned this issue Nov 25, 2015

Followup to Conditional HTML Styling #11610

Closed

13 tasks

jreback modified the milestones: Next Major Release, 0.18.0 Jan 24, 2016

TomAugspurger added the Code Style Code style, linting, code_checks label Mar 11, 2016

TomAugspurger removed the Code Style Code style, linting, code_checks label May 17, 2016

jreback mentioned this issue Jun 6, 2016

ENH: write Styler rendered output to file #13379

Closed

jorisvandenbossche mentioned this issue Jan 3, 2017

BUG: GH14882 Incorrect index label displayed on MultiIndex DataFrame #14975

Closed

4 tasks

attack68 added the Styler conditional formatting using DataFrame.style label Feb 20, 2021

This was referenced Mar 8, 2021

ENH: Styler.to_latex() #21673

Closed

API: Add Styler.to_html, for saving output to HTML file #40312

Merged

jreback modified the milestones: Contributions Welcome, 1.3 May 21, 2021

attack68 mentioned this issue May 23, 2021

ENH: set render limits on Styler to automatically trim dataframes #41635

Merged

2 tasks

jreback closed this as completed in #40312 May 26, 2021

jorisvandenbossche reopened this Aug 25, 2021

jorisvandenbossche mentioned this issue Aug 25, 2021

WIP: REF: DataFrame.to_html directs to Styler.to_html #43161

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CLN/API: implemented to_html in terms of .style #11700

CLN/API: implemented to_html in terms of .style #11700

jreback commented Nov 25, 2015 •

edited by jorisvandenbossche

Loading

jorisvandenbossche commented Jan 3, 2017

TomAugspurger commented Jan 3, 2017 •

edited

Loading

attack68 commented Feb 21, 2021

moi90 commented Mar 10, 2021 •

edited

Loading

toobaz commented Mar 10, 2021

attack68 commented Mar 10, 2021

toobaz commented Mar 10, 2021

moi90 commented Mar 11, 2021

attack68 commented Mar 11, 2021

attack68 commented Mar 11, 2021

toobaz commented Mar 11, 2021 •

edited

Loading

attack68 commented Mar 12, 2021

toobaz commented Mar 12, 2021

jorisvandenbossche commented Mar 12, 2021

attack68 commented Mar 12, 2021

jorisvandenbossche commented Aug 25, 2021

attack68 commented Jan 26, 2022

CLN/API: implemented to_html in terms of .style #11700

CLN/API: implemented to_html in terms of .style #11700

Comments

jreback commented Nov 25, 2015 • edited by jorisvandenbossche Loading

jorisvandenbossche commented Jan 3, 2017

TomAugspurger commented Jan 3, 2017 • edited Loading

attack68 commented Feb 21, 2021

moi90 commented Mar 10, 2021 • edited Loading

toobaz commented Mar 10, 2021

attack68 commented Mar 10, 2021

toobaz commented Mar 10, 2021

moi90 commented Mar 11, 2021

attack68 commented Mar 11, 2021

attack68 commented Mar 11, 2021

toobaz commented Mar 11, 2021 • edited Loading

attack68 commented Mar 12, 2021

toobaz commented Mar 12, 2021

jorisvandenbossche commented Mar 12, 2021

attack68 commented Mar 12, 2021

jorisvandenbossche commented Aug 25, 2021

attack68 commented Jan 26, 2022

jreback commented Nov 25, 2015 •

edited by jorisvandenbossche

Loading

TomAugspurger commented Jan 3, 2017 •

edited

Loading

moi90 commented Mar 10, 2021 •

edited

Loading

toobaz commented Mar 11, 2021 •

edited

Loading