Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLN/API: implemented to_html in terms of .style #11700

Open
jreback opened this issue Nov 25, 2015 · 17 comments · Fixed by #40312
Open

CLN/API: implemented to_html in terms of .style #11700

jreback opened this issue Nov 25, 2015 · 17 comments · Fixed by #40312
Labels
API Design Clean IO HTML read_html, to_html, Styler.apply, Styler.applymap Output-Formatting __repr__ of pandas objects, to_string Styler conditional formatting using DataFrame.style
Milestone

Comments

@jreback
Copy link
Contributor

jreback commented Nov 25, 2015

Implement to_html / notebook repr based on .style.

prob need to expand this to take a use argument (to select the style, needs to be 'classic' for a while, to replicate the current .to_html one).

@jreback jreback added Output-Formatting __repr__ of pandas objects, to_string API Design IO HTML read_html, to_html, Styler.apply, Styler.applymap Clean labels Nov 25, 2015
@jreback jreback added this to the 0.18.0 milestone Nov 25, 2015
@jreback jreback modified the milestones: Next Major Release, 0.18.0 Jan 24, 2016
@TomAugspurger TomAugspurger added the Code Style Code style, linting, code_checks label Mar 11, 2016
@TomAugspurger TomAugspurger removed the Code Style Code style, linting, code_checks label May 17, 2016
@jorisvandenbossche
Copy link
Member

Some discussion related to this was going on in #14975 (comment). Summarizing some elements here:

Barriers: some missing features are needed before such a replacement is possible (see also some elements in #11610)

Advantages:

  • would eliminate a lot of code that gives similar functionality (HTMLFormatter, possibly other formatters) -> converging to one formatting system

Disadvantages:

  • formally adding jinja2 as a dependency.
  • performance?
    • plain html rendering on dataframe of 10 columns /10,000 rows of floats: df.style.render(): 19.6 s vs df.to_html() 2.7 s
    • for notebook reprs (which are typically truncated) this will probably not be a problem

cc @TomAugspurger For basic html output / notebook repr, it would maybe be useful to have a base class that has a simpler template and does not support all the different customization methods? For example, I can imagine that leaving out all the id=.. (which are not needed for basic display I think?) can improve perf / simplify things.

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Jan 3, 2017

For basic html output / notebook repr, it would maybe be useful to have a base class that has a simpler template and does not support all the different customization methods?

100% agree with your comments here. This wouldn't really be implementing df.to_html using .style.
Instead we'd have a common Jinja2 template that would handle the logic of iterating over rows, inserting tags.
Then .to_html() and .style would extend that base template. .to_html probably wouldn't change much from the base really.

Also, Jinja depends on MarkupSafe, so that becomes another dependency.

@attack68 attack68 added the Styler conditional formatting using DataFrame.style label Feb 20, 2021
@attack68
Copy link
Contributor

Was there ever any progression on these ideas?

FYI the performance disadvantage above is much improved from 2017. 19.6s vs 2.7s, I now get about 3.9s versus 1.9s.

Also note #39951

@moi90
Copy link
Contributor

moi90 commented Mar 10, 2021

I don't agree with the advantage mentioned by @jorisvandenbossche: While I'm all for one convergent formatting system, a templating engine is not the solution. It just does not work for everything:
As I said in #21673, there are other formats (like Excel) that can not (realistically) be built using a templating engine.

Also, I am not enthusiastic about making Jinja a hard dependence to render templates (for both HTML and LaTex, or anything else).

EDIT: My idea is that the various (styleable) *Formatters (HTMLFormatter, NotebookFormatter, ExcelFormatter, ...) should be extended to get the ability to optionally apply styles to their output (like I described in #21673).

@toobaz
Copy link
Member

toobaz commented Mar 10, 2021

EDIT: My idea is that the various (styleable) *Formatters (HTMLFormatter, NotebookFormatter, ExcelFormatter, ...) should be extended to get the ability to optionally apply styles to their output

Isn't ExcelFormatter already used to do precisely this?

@attack68
Copy link
Contributor

I don't agree with the advantage mentioned by @jorisvandenbossche: While I'm all for one convergent formatting system, a templating engine is not the solution. It just does not work for everything:

I don't believe the objective here is to have one convergent system for everything, rather this post is about having one convergent formatting system for to_html, as opposed to Styler with jinja2 and DataFrame.to_html with HTMLFormatter.

jinja2 is a goto for python generating HTML due to packages like flask and Django, so if you are rendering HTML tables from pandas it is a logical combination, as well as the additional template extension flexibility it gives users, that HTMLFormatter cannot.

Since jinja2 is a dependency of Styler and if we assume that is not going away, then any Styler.to_latex method would have jinja2 available to it and some initial work done suggests this is quite easy to incorporate, or at least replicate the existing Dataframe.to_latex() functionality, without having, imo, the horrible subclassing of Formatters. master...attack68:latex_styler_mvp

@toobaz
Copy link
Member

toobaz commented Mar 10, 2021

I'm conflicted. On one hand, it's nice to remove code. On the other, I'm not sure of how much code we would really save in exchange for a "stronger" dependency on jinja2. In #40344, you say that some of the arguments of to_html() (e.g. min_rowsint) are pointless because they are "related to console display"... but if the idea is that DataFrame.to_html() and Styler.to_html() are formatted with templates but not DataFrame._repr_html_(), then we are not really gaining much - we still need internal code to produce html for console display, right? And by the way, the fact that Styler._repr_html() does not truncate data like DataFrame._repl_html_() does should probably be considered a bug.

The possibility to export to other formats via jinja2 is also something potentially interesting but to be better investigated. While your attempt in master...attack68:latex_styler_mvp is cool, I suspect the complexity will increase quite a bit once we start supporting formatting (which won't use stuff like css), to the point that what jinja2 actually delivers is only a small part of the task of formatting to LaTeX.

I would be happy to be proven wrong though. How difficult would it be, in #40312, to run the test suite with DataFrame.to_html() replaced with the jinja2 implementation, just to see what breaks?

@moi90
Copy link
Contributor

moi90 commented Mar 11, 2021

I don't believe the objective here is to have one convergent system for everything, rather this post is about having one convergent formatting system for to_html, as opposed to Styler with jinja2 and DataFrame.to_html with HTMLFormatter.

You're right if it is certain that HTMLFormatter can be completely removed. Is that the case? It seems not, guessing from @toobaz' comment.

@attack68
Copy link
Contributor

You're right if it is certain that HTMLFormatter can be completely removed. Is that the case? It seems not, guessing from @toobaz' comment.

@moi90 If the goal is to replicate all of the functionality from DataFrame.to_html() then yes it can be done and a lot has already been done in my wip pr. Not all though, because I wanted to raise the issue about simply blindly replicating a function which in some cases produces deprecated HTML, and instead consider the merits of making some changes perhaps with a view to pandas 2.0.

While your attempt in master...attack68:latex_styler_mvp is cool, I suspect the complexity will increase quite a bit once we start supporting formatting (which won't use stuff like css), to the point that what jinja2 actually delivers is only a small part of the task of formatting to LaTeX.

@toobaz I progressed the MVP to state where it now has a lot of general conditional styling capability for latex tables. See my response here
I still want to be able to add some table level styles like column colouring or odd/even colouring but these are quite easy extensions.

I would be happy to be proven wrong though. How difficult would it be, in #40312, to run the test suite with DataFrame.to_html() replaced with the jinja2 implementation, just to see what breaks?

Quite easy, just need to redirect the method, when I push it I will ping you to take a look at test results.

@attack68
Copy link
Contributor

And by the way, the fact that Styler._repr_html() does not truncate data like DataFrame.repl_html() does should probably be considered a bug.

Actually I think the opposite. The docstring for _repr_html states it is mainly for Ipython / Jupyter, which has its own auto scrolling feature. I find it a real nuisance when pandas truncates my dataframes, so always revert to the default df.style display because it shows everything. If you want to view a dataframe in a console don't use a html represenatation, no?

@toobaz
Copy link
Member

toobaz commented Mar 11, 2021

The docstring for _repr_html states it is mainly for Ipython / Jupyter, which has its own auto scrolling feature.

Sure, but passing the notebook a table with millions of rows will just make it crash, whether or not you scroll. We can discuss the optimal numer of rows to show (notice that you can easily customize it), but I'm afraid "no limit" is not an option.

If you want to view a dataframe in a console don't use a html represenatation, no?

Sure, the point is indeed about notebooks.

@attack68
Copy link
Contributor

Sure, but passing the notebook a table with millions of rows will just make it crash, whether or not you scroll. We can discuss the optimal numer of rows to show (notice that you can easily customize it), but I'm afraid "no limit" is not an option.

Do pandas set a limit of the size of a DataFrame you can construct, or is its limit just naturally determined by system constraints? Same logic could be argued here, albeit one is inside native python and the the other is rendering in external application like Jupyter in a browser (so error might not be as obvious)

I have seen multiple use cases of wanting to visualise large tables one is here with the other upto 20,000 rows. To be honest thats the largest I've seen so even if I'm not convinced a limit is necessary I think having one above that would not have affected any use case I have seen so far - and from memory that only took seconds to render, so would be happy with that.

@toobaz
Copy link
Member

toobaz commented Mar 12, 2021

I have seen multiple use cases of wanting to visualise large tables one is here with the other upto 20,000 rows.

I regularly use tables with a couple of million rows inside Jupyter and it's great to see them easily. I would hate to crash my notebook every time I view them without thinking about truncating them. I'm sure many people use pandas with much larger databases. Again, I think deprecating the truncated visualization is not an option. I might be wrong on the need to truncate Styler too, however, so we can leave that option out of this discussion.

@jorisvandenbossche
Copy link
Member

Indeed, removing truncation from the default html repr is currently not an option I think (unless we would use a more advanced widget that eg does that automatically, but that's another discussion). There are already settings to change the number of rows to show, if you want to change this as a user.

So if we want to replace the to_html/_repr_html_ with Styler, the truncation functionality will need to be added to Styler (although I don't think that Styler needs to do that by default).

@attack68
Copy link
Contributor

OK seems well supported, adding this to the list of things needed.

@jreback jreback modified the milestones: Contributions Welcome, 1.3 May 21, 2021
@jorisvandenbossche
Copy link
Member

This wasn't really closed by #40312, which only added a Styler.to_html, and didn't implement the main to_html in terms of Styler

@attack68
Copy link
Contributor

In #45382 I'm proposing changing the signature of DataFrame.to_latex to:

DataFrame.to_latex(hide, format, format_index, render_kwargs)

and this will perform the following:

DataFrame.style.hide(**hide).format(**format).format_index(**format_index).to_latex(**render_kwargs)

This has the advantage of:

  • converting the method to use Styler implementation
  • not require updates to the arguments signature of DataFrame.to_latex since it passes the kwargs through
  • allows a structured deprecation cycle where all the existing args can be restructured into this format as documented.

Is this reasonable and would it be appropriate to aim for something similar with to_html for v2.0?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Clean IO HTML read_html, to_html, Styler.apply, Styler.applymap Output-Formatting __repr__ of pandas objects, to_string Styler conditional formatting using DataFrame.style
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants