Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REF: Use Styler implementation for DataFrame.to_latex #47970

Merged
merged 28 commits into from Jan 19, 2023

Conversation

attack68
Copy link
Contributor

@attack68 attack68 commented Aug 4, 2022

After a year of patching up things in #41649, and @jreback merge of #47864 I can finally propose this for review.

Objective

  • Maintain DataFrame.to_latex with its existing arguments, adding no arguments
  • Process the rendering via Styler.to_latex
  • Eliminate the need for LatexFormatter (code removal not part of this PR) and dual pandas code systems.
  • Redocument and direct users to the Styler implementation for forward development

Outcome

  • All arguments in DataFrame.to_latex were replicable, with the exception of col_space which has no impact upon latex render and, I personally don't like anyway. col_space is deprecated with warning and test. col_space is removed.
  • All original tests pass with minor changes to latex formatting and no significant changes to latex render.
  • Some default formatting of floats changes based on pandas Styler options and DataFrame options crossover, which should be addressed later.
  • The performance of Styler is marginally better, although for the table sizes that one would like to render in latex is neglible, anyway.

No whats_new: awaiting feedback.

New docs

The key new section of the docs..

Screenshot 2022-08-04 at 22 19 37

@attack68 attack68 added this to the 1.5 milestone Aug 5, 2022
@attack68 attack68 added Styler conditional formatting using DataFrame.style IO LaTeX to_latex labels Aug 5, 2022
@attack68
Copy link
Contributor Author

attack68 commented Aug 8, 2022

@ivanovmg @rhshadrach you both had input to the underlying issue so I think your opinion on the output here is very welcome.

@@ -351,6 +350,8 @@ def test_read_fspath_all(self, reader, module, path, datapath):
],
)
def test_write_fspath_all(self, writer_name, writer_kwargs, module):
if writer_name in ["to_latex"]: # uses Styler implementation
pytest.importorskip("jinja2")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this new implementation will require jinja2 to be installed first?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correct. @jreback was fine for this to be the case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No objection in having that as a requirement, but we regard this as a breaking change, yes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is a breaking chang, yes. Current implemtations of dataframe.to_latex will fail after this PR if the user does not have jinja2.

fallback option: since the pr does not remove the LatexFormatter, code could be redirected on undetected jinja2. this would then be non-breaking.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the fallback behavior was implemented then I think this PR would be okay for 1.5. If not, this could be one of the "breaking without deprecation" behaviors for 2.0 possibly if this change is "too large" #44823

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When thinking about doing this I realised the tests that I have altered will be a nightmare for dual implementation, so if this needs to go to 2.0, I don't oppose that.

But then for 2.0 I will probably look to change the arg signature of all of this anyway.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would the args be changed to? The current arguments should be deprecated.

pandas/core/generic.py Outdated Show resolved Hide resolved
@@ -3377,14 +3376,16 @@ def test_filepath_or_buffer_arg(
filepath_or_buffer_id,
):
df = DataFrame([data])
if method in ["to_latex"]: # uses styler implementation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to simplify it to method == 'to_latex'?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if other methods move over to Styler, (the intention being to_html) we will need a list here anyway

Copy link
Member

@ivanovmg ivanovmg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of my comments so far.

Copy link
Member

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to take a more detailed look, but definitely +1 on the direction this is going. I have concerns on merging this prior to 2.0 though.

pandas/core/generic.py Show resolved Hide resolved
pandas/core/generic.py Outdated Show resolved Hide resolved
pandas/core/generic.py Outdated Show resolved Hide resolved
@@ -351,6 +350,8 @@ def test_read_fspath_all(self, reader, module, path, datapath):
],
)
def test_write_fspath_all(self, writer_name, writer_kwargs, module):
if writer_name in ["to_latex"]: # uses Styler implementation
pytest.importorskip("jinja2")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No objection in having that as a requirement, but we regard this as a breaking change, yes?

@attack68
Copy link
Contributor Author

I have concerns on merging this prior to 2.0 though.

Can you share those. Im abivalent as towards 1.5.0 and 2.0, but here are some good reasons for 1.5.0.

  • It allows getting some feedback issues ahead of 2.0 for these transition methods. Since to_html is also planned for the same transition, but I imagine that is a much more popular method, this feedback might be useful.
  • This implementation doesn't change the current arguments for DataFrame.to_latex so provides minimal breaking change and a smoother transition. between 1.4. to 1.5 to 2.

Arguments for including in 2.0

  • Breaking issues do not matter as much.
  • The keyword arguments to DataFrame.to_latex can be changed (rather simplified) and make this function much simpler in code. (Importantly they do not have to conform to the current arguments)

@mroeschke mroeschke modified the milestones: 1.5, 2.0 Aug 22, 2022
@github-actions
Copy link
Contributor

This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this.

@github-actions github-actions bot added the Stale label Sep 22, 2022
@rhshadrach
Copy link
Member

rhshadrach commented Sep 24, 2022

Apologies @attack68 on not getting back to you here.

I have concerns on merging this prior to 2.0 though.

Can you share those. Im abivalent as towards 1.5.0 and 2.0, but here are some good reasons for 1.5.0.

The concerns are the breaking changes this introduces, highlighted in my comments above. I think it would be okay to have the added requirement of jinja2 in 2.0 (#47970 (comment)). #47970 (comment) is still outstanding.

.. _whatsnew_200.api_breaking.to_latex:

DataFrame to LaTeX has a new render engine
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sphinx allows underline to be exact or longer than the text so this isn't strictly necessary, but committed anyway.


The pandas options below are no longer used and will be removed in future releases.
The alternative options giving similar functionality are indicated below:
- ``display.latex.escape``: replaced with ``styler.format.escape``,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this indentation is causing this docbuild error:

2023-01-05T18:42:10.0685513Z /home/runner/work/pandas/pandas/doc/source/whatsnew/v2.0.0.rst:435: WARNING: Block quote ends without a blank line; unexpected unindent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets see if the pushed change fixes that error

@datapythonista datapythonista mentioned this pull request Jan 6, 2023
1 task
@mroeschke
Copy link
Member

Also as a reminder. Could you remove any potential FutureWarnings that are filtered out in the test suite due to the original deprecation?

@mroeschke
Copy link
Member

Just noting that during yesterday's dev call that we decided that this PR isn't necessarily a blocker for releasing 2.0 i.e. this FutureWarning will persist until 3.0 if not ready.

That being said this PR seems close

Co-authored-by: Matthew Roeschke <10647082+mroeschke@users.noreply.github.com>
@attack68
Copy link
Contributor Author

attack68 commented Jan 13, 2023

Just noting that during yesterday's dev call that we decided that this PR isn't necessarily a blocker for releasing 2.0 i.e. this FutureWarning will persist until 3.0 if not ready.

That being said this PR seems close

Yes, if I can just get this to green I dont think there is more to do.

Post PR clean up, such as removal of redundant code and checking the filters can be done after 2.0. WHilst not a blocker I still it would be good to get in if poss.

@attack68
Copy link
Contributor Author

@ivanovmg @rhshadrach @mroeschke this is greenish now. I believe the http doc build is unrelated.
Please re look and consider if all your comments are addressed

pandas/core/generic.py Outdated Show resolved Hide resolved
pandas/core/generic.py Outdated Show resolved Hide resolved
@mroeschke mroeschke merged commit 5e4ea2e into pandas-dev:main Jan 19, 2023
@mroeschke
Copy link
Member

Thanks for the great work here @attack68!

So to clarify the follow ups?

  1. Removing any remaining, related warning filtering
  2. Removing the display options that are no longer relevant?

@attack68
Copy link
Contributor Author

Yes and remove the redundant LatexFormatter code

@attack68 attack68 deleted the to_latex_styler_implement branch January 19, 2023 21:42
pooja-subramaniam pushed a commit to pooja-subramaniam/pandas that referenced this pull request Jan 25, 2023
…#47970)

* Base implementation

* Base implementation

* test fix up

* test fix up

* test fix up

* doc change

* doc change

* doc change

* mypy fixes

* ivanov doc comment

* ivanov doc comment

* rhshadrach reduction

* change text  from 1.5.0 to 2.0.0

* remove argument col_space and add whatsnew

* mroeschke requests

* mroeschke requests

* pylint fix

* Whats new text improvements and description added

* Update doc/source/whatsnew/v2.0.0.rst

Co-authored-by: Matthew Roeschke <10647082+mroeschke@users.noreply.github.com>

* Update doc/source/whatsnew/v2.0.0.rst

* remove trailing whitespace

* remove trailing whitespace

* Whats new linting fixes

* mroeschke requests

Co-authored-by: JHM Darbyshire (iMac) <attack68@users.noreply.github.com>
Co-authored-by: Matthew Roeschke <10647082+mroeschke@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Deprecate Functionality to remove in pandas IO LaTeX to_latex Styler conditional formatting using DataFrame.style
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants