Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Allow to assign/create custom body_style and header_style property to an instance of ExcelFormatter used in method df.to_excel() #53973

Closed
wants to merge 14 commits into from

Conversation

rmhowe425
Copy link
Contributor

@rmhowe425 rmhowe425 commented Jul 2, 2023

@rmhowe425 rmhowe425 marked this pull request as draft July 2, 2023 19:05
@rmhowe425
Copy link
Contributor Author

@rhshadrach I'm reading through the linked GitHub issue, as well as @attack68's comments. Do we want to use setter methods to declare values for body_style and header_style?

Or are we letting Styler be used to add and define all the custom style?

@attack68
Copy link
Contributor

attack68 commented Jul 3, 2023

My preferred option as stated in the issues was to remove the formatting added by default when doing df.to_excel(), and make the user use Styler to control all of the output formatting.

However, given that might annoy a number of users I would suggest a compromise.

  1. Adding properties and setters as per one of the issues.
  2. Default these default properties to no styling later in a major release and document how a user can set these properties to retrieve the original behaviour (bolding /bordering the header / index cells).

@@ -577,19 +577,32 @@ def __init__(
self.header = header
self.merge_cells = merge_cells
self.inf_rep = inf_rep
self._header_styledict[str, Any] = None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be self._header_style: dict[str, Any] | None = None

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the difference between defaulting to None and {} here?

@header_style.setter
def header_style(self, val):
if not isinstance(val, dict):
return None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

setters shouldn't return a value

@mroeschke mroeschke added Styler conditional formatting using DataFrame.style IO Excel read_excel, to_excel labels Jul 6, 2023
@rmhowe425 rmhowe425 marked this pull request as ready for review July 7, 2023 01:15
Copy link
Member

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is overwriting a property on a class, replacing it with (presumably) a Python dictionary. More generally, it is relying on modifying global state but not by any standard approach (e.g. options). I think it is also requiring users to access a class that was previously internal, further confusing the public/private nature of pandas.io that I think we should be trying to clear up.

For these reasons, I'm -1 on this approach in its current state. Is using an option for this appropriate?


# Set Excel worksheet header and body style
>>> df = pd.DataFrame([{"A": 1, "B": 2, "C": 3}, {"A": 1, "B": 2, "C": 3}])
>>> pd.io.formats.excel.ExcelFormatter.header_style = {"font": {"bold": True},}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is doing what you might expect. In particular, this is not using getters/setters, but rather just overwriting a property.

class Foo:
    @property
    def foo(self):
        raise ValueError
        
    @foo.setter
    def foo(self, other):
        raise ValueError
        
print(Foo.foo)
# <property object at 0x7f5438093290>
Foo.foo = 5
print(Foo.foo)
# 5

Copy link
Contributor Author

@rmhowe425 rmhowe425 Jul 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rhshadrach @attack68

What do we think would be the appropriate implementation for this request then? It seems like the initial suggestion is not ideal.

What do we think about setting the default style to None for both header_style and body_style and have users rely on Styler to format output of to_excel as mentioned here?

@rhshadrach
Copy link
Member

rhshadrach commented Jul 12, 2023

@attack68 (from #53973 (comment))

However, given that might annoy a number of users I would suggest a compromise.

Is it only the removing of the default style that you feel might annoy users? Or being forced into the Styler in order to add style? Or something else?

My preferred option as stated in the issues was to remove the formatting added by default when doing df.to_excel(), and make the user use Styler to control all of the output formatting.

Pending a response to the above, I'm +1 on this approach. I think modifying global state should be avoided where ever possible, and this feature seems perfect for the Styler to me.

@attack68
Copy link
Contributor

Is it only the removing of the default style that you feel might annoy users? Or being forced into the Styler in order to add style? Or something else?

to_excel has been around for a while bolding the headers, perhaps users expect that by now? If we could deprecate that and use a keyword instead (like we do in to_latex and to_html) then I would prefer that, e.g. bold_rows=False/True. The method could re-route via the Styler implementation to do this (as it does for to_latex)

Since the Styler uses the underlying implementation of DataFrame.to_excel, it is a shame that an unstyled Styler ends up with bold headers by default because of the status quo.

Pending a response to the above, I'm +1 on this approach. I think modifying global state should be avoided where ever possible, and this feature seems perfect for the Styler to me.

It does seem unnecessary to add this to the state variables of the excel writer when Styler can already do this by default and with greater flexibility. Perhaps documenting users to use the Styler is the better approach that making the code more complicated.

@rmhowe425
Copy link
Contributor Author

@attack68 just to recap, add a parameter to to_excel that takes a boolean value representing whether the headers should be bolded. As you said, the value would propagate through the code so that the Styler implementation is responsible for formatting the output.

For the formatting of the spreadsheet body, that would be controlled via Styler as well.

@rhshadrach Are you good with this?

@attack68
Copy link
Contributor

@rmhowe425 I just reviewed the current arguments for DataFrame.to_excel. It is actually very complicated to add a bold_rows argument that uses the Styler implementation Not because this is complicated itself, but because you have to ensure all the other argument combinations that might used also result in the same excel file. So let me save you the bother now - don't attempt it.

@rhshadrach - If we remove styling from to_excel and do not bold or create borders for any cells do we need a deprecation cycle for that, even though it isn't actually an option it is just created by default? As part of the deprecation we could write notes in the docs how to replicate similar features with the Styler.to_excel example?

@rhshadrach
Copy link
Member

rhshadrach commented Jul 15, 2023

If we remove styling from to_excel and do not bold or create borders for any cells do we need a deprecation cycle for that, even though it isn't actually an option it is just created by default?

I'm not sure I would even consider this a "breaking" change; I don't believe removing bolding from the cells could ever break any workflow. Even if it is considered breaking, I would lean toward adding this to the list of breaking changes in 3.0 in #44823.

@rmhowe425
Copy link
Contributor Author

@rhshadrach I'll plan on updating my implementation so that spreadsheets have no formatting by default, and I'll update documentation so that we're telling users to use Styler for formatting.

This sounds like the changes that we agree on?

@rhshadrach
Copy link
Member

This sounds like the changes that we agree on?

This is what I've suggested, but I think we need to make sure there is a consensus among those involved before saying "we agree".

@attack68
Copy link
Contributor

I have created an issue to highlight this to move forward: #54154 54154

@rmhowe425
Copy link
Contributor Author

Closing PR as work is continuing on #54154

@rmhowe425 rmhowe425 closed this Jul 29, 2023
@rmhowe425 rmhowe425 deleted the dev/excel-formatter-style branch July 30, 2023 00:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO Excel read_excel, to_excel Styler conditional formatting using DataFrame.style
Projects
None yet
4 participants