Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: support Styler in ExcelFormatter #15530

Closed
wants to merge 53 commits into from

Conversation

jnothman
Copy link
Contributor

@jnothman jnothman commented Feb 28, 2017

Pandas is a nice way to export data from Python to Excel. It even has an internal representation of Excel cell styling. It also has a data-driven framework for styling Pandas dataframes with CSS.

This PR connects those features together in a rudimentary and experimental way, for advanced users only. It allows ExcelFormatter to accept a Styler in place of a DataFrame and to provide a style_converter callback for conversion from CSS styles to the styles understood by ExcelWriter.

While I acknowledge that I am extending a hidden feature (to make it more usable) and that high-coverage conversion of CSS to Excel styles is no small feat so ?all prospective usages will be a hack, I would enjoy using this feature and think others would too.

I wasn't sure how to test this as no tests directly instantiate ExcelFormatter. (I could of course alter the to_excel parameters to provide the new feature, but I thought it might best remain hidden. WDYT?)

A test follows:

import pandas as pd
from pandas.formats.format import ExcelFormatter


def test_style_converter():
    df = pd.DataFrame({'a': list(range(3))})

    header_style = '!unset'
    # default Styler provided
    formatter = ExcelFormatter(df)
    for cell in formatter.get_formatted_cells():
        if cell.row < 1 or cell.col < 1:
            if header_style == '!unset':
                header_style = cell.style
                assert header_style is not None
            else:
                assert cell.style == header_style
        else:
            assert cell.style is None

    styler = df.style.background_gradient(cmap='Greys', low=.5, high=0)
    expected = ['c6c6c6', '686868', '000000']
    formatter = ExcelFormatter(styler)
    for cell in formatter.get_formatted_cells():
        if cell.row < 1 or cell.col < 1:
            assert cell.style is header_style
        else:
            assert cell.style == {'fill': {'fgColor': expected[cell.row - 1],
                                           'patternType': 'solid'}}

    styler.to_excel('/tmp/out.xlsx', engine='openpyxl')
    with open('/tmp/out.html', 'w') as f:
        f.write(styler.render())

@@ -1902,11 +1907,26 @@ def _format_hierarchical_rows(self):
indexcolval, header_style)
gcolidx += 1

for cell in self._generate_body(coloffset=gcolidx):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And if uninterested in merging the whole patch, could you consider merging just this refactoring, so that it's easier to extend ExcelFormatter?

@jreback
Copy link
Contributor

jreback commented Feb 28, 2017

is this intended to address #1663?

@jnothman
Copy link
Contributor Author

is this intended to address #1663?

I suppose so. And it seems I hacked a bit quickly and need to fix some test failures.

@jnothman
Copy link
Contributor Author

I couldn't tell whether #1663 had anything to do with Styler objects.

@jreback
Copy link
Contributor

jreback commented Feb 28, 2017

also have a look at comments #7565

@jnothman
Copy link
Contributor Author

also have a look at comments #7565

I think I've got the gist. Obviously this builds on that support together with Styler

@codecov-io
Copy link

codecov-io commented Feb 28, 2017

Codecov Report

Merging #15530 into master will increase coverage by 0.03%.
The diff coverage is 97.51%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #15530      +/-   ##
==========================================
+ Coverage   90.79%   90.82%   +0.03%     
==========================================
  Files         156      159       +3     
  Lines       50534    50794     +260     
==========================================
+ Hits        45883    46135     +252     
- Misses       4651     4659       +8
Flag Coverage Δ
#multiple 88.6% <97.51%> (+0.04%) ⬆️
#single 40.36% <17.6%> (-0.09%) ⬇️
Impacted Files Coverage Δ
pandas/io/formats/format.py 95.02% <100%> (-0.1%) ⬇️
pandas/io/formats/style.py 96.36% <100%> (+0.07%) ⬆️
pandas/core/frame.py 97.64% <100%> (-0.02%) ⬇️
pandas/io/formats/css.py 100% <100%> (ø)
pandas/io/formats/common.py 94.44% <94.44%> (ø)
pandas/io/formats/excel.py 96.64% <96.64%> (ø)
pandas/plotting/_converter.py 63.54% <0%> (-1.82%) ⬇️
pandas/core/series.py 94.97% <0%> (ø) ⬆️
... and 5 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f114af0...c7a51ca. Read the comment docs.

@jreback jreback added IO Excel read_excel, to_excel Output-Formatting __repr__ of pandas objects, to_string labels Feb 28, 2017
@jreback
Copy link
Contributor

jreback commented Feb 28, 2017

can you show a picture of what this does? e.g. the excel picture.

@chris-b1
Copy link
Contributor

While I like the idea of using Styler to define Excel styles, I'm not sure about this api. I do understand that this would be intended to advanced / experimental.

I would prefer it if a default style_converter was provided, and extensible in some way. Handle simple things like background color, text color, etc by default.

@jnothman
Copy link
Contributor Author

jnothman commented Feb 28, 2017 via email

@jnothman
Copy link
Contributor Author

Sorted. I needed a patternType and to use fgColor rather than bgColor.

@jnothman
Copy link
Contributor Author

I have added writing to the test in the PR description above... with paths in /tmp/ so not actually to be used in a test. Output is as follows:

screen shot 2017-03-01 at 10 12 22 am

@chris-b1, in the PR description I have also drafted a CSSToExcelStyle class with an architecture for a more full-fledged converter. WDYT?

@jnothman
Copy link
Contributor Author

jnothman commented Mar 1, 2017

Note that openpyxl cannot support conversion of the linear-gradient used for bar chart backgrounds without https://bitbucket.org/openpyxl/openpyxl/issues/771 being fixed.

@jnothman
Copy link
Contributor Author

jnothman commented Mar 1, 2017

If we go down @c123w's path, can we have a dependency on tinycss or tinycss2 when this feature is used to avoid replicating a full-fledged CSS parser here?

Also, what would be the public API? df.style.apply(...).to_excel() seems most natural.

@jnothman
Copy link
Contributor Author

jnothman commented Mar 1, 2017

I do have some concern that exporting to HTML and using Excel's import may be appropriate and much simpler in many cases.

@jreback
Copy link
Contributor

jreback commented Mar 1, 2017

@jnothman would not be a problem to have an optional dep like that

@chris-b1
Copy link
Contributor

chris-b1 commented Mar 1, 2017

@jnothman - thanks - a couple thoughts.

Your CSSToExcelStyle mockup is basically what I was thinking - though I would probably make the handlers return a dict of style rather than mutating. My concern was that a relatively complex callback is too hard to think about (at least for me!) - something like that, which can be used as an example / extended is probably friendlier.

And to be clear, I personally would be fine initially supporting a tiny subset of CSS (even just font color and backround). Could take the dependency if needed, although I'm not sure how much it would really add in this case?

@jreback
Copy link
Contributor

jreback commented Mar 1, 2017

df.style.apply(...).to_excel() would be nice.

we de-facto have .to_html() now via _repr_html_

you might want to define this new style sheet as _repr_excel_

@TomAugspurger

@jnothman
Copy link
Contributor Author

jnothman commented Mar 1, 2017

Thanks @jreback for the feedback on API.

@chris-b1:

though I would probably make the handlers return a dict of style rather than mutating.

I considered this. I mutate in order to take advantage of setdefault. This allows setting the patternType to 'solid' only if it was not set to something else. On the other hand, this may be construed as ExcelWriter having a bug in that it doesn't default to patternType='solid'. I would rather not have to modify ExcelWriter every time something like this is needed, so mutating is best, IMO.

And to be clear, I personally would be fine initially supporting a tiny subset of CSS (even just font color and backround). Could take the dependency if needed, although I'm not sure how much it would really add in this case?

I'm not yet certain about the need for a CSS parser, but as is the way of these things, you can implement something that works for the limited cases you've thought of, but then someone will break it, even just by specifying "red" or "hsl(0, 100%, 50%)" or "rgb(255, 0, 0)" or "rgba(255, 0, 0, 100)" instead of "#ff0000", let alone by inserting a comment or unexpected use of whitespace, etc.

I personally would be fine initially supporting a tiny subset of CSS (even just font color and backround)

Font and background colour seem a good place to start, followed by borders. Ideally we'd support the bar charting provided by Styler, but will have to wait on openpyxl for that.

@jnothman
Copy link
Contributor Author

jnothman commented Mar 1, 2017

I've attempted to implement a few handlers. Tricky parsing cases include font which can be composed of identifiers, strings for font name (comma separated for a back-off strategy which won't be handled in Excel except to set font family) and numeric quantities (for size). Functions may also be present, at least in theory, and while we would ignore them, splitting on whitespace becomes insufficent. The many expressions of font would still need handling with tinycss, but at least it would identify the token types and handle string escaping.

Tricky not in terms of parsing but in terms of conversion are elements that require state to be stored that is not naturally represented in the Excel style dicts. One (not very useful) example is that text-decoration-style may indicate single, double etc, but it applies to underline, overline (not supported in Excel) or line-through (double not supported in openpyxl) depending on text-decoration-line. So we can provide limited support, or need to maintain state that is not part of the output.

@jnothman
Copy link
Contributor Author

jnothman commented Mar 1, 2017

I have wasted far too much time on this, but I think the principled way to go about CSS to Excel conversion for a list of CSS declarations:

  1. remove comments, normalize whitespace, ?tokenize and parse function-argument structures
  2. expand CSS shorthands (e.g. font, border-width) to atomic properties
  3. override earlier values for same property with later values
  4. resolve values relative to parent/initial settings (does not need to be fully fledged for our application, but font-size: 1.5em needs to be interpreted)
  5. compute excel styles with reference to calculated CSS properties (rather than by processing each CSS declaration in turn)

A full CSS interpreter would also deal with matching selectors to contexts and ordering declarations by specificity.

xhtml2pdf does some of these things in a different context, but I can't at a glance commend its implementation. I haven't found other Python implementations of this CSS resolution process.

@jreback
Copy link
Contributor

jreback commented Apr 3, 2017

status on this @chris-b1

@chris-b1
Copy link
Contributor

chris-b1 commented Apr 3, 2017

@jnothman - thoughts? I'm not opposed to exposing something small / experimental, like your top comment.

@jnothman
Copy link
Contributor Author

jnothman commented Apr 3, 2017 via email

@jreback jreback added this to the 0.20.0 milestone Apr 19, 2017
@@ -858,6 +847,53 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Export to Excel\n",
"\n",
"*New in version 0.20.0*\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you might want to add an Experimental statement somehwere

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's one right below, isn't there?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's one right below, isn't there?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh i c, ok then.

@jreback
Copy link
Contributor

jreback commented Apr 19, 2017

@jnothman ok if you'd make those small changes. ping on green.

@jnothman
Copy link
Contributor Author

jnothman commented Apr 19, 2017 via email

@jreback jreback closed this in 1b52b12 Apr 20, 2017
@jreback
Copy link
Contributor

jreback commented Apr 20, 2017

thanks, very nice patch!.

watch the dev docs: http://pandas-docs.github.io/pandas-docs-travis/ to see what the built looks like; if needed please issue a followup. (this could take an hour or so).

@TomAugspurger
Copy link
Contributor

Thanks @jnothman!

@jnothman
Copy link
Contributor Author

thanks, very nice patch!.

Oh I hope so! I kept wondering if maybe we should just export HTML and ask Google to convert it to Excel... But I'd much rather this interface, personally!

@jnothman
Copy link
Contributor Author

Thanks a lot for the reviews and approvals.

import os
os.remove('styled.xlsx')

See the :ref:`Style documentation <style>` for more detail.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This ref isn't working. @TomAugspurger, I assume this is related to the nbsphinx changes. There is another ref to style in 0.17.1

analyticalmonk pushed a commit to analyticalmonk/pandas that referenced this pull request Apr 20, 2017
closes pandas-dev#1663

Author: Joel Nothman <joel.nothman@gmail.com>

Closes pandas-dev#15530 from jnothman/excel_style and squashes the following commits:

c7a51ca [Joel Nothman] Test currently fails on openpyxl1 due to version incompatibilities
836f39e [Joel Nothman] Revert changes to xlwt
de53808 [Joel Nothman] Remove debug code
a5d51f9 [Joel Nothman] Merge branch 'master' into excel_style
934df06 [Joel Nothman] Display df, not styled
6465913 [Joel Nothman] More pytest-like test_styler_to_excel; enhancements to xlwt
6168765 [Joel Nothman] Recommended changes to what's new
9669d7d [Joel Nothman] Require jinja in test with df.style
14035c5 [Joel Nothman] Merge branch 'master' into excel_style
3071bac [Joel Nothman] Complete tests
ceb9171 [Joel Nothman] reasons for xfails
e2cfa77 [Joel Nothman] Test Styler.to_excel
d5db0ac [Joel Nothman] Remove obsolete TODO
0256fc6 [Joel Nothman] Return after unhandled font size warning
60d6a3b [Joel Nothman] add doc/source/styled.xlsx to the gitignore
4e72993 [Joel Nothman] Fix what's new heading
d144fdf [Joel Nothman] Font name strings
61fdc69 [Joel Nothman] Complete testing basic CSS -> Excel conversions
6ff8a46 [Joel Nothman] Fix loose character; sorry
6d3ffc6 [Joel Nothman] Lint
79eae41 [Joel Nothman] Documentation tweaks
c4f59c6 [Joel Nothman] Doc tweaks
2c3d015 [Joel Nothman] Fix JSON syntax in IPynb
b1d774b [Joel Nothman] What's new heading
096f26c [Joel Nothman] Merge remote-tracking branch 'upstream/master' into excel_style
433be03 [Joel Nothman] Documentation
9a62699 [Joel Nothman] Fix tests and add TODOs to tests
7c54a69 [Joel Nothman] Fix test failures; avoid hair border which renders strangely
8e9a567 [Joel Nothman] Fixes from integration testing
c1fc232 [Joel Nothman] Remove debugging print statements
a43d6b7 [Joel Nothman] Cleaner imports
a1127f6 [Joel Nothman] Merge branch 'master' into excel_style
306eebe [Joel Nothman] Module-level docstring
350eab5 [Joel Nothman] remove spurious blank line
efce9b6 [Joel Nothman] More CSS to Excel testing; define ExcelFormatter.write
f17a0f4 [Joel Nothman] Some border style tests
1a8818f [Joel Nothman] Lint
9a5b791 [Joel Nothman] Fix testing ImportError
1984cab [Joel Nothman] Fix making get_level_lengths non-private
eb02cc1 [Joel Nothman] Fix testing ImportError
3b26087 [Joel Nothman] Make get_level_lengths non-private
f62f02d [Joel Nothman] File restructure
dc953d4 [Joel Nothman] Font size and border width
7db59c0 [Joel Nothman] Test inherited styles in converter
d103f61 [Joel Nothman] Refactoring and initial tests for CSS to Excel
176e51c [Joel Nothman] Fix NameError
c589c35 [Joel Nothman] Fix some lint errors (yes, the code needs testing)
cb5cf02 [Joel Nothman] Fix bug where inherited not being passed; avoid classmethods
0ce72f9 [Joel Nothman] Use inherited font size for em_pt
8780076 [Joel Nothman] Merge branch 'master' into excel_style
96680f9 [Joel Nothman] Largely complete CSSToExcelConverter and Styler.to_excel()
f1cde08 [Joel Nothman] FIX column offset incorrect in refactor
ada5101 [Joel Nothman] ENH: support Styler in ExcelFormatter
@jnothman
Copy link
Contributor Author

I'm interested in making a separate package of the CSS handling here, in part so that it can be extended without small changes to Pandas, and in part so that it is available to be used elsewhere. Is there a nice way to avoid synchrony issues?

@jnothman
Copy link
Contributor Author

css.py and test_css.py are now living (under different names) at https://github.com/jnothman/cssdecl

@jreback
Copy link
Contributor

jreback commented Apr 26, 2017

what you would do is release a version of this package

then in pandas we would do an import of this package and use its api

this would have to be for 0.21.0 (which is starting soon actually)

@jnothman
Copy link
Contributor Author

jnothman commented Apr 26, 2017 via email

@jreback
Copy link
Contributor

jreback commented Apr 26, 2017

no that's exactly what u so
release 0.1 and we can pin in pandas for now

@jnothman
Copy link
Contributor Author

jnothman commented Apr 26, 2017 via email

@jnothman
Copy link
Contributor Author

jnothman commented Apr 27, 2017

I assume you don't want me to include a hard dependency on pypi's cssdecl, but make it an optional dependency like openpyxl is? If so it seems hard to require a particular version when I'm expecting backwards incompatible changes.

@jreback
Copy link
Contributor

jreback commented Apr 27, 2017

yes

normally u start versions > 0 iow 0.1

@jnothman
Copy link
Contributor Author

Fyi, I'm having trouble with conda build: https://groups.google.com/a/continuum.io/forum/m/#!topic/conda/yKDGIMZZl2o

pcluo pushed a commit to pcluo/pandas that referenced this pull request May 22, 2017
closes pandas-dev#1663

Author: Joel Nothman <joel.nothman@gmail.com>

Closes pandas-dev#15530 from jnothman/excel_style and squashes the following commits:

c7a51ca [Joel Nothman] Test currently fails on openpyxl1 due to version incompatibilities
836f39e [Joel Nothman] Revert changes to xlwt
de53808 [Joel Nothman] Remove debug code
a5d51f9 [Joel Nothman] Merge branch 'master' into excel_style
934df06 [Joel Nothman] Display df, not styled
6465913 [Joel Nothman] More pytest-like test_styler_to_excel; enhancements to xlwt
6168765 [Joel Nothman] Recommended changes to what's new
9669d7d [Joel Nothman] Require jinja in test with df.style
14035c5 [Joel Nothman] Merge branch 'master' into excel_style
3071bac [Joel Nothman] Complete tests
ceb9171 [Joel Nothman] reasons for xfails
e2cfa77 [Joel Nothman] Test Styler.to_excel
d5db0ac [Joel Nothman] Remove obsolete TODO
0256fc6 [Joel Nothman] Return after unhandled font size warning
60d6a3b [Joel Nothman] add doc/source/styled.xlsx to the gitignore
4e72993 [Joel Nothman] Fix what's new heading
d144fdf [Joel Nothman] Font name strings
61fdc69 [Joel Nothman] Complete testing basic CSS -> Excel conversions
6ff8a46 [Joel Nothman] Fix loose character; sorry
6d3ffc6 [Joel Nothman] Lint
79eae41 [Joel Nothman] Documentation tweaks
c4f59c6 [Joel Nothman] Doc tweaks
2c3d015 [Joel Nothman] Fix JSON syntax in IPynb
b1d774b [Joel Nothman] What's new heading
096f26c [Joel Nothman] Merge remote-tracking branch 'upstream/master' into excel_style
433be03 [Joel Nothman] Documentation
9a62699 [Joel Nothman] Fix tests and add TODOs to tests
7c54a69 [Joel Nothman] Fix test failures; avoid hair border which renders strangely
8e9a567 [Joel Nothman] Fixes from integration testing
c1fc232 [Joel Nothman] Remove debugging print statements
a43d6b7 [Joel Nothman] Cleaner imports
a1127f6 [Joel Nothman] Merge branch 'master' into excel_style
306eebe [Joel Nothman] Module-level docstring
350eab5 [Joel Nothman] remove spurious blank line
efce9b6 [Joel Nothman] More CSS to Excel testing; define ExcelFormatter.write
f17a0f4 [Joel Nothman] Some border style tests
1a8818f [Joel Nothman] Lint
9a5b791 [Joel Nothman] Fix testing ImportError
1984cab [Joel Nothman] Fix making get_level_lengths non-private
eb02cc1 [Joel Nothman] Fix testing ImportError
3b26087 [Joel Nothman] Make get_level_lengths non-private
f62f02d [Joel Nothman] File restructure
dc953d4 [Joel Nothman] Font size and border width
7db59c0 [Joel Nothman] Test inherited styles in converter
d103f61 [Joel Nothman] Refactoring and initial tests for CSS to Excel
176e51c [Joel Nothman] Fix NameError
c589c35 [Joel Nothman] Fix some lint errors (yes, the code needs testing)
cb5cf02 [Joel Nothman] Fix bug where inherited not being passed; avoid classmethods
0ce72f9 [Joel Nothman] Use inherited font size for em_pt
8780076 [Joel Nothman] Merge branch 'master' into excel_style
96680f9 [Joel Nothman] Largely complete CSSToExcelConverter and Styler.to_excel()
f1cde08 [Joel Nothman] FIX column offset incorrect in refactor
ada5101 [Joel Nothman] ENH: support Styler in ExcelFormatter
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO Excel read_excel, to_excel Output-Formatting __repr__ of pandas objects, to_string
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Styling in DataFrame.to_excel
5 participants