Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: XlsxWriter ignoring formats on numpy types if merged cells #27006

Merged
merged 11 commits into from Jun 28, 2019

Conversation

@riptusk331
Copy link
Contributor

commented Jun 23, 2019

The XlsxWriter write_cells method takes a cells parameter that is itself derived from a generator of the various cells in the DataFrame. It calls the _value_with_format() to convert any numpy types to Python types for the Excel writer.

In the section of code that deals with writing merged cells, the original passed cell.val parameter was being passed into the writer, rather than the val returned from the _value_with_format() function. This
caused incompatible numpy or Pandas types to get passed into the writer when the condition of being in a 'merged' (grouped) DataFrame cell was met.

In my case, I had a Pandas Period object in the MultiIndex of a grouped DataFrame . All that needed to be done to fix this was simply removing the cell. from cell.val that was being passed into wks.merge_range() function, and replacing it with the correct val, which is 'type safe' for the Excel writer.

  • closes #26999
  • tests added / passed
  • whatsnew entry
BUG: XlsxWriter ignoring formats on numpy types if merged cells
The write_cells method takes a 'cells' parameter that is itself derived
from a generator of the various cells in the dataframe. It calls
_value_with_format to convert any numpy types to Python types for the
Excel writer. In the section of code that deals with writing merged
cells, the original 'cell.val' parameter was being passed into the
writer, rather than the 'val' returned from the format function. This
caused incompatible numpy or Pandas formats to get passed into the
writer when they were in a 'merged' (grouped) DataFrame cell. In my
case I had a Period object in the DataFrame index. All that needed to
be done was simply removing 'cell.'
@simonjayhawkins

This comment has been minimized.

Copy link
Member

commented Jun 23, 2019

@riptusk331 Thanks for the PR.

can you add tests and a whatsnew.

@WillAyd

This comment has been minimized.

Copy link
Member

commented Jun 23, 2019

Nice catch! Yea as mentioned I think just need a test and a whatsnew for v0.25 and can get this in

@WillAyd WillAyd added this to the 0.25.0 milestone Jun 23, 2019

@riptusk331

This comment has been minimized.

Copy link
Contributor Author

commented Jun 23, 2019

Never written a proper test before, so giving the docs a read and will try to push something this week.

@jreback

This comment has been minimized.

Copy link
Contributor

commented Jun 27, 2019

can you add a test here

@riptusk331

This comment has been minimized.

Copy link
Contributor Author

commented Jun 27, 2019

yes trying to get to this tomorrow

@pep8speaks

This comment has been minimized.

Copy link

commented Jun 27, 2019

Hello @riptusk331! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2019-06-28 00:43:56 UTC
@codecov

This comment has been minimized.

Copy link

commented Jun 27, 2019

Codecov Report

Merging #27006 into master will decrease coverage by <.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #27006      +/-   ##
==========================================
- Coverage   91.99%   91.98%   -0.01%     
==========================================
  Files         180      180              
  Lines       50774    50774              
==========================================
- Hits        46712    46707       -5     
- Misses       4062     4067       +5
Flag Coverage Δ
#multiple 90.63% <ø> (ø) ⬆️
#single 41.83% <ø> (-0.11%) ⬇️
Impacted Files Coverage Δ
pandas/io/excel/_xlsxwriter.py 94.36% <ø> (ø) ⬆️
pandas/io/gbq.py 88.88% <0%> (-11.12%) ⬇️
pandas/core/frame.py 96.89% <0%> (-0.12%) ⬇️
pandas/util/testing.py 90.84% <0%> (-0.11%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b4d4ec5...4d5eab1. Read the comment docs.

@codecov

This comment has been minimized.

Copy link

commented Jun 27, 2019

Codecov Report

❗️ No coverage uploaded for pull request base (master@08a599b). Click here to learn what that means.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff            @@
##             master   #27006   +/-   ##
=========================================
  Coverage          ?   92.04%           
=========================================
  Files             ?      180           
  Lines             ?    50714           
  Branches          ?        0           
=========================================
  Hits              ?    46679           
  Misses            ?     4035           
  Partials          ?        0
Flag Coverage Δ
#multiple 90.68% <ø> (?)
#single 41.87% <ø> (?)
Impacted Files Coverage Δ
pandas/io/excel/_xlsxwriter.py 94.36% <ø> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 08a599b...b16397d. Read the comment docs.

riptusk331 added some commits Jun 27, 2019

@WillAyd
Copy link
Member

left a comment

Can you also add a whatsnew note for v0.25?

Show resolved Hide resolved pandas/tests/io/excel/test_xlsxwriter.py Outdated
Show resolved Hide resolved pandas/tests/io/excel/test_xlsxwriter.py Outdated
Show resolved Hide resolved pandas/tests/io/excel/test_xlsxwriter.py Outdated
Show resolved Hide resolved pandas/tests/io/excel/test_xlsxwriter.py Outdated
@riptusk331

This comment has been minimized.

Copy link
Contributor Author

commented Jun 27, 2019

whatsnew added

Show resolved Hide resolved pandas/tests/io/excel/test_writers.py
Show resolved Hide resolved pandas/tests/io/excel/test_xlsxwriter.py Outdated
expected.to_excel(self.path)
result = pd.read_excel(self.path, header=[0, 1], index_col=0)
# need to convert PeriodIndexes to standard Indexes for assert equal
expected.columns.set_levels([[str(i) for i in mi.levels[0]],

This comment has been minimized.

Copy link
@WillAyd

WillAyd Jun 27, 2019

Member

Can just simplify this as expected.columns = expected.columns.astype(object) - Period won't be lossless to / from Excel so makes sense

This comment has been minimized.

Copy link
@WillAyd

WillAyd Jun 27, 2019

Member

Simplify this as stated

This comment has been minimized.

Copy link
@riptusk331

riptusk331 Jun 28, 2019

Author Contributor

This simplication doesn't seem to work. Comparison fails - the Index read back in isn't preserved as a Period.

E  MultiIndex level [0] classes are not equivalent
E  [left]:  PeriodIndex(['2018', '2018'], dtype='period[A-DEC]', freq='A-DEC')
E  [right]: Index(['2018', '2018'], dtype='object')
Show resolved Hide resolved pandas/tests/io/excel/test_writers.py Outdated
Show resolved Hide resolved pandas/tests/io/excel/test_writers.py Outdated
@jreback

This comment has been minimized.

Copy link
Contributor

commented Jun 27, 2019

@riptusk331 looks like a linting error, can you fixup and repush.

@WillAyd
Copy link
Member

left a comment

Just minor comments to improve coverage and readability but lgtm once resolved

Show resolved Hide resolved pandas/tests/io/excel/test_writers.py Outdated
Show resolved Hide resolved pandas/tests/io/excel/test_writers.py Outdated
mi = MultiIndex.from_tuples([(pd.Period('2018'), pd.Period('2018Q1')),
(pd.Period('2018'), pd.Period('2018Q2'))])
expected = DataFrame(np.random.rand(2, len(mi)), columns=mi)
expected.to_excel(self.path)

This comment has been minimized.

Copy link
@WillAyd

WillAyd Jun 27, 2019

Member

Once you add the merge_cells fixture do merge_cells=merge_cells as part of the call here

This comment has been minimized.

Copy link
@riptusk331

riptusk331 Jun 28, 2019

Author Contributor

So if I do this, won't this always fail the assert comparison? When this is False, it creates a single cell that contains both levels in the excel file (for instance '2018.2018Q2'). When I read this back in, it's read in as a single column Index and fails comparing against the original MultiIndex.

expected.to_excel(self.path)
result = pd.read_excel(self.path, header=[0, 1], index_col=0)
# need to convert PeriodIndexes to standard Indexes for assert equal
expected.columns.set_levels([[str(i) for i in mi.levels[0]],

This comment has been minimized.

Copy link
@WillAyd

WillAyd Jun 27, 2019

Member

Simplify this as stated

@riptusk331

This comment has been minimized.

Copy link
Contributor Author

commented Jun 28, 2019

@jreback fixed linting and repushed.

@WillAyd This passes all pytest tests as I pushed it, but I was not able to implement all your suggestions without failing tests for the reasons mentioned in the comments above. Namely using merge_cells in the call and the column type setting simplification. Thinking of how to best resolve, but wanted to get most current working iteration pushed.

@WillAyd
Copy link
Member

left a comment

lgtm @jreback

@jreback jreback merged commit c3133db into pandas-dev:master Jun 28, 2019

14 checks passed

codecov/patch Coverage not affected.
Details
codecov/project 91.83% (target 82%)
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
pandas-dev.pandas Build #20190628.2 succeeded
Details
pandas-dev.pandas (Checks) Checks succeeded
Details
pandas-dev.pandas (Docs) Docs succeeded
Details
pandas-dev.pandas (Linux py35_compat) Linux py35_compat succeeded
Details
pandas-dev.pandas (Linux py36_locale_slow) Linux py36_locale_slow succeeded
Details
pandas-dev.pandas (Linux py36_locale_slow_old_np) Linux py36_locale_slow_old_np succeeded
Details
pandas-dev.pandas (Linux py37_locale) Linux py37_locale succeeded
Details
pandas-dev.pandas (Linux py37_np_dev) Linux py37_np_dev succeeded
Details
pandas-dev.pandas (Windows py36_np15) Windows py36_np15 succeeded
Details
pandas-dev.pandas (Windows py37_np141) Windows py37_np141 succeeded
Details
pandas-dev.pandas (macOS py35_macos) macOS py35_macos succeeded
Details
@jreback

This comment has been minimized.

Copy link
Contributor

commented Jun 28, 2019

thanks @riptusk331

@riptusk331

This comment has been minimized.

Copy link
Contributor Author

commented Jun 28, 2019

@jreback glad to contribute and look forward to helping out more. will delete this branch shortly.

@WillAyd thanks for guiding me! when you have time, would love to get input on how my test code could be further improved over what i pushed, as i was unable to figure out how to properly implement 2 of your suggestions (or was possibly not fully understanding what you were looking for / what's proper in a unit test). cheers.

@riptusk331 riptusk331 deleted the riptusk331:period-excel-fix branch Jun 28, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.