Problem with xlsx with merged cells and multiline header #320

mcarans · 2020-05-13T07:10:34Z

Overview

When trying to read the file below, tabulator reads the headers on rows 10-12 incorrectly:
http://mapipcissprd.us-east-1.elasticbeanstalk.com/api/public/population-tracking-tool/data/2017,2020/?page=1&limit=1&condition=A&export=true&country=AF

I used options headers=[10,12], fill_merged_cells=True:
stream = Stream('http://mapipcissprd.us-east-1.elasticbeanstalk.com/api/public/population-tracking-tool/data/2017,2020/?page=1&limit=1&condition=A&export=true&country=AF', headers=[10,12], fill_merged_cells=True, format='xlsx')
I also tried fill_merged_cells=False.

An example of what's missing can be seen in column K of the spreadsheet. The header should be for that one "Current Phase 1 #" but instead is just "Phase 1 #".

Pandas is able to read it using:
df = pandas.read_excel(url, header=[9, 10, 11])

Please preserve this line to notify @roll (lead of this repository)

The text was updated successfully, but these errors were encountered:

roll · 2020-05-19T07:08:26Z

Hi @mcarans,

There is some magic behind this issue because Openpyxl 3.0.3 gives this list of merged ranges:

A10:E11
W10:AL10
G11:H11
M11:N11
Q11:R11
U11:V11
AA11:AB11
AE11:AF11
AI11:AJ11
AM11:AN11
AS11:AT11
AW11:AX11
BA11:BB11

while "Current" and "Second projection" are definitely merged but at the same time definitely no on this list although "First projection" is here "W10:AL10".

I'm closing it, for now, could you please:

try to re-create this table in case it's somehow broken
or create a bug report for Openpyxl

?

mcarans · 2020-05-19T07:58:55Z

@roll I got a different result from openpyxl. I saved the link I gave in the OP as test.xlsx then:

wb = load_workbook('/home/mcarans/Downloads/test.xlsx')
sheet_ranges = wb['IPC']
print(sheet_ranges.merged_cells.ranges)```

[<CellRange A10:E11>, <CellRange G10:V10>, <CellRange W10:AL10>, <CellRange AM10:BB10>, <CellRange G11:H11>, <CellRange K11:L11>, <CellRange M11:N11>, <CellRange O11:P11>, <CellRange Q11:R11>, <CellRange S11:T11>, <CellRange U11:V11>, <CellRange W11:X11>, <CellRange AA11:AB11>, <CellRange AC11:AD11>, <CellRange AE11:AF11>, <CellRange AG11:AH11>, <CellRange AI11:AJ11>, <CellRange AK11:AL11>, <CellRange AM11:AN11>, <CellRange AQ11:AR11>, <CellRange AS11:AT11>, <CellRange AU11:AV11>, <CellRange AW11:AX11>, <CellRange AY11:AZ11>, <CellRange BA11:BB11>, <CellRange A13:D13>]

Please can you check.

roll · 2020-05-19T08:02:06Z

Is it opynpyxl 3.0.3? вт, 19 мая 2020 г., 10:59 Mike <notifications@github.com>:

…

@roll <https://github.com/roll> I got a different result from openpyxl. I saved the link I gave in the OP as test.xlsx then: wb = load_workbook('/home/mcarans/Downloads/test.xlsx') sheet_ranges = wb['IPC'] print(sheet_ranges.merged_cells.ranges)``` [<CellRange A10:E11>, <CellRange G10:V10>, <CellRange W10:AL10>, <CellRange AM10:BB10>, <CellRange G11:H11>, <CellRange K11:L11>, <CellRange M11:N11>, <CellRange O11:P11>, <CellRange Q11:R11>, <CellRange S11:T11>, <CellRange U11:V11>, <CellRange W11:X11>, <CellRange AA11:AB11>, <CellRange AC11:AD11>, <CellRange AE11:AF11>, <CellRange AG11:AH11>, <CellRange AI11:AJ11>, <CellRange AK11:AL11>, <CellRange AM11:AN11>, <CellRange AQ11:AR11>, <CellRange AS11:AT11>, <CellRange AU11:AV11>, <CellRange AW11:AX11>, <CellRange AY11:AZ11>, <CellRange BA11:BB11>, <CellRange A13:D13>] Please can you check. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#320 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAEICU4MNBSSNUBMTN2ULCDRSI345ANCNFSM4M7PKX3A> .

mcarans · 2020-05-19T09:07:10Z

@roll Yes I can confirm it is 3.0.3.

roll · 2020-05-19T09:34:12Z

@mcarans
And with tabulator@1.46.0 the bug is still here?

mcarans · 2020-05-19T11:14:21Z

@roll
The bug seems to still exist with 1.46 unfortunately. This still seems to be a problem:
An example of what's missing can be seen in column K of the spreadsheet. The header should be for that one "Current Phase 1 #" but instead is just "Phase 1 #".

roll · 2020-05-19T11:36:44Z

@mcarans
I hope I have found the problem. It seems self.__sheet.merged_cells.ranges was changed in-place during merging so some ranges were skipped

Could you please try tabulator@1.46.1

mcarans · 2020-05-19T12:40:00Z

Super, that looks good, thx @roll !

roll added the bug label May 18, 2020

roll mentioned this issue May 19, 2020

Fix the way we merge multi-line headers #323

Merged

roll closed this as completed in #323 May 19, 2020

mcarans mentioned this issue Apr 13, 2022

Incorrect reading of 3 line header that includes merged cells frictionlessdata/frictionless-py#1024

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problem with xlsx with merged cells and multiline header #320

Problem with xlsx with merged cells and multiline header #320

mcarans commented May 13, 2020 •

edited

roll commented May 19, 2020 •

edited

mcarans commented May 19, 2020

roll commented May 19, 2020 via email

mcarans commented May 19, 2020

roll commented May 19, 2020

mcarans commented May 19, 2020

roll commented May 19, 2020

mcarans commented May 19, 2020

Problem with xlsx with merged cells and multiline header #320

Problem with xlsx with merged cells and multiline header #320

Comments

mcarans commented May 13, 2020 • edited

Overview

roll commented May 19, 2020 • edited

mcarans commented May 19, 2020

roll commented May 19, 2020 via email

mcarans commented May 19, 2020

roll commented May 19, 2020

mcarans commented May 19, 2020

roll commented May 19, 2020

mcarans commented May 19, 2020

mcarans commented May 13, 2020 •

edited

roll commented May 19, 2020 •

edited