Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support missing fonts and <dimension> tags #10

Closed
wants to merge 1 commit into from

Conversation

SGrondin
Copy link

@SGrondin SGrondin commented Apr 5, 2018

Hi, thank you for making and maintaining saxlsx, it's a wonderful library.

I've recently encountered an XLSX file that Excel, LibreOffice, Numbers and the Creek library were able to handle correctly, but not saxlsx. The file itself is incorrect, and it would be great if saxlsx was able to tolerate these small errors.

There were 2 issues, this PR fixes both and includes a test XLSX that contains both errors.

Missing fonts

In xl/styles.xml:

<cellXfs count="3">
  <xf applyBorder="1" applyFill="1" applyFont="1" borderId="0" fillId="0" fontId="1" numFmtId="0"/>
  <xf applyAlignment="1" applyBorder="1" applyFill="1" applyFont="1" applyNumberFormat="1" borderId="0" fillId="0" fontId="2">
    <alignment horizontal="left" readingOrder="1" textRotation="0" vertical="top" wrapText="1"/>
  </xf>
  <xf applyAlignment="1" applyBorder="1" applyFill="1" applyFont="1" applyNumberFormat="1" borderId="0" fillId="0" fontId="2" numFmtId="82">
    <alignment horizontal="left" readingOrder="1" textRotation="0" vertical="top" wrapText="1"/>
  </xf>
</cellXfs>

Notice how one of the <xf> tags is missing numFmtId. Probably related to its fonts:

<fonts count="3">
  <font>
    <sz val="11"/>
    <color rgb="FF000000"/>
    <name val="Calibri"/>
    <family val="2"/>
    <scheme val="minor"/>
  </font>
  <font/>
  <font>
    <b val="0"/>
    <i val="0"/>
    <strike val="0"/>
    <u val="none"/>
    <sz val="8"/>
    <color rgb="FF000000"/>
    <name val="Arial"/>
  </font>
</fonts>

This leads to @num_fmt_id being nil on line 36 in lib/saxlsx/style_collection_parser.rb. My patch simply calls to_i on @num_fmt_id which turns any nil into 0.

Missing <dimension> tag

I think using the <dimension> tag is a brilliant way to very quickly retrieve the number of rows, however some spreadsheets do not include this tag.

My patch adds a fallback, if the quick lookup failed it iterates through the rows.

I hope this PR meets your quality requirements and gets merged/released.

Thank you

@ebeigarts
Copy link
Member

Hi, thanks, I have opened #11, that includes a much faster #count that looks for <dimension> or counts <row>. If <dimension> is missing, then it counts <row> without extra formatting/typecasting etc, this way it is much faster than each { @count += 1 }. Can you check #11?

@SGrondin
Copy link
Author

SGrondin commented Apr 6, 2018

I reviewed and approved #11

@SGrondin SGrondin closed this Apr 6, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants