-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cliffhanger sentence in README #5
Comments
Thanks Tim. Have corrected, but a more detailed explainer below that I probably need to document somewhere else in due course. Long-story-short: XML is generally awful, ODS is specifically awful XML.
The main efficiency comes from (i) as it's extracting only a small number of specific named attributes from a cell which can be used directly in the construction of the output tibble. Whereas in the "full" process the code extracts all attributes associated with a cell, which is returned as a list and there is a processing overhead both from that extraction by On (ii), I think my refactored approach to text processing is relatively quick, but I've not tried it with sheets with a lot of complex text yet. There are three things at play here: (a) Microsoft Office and LibreOffice save repeated whitespace in different ways (I've not checked Google), (b) cell content with explicit new line breaks, and (c) comments/annotations.
<!--MS Excel: multi-space and multi-paragraph text-->
<table:table-cell office:value-type="string" table:style-name="ce2">
<text:p>Cells with <text:s/>repeated <text:s text:c="2"/>spaces</text:p>
<text:p>And multiple <text:s text:c="3"/>lines</text:p>
</table:table-cell>
<!--LibreOffice: multi-space and multi-paragraph text-->
<table:table-cell office:value-type="string" calcext:value-type="string">
<text:p>Cells with repeated spaces</text:p>
<text:p>And multiple lines</text:p>
</table:table-cell>
<!--MS Excel: cell with comment-->
<table:table-cell office:value-type="string" table:style-name="ce2">
<office:annotation draw:style-name="a14" svg:x="2.13541666666667in"
svg:y="2.15625in" svg:width="1.08333333333333in" svg:height="0.75in">
<dc:creator>Microsoft Office User</dc:creator>
<text:p>
<text:span text:style-name="T1">Test comment</text:span>
</text:p>
</office:annotation>
<text:p>Cell with new Excel comment</text:p>
</table:table-cell>
<!--LibreOffice: cell with comment-->
<table:table-cell office:value-type="string" calcext:value-type="string">
<office:annotation draw:style-name="gr1" draw:text-style-name="P2"
svg:width="2.899cm" svg:height="1.799cm" svg:x="6.62cm" svg:y="0.451cm"
draw:caption-point-x="-0.61cm" draw:caption-point-y="0.462cm">
<dc:date>2022-06-16T00:00:00</dc:date>
<text:p text:style-name="P1">
<text:span text:style-name="T1">Test comment</text:span>
</text:p>
</office:annotation>
<text:p>Cell with comment</text:p>
</table:table-cell>
Applying
Firstly, Microsoft have implemented the Secondly, Finally, the text contained within the annotations are also captured as they are children of the cell, and are subject to the same peculiarities above. This could be improved by extracting just the immediate The "full" process takes a more complex approach to text processing. Having identified the immediate
In essence if you know that your sheet(s) meet the following criteria then
|
Adding a |
There's a bit of a cliffhanger in the README when explaining what
quick
argument doestidyods/README.Rmd
Line 105 in 589704d
The text was updated successfully, but these errors were encountered: