Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ODS files from Google Sheets #7

Open
2 tasks done
mattkerlogue opened this issue May 6, 2023 · 1 comment
Open
2 tasks done

ODS files from Google Sheets #7

mattkerlogue opened this issue May 6, 2023 · 1 comment

Comments

@mattkerlogue
Copy link
Owner

Issue for tracking development issues with Google Sheets

Basic compatibility checked:

  • Generate version of basic example file in Google Sheets
  • Check basic compatibility with {tidyods} functions

tidyods::read_ods_cells() can read an ODS file generated from Google Sheets, but naturally it has its own weirdnesses.

Similarities with LibreOffice

XML namespace

The Google generated ODS has the same XML namespace set-up as LibreOffice (same 35 namespaces, Excel has just 15).

Dates

Dates that are pure dates (15/06/2022) not date-times (15/06/2022 13:24:56) are stored as just date components (Excel stores these as a date-time at midnight).

Similarities with Microsoft Excel

Whitespace replication

Google ODS follows the white space replication approach used by Microsoft Excel (text:s tags with a text:c attribute).

Trailing blank cells

Like Excel a Google ODS has replicated blank cells at the end of a row (i.e. with column repetition) and at the end of the sheet (i.e. with row repetition). However, end of sheet replication is different in that there are three blank rows instead of just one, the penultimate row is replicated (over 1 million times in the example file) but is preceded and followed by a blank row that is not repeated. Column repetition is just over 1,000 columns in the example file, but unlike rows this is the final cell in the row.

Google ODS's unique weirdnesses

Comments with author

When saving a comment thread the format is different from Excel, the author's name is included as a text component.

cell_tbl$cell_annotation[c(156, 166)]
#> [1] "Test comment\n-Matt Kerlogue"                                    
#> [2] "Test comment 1\n-Matt Kerlogue\nReply to comment\n-Matt Kerlogue"

In Excel the author information for a comment thread is not retained.

cell_tbl$cell_annotation[c(154, 164)]
#> [1] "Comment:\n    Test comment"                                
#> [2] "Comment:\n    Test comment 1\nReply:\n    Reply to comment"

LibreOffice does not support comment threads, only singular "note" annotations.

Booleans

!! THIS IS VERY DIFFERENT !!

Boolean (TRUE/FALSE) values are not stored as booleans! They are stored with a numeric value (0 for FALSE, 1 for TRUE, in the office:value attribute) and the cell's general text (what ultimately becomes cell_content).

As a result the office:value-type of a Google boolean is float not boolean!

The boolean aspect of the value is generated via the use of formula (table:formula attribute) of either of:=FALSE() or of:=TRUE().

Probably need consideration for how to deal with in the smart rectifier.

mattkerlogue added a commit that referenced this issue May 6, 2023
mattkerlogue added a commit that referenced this issue May 6, 2023
@mattkerlogue
Copy link
Owner Author

79aee31 adds code to generate the relevant TRUE/FALSE value in the boolean_value column of the output dataset when using full extraction, however does not change any other parts of the extracted info (i.e. still has 0/1 in cell_content and numeric_value columns, a value_type of float, and entries in the in the has_formula and formula columns.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant