Warning: This project is an experiment; do not use for anything other than amusement/frustration purposes.
This package implements a very slow, but thorough, Excel (xlsx) reader. If you have a rectangular region of cells to read you will be better off with another Excel reading package such as:
RODBC, or possibly even
tablaxlsx (not clear how current these last 4 are). Mango Solutions has a nice review article, R: the Excel Connection, in which they compare several of the above packages, with a special emphasis on those that can both read and write Excel files (XLConnect, xlsx, openxlsx, excel.link).
Compared with the above packages,
rexcel tries to read all the data from an Excel sheet using
linen as an intermediate representation in R. The eventual goal is to provide a common receptacle for detailed spreadsheet information from both Excel and Google Sheets. Rather than trying to create a single data.frame in one shot, it allows access to data, formulae and formatting information. Excel type information is preserved, especially for heterogeneous columns. It has no non-R dependencies (e.g. on Perl or Java) and should run on any platform regardless of whether Excel is installed.
Requires the development version of
xml2, as the newer version changes the behaviour of matching functions in fairly large ways. For terminal printing we use the most recent copy of
crayon which includes more accurate rendering. And we use
linen as the R spreadsheet representation.
devtools::install_github("hadley/xml2") devtools::install_github("gaborcsardi/crayon") devtools::install_github("rsheets/linen") devtools::install_github("rsheets/rexcel")
- Cell fill colour
- Cell patterns
- Cell gradients
- Text colour
- Text bold, italic, underline, strikethrough, outline, shadow, condense, extend
- Text font
- Text size
- Text alignment (horizontal, vertical)
- Column/row visibility
- Column/row width/height
- Styles applied at the column level (though the spec seems vague about if that's a real thing - compare p. 1600 and 1596)
- Conditional formatting (the rule and the outcome)
- Borders (position and colour)
- Numeric/date formatting, possibly also formatted text?
- Table styles (e.g. for pivot tables)
Of particular concern:
A cell can have both direct formatting (e.g., bold) and a cell style (e.g., Explanatory) applied to it. Therefore, both the cell style xf records and cell xf records shall be read to understand the full set of formatting applied to a cell.
When the color palette is modified, the indexedColors collection is written. When a custom color has been selected, the mruColors collection is written.
So we should find a sheet that includes this and see what this looks like empirically.
Other worthwhile things to get
- named ranges
- comments (author, ref, rich text, visibility)
- other drawings
- pivot tables
- frozen rows / split panes
- calculation chain
The Excel XML Spec
Page and section numbers refer to the "ECMA Office Open XML Part 1 - Fundamentals and Markup Language Reference" document; a massive and fairly unweidly 5026 PDF! The spreadsheet material is mostly in section 18 (pages 1518 - 2508).
Writing Excel files
Writing workbooks is not currently supported. Before implementing it, we want to see how much information we can preserve while reading. Once we know that, we can start working towards seeing how much can be written and how lossy a read/write cycle will be (it is likely to be very lossy as there is an enormous number of things that might be stored in a worksheet.