Skip to content
Permalink
Branch: master
Find file Copy path
Find file Copy path
4 contributors

Users who have contributed to this file

@richfitz @jjchern @jennybc @j450h1
76 lines (54 sloc) 4.46 KB

rexcel

Build Status

Warning: This project is an experiment; do not use for anything other than amusement/frustration purposes.

Design

This package implements a very slow, but thorough, Excel (xlsx) reader. If you have a rectangular region of cells to read you will be better off with another Excel reading package such as: readxl, openxlsx, XLConnect, xlsx, gdata, RODBC, or possibly even excel.link, WriteXLS, table1xlsx, tablaxlsx (not clear how current these last 4 are). Mango Solutions has a nice review article, R: the Excel Connection, in which they compare several of the above packages, with a special emphasis on those that can both read and write Excel files (XLConnect, xlsx, openxlsx, excel.link).

Compared with the above packages, rexcel tries to read all the data from an Excel sheet using linen as an intermediate representation in R. The eventual goal is to provide a common receptacle for detailed spreadsheet information from both Excel and Google Sheets. Rather than trying to create a single data.frame in one shot, it allows access to data, formulae and formatting information. Excel type information is preserved, especially for heterogeneous columns. It has no non-R dependencies (e.g. on Perl or Java) and should run on any platform regardless of whether Excel is installed.

Installation

Requires the development version of xml2, as the newer version changes the behaviour of matching functions in fairly large ways. For terminal printing we use the most recent copy of crayon which includes more accurate rendering. And we use linen as the R spreadsheet representation.

devtools::install_github("hadley/xml2")
devtools::install_github("gaborcsardi/crayon")
devtools::install_github("rsheets/linen")
devtools::install_github("rsheets/rexcel")

Formatting preserved

  • Cell fill colour
  • Cell patterns
  • Cell gradients
  • Text colour
  • Text bold, italic, underline, strikethrough, outline, shadow, condense, extend
  • Text font
  • Text size
  • Text alignment (horizontal, vertical)
  • Column/row visibility
  • Column/row width/height
  • Styles applied at the column level (though the spec seems vague about if that's a real thing - compare p. 1600 and 1596)
  • Conditional formatting (the rule and the outcome)
  • Borders (position and colour)
  • Numeric/date formatting, possibly also formatted text?
  • Table styles (e.g. for pivot tables)

Of particular concern:

A cell can have both direct formatting (e.g., bold) and a cell style (e.g., Explanatory) applied to it. Therefore, both the cell style xf records and cell xf records shall be read to understand the full set of formatting applied to a cell.

(18.8.10, p.1754)

Also

When the color palette is modified, the indexedColors collection is written. When a custom color has been selected, the mruColors collection is written.

So we should find a sheet that includes this and see what this looks like empirically.

Other worthwhile things to get

  • named ranges
  • comments (author, ref, rich text, visibility)
  • graphs
  • other drawings
  • pivot tables
  • frozen rows / split panes
  • calculation chain
  • header/footer

The Excel XML Spec

Page and section numbers refer to the "ECMA Office Open XML Part 1 - Fundamentals and Markup Language Reference" document; a massive and fairly unweidly 5026 PDF! The spreadsheet material is mostly in section 18 (pages 1518 - 2508).

Writing Excel files

Writing workbooks is not currently supported. Before implementing it, we want to see how much information we can preserve while reading. Once we know that, we can start working towards seeing how much can be written and how lossy a read/write cycle will be (it is likely to be very lossy as there is an enormous number of things that might be stored in a worksheet.

You can’t perform that action at this time.