<font size="6"><b>IMPORT/EXPORT DATA FROM/TO EXCEL FILES</b></font>

In [None]:
library(data.table)
library(tidyverse)
library(xlsx)
library(readxl)
library(tidyxl)

In [None]:
options(repr.matrix.max.rows=20, repr.matrix.max.cols=10) # for limiting the number of top and bottom rows of tables printed 

![xkcd](../imagesbb/spreadsheets.png)

(https://xkcd.com/2180)

Here we will show three alternative packages two write and/or read data into/from R and excel

Let's create a data.table copy of built-in iris

In [None]:
iris2 <- iris

In [None]:
setDT(iris2)

In [None]:
iris2 %>% str

In [None]:
iris2 %>% head

# xlsx package

Write iris2 into an excel file:

In [None]:
write.xlsx(iris2, row.names = F, "~/databb/temp/iris.xlsx")

Read from excel into a data.frame and convert to data.table:

In [None]:
iris_from_xl1 <- read.xlsx("~/databb/temp/iris.xlsx", sheetIndex = 1)

In [None]:
setDT(iris_from_xl1)

Change Species from character to factor:

In [None]:
iris_from_xl1[, Species := factor(Species, levels(iris$Species))]

The first table imported from excel is also identical with the original one:

In [None]:
identical(iris2, iris_from_xl1)

# readxl package

Similar to xlsx package:

In [None]:
iris_from_xl2 <- read_xlsx("~/databb/temp/iris.xlsx", sheet = 1)

In [None]:
setDT(iris_from_xl2)

In [None]:
iris_from_xl2[, Species := factor(Species, levels(iris$Species))]

The second table imported from excel is also identical with the original one:

In [None]:
identical(iris2, iris_from_xl2)

# tidyxl

tidyxl package is an overkill for excel files with a regular format

But if the excel sheets have multiple tables of complex structure (multiple header column/rows, formatting relavant for data, formulas to extract, etc), tidyxl is a powerhouse: It read data cellwise into a long object with all metadata, format and coordinates as separate columns:

In [None]:
iris_from_xl_long <- tidyxl::xlsx_cells("~/databb/temp/iris.xlsx", sheet = 1)

In [None]:
setDT(iris_from_xl_long)

See how large a number of metadata is imported:

In [None]:
iris_from_xl_long %>% str

Select relevant columns:

In [None]:
iris_from_xl_long_2 <- iris_from_xl_long %>% select(row, col, is_blank, data_type, character, numeric)

And filter for only the numeric values with coordinate information (rows and columns)

In [None]:
values1 <- iris_from_xl_long_2[data_type == "numeric", .(row, col, values = numeric)]

In [None]:
values1

Now let's filter for character values. These include column headers and the Species column:

In [None]:
iris_from_xl_long_3 <- iris_from_xl_long_2[data_type == "character"]

Column names reside in the first row:

In [None]:
colnamesx <- iris_from_xl_long_3[row == 1, .(col, colname = character)]

In [None]:
colnamesx

Species values reside on the rest of the rows:

In [None]:
values2 <- iris_from_xl_long_3[row != 1, .(row, col, values = character)]

In [None]:
values2

Now we have two data.tables for values: One for numeric columns, the other for species. Both are in long format now

Now join the column names through `col`:

In [None]:
values1b <- values1 %>% left_join(colnamesx, by = "col")

In [None]:
values2b <- values2 %>% left_join(colnamesx, by = "col")

And convert into wide format:

In [None]:
values1c <- values1b %>% dcast(row ~ colname, value.var = "values")

In [None]:
values1c

In [None]:
values2c <- values2b %>% dcast(row ~ colname, value.var = "values")

In [None]:
values2c

Join both tables using `row`:

In [None]:
iris_from_xl3 <- values1c %>% left_join(values2c, by = "row") %>% select(-row)

Change Species into a factor:

In [None]:
iris_from_xl3[, Species := factor(Species, levels(iris$Species))]

Reorder columns according to iris2:

In [None]:
setcolorder(iris_from_xl3, names(iris2))

The `sorted` attribute is added during the `dcast` operation.

That's the only difference with iris2, let's delete that attribute

In [None]:
attributes(iris_from_xl3)$sorted <- NULL

The third table imported from excel is also identical with the original one:

In [None]:
identical(iris2, iris_from_xl3)