-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Experiment with converting the whole df to character matrix first #126
Comments
Current HEAD system.time(write_ods(nycflights13::flights))
## user system elapsed
##203.636 4.029 207.797 This is experimental (it won't work for single-row df, for example) .write_sheet_con <- function(x, con, sheet = "Sheet1", row_names = FALSE, col_names = FALSE, na_as_string = FALSE, padding = FALSE) {
cmax <- force(if(ncol(x) > 1024) { 16384 } else { 1024 })
types <- unlist(lapply(x, class))
types <- ifelse(types %in% c("integer", "numeric"), "float", "string")
colj <- seq_len(NCOL(x))
cols <- ncol(x)
if (row_names) {
cols <- cols + 1
}
rows <- nrow(x)
if (col_names) {
rows <- rows + 1
}
if (padding) {
.write_as_utf8(.gen_sheet_tag(sheet = sheet, cols = cmax), con)
} else {
.write_as_utf8(.gen_sheet_tag(sheet = sheet, cols = cols), con)
}
## add data
rownames_x <- rownames(x)
colnames_x <- colnames(x)
if (col_names) {
.write_as_utf8("<table:table-row table:style-name=\"ro1\">", con)
if (row_names) {
.cell_out("string", value = "", con = con)
}
for (j in colj) {
.cell_out(type = "string", value = colnames_x[j], con = con)
}
if (cols < cmax && padding) {
.write_as_utf8(stringi::stri_join("<table:table-cell table:number-columns-repeated=\"", as.character(cmax - cols), "\"/>", sep = ""), con)
}
.write_as_utf8("</table:table-row>", con)
}
x2 <- apply(x, 2, as.character)
for (i in seq_len(NROW(x2))) {
## create a row
.write_as_utf8("<table:table-row table:style-name=\"ro1\">", con)
if (row_names) {
.cell_out(type = "string", value = rownames_x[i], con = con)
}
for (j in colj) {
value <- x2[i, j, drop = TRUE]
write_empty_cell <- FALSE
if (is.na(value) && !na_as_string) {
write_empty_cell <- TRUE
}
if (is.na(value) && na_as_string) {
type <- "string"
value <- "NA"
} else {
type <- types[j]
}
.cell_out(type = type, value = value, con = con, write_empty_cell = write_empty_cell)
}
if (cols < cmax && padding) {
.write_as_utf8(stringi::stri_join("<table:table-cell table:number-columns-repeated=\"", as.character(cmax - cols), "\"/>", sep = ""), con)
}
.write_as_utf8("</table:table-row>", con)
}
if (rows < 2^20 && padding) {
.write_as_utf8(stringi::stri_join("<table:table-row table:style-name=\"ro1\" table:number-rows-repeated=\"", 2^20 - rows, "\"><table:table-cell table:number-columns-repeated=\"", cmax, "\"/></table:table-row>", sep = ""), con)
}
.write_as_utf8("</table:table>", con)
return(invisible(con))
}
system.time(write_ods(nycflights13::flights))
## user system elapsed
## 126.228 0.827 127.168 |
I think this is probably the route we'll want to go down for #113. I haven't looked into it that heavily, but my thought is to do a similar thing to For very large datasets, this presents a problem, as it possibly doubles the memory required for writing, and so it may be that it's a little better to do it in chunks, but that's by the by. |
For the research of #113
readODS/R/writeODS.R
Line 107 in dfe3d06
Would there be performance gain for converting the whole df to character matrix first, rather than do this for every cell?
The text was updated successfully, but these errors were encountered: