Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

using temp files on windows? #13

Closed
r2evans opened this issue Jun 20, 2016 · 2 comments
Closed

using temp files on windows? #13

r2evans opened this issue Jun 20, 2016 · 2 comments
Assignees

Comments

@r2evans
Copy link

r2evans commented Jun 20, 2016

Using capture.output with large objects can be really slow. Doing some console testing, number of rows is non-linear in its time-to-encode:

system.time( zz <- capture.output(write.table(data.frame(a=1:10000), sep="\t")) )
#    user  system elapsed 
#     0.2     0.0     0.2 
system.time( zz <- capture.output(write.table(data.frame(a=1:20000), sep="\t")) )
#    user  system elapsed 
#    0.84    0.00    0.84 
system.time( zz <- capture.output(write.table(data.frame(a=1:30000), sep="\t")) )
#    user  system elapsed 
#    2.08    0.00    2.08 
system.time( zz <- capture.output(write.table(data.frame(a=1:40000), sep="\t")) )
#    user  system elapsed 
#    4.20    0.00    4.21 
system.time( zz <- capture.output(write.table(data.frame(a=1:50000), sep="\t")) )
#    user  system elapsed 
#    7.70    0.02    7.75 

Whereas using temporary files is quite a bit faster:

tf <- "foo.txt"
system.time( {
  write.table(data.frame(a=1:10000), sep = "\t", file = tf)
  zz <- paste(readLines(tf), collapse = "\r\n")
  writeClipboard(zz)
})
#    user  system elapsed 
#    0.03    0.00    0.03 
system.time( {
  write.table(data.frame(a=1:50000), sep = "\t", file = tf)
  zz <- paste(readLines(tf), collapse = "\r\n")
  writeClipboard(zz)
})
#    user  system elapsed 
#    0.13    0.00    0.13 

And pasting into Excel works as expected in a fraction of the time.

Even going to the extreme row-count of Excel 2013/2016 (allowing for a header row):

system.time( {
  write.table(data.frame(a=1:1048575), sep = "\t", file = tf)
  zz <- paste(readLines(tf), collapse = "\r\n")
  writeClipboard(zz)
})
#    user  system elapsed 
#    3.72    0.08    3.89 

(I don't want to try that with capture.output, though by its exponential progression I imagine it would take on the order of 700 seconds if it completed at all.)

BTW: I'm testing this on R-3.2.5 on win10_64, so I don't know if or how much impact this would have on other architectures.

@mdlincoln mdlincoln self-assigned this Jun 20, 2016
@mdlincoln
Copy link
Owner

Oh, interesting problem! I'd not expected clipr to be used for such large payloads, but then again, why not? The temp file solution looks elegant enough (I'd use the tempfile function to create a filepath, though.)

capture.output is also used on OS X and X11-like systems, so let me take a look at the behavior there. If there's no adverse performance effects on those platforms to writing to a tempfile, I'll implement it.

@r2evans
Copy link
Author

r2evans commented Jun 20, 2016

Yes, tempfile was the assumed preferred method over my hard-coded filename.

It's always interesting to see how others use your packages in ways you had not imagined. I'm frequently trying to copy things between R and Excel, at times pushing Excel's limits. I was pointed to your package by @alistaire on StackOverflow, and that's when I learned about utils::readClipboard and family. I've had my own home-grown function using read.delim("clipboard", ...), and it would just instantly fail with large data, so I've been resorting to intermediary CSV files. Though I may not replace my home-grown utils with clipr, I'm deeply appreciative looking at your code to improve my stuff. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants