Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WISH: .rm(x) - a fast light-weight version of rm(x) #18

Open
HenrikBengtsson opened this issue Mar 22, 2016 · 1 comment
Open

WISH: .rm(x) - a fast light-weight version of rm(x) #18

HenrikBengtsson opened this issue Mar 22, 2016 · 1 comment

Comments

@HenrikBengtsson
Copy link
Owner

Background

rm(x) and rm(list="x") are slow. The latter 2-3 times faster, but still very slow (100-200 times slower) compared to a simple assignment, e.g. x <- NULL. For a few number of calls to rm() this makes little difference, but if it's called thousands of times it is noticable.

Some benchmark results:

> options(digits=3)
> microbenchmark::microbenchmark(
  "rm(x)"            = { x <- 1; rm(x) },
  "rm(list='x')"     = { x <- 1; rm(list="x") },
  ".Internal(rm(x))" = { x <- 1; .Internal(remove("x", parent.frame(), FALSE)) },
  "x <- NULL"        = { x <- 1; x <- NULL },
  times=10e3, unit="ms"
)

Unit: milliseconds
             expr      min       lq     mean   median       uq    max neval
            rm(x) 0.030027 0.033492 0.036719 0.034647 0.036186 3.3753 10000
     rm(list='x') 0.018479 0.021558 0.023979 0.022329 0.023483 1.5960 10000
 .Internal(rm(x)) 0.000385 0.001155 0.001249 0.001156 0.001541 0.0192 10000
        x <- NULL 0.000000 0.000001 0.000174 0.000001 0.000386 0.0273 10000

Troubleshooting

One reason rm() is slow is that already at the R level it carries lots of extra weight in order to work in many different cases, e.g. rm(x), rm(list="x"), rm(x,y), rm(list=c("x", "y"), envir=env, inherits=TRUE) etc. As the benchmark stats show, calling .Internal(remove("x", ...)) is yet faster, but still 10 times slower than a plain assignment.

> base::rm
function (..., list = character(), pos = -1, envir = as.environment(pos),
    inherits = FALSE)
{
    dots <- match.call(expand.dots = FALSE)$...
    if (length(dots) && !all(vapply(dots, function(x) is.symbol(x) ||
        is.character(x), NA, USE.NAMES = FALSE)))
        stop("... must contain names or character strings")
    names <- vapply(dots, as.character, "")
    if (length(names) == 0L)
        names <- character()
    list <- .Primitive("c")(list, names)
    .Internal(remove(list, envir, inherits))
}

Suggestion 1

As a straightforward first improvement, the base package could provide:

.rm <- function(x) .Internal(remove(x, parent.frame(), FALSE))
> options(digits=3)
> microbenchmark::microbenchmark(
  "rm(x)"            = { x <- 1; rm(x) },
  "rm(list='x')"     = { x <- 1; rm(list="x") },
  ".Internal(rm(x))" = { x <- 1; .Internal(remove("x", parent.frame(), FALSE)) },
  ".rm('x')" = { x <- 1; .rm("x") },
  "x <- NULL"        = { x <- 1; x <- NULL },
  times=10e3, unit="ms"
)

Unit: milliseconds
             expr      min       lq     mean   median       uq    max neval
            rm(x) 0.030412 0.033492 0.036597 0.034647 0.036186 1.6772 10000
     rm(list='x') 0.018863 0.021558 0.023578 0.022328 0.023483 1.5206 10000
 .Internal(rm(x)) 0.000385 0.000771 0.001293 0.001156 0.001540 1.4509 10000
         .rm('x') 0.000770 0.001540 0.001976 0.001925 0.002310 1.5279 10000
        x <- NULL 0.000000 0.000001 0.000154 0.000001 0.000386 0.0189 10000

Suggestion 2

The above could probable be improved by a native implementation. In [1], @s-u suggests:

If you really want to go overboard, you can define your own function:

SEXP rm(SEXP x, SEXP rho) { setVar(x, R_UnboundValue, rho); return R_NilValue; }
poof <- function(x) .Call(rm_C, substitute(x), parent.frame())

That will be faster than anything else (mainly because it avoids the trip through strings as it can use the symbol directly).

Miscellaneous

Alternative names for this function:

  • .rm()
  • poof()
  • yank()

See also

@HenrikBengtsson
Copy link
Owner Author

From the NEWS of R 4.3.0:

  • rm(list = *) is faster and more readable thanks to Kevin Ushey's PR#18492.

The gist of the speedup was to replace:

    dots <- match.call(expand.dots=FALSE)$...
    if(length(dots) && ... {

with

    if(...length()) {
      dots <- match.call(expand.dots=FALSE)$...

Results

This update made rm(list = "x") approximately 8 times faster. In R (>= 4.3.0), rm(list = "x") is now almost as fast as above proposed .rm(x) function.

Benchmarking with:

.rm <- function(x) .Internal(remove(x, parent.frame(), FALSE))

microbenchmark::microbenchmark(
  "rm(x)"            = { x <- 1; rm(x) },
  "rm(list='x')"     = { x <- 1; rm(list="x") },
  ".rm('x')" = { x <- 1; .rm("x") },
  ".Internal(rm(x))" = { x <- 1; .Internal(remove("x", parent.frame(), FALSE)) },
  "x <- NULL"        = { x <- 1; x <- NULL },
  times=10e3, unit="ms"
)

we get, for R 4.3.0:

Unit: milliseconds
             expr      min       lq         mean   median        uq      max
            rm(x) 0.008709 0.009790 0.0118062818 0.010497 0.0114025 1.953368
     rm(list='x') 0.000646 0.000740 0.0009125270 0.000802 0.0009260 0.016631   <==
         .rm('x') 0.000493 0.000575 0.0016054615 0.000622 0.0006960 8.968469
 .Internal(rm(x)) 0.000332 0.000385 0.0004852244 0.000415 0.0004680 0.016766
        x <- NULL 0.000088 0.000130 0.0001465821 0.000140 0.0001520 0.006369

and for R 4.2.3 we get:

Unit: milliseconds
             expr      min       lq         mean   median        uq      max
            rm(x) 0.008701 0.009471 0.0109918104 0.010079 0.0109485 1.912056
     rm(list='x') 0.005817 0.006338 0.0072416564 0.006768 0.0074360 0.708731   <==
         .rm('x') 0.000490 0.000560 0.0015168166 0.000604 0.0006490 8.591883
 .Internal(rm(x)) 0.000326 0.000377 0.0005168445 0.000417 0.0004540 0.605446
        x <- NULL 0.000086 0.000122 0.0001330333 0.000132 0.0001410 0.002661

The details are in wch/r-source@4dc057f.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant