Substitution Functions
Replacing text matching a regular expression with something new is a common operation. Base R provides the sub and gsub functions for this purpose, while the ore package provides ore.subst. However, the latter offers one variant of this operation which I believe is currently unique: the ability to use a general R function to derive the replacement strings from the corresponding matches in the original text.
Motivation
The substituted text is very often related to the matched text. Subgroups and back-references allow a substantial amount of flexibility, but they are still very limited in many ways. Additional flexibility can be made available via extra syntax: sub and gsub allow for the syntax \U and \L in replacements to indicate a conversion to upper or lower case, for example.
sub("(\\w)(\\w*)", "\\U\\1\\L\\2", "teST", perl=TRUE)
# [1] "Test"Again, this helps with some use cases, but is ultimately limited in scope. Any more complicated manipulation would involve a tedious extraction of the matches, application of the function in each case, and then a piecemeal reconstruction of the final string.
A toy example is to identify integers in a string and replace them with their squares. This is awkward to achieve using only base R. Let's take an example string and create the function.
text <- "I have 2 dogs, 3 cats and 4 hamsters"
fun <- function(x) as.integer(x)^2Now, a simple version of the code, for only one string, might look something like
match <- gregexpr("-?\\d+", text, perl=TRUE)[[1]]
result <- ""
start <- 1
for (i in seq_along(match)) {
result <- paste0(result, substr(text,start,match[i]-1))
result <- paste0(result, fun(substr(text,match[i],match[i]+attr(match,"match.length")[i]-1)))
start <- match[i] + attr(match,"match.length")[i]
}
result <- paste0(result, substr(text,match[3]+attr(match,"match.length")[3],nchar(text)))
result
# [1] "I have 4 dogs, 9 cats and 16 hamsters"Fully generalising this would be further work. By contrast, ore makes this easy, and consistent with simpler string-based substitutions:
library(ore)
ore.subst("-?\\d+", fun, text, all=TRUE)
# [1] "I have 4 dogs, 9 cats and 16 hamsters"In this case, all of the logic is implemented in C, and only the call to fun itself takes place at the R level.
A "real-world" example: Expression substitution
The example above demonstrates the point, but it isn't a hugely useful application. An example with more practical use would be to implement "expression substitution", an extension to string syntax allowing R expressions embedded in a string to be evaluated, and their values substituted back in.
First, we need a vectorised function for parsing and evaluating R expressions. (The base function eval won't evaluate multiple expressions at once.)
veval <- function(x,e) sapply(x, function(xi) eval(parse(text=xi),envir=e))We can then implement a simple form of expression substitution via
expr_subst <- function (string, envir = parent.frame()) {
ore.subst("#\\{([^\\}]+)\\}", function(match,envir) return(veval(groups(match),envir)), string, envir=envir)
}This can then be used as follows:
expr_subst("pi is #{pi}")
# [1] "pi is 3.14159265358979"
x <- 3
expr_subst("x^2 is #{x^2}")
# [1] "x^2 is 9"The es function provided with the ore package implements a more flexible variant of this approach. Give it a try!