New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation question: how does one convert a string to a quosure? #116

Open
JohnMount opened this Issue May 3, 2017 · 31 comments

Comments

Projects
None yet
@JohnMount

JohnMount commented May 3, 2017

I have skimmed the dplyr/tidyeval/rlang documentation and tutorials and I don't remember or see how to convert a string to a quosure easily. What I want to do (and I think it is an important use case) is take the name of a column as a string from some external source (say from colnames(), or from the yarn-control block of an R-markdown document) and then use that string as a variable name. It looks like to do that you have to promote the string up to a quosure- and that is the part I don't know how to do in pure tidyeval idiom. I've tried things like quo(), but I am missing something.

Below is a specific example with a work-around that shows the effect I want. The only question is how does one produce the variable varQ from the value stored in varName (again, assuming the value stored is a string and not known to the programmer)?

# devtools::install_github("tidyverse/dplyr")
library("dplyr")
## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
packageVersion("dplyr")
## [1] '0.5.0.9004'
library("wrapr")

# imagine this string comes from somewhere else
varName <- 'disp' 
# The following does not currently work as
# rlang/tidyeval/dplyr is expecting a 
# quousure to represent the variable name,
# not a string.
mtcars %>% 
  select(!!varName)
## Error: `"disp"` must resolve to integer column positions, not string
# How does one idiomatically 
# create a quosure varQ that refers
# to the name stored in varName?
# Here is a work-around using wrapr
# to show the desired effect:
stringToQuoser <- function(varName) {
  wrapr::let(c(VARNAME = varName), quo(VARNAME))
}
varQ <- stringToQuoser(varName)

# the code we want to run:
mtcars %>% 
  select(!!varQ) %>%
  head()
##                   disp
## Mazda RX4          160
## Mazda RX4 Wag      160
## Datsun 710         108
## Hornet 4 Drive     258
## Hornet Sportabout  360
## Valiant            225
@lionel-

This comment has been minimized.

Member

lionel- commented May 3, 2017

you can use sym() or syms(). You can unquote these symbols directly in capturing functions:

select(df, !! sym("foo"))
select(df, !!! syms(c("foo", "bar")))

or you can unquote them while creating quosures (which is actually the same mechanism):

quo(!! sym("foo"))
quo(list(!!! syms(letters)))

@lionel- lionel- closed this May 3, 2017

@JohnMount

This comment has been minimized.

JohnMount commented May 3, 2017

Thanks, just a note: that doesn't work until you explicitly import rlang (dplyr seems to not share it).

library("dplyr") # installed today
packageVersion("dplyr")
# [1] ‘0.5.0.9004’
select(mtcars,  !! sym("disp"))
# Error in (function (x)  : could not find function "sym" 
library('rlang')
packageVersion("rlang")
# [1] ‘0.0.0.9018’
select(mtcars,  !! sym("disp"))
@lionel-

This comment has been minimized.

Member

lionel- commented May 3, 2017

You can qualify: rlang::sym()

@JohnMount

This comment has been minimized.

JohnMount commented May 3, 2017

Also I don't think varQ = quo(sym(varName)) works (and it is the bit I really need).

library("dplyr")
library("rlang")
varName <- 'disp'
varQ = quo(sym(varName))
varQ
# <quosure: global>
# ~sym(varName)
select(mtcars, varQ)
#  Error: `varQ` must resolve to integer column positions, not formula 

Even if that did work, it would be dangerous as it looks like it is hanging on a reference to varName (instead of taking the value at the current moment), meaning if we later changed the value of varQ the select could (do to lazy eval) become a different result (I call this a dragging reference). This drives bugs sort of like this one: tidyverse/dplyr#2455 .

Can you please re-open this issue until we find a working pure rlang solution?

@hadley

This comment has been minimized.

Member

hadley commented May 3, 2017

But you don't need to create a quosure here; just pass in a symbol (which you can also create with as.name() if you don't want to import rlang)

@lionel-

This comment has been minimized.

Member

lionel- commented May 3, 2017

When you're programming with NSE functions, you are building an expression. Unquoting makes it possible to change parts of the expression and is what a programmer should focus on:

varName <- "disp"
varQ <- quo(!! sym(varName))  # unquoting the symbol
select(mtcars, !! varQ)  # unquoting the quosure

Note that the quosure is superfluous here as Hadley mentions, since you're not referring to symbols from the contextual environment.

it would be dangerous as it looks like it is hanging on a reference to varName

I don't understand what you mean. In the snippet above we build a quosure containing the value of sym(varName) which is the symbol disp. Note that you can also unquote literal values (i.e. vectors) just like you can unquote expressions (though in this case select_vars() expects column positions rather than actual columns; it evaluates expressions in an environment where column symbols evaluate to column positions).

@MZLABS

This comment has been minimized.

MZLABS commented May 3, 2017

John Mount here again (different account, sorry).

I see the later solutions do work, and than you for that. I also understand answering questions is a volunteer activity, so I appreciate you working on this for me.

I had a typo in my attempt to use the answer, and I apologize for that. But variations on that idea do not work even if I type it "correctly":

library("dplyr")
library("rlang")
varName <- 'disp'
varQ = quo(sym(varName))
select(mtcars, !! varQ)
# Error: `sym(varName)` must resolve to integer column positions, not symbol

To answer some expressed concerns and set some context I have some follow-up, but all my questions are now answered.

As far as the appearance of unbound variables and CRAN check. If I had been submitting the above code to CRAN I would have added the following line above the let-block:

VARNAME <- NULL # mark variable symbol as not an unbound reference

The reason I asked for "quosure" is: I thought that was all rlang accepted. I had tested it does not take strings and base-R formulas, so I was just asking for what I thought was the help I needed based on my state of knowledge.

I do not mind importing rlang or qualifying rlang::sym, I only mentioned that dplyr did not re-export rlang::sym to point out the difficulty in discovering that command starting from a dplyr task (i.e. working on a select()).

The latest solution does work (and thank you for it):

library("dplyr")
library("rlang")
varName <- 'disp'
varQ = quo(!! sym(varName))
mtcars %>%
  select(!! varQ) %>% 
  head()
#                  disp
#Mazda RX4          160
#Mazda RX4 Wag      160
#Datsun 710         108
#Hornet 4 Drive     258
#Hornet Sportabout  360
#Valiant            225

It also appears to not have the captured reference name issue. Since the captured reference issue is not in this variation (there is no visible reference to the variable name "varName" in "varQ") there isn't much point going more into it. But the rough idea is: if a quosure worked by capturing varName and varName changed between when we thought we set it and when we used the value we would not want the new value to enter into the calculation. The effect (albeit in another context) was discussed at length in my linked issue tidyverse/dplyr#2455 which was supposed to be a worked example by analogy. But as I said, I don't think the current solutions hold a reference to the varName (they seem to properly go after the value) so we don't have to worry if we have this issue or not. Typically R doesn't have these issues due to its copy by value semantics, but anywhere names are directly used I worry about possible (unintended, and therefore undesirable) reference-like semantics leaking in.

Also I understand sym() this is not the form you would prefer and that as.name() can be used.

library("dplyr")
varName <- 'disp'
varQ = as.name(varName)
mtcars %>%
  select(!! varQ) %>% 
  head()
#                  disp
#Mazda RX4          160
#Mazda RX4 Wag      160
#Datsun 710         108
#Hornet 4 Drive     258
#Hornet Sportabout  360
#Valiant            225

Finally I did work on this before asking. I tried all of:

varQ = as.formula(~varName)
varQ = quo(!! varName)
varQ = quote(varName)
varQ = quote(!! varName)

And none of the above worked.

Frankly I was guessing, but the reason I was guessing is I did not find a worked example of converting a string to something rlang is willing to use as variable name. That is: guessing was not my first choice.

I had tried help(quo), help(UQ) (!!'s equiv) and neither of them mentions, links to, or has an example of sym() or as.name().

Anyway thank you very much for your solutions.

@hadley

This comment has been minimized.

Member

hadley commented May 3, 2017

You can also use the .data pronoun to avoid both R CMD check notes and the need to convert to symbols:

varName <- "disp"
mtcars %>% select(.data["disp"])

(Well, you will be able to once tidyverse/dplyr#2718 is merged)

@lionel-

This comment has been minimized.

Member

lionel- commented May 3, 2017

I worry about possible (unintended, and therefore undesirable) reference-like semantics leaking in.

It's not about reference semantics, it's about delayed evaluation. We're building an expression, sometimes in several steps, and if the value of some symbols changes before evaluation actually happens this could be a problem. To work around this, you can unquote values rather than symbols, or you could make sure the symbols are in read-only environments (e.g. by building an appropriate quosure).

I had tested it does not take strings and base-R formulas

tidyeval works with pure expressions. The only adjustment we make is that quosures self-evaluate within their environments (with overscoped data attached). This is why the following expressions are completely equivalent:

select(mtcars, "cyl")

var <- "cyl"
select(mtcars, !! var)

This doesn't work because select() doesn't work with strings but with column positions (or with expressions evaluating to column positions). Hence the following works:

select(mtcars, cyl)

var <- sym("cyl")
select(mtcars, !! var)

Alternatively, you can also supply the values it understands (column positions):

select(mtcars, 1)

var <- 1
select(mtcars, !! var)
@yeedle

This comment has been minimized.

yeedle commented Jun 16, 2017

It seems that if an expression (as opposed to a column name) is passed as a string the above solutions do not work.
consider:

count(mtcars, !!! rlang::syms(c("2 * cyl", "am")))
#> Error in grouped_df_impl(data, unname(vars), drop): Column `2 * cyl` is unknown

While this works:

count(mtcars, !!! rlang::syms(c("cyl", "am")))
#> # A tibble: 6 x 3
#>     cyl    am     n
#>   <dbl> <dbl> <int>
#> 1     4     0     3
#> 2     4     1     8
#> 3     6     0     4
#> 4     6     1     3
#> 5     8     0    12
#> 6     8     1     2

I must be missing something but I'm not sure what.

@hadley

This comment has been minimized.

Member

hadley commented Jun 16, 2017

mtcars doesn't have a column called 2 * cyl ?

@yeedle

This comment has been minimized.

yeedle commented Jun 16, 2017

Right. It does not. But count (like group_by) does not require a column name

count(mtcars, 2 * cyl, am)
#> # A tibble: 6 x 3
#>   `2 * cyl`    am     n
#>       <dbl> <dbl> <int>
#> 1         8     0     3
#> 2         8     1     8
#> 3        12     0     4
#> 4        12     1     3
#> 5        16     0    12
#> 6        16     1     2

This even works with count_:

count_(mtcars, c("2 * cyl", "am"))
#> # A tibble: 6 x 3
#>   `2 * cyl`    am     n
#>       <dbl> <dbl> <int>
#> 1         8     0     3
#> 2         8     1     8
#> 3        12     0     4
#> 4        12     1     3
#> 5        16     0    12
#> 6        16     1     2

Can I get the same results if 2 * cyl is a string with tidyeval?

@hadley

This comment has been minimized.

Member

hadley commented Jun 16, 2017

Yes, you need to parse it, not convert it to a string. I forget if rlang has an equivalent of parse(text = text)

@lionel-

This comment has been minimized.

Member

lionel- commented Jun 17, 2017

There is parse_expr(), parse_quo() which parse one expression, e.g. "foo(bar)" and the plural variants which parse a list of expressions, e.g. "foo(bar); baz".

@shearerp

This comment has been minimized.

shearerp commented Jun 19, 2017

A couple SO answers which illustrate how one can use quosures and parse_quosure() to pass strings to dplyr, as we once did with the deprecated underscore verbs:
https://stackoverflow.com/a/44594223/845800
https://stackoverflow.com/a/44593617/845800

@gwhiteford-cwt

This comment has been minimized.

gwhiteford-cwt commented Nov 14, 2017

@hadley , including references to as.name() or sym() with examples in the dplyr programming vignette would be very helpful. I must have tried just about every permutation in the (0.7.3) vignette examples trying to get a code-generated string (in a Shiny app) to be a column name before I stumbled upon this thread. FWIW. (Will happily post this in whatever venue is more appropriate.)

@lionel-

This comment has been minimized.

Member

lionel- commented Nov 14, 2017

@gwhiteford-cwt we're working on a new vignette: http://rpubs.com/lionel-/programming-draft

@lionel-

This comment has been minimized.

Member

lionel- commented Nov 14, 2017

btw if you need a symbol don't use the parse_ functions, use sym() or syms().

@tmastny

This comment has been minimized.

tmastny commented Mar 13, 2018

@lionel- Why what's the difference?

For example:

> var <- "Species"
> new_expr <- parse_expr(var)
> rlang::is_symbol(new_expr)
[1] TRUE
> new_sym <- rlang::sym(var)
> rlang::is_expr(new_sym)
[1] TRUE
> identical(new_sym, new_expr)
[1] TRUE

Where can I read about the difference?

@lionel-

This comment has been minimized.

Member

lionel- commented Mar 13, 2018

sym() will create non-syntactic symbols while parse_expr() will give an error because it tries to interpret the string as an R expression rather than an R symbol:

parse_expr("foo+")
#> Error in parse(text = x) : <text>:2:0: unexpected end of input
#> 1: foo+
#>    ^

sym("foo+")
#> `foo+`
@adamryczkowski

This comment has been minimized.

adamryczkowski commented Mar 16, 2018

@lionel- I think you should also include a reference to the rlang::parse_quo and the example of how to un-depreciate use of variable-held expressions in mutate_ or filter_. It is the third time I lost many minutes by forgetting this function, when modernizing (not that old) legacy dplyr code, like this one:

apply_filter<-function(db, filterstring) {
 return(dplyr::filter_(db, filterstring))
}

into this

apply_filter<-function(db, filterstring) {
     filterexpr<-rlang::parse_quosure(filterstring)
     return(dplyr::filter(db, !!filterexpr))
}
@lionel-

This comment has been minimized.

Member

lionel- commented Mar 16, 2018

note the old rlang::parse_quosure() function has been renamed to parse_quo() and now requires you to specify an environment. The old function would set the current env by default, which was a bug. It should at least be caller_env() (or parent.frame()), but if the string has been passed around several times you should pass the original environment alongside.

You are right we should document this process somewhere.

@lionel- lionel- reopened this Mar 16, 2018

@lionel- lionel- added the tidyeval label Mar 16, 2018

@efh0888

This comment has been minimized.

efh0888 commented May 30, 2018

hey @lionel- noticed that there's no mention of sym() or syms() in the final vignette, so has the best practice changed? my use case is similar to @gwhiteford-cwt, i.e. allowing the user to choose which column to group_by() in a shiny app...

@lionel-

This comment has been minimized.

Member

lionel- commented May 30, 2018

If the symbols reference data frame columns, you can safely use sym() and syms(). Quosures are for complex expressions so dplyr can find where your functions are defined.

@efh0888

This comment has been minimized.

efh0888 commented May 31, 2018

@lionel- thanks for clarifying!

@RolandASc

This comment has been minimized.

RolandASc commented Jun 5, 2018

edit:
ok I guess I'm just repeating what was said a few comments above, although parse_expr seems simpler since it doesn't need env.

@lionel-
I stumbled across this issue, so sorry for adding it here. I think it would be helpful if the programming vignette of dplyr could include an example illustrating something like below (i.e. a utility function that pastes together string conditions). This was straight-forward with SE semantics.
Apols if you've covered this elsewhere.

cond <- "mass < 20"
dplyr::filter(dplyr::starwars, !!rlang::parse_expr(cond))

cond <- "mass > 1000"
dplyr::filter(dplyr::starwars, !!rlang::parse_expr(cond))

my_filter <- function(df, cond) {...}
@lionel-

This comment has been minimized.

Member

lionel- commented Jun 5, 2018

Agreed, filter() expressions should be a prominent example in the vignette. But it won't involve strings or parse_expr() because we believe meta-programming with strings is sloppy and not robust. Here is one way to do it with quasiquotation:

my_filter <- function(data, col, op, value) {
  op_sym <- sym(op)
  col_sym <- sym(col)

  cond <- expr((!!op_sym)(!!col_sym, !!value))
  filter(data, !!cond)
}
my_filter(starwars, "mass", "<", 20)
my_filter(starwars, "mass", ">", 1000)

This is compatible with op being any binary predicate function, not just a binary operator. Though there's a bug because we didn't unquote a quosure and local functions won't be available (functions in the global env and search path are always available tough). To fix this (which is important if you want my_filter() to be usable within packages) you just wrap cond in a quosure, which you can do manually with:

cond <- new_quosure(cond, parent.frame())

just before the filter() call.

@lionel-

This comment has been minimized.

Member

lionel- commented Jun 5, 2018

Though it's better to take the environment as argument in case your function is called from another function rather than by the user. So the complete solution is:

my_filter <- function(data, col, op, value, env = parent.frame()) {
  op_sym <- sym(op)
  col_sym <- sym(col)

  cond <- expr((!!op_sym)(!!col_sym, !!value))
  cond <- new_quosure(cond, env)

  filter(data, !!cond)
}
@RolandASc

This comment has been minimized.

RolandASc commented Jun 5, 2018

Ok, thanks, that's a helpful example

@lionel-

This comment has been minimized.

Member

lionel- commented Jun 5, 2018

And you can also build your expression with call() instead of quasiquotation. Then op is automatically converted to a symbol:

cond <- call(op, sym(col), value)

That does make things a bit simpler:

my_filter <- function(data, col, op, value, env = parent.frame()) {
  cond <- call(op, sym(col), value)
  cond <- new_quosure(cond, env)

  filter(data, !!cond)
}
@RolandASc

This comment has been minimized.

RolandASc commented Jun 5, 2018

yes I like that!

@lionel- lionel- added this to the 0.3.0 milestone Sep 28, 2018

@lionel- lionel- added the doc 📒 label Oct 3, 2018

@lionel- lionel- removed this from the 0.3.0 milestone Oct 19, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment