Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

double-dot notation for formula #41

Closed
kgoldfeld opened this issue Sep 18, 2020 · 4 comments · Fixed by #58
Closed

double-dot notation for formula #41

kgoldfeld opened this issue Sep 18, 2020 · 4 comments · Fixed by #58
Assignees
Labels
feature feature request or enhancement
Milestone

Comments

@kgoldfeld
Copy link
Owner

The double-dot notation for external variable reference is awesome. A couple of things.

(1) In my workflow, I often put the data definitions at the top, and it seems natural to sometimes first reference the external below

 #--- this doesn't work but is the way I like to do things

def <- defData(varname = "age", formula=10, dist = "nonrandom")
def <- defData(def, varname="agemult", 
               formula="age * ..age_effect", dist="nonrandom")
#> Error: Escaped variables referenced not defined (or not numeric): age_effect

age_effect <- 3
genData(2, def)
#>    id age
#> 1:  1  10
#> 2:  2  10

  #--- this does work as I am sure you know

age_effect <- 3

def <- defData(varname = "age", formula=10, dist = "nonrandom")
def <- defData(def, varname="agemult", 
               formula="age * ..age_effect", dist="nonrandom")

genData(2, def)
#>    id age agemult
#> 1:  1  10      30
#> 2:  2  10      30

(2) The strength of the double-dot notation is how it makes dynamic data definition so easy. To generate different data sets in R using different assumptions, there are at least two ways. The first approach, using for loops, works fine, but I find it a little clunky and a bit less amenable to parallelization. (And, by the way, this is the perfect kind of case where it doesn't make sense to introduce age_effect before creating def. In fact, it is just not possible - except to create a dummy version of age_effect, which is not so appealing. But, maybe there's no way around this.

  #--- this works but is a little clunky

list_of_data <- list()

age_effects <- c(0, 5, 10)
for (i in seq_along(age_effects)) {
  age_effect <- age_effects[i]
  list_of_data[[i]] <- genData(2, def)  
}

list_of_data
#> [[1]]
#>    id age agemult
#> 1:  1  10       0
#> 2:  2  10       0
#> 
#> [[2]]
#>    id age agemult
#> 1:  1  10      50
#> 2:  2  10      50
#> 
#> [[3]]
#>    id age agemult
#> 1:  1  10     100
#> 2:  2  10     100

I prefer to use lapply or mclapply - seems more compact and faster than the for loop, especially when doing 1000's of replications. But it doesn't work - and not sure how how hard it would be to make it work.

  #--- this doesn't work but is how I like to do things

myreplicate <- function(x, def) {
  age_effect <- x
  genData(2, def)
}

lapply(c(0, 5, 10), function(x) myreplicate(x, def))
#> [[1]]
#>    id age agemult
#> 1:  1  10     100
#> 2:  2  10     100
#> 
#> [[2]]
#>    id age agemult
#> 1:  1  10     100
#> 2:  2  10     100
#> 
#> [[3]]
#>    id age agemult
#> 1:  1  10     100
#> 2:  2  10     100
@kgoldfeld kgoldfeld added the feature feature request or enhancement label Sep 18, 2020
@assignUser
Copy link
Collaborator

assignUser commented Sep 18, 2020 via email

@assignUser
Copy link
Collaborator

Yeah we can just move these and similar block in the other .check* functions into a sepreate function and use the new error system too.

simstudy/R/define_data.R

Lines 773 to 805 in bfcecd6

naFormFuncs <-
is.na(mget(
formFuncs,
ifnotfound = NA,
mode = "function",
envir = parent.frame(),
inherits = TRUE
))
if (any(naFormFuncs)) {
stop(paste(
"Functions(s) referenced not defined:",
paste(formFuncs[naFormFuncs], collapse = ", ")
), call. = FALSE)
}
naDotVars <-
is.na(mget(
sub("..", "", formVars[dotVarsBol]),
ifnotfound = NA,
mode = "numeric",
envir = parent.frame(),
inherits = TRUE
))
if (any(naDotVars)) {
stop(paste(
"Escaped variables referenced not defined (or not numeric):",
paste(names(naDotVars), collapse = ", ")
),
call. = FALSE
)
}

@assignUser assignUser self-assigned this Sep 19, 2020
@kgoldfeld
Copy link
Owner Author

I noticed you never reacted to my second comment - was that intentional? Too hard to deal with?

@assignUser
Copy link
Collaborator

:D I will check it out once we moved the checks but I assume it has something to do with execution environments...

@assignUser assignUser added this to the 0.2.0 milestone Sep 22, 2020
This was referenced Sep 25, 2020
assignUser added a commit that referenced this issue Oct 2, 2020
This allows for non defined external vars to be used in data defs
fixes double-dot notation for formula #41
@assignUser assignUser linked a pull request Oct 3, 2020 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature feature request or enhancement
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants