Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tagged parameters and inventory version 2 #1078

Merged
merged 25 commits into from
Jul 25, 2023
Merged

Tagged parameters and inventory version 2 #1078

merged 25 commits into from
Jul 25, 2023

Conversation

kylebaron
Copy link
Collaborator

@kylebaron kylebaron commented Apr 12, 2023

Summary

Models have parameters that can be updated through the input data set; mrgsolve takes care of this by matching parameter names against the names in the data set.

One problem is detecting when the user intended to pass in new parameter values via the data set, but misspelled or otherwise misspecified the name of the parameter (e.g. the data says WGT but the model is looking for WT). Because we bypass data set column names that we don't recognize, it's impossible to tell when the user made a mistake in the data.

This PR expands an existing feature where we can tag or label some of the parameters in a model so that they can be (optionally) checked in the input data set. A warning or error is generated when we check the data set names and don't find all the parameters we were expecting based on these tags. This check is optional, so users still have the freedom to exclude certain parameters from the data set and this is still perfectly reasonable to do.

The existing functionality was to add a @covariates attribute so the user could look into the model and learn which parameters are considered covariates. This will have to stay in place and, by default, we'll check data for parameters that have this attribute.

We also introduce a more general @input attribute as well as a new block $INPUT. This is more general and sort of mimics NONMEM syntax. $INPUT is just like $PARAM but all the parameters get the @input attribute and the data set will be checked for these tagged parameters when requested.

We also allow users to pass through a user-defined @tag for a parameter block. This gives quite granular control over which parameter will be required in which circumstances.

Blocks of parameters can have multiple tags or attributes.

There are two new functions:

  • check_data_names() which takes the model object and a data set and does the inventory and either warns or errors
  • param_tags() returns parameters and their tags for the user to see

By default, we warn when (1) a tagged parameter is missing in data or (2) no parameters have ben tagged. Use mode = "error" to generate errors in these situations. The user can also choose inform to receive a message when a tagged parameter is missing. I wanted this option because there is so much variability / flexibility in the input data sets, it would be reasonable to want to be simply notified about the parameter reconciliation and let the user decide ... no need to raise a red flag about it.

I think eventually there will be a "safe" simulation mode that will automatically call check_data_names() with the simulation call. Going to wait on that for now.

Some models in the internal model library have been updated to utilize some of these features.

Example

library(mrgsolve)
#> 
#> Attaching package: 'mrgsolve'
#> The following object is masked from 'package:stats':
#> 
#>     filter
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

mod <- modlib("popex")
#> Building popex ...
#> done.

data <- expand.ev(amt = 1, WGT = 70)

check_data_names(data, mod)
#> Warning: Could not find the following parameter names in `data`:
#> • WT (covariates)
#> ℹ Please check names in `data` against names in the parameter list.

try(check_data_names(data, mod, strict = TRUE))
#> Error in check_data_names(data, mod, strict = TRUE) : 
#>   Could not find the following parameter names in `data`:
#> • WT (covariates)
#> ✖ Please check names in `data` against names in the parameter list.

check_data_names(data, mod, inform = TRUE)
#> Could not find the following parameter names in `data`:
#> • WT (covariates)

Created on 2023-07-18 with reprex v2.0.2

@kylebaron kylebaron linked an issue Jul 7, 2023 that may be closed by this pull request
@kylebaron kylebaron self-assigned this Jul 15, 2023
@kylebaron kylebaron added this to the v1.1.0 milestone Jul 15, 2023
@kylebaron kylebaron added this to Evaluate in Development Jul 15, 2023
@kylebaron kylebaron requested a review from kyleam July 19, 2023 22:11
Copy link
Contributor

@kyleam kyleam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The background and motivation makes sense to me.

Code-wise I didn't spot any issues. (In addition to a read through, I stepped through and experimented with check_data_names a good amount.)

R/inven.R Outdated
#' `"tag1,tag2"`).
#' @param strict if `TRUE`, generate an error when `data` is missing some
#' expected column names; otherwise, issue a warning.
#' @param inform if `TRUE`, inform the user when `data` is missing some
Copy link
Contributor

@kyleam kyleam Jul 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a user, I tend to find dependent boolean flags like strict and inform more confusing than having one option with multiple modes: if_missing = c("error", "warn", "inform"). Not sure how many users feel the same though.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand; I wondered about that approach for check_covariates and check_input too, but wanted to force the user to opt out of both if they really wanted.

inform was a late addition and I just put it in as another boolean. I can pull those into a single argument.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been re-factored; good call.

Copy link
Contributor

@kyleam kyleam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update looks good to me. Thanks.

@kylebaron kylebaron merged commit c13335f into main Jul 25, 2023
2 checks passed
Development automation moved this from Evaluate to Done Jul 25, 2023
@kylebaron kylebaron deleted the strict-inventory branch July 25, 2023 21:00
@kylebaron kylebaron mentioned this pull request Aug 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development
  
Done
Development

Successfully merging this pull request may close these issues.

Inventory expected columns in data against param
2 participants