Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

export functions for setting init values #18

Merged
merged 13 commits into from
Dec 4, 2023
Merged

export functions for setting init values #18

merged 13 commits into from
Dec 4, 2023

Conversation

kyleam
Copy link
Collaborator

@kyleam kyleam commented Nov 16, 2023

This series implements the nmrec-side for the bbr "inherit inits" feature (metrumresearchgroup/bbr#582, metrumresearchgroup/bbr#608). It adds three functions for setting parameter init values: set_theta, set_omega, and set_sigma.

Series overview

[01/12]  theta parser: fix handling of no-init form with spaces
[02/12]  matrix parser: parse values(diag,odiag) to nested option
[03/12]  define matrix helpers
[04/12]  get_record_option: extract selection logic to helper

Patches 1-4 are prep work. Functionality-wise, patch 3 is the most interesting. It adds some general helper functions that will be used for working the OMEGA and SIGMA input and creating the map from matrix indices to the record options.

[05/12]  add helper for mapping from parameter space to option

This adds the main behind-the-scenes logic for "indexing" parameter records (gathering the required info and making the parameter space to option map). The docstring of create_param_index is the best place to start to get a high-level overview of what's going on there.

[06/12]  add functions to set theta, omega, and sigma init values
[07/12]  set_param: mention prior records if aborting due to length mismatch
[08/12]  set_param: allow overriding default format specifier
[09/12]  set_{omega,sigma}: support resetting representation
[10/12]  set_theta: support discarding bounds
[11/12]  set_param: drop parens if option was reduced to single item

Patches 6-11 introduce the user-exported functions and make various extensions. Patch 9 is probably the one that deserves the most attention (and is something we haven't discussed), but of course any comments on the interface overall would be very appreciated.

[12/12]  README: drop paragraph about higher-level forms

Housekeeping.


I'm marking this as a draft because I'd like to go over it with fresh eyes (especially the docs, which I haven't read through again) and will likely reroll it with minor changes. However, please feel free to start to take a look and comment. @barrettk it's in a state where I think you can begin wiring it up to bbr whenever you're ready.

cc @seth127

Closes #9.

theta can take the form "(low,,up)", with optional white space between
the two commas.  parse_theta_paren() botches its "has white space?"
check because, when at the first comma, it looks for a sequence of two
white space elements (something the parser would never produce) rather
than a white space element followed by a comma.  This leads to an init
option that uses the up value.

Look for a comma in the second position, as intended.
The values(diag,odiag) form for omega and sigma is stored as an option
with "(diag,odiag)" as the value.  For the upcoming set_{omega,sigma}
feature, we need to be able to set diag and odiag individually.

Parse values(...) into a nested option with diag and odiag options.
These will be used in the upcoming set_{omega,sigma} feature, which
needs to map values from input matrices to the init values in the
parsed records.
The upcoming parameter indexing feature is going to need the core
logic of get_record_option() in a few spots, but for internal code
there's no reason to go through the name resolution.

Pull the logic out into an internal function.
The new function create_param_index() will be used by the upcoming
set_{theta,omega,sigma} functions.
One of the original motivations for nmrec was to support a feature in
bbr where initial estimates for a new model are inherited from the
final estimates of a previous (typically parent) model.

To make that possible, add set_theta(), set_omega(), and set_sigma()
functions that accept a full parameter value---which is the form that
would come from calling bbr::get_{theta,omega,sigma} on the parent
model---and then map each of those values to the corresponding option
in the parsed records.

Two constraints simplify things from the design standpoint:

 * only support updating parameters that are explicitly written in the
   control stream

   This makes it much easier to stick close to the original form
   because we don't have to drastically alter the control stream and
   instead are (mostly) working with options that are already there.

   And that also means we don't need complex logic for determining if
   two representations are the same, and deciding what's best to use
   in a given context.

 * only accept the _full_ parameter

   Even though we only update the explicit values, we always require
   the full parameter to be passed, so we don't need provide an
   explicit interface for selecting a sub-part of vector or matrix.

   Note, however, that the caller can still effectively update a
   subset of the explicit values by setting some input values to NA.

Closes #9.
@kyleam kyleam marked this pull request as ready for review November 16, 2023 13:43
@kyleam kyleam requested a review from barrettk November 16, 2023 13:44
@kyleam kyleam mentioned this pull request Nov 16, 2023
@barrettk
Copy link
Collaborator

barrettk commented Nov 27, 2023

@kyleam I think we may want different handling when keeping the bounds. If two values are found, I would think we'd want to add a middle value as opposed to replacing the higher value:

Keeping Bounds or adding new initial value

> test_case$input_nmrec
$PROBLEM theta bounds - maintain bounds; add starting value if not present

$THETA
(0, 2)      ; KA
(0, 3)      ; CL
(0, 10)     ; V2
(0.02)      ; RUVp
(1)         ; RUVa
(0, 0.5, 1) ; F
(-10, -3.92E+00, 10)
> test_case$replacement
[1] 1 2 3 4 5 6 7

> nmrec::set_theta(test_case$input_nmrec, test_case$replacement)
> test_case$input_nmrec
$PROBLEM theta bounds - maintain bounds; add starting value if not present

$THETA
(0, 1)      ; KA
(0, 2)      ; CL
(0, 3)     ; V2
(4)      ; RUVp
(5)         ; RUVa
(0, 6, 1) ; F
(-10, 7, 10)

Referring to the above example, I think we'd want to return the second record:
image

Discarding Bounds

Additionally, it would be ideal if we could maintain parentheses and any other formatting when discarding the bounds (second record in image below again):

> test_case$input_nmrec
$PROBLEM theta bounds - replace with single value

$THETA
(0, 2)      ; KA
(0, 3)      ; CL
(0, 10)     ; V2
(0.02)      ; RUVp
(1)         ; RUVa
(0, 0.5, 1) ; F
(-10, -3.92E+00, 10)
>     # Copy Thetas
>     nmrec::set_theta(test_case$input_nmrec, test_case$replacement, bounds = "discard")
> test_case$input_nmrec
$PROBLEM theta bounds - replace with single value

$THETA
1      ; KA
2      ; CL
3     ; V2
(4)      ; RUVp
(5)         ; RUVa
6 ; F
7

image

@seth127
Copy link
Collaborator

seth127 commented Nov 27, 2023

I looked over this a fair amount, but alone and with @barrettk , and I think it's looking great. I like how the test cases are set up, and they all made sense to me. @barrettk is going to take a look at the test cases he was using before to see if there are any others we might consider adding. He is also going to begin wiring this up to bbr to see how that looks.

I had two other thoughts to discuss as well:

  1. What do you think about trying to recruit some users to look at the test-set-param.R test cases and make sure they are doing what they would expect? They all make sense to me, but NONMEM is so complex and nuanced that I'm sometimes surprised by what the "expected" behavior is, from a user point of view.
  2. I'm very impressed how few constraints we ended up needing, but I wanted to discuss the constraint about priors. Mainly because this is something that @barrettk saw a lot of, in looking through example control streams, so it would be nice to be able to support it. My understanding is that you can pretty easily filter them by looking at $PRIOR, but I'm sure there is some nuance that I'm missing that makes this more complicated. This might merit a separate thread/issue, so feel free to start one if you'd prefer to isolate this discussion elsewhere.

Thanks again. Great work on this.

@kyleam
Copy link
Collaborator Author

kyleam commented Nov 28, 2023

@barrettk:

I think we may want different handling when keeping the bounds. If two values are found, I would think we'd want to add a middle value as opposed to replacing the higher value:

The two-value form with one comma is (low, init). Why would we want to move to insert a value in between (i.e. move init to up, adding another init) rather than replacing init?

Additionally, it would be ideal if we could maintain parentheses and any other formatting when discarding the bounds (second record in image below again):

Why you think that would be ideal?

This behavior (dropping parens if we've reduced a multi-value options down to one value) comes with ab7fab1. As mentioned in that message, this applies to cases where we are dropping things at the request of a non-default option, so I think it's reasonable to simplify the result.

Given that either way we go in some cases the result may or may not be consistent with neighboring single-value items, I lean toward giving the simplified result. However, if you and @seth127 prefer retaining the parens in these cases, I'm okay with ejecting ab7fab1 from this series.

@kyleam
Copy link
Collaborator Author

kyleam commented Nov 28, 2023

@seth127:

What do you think about trying to recruit some users to look at the test-set-param.R test cases and make sure they are doing what they would expect? [...]

Sure, that sounds good to me.

[...] but I wanted to discuss the constraint about priors. Mainly because this is something that @barrettk saw a lot of, in looking through example control streams, so it would be nice to be able to support it.

Were those examples NONMEM code from projects, or the examples that ship with NONMEM/bbr?

Is there a reason why someone would prefer the use the old form instead of the more informative names? I don't think there is, in which case my view (84a591b) is that it's not worth the added complication and potential for bugs.

My understanding is that you can pretty easily filter them by looking at $PRIOR, but I'm sure there is some nuance that I'm missing that makes this more complicated.

Even accepting that we can rely on prior records, that seems like non-trivial logic that we'd be better off not adding unless there was a strong reason why someone shouldn't just use the informative names. (However, as far as I understand, a PRIOR subroutine can be used instead of a PRIOR record, so looking at the PRIOR record doesn't cover everything.)

If someone wants to use set_{theta,omega,sigma} (or more likely the bbr wrapper), requiring them to transition the control stream to the informative names seems like a small ask, and their model code will be more readable for it.

@barrettk
Copy link
Collaborator

barrettk commented Nov 28, 2023

The two-value form with one comma is (low, init). Why would we want to move to insert a value in between (i.e. move init to up, adding another init) rather than replacing init?

Ahh I think I misunderstood this specification, though it sounds familiar from years ago when I was first learning about NONMEM. I was thinking the bounds were (low, hi) if two values were provided, but I confirmed the two-form you mentioned.

Why you think that would be ideal?

I would have said it would make the tests nicer/easier using the framework we made in bbr, but since we're moving away from that, my main motivation is just the sentiment of wanting to change as little as possible from the original control stream file, and ideally only update the values. It doesn't really feel like a simplification from a non-technical (perhaps scientist) perspective, though I get what you're saying in your commit message. I would ask for @seth127's opinion on this as well though.

  • What do you think about passing representation to set_theta as well, and have that argument toggle a similar feature?

Even accepting that we can rely on prior records, that seems like non-trivial logic that we'd be better off not adding unless there was a strong reason why someone shouldn't just use the informative names. (However, as far as I understand, a PRIOR subroutine can be used instead of a PRIOR record, so looking at the PRIOR record doesn't cover everything.)

If someone wants to use set_{theta,omega,sigma} (or more likely the bbr wrapper), requiring them to transition the control stream to the informative names seems like a small ask, and their model code will be more readable for it.

Im slightly conflicted here. In terms of where I saw the examples involving the older method of specifying priors, I saw a notable number of examples. There are also cases where your error would trigger, but priors are not used. Here is one such example, and related test we had. This is because the second $THETA 4 FIX block is actually used as an alternative method for specifying $OMEGAPD. Though I agree with the sentiment 'we should push scientists to be more explicit and be rigid with what NONMEM syntax we support', I think scientists (and especially clients that use bbr) would end up running into this concern somewhat frequently. Given that, my vote would be to support cases where the expected number of replacements doesnt match the found number of records (including priors and the above case I mentioned).

@kyleam
Copy link
Collaborator Author

kyleam commented Nov 28, 2023

It doesn't really feel like a simplification from a non-technical (perhaps scientist) perspective

Simplification is referring to reducing (X) to X (think of simplifying a math expression), not making any larger claims about what's simpler conceptually.

From a user standpoint, if you want to say (1,2,3), you have no choice but to add the parens, but now the user is requesting 1 and 3 be dropped, and we can't say whether they would have written that as (2) or 2 (and you can find lots of examples one way or the other).

my main motivation is just the sentiment of wanting to change as little as possible from the original control stream file, and ideally only update the values

Given we're touching the value anyway, I think it makes sense to go with the simplified form (but I'm just repeating myself). If others share your preference, we can drop ab7fab1.

What do you think about passing representation to set_theta as well, and have that argument toggle a similar feature?

I'd rather remove the "drop parens" simplification entirely than bloat the interface for a cosmetic detail.

@kyleam
Copy link
Collaborator Author

kyleam commented Nov 28, 2023

Im slightly conflicted here. In terms of where I saw the examples involving the older method of specifying priors, I saw a notable number of examples.

In real project code or NONMEM examples?

Though I agree with the sentiment 'we should push scientists to be more explicit and be rigid with what NONMEM syntax we support', I think scientists (and especially clients that use bbr) would end up running into this concern somewhat frequently.

I'm not convinced that it's a problem to tell them to switch to the the informative names if they want to use the "inherit inits" feature.

@barrettk
Copy link
Collaborator

With regards to the parentheses simplification, I'll defer to Seth. I see the argument for and against it, and would honestly lean towards what Seth thinks scientists would prefer/expect (but dont think it's significant enough to require scientist feedback)

In real project code or NONMEM examples?

Both. Im not sure I could back track to see where I got specific examples from, as I just went through the repos I had to find example cases I had seen before. I think could be something worth asking @callistosp or someone about to get a better feel for how frequent this is seen in:

  • client models (past 3-5 years)
  • older Metrum projects
  • newer projects
    My assumption is that Metrum scientists use the newer, more explicit version of specifying priors now, but that we have a lot of older models that still use the older method (unsure if we need to care about supporting these/how far back this starts). Im also under the assumption that many clients fall under the same boat, and if anything, are further behind in that transition.

Im of the opinion that if we determined/ballparked ~10% of older models and 20% of client models use the older method, it's worth supporting the older method.

I'm not convinced that it's a problem to tell them to switch to the the informative names if they want to use the "inherit inits" feature.

I think if scientists ran into it somewhat frequently, it would be better to support it. I know we develop these packages with mostly our scientists in mind, but given that we've given bbr demos to outside scientists that are transitioning to R/learning to code, I like the idea of reducing the burden on the scientist when possible. The more we can worry about syntax and they can worry about the math/science, the better. That being said, it seems like it could be worth getting scientist feedback before moving further with any refactor

@callistosp
Copy link

My two cents:

  • drop the parens if you are dropping the bounds. They aren't needed at that point and I think most users don't include parens around boundless initial estimates

  • I think most scientists use the new $THETAP/$OMEGAP naming conventions when they are creating a new model. However, if they are using an older model which was developed before this syntax was introduced, they may still have that old syntax included in their model. The new syntax was introduced in NONMEM 7.3 which was released in 2015. I would assume the vast majority of our users will be using NONMEM 7.3 or newer so there shouldn't by anything preventing them from using the THETAP/OMEGAP syntax.
    TL;DR: I don't think it's an unreasonable ask for the user to update any prior blocks to use the THETAP/OMEGAP syntax if they want to use the update inits function.

@barrettk
Copy link
Collaborator

@callistosp Thanks so much for your feedback! Discussing with Seth later today and that will be helpful in coming to a decision.

R/set-param.R Outdated Show resolved Hide resolved
Control streams can use additional $THETA, $OMEGA, and $SIGMA records
to define priors rather than the more informative record names (e.g.,
$THETAP and $THETAPV).  In this case the set_{param} functions will
fail when fed a correctly sized parameter.

create_param_index() could look at the $PRIOR record and discard some
records when indexing, though that would still miss the case where
PRIOR subroutine is used instead of $PRIOR record.

Supporting the use of overloaded $THETA, $OMEGA, and $SIGMA records
isn't worth the complication given that using the informative record
names seem to be best practice and transitioning to them is easy.
Instead just mention this known incompatibility in the error message.
The values for the set_* functions will typically be floating point
numbers (often the final estimates from a previous NONMEM run).  The
default as.character() behavior uses 15 significant digits, which is
too many in the context of initial estimates.

For this reason, param_set_value() formats the value with "%.3G".
That's probably good for most cases, but expose the format string as
an option to let callers override it.
A common source of values for the set_* function will be the final
estimates from another model.  Regardless of how the model for that
run specified OMEGA/SIGMA, NONMEM outputs variances and covariances,
so care must be taken to not feed those values into a control stream
that specifies something other than variance/covariance.

Add an option that lets the caller tell set_omega() and set_sigma()
that options for non-default representations should be ejected from
the modified records/options.

Go with two modes rather than a boolean option because it's not clear
whether we'll end up needing to support more variants.
Go with two modes rather than a boolean option because it's not clear
whether we'll end up needing to support more variants.
Both discarding theta bounds ("(low,init,up)" => "(up)") and dropping
alternative representations from matrix options (e.g., "(SD 1)" =>
"(1)") can lead to a result that no longer needs parentheses.

In general, the set_* functions try to retain all of the original
control stream formatting when updating the init values.  However, in
the cases above, we're already dropping values at the request of a
non-default option, so it seems reasonable to simplify the result.

Note that the parentheses as left as is when updating a single initial
estimate surrounded by parentheses.
The motivation for "getting things in and out of higher-level forms"
was the functionality now covered by the new set_{theta,omega,sigma}
functions.  Those didn't end up needing to construct higher-level
forms from a record, and instead just rely on a map between parameter
space and the record options.
Copy link
Collaborator

@seth127 seth127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is all looking good to me at this point. I think we have agreement on both points discussed above:

  • Don't support old method of specifying priors
  • Drop parens when no longer needed

@kyleam @barrettk did either of you have any outstanding comments or anything on this? If not, I think we're good to merge.

@seth127
Copy link
Collaborator

seth127 commented Dec 4, 2023

In terms of next steps here, I think we probably want a release sooner than later here, right? My motivation is that we'll want to gate the nmrec version on the accompanying bbr function, so we can't really move forward with metrumresearchgroup/bbr#623 until this is actually released. Do both of you agree that that's the right approach?

@kyleam
Copy link
Collaborator Author

kyleam commented Dec 4, 2023

@kyleam @barrettk did either of you have any outstanding comments or anything on this?

No, from my standpoint this is ready to merge. (I'll push a dev version bump on top before merging and tag.)

In terms of next steps here, I think we probably want a release sooner than later here, right?

Yes. I'll plan to do a release this week. There are one or two small things I may try to squeeze in (not functional changes), but that won't hold things up.

My motivation is that we'll want to gate the nmrec version on the accompanying bbr function

A version gate is fine with me. Even better in my view would be gating on the presence of one of these functions, but I know we do the version gate plenty.

so we can't really move forward with metrumresearchgroup/bbr#623 until this is actually released.

I'd say you can move forward in terms of getting that CI wired up to test against this version and merging the PR. It's just that a bbr release is held up by this actually being released.

@barrettk
Copy link
Collaborator

barrettk commented Dec 4, 2023

Just to echo what Seth mentioned - this all looks good to me. Feel free to merge whenever

@kyleam kyleam merged commit 9c83c7d into main Dec 4, 2023
2 checks passed
@kyleam kyleam deleted the set-params branch December 4, 2023 19:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

higher-level helpers for parameters
4 participants