export functions for setting init values #18

kyleam · 2023-11-16T03:34:38Z

This series implements the nmrec-side for the bbr "inherit inits" feature (metrumresearchgroup/bbr#582, metrumresearchgroup/bbr#608). It adds three functions for setting parameter init values: set_theta, set_omega, and set_sigma.

Series overview

[01/12]  theta parser: fix handling of no-init form with spaces
[02/12]  matrix parser: parse values(diag,odiag) to nested option
[03/12]  define matrix helpers
[04/12]  get_record_option: extract selection logic to helper

Patches 1-4 are prep work. Functionality-wise, patch 3 is the most interesting. It adds some general helper functions that will be used for working the OMEGA and SIGMA input and creating the map from matrix indices to the record options.

[05/12]  add helper for mapping from parameter space to option

This adds the main behind-the-scenes logic for "indexing" parameter records (gathering the required info and making the parameter space to option map). The docstring of create_param_index is the best place to start to get a high-level overview of what's going on there.

[06/12]  add functions to set theta, omega, and sigma init values
[07/12]  set_param: mention prior records if aborting due to length mismatch
[08/12]  set_param: allow overriding default format specifier
[09/12]  set_{omega,sigma}: support resetting representation
[10/12]  set_theta: support discarding bounds
[11/12]  set_param: drop parens if option was reduced to single item

Patches 6-11 introduce the user-exported functions and make various extensions. Patch 9 is probably the one that deserves the most attention (and is something we haven't discussed), but of course any comments on the interface overall would be very appreciated.

[12/12]  README: drop paragraph about higher-level forms

Housekeeping.

I'm marking this as a draft because I'd like to go over it with fresh eyes (especially the docs, which I haven't read through again) and will likely reroll it with minor changes. However, please feel free to start to take a look and comment. @barrettk it's in a state where I think you can begin wiring it up to bbr whenever you're ready.

cc @seth127

Closes #9.

theta can take the form "(low,,up)", with optional white space between the two commas. parse_theta_paren() botches its "has white space?" check because, when at the first comma, it looks for a sequence of two white space elements (something the parser would never produce) rather than a white space element followed by a comma. This leads to an init option that uses the up value. Look for a comma in the second position, as intended.

The values(diag,odiag) form for omega and sigma is stored as an option with "(diag,odiag)" as the value. For the upcoming set_{omega,sigma} feature, we need to be able to set diag and odiag individually. Parse values(...) into a nested option with diag and odiag options.

These will be used in the upcoming set_{omega,sigma} feature, which needs to map values from input matrices to the init values in the parsed records.

The upcoming parameter indexing feature is going to need the core logic of get_record_option() in a few spots, but for internal code there's no reason to go through the name resolution. Pull the logic out into an internal function.

The new function create_param_index() will be used by the upcoming set_{theta,omega,sigma} functions.

One of the original motivations for nmrec was to support a feature in bbr where initial estimates for a new model are inherited from the final estimates of a previous (typically parent) model. To make that possible, add set_theta(), set_omega(), and set_sigma() functions that accept a full parameter value---which is the form that would come from calling bbr::get_{theta,omega,sigma} on the parent model---and then map each of those values to the corresponding option in the parsed records. Two constraints simplify things from the design standpoint: * only support updating parameters that are explicitly written in the control stream This makes it much easier to stick close to the original form because we don't have to drastically alter the control stream and instead are (mostly) working with options that are already there. And that also means we don't need complex logic for determining if two representations are the same, and deciding what's best to use in a given context. * only accept the _full_ parameter Even though we only update the explicit values, we always require the full parameter to be passed, so we don't need provide an explicit interface for selecting a sub-part of vector or matrix. Note, however, that the caller can still effectively update a subset of the explicit values by setting some input values to NA. Closes #9.

barrettk · 2023-11-27T18:35:41Z

@kyleam I think we may want different handling when keeping the bounds. If two values are found, I would think we'd want to add a middle value as opposed to replacing the higher value:

Keeping Bounds or adding new initial value

> test_case$input_nmrec
$PROBLEM theta bounds - maintain bounds; add starting value if not present

$THETA
(0, 2)      ; KA
(0, 3)      ; CL
(0, 10)     ; V2
(0.02)      ; RUVp
(1)         ; RUVa
(0, 0.5, 1) ; F
(-10, -3.92E+00, 10)
> test_case$replacement
[1] 1 2 3 4 5 6 7

> nmrec::set_theta(test_case$input_nmrec, test_case$replacement)
> test_case$input_nmrec
$PROBLEM theta bounds - maintain bounds; add starting value if not present

$THETA
(0, 1)      ; KA
(0, 2)      ; CL
(0, 3)     ; V2
(4)      ; RUVp
(5)         ; RUVa
(0, 6, 1) ; F
(-10, 7, 10)

Referring to the above example, I think we'd want to return the second record:

Discarding Bounds

Additionally, it would be ideal if we could maintain parentheses and any other formatting when discarding the bounds (second record in image below again):

> test_case$input_nmrec
$PROBLEM theta bounds - replace with single value

$THETA
(0, 2)      ; KA
(0, 3)      ; CL
(0, 10)     ; V2
(0.02)      ; RUVp
(1)         ; RUVa
(0, 0.5, 1) ; F
(-10, -3.92E+00, 10)
>     # Copy Thetas
>     nmrec::set_theta(test_case$input_nmrec, test_case$replacement, bounds = "discard")
> test_case$input_nmrec
$PROBLEM theta bounds - replace with single value

$THETA
1      ; KA
2      ; CL
3     ; V2
(4)      ; RUVp
(5)         ; RUVa
6 ; F
7

seth127 · 2023-11-27T19:45:25Z

I looked over this a fair amount, but alone and with @barrettk , and I think it's looking great. I like how the test cases are set up, and they all made sense to me. @barrettk is going to take a look at the test cases he was using before to see if there are any others we might consider adding. He is also going to begin wiring this up to bbr to see how that looks.

I had two other thoughts to discuss as well:

What do you think about trying to recruit some users to look at the test-set-param.R test cases and make sure they are doing what they would expect? They all make sense to me, but NONMEM is so complex and nuanced that I'm sometimes surprised by what the "expected" behavior is, from a user point of view.
I'm very impressed how few constraints we ended up needing, but I wanted to discuss the constraint about priors. Mainly because this is something that @barrettk saw a lot of, in looking through example control streams, so it would be nice to be able to support it. My understanding is that you can pretty easily filter them by looking at $PRIOR, but I'm sure there is some nuance that I'm missing that makes this more complicated. This might merit a separate thread/issue, so feel free to start one if you'd prefer to isolate this discussion elsewhere.

Thanks again. Great work on this.

kyleam · 2023-11-28T15:29:36Z

@barrettk:

I think we may want different handling when keeping the bounds. If two values are found, I would think we'd want to add a middle value as opposed to replacing the higher value:

The two-value form with one comma is (low, init). Why would we want to move to insert a value in between (i.e. move init to up, adding another init) rather than replacing init?

Additionally, it would be ideal if we could maintain parentheses and any other formatting when discarding the bounds (second record in image below again):

Why you think that would be ideal?

This behavior (dropping parens if we've reduced a multi-value options down to one value) comes with ab7fab1. As mentioned in that message, this applies to cases where we are dropping things at the request of a non-default option, so I think it's reasonable to simplify the result.

Given that either way we go in some cases the result may or may not be consistent with neighboring single-value items, I lean toward giving the simplified result. However, if you and @seth127 prefer retaining the parens in these cases, I'm okay with ejecting ab7fab1 from this series.

kyleam · 2023-11-28T15:52:39Z

@seth127:

What do you think about trying to recruit some users to look at the test-set-param.R test cases and make sure they are doing what they would expect? [...]

Sure, that sounds good to me.

[...] but I wanted to discuss the constraint about priors. Mainly because this is something that @barrettk saw a lot of, in looking through example control streams, so it would be nice to be able to support it.

Were those examples NONMEM code from projects, or the examples that ship with NONMEM/bbr?

Is there a reason why someone would prefer the use the old form instead of the more informative names? I don't think there is, in which case my view (84a591b) is that it's not worth the added complication and potential for bugs.

My understanding is that you can pretty easily filter them by looking at $PRIOR, but I'm sure there is some nuance that I'm missing that makes this more complicated.

Even accepting that we can rely on prior records, that seems like non-trivial logic that we'd be better off not adding unless there was a strong reason why someone shouldn't just use the informative names. (However, as far as I understand, a PRIOR subroutine can be used instead of a PRIOR record, so looking at the PRIOR record doesn't cover everything.)

If someone wants to use set_{theta,omega,sigma} (or more likely the bbr wrapper), requiring them to transition the control stream to the informative names seems like a small ask, and their model code will be more readable for it.

barrettk · 2023-11-28T16:04:57Z

The two-value form with one comma is (low, init). Why would we want to move to insert a value in between (i.e. move init to up, adding another init) rather than replacing init?

Ahh I think I misunderstood this specification, though it sounds familiar from years ago when I was first learning about NONMEM. I was thinking the bounds were (low, hi) if two values were provided, but I confirmed the two-form you mentioned.

Why you think that would be ideal?

I would have said it would make the tests nicer/easier using the framework we made in bbr, but since we're moving away from that, my main motivation is just the sentiment of wanting to change as little as possible from the original control stream file, and ideally only update the values. It doesn't really feel like a simplification from a non-technical (perhaps scientist) perspective, though I get what you're saying in your commit message. I would ask for @seth127's opinion on this as well though.

What do you think about passing representation to set_theta as well, and have that argument toggle a similar feature?

Even accepting that we can rely on prior records, that seems like non-trivial logic that we'd be better off not adding unless there was a strong reason why someone shouldn't just use the informative names. (However, as far as I understand, a PRIOR subroutine can be used instead of a PRIOR record, so looking at the PRIOR record doesn't cover everything.)

If someone wants to use set_{theta,omega,sigma} (or more likely the bbr wrapper), requiring them to transition the control stream to the informative names seems like a small ask, and their model code will be more readable for it.

Im slightly conflicted here. In terms of where I saw the examples involving the older method of specifying priors, I saw a notable number of examples. There are also cases where your error would trigger, but priors are not used. Here is one such example, and related test we had. This is because the second $THETA 4 FIX block is actually used as an alternative method for specifying $OMEGAPD. Though I agree with the sentiment 'we should push scientists to be more explicit and be rigid with what NONMEM syntax we support', I think scientists (and especially clients that use bbr) would end up running into this concern somewhat frequently. Given that, my vote would be to support cases where the expected number of replacements doesnt match the found number of records (including priors and the above case I mentioned).

kyleam · 2023-11-28T18:18:35Z

It doesn't really feel like a simplification from a non-technical (perhaps scientist) perspective

Simplification is referring to reducing (X) to X (think of simplifying a math expression), not making any larger claims about what's simpler conceptually.

From a user standpoint, if you want to say (1,2,3), you have no choice but to add the parens, but now the user is requesting 1 and 3 be dropped, and we can't say whether they would have written that as (2) or 2 (and you can find lots of examples one way or the other).

my main motivation is just the sentiment of wanting to change as little as possible from the original control stream file, and ideally only update the values

Given we're touching the value anyway, I think it makes sense to go with the simplified form (but I'm just repeating myself). If others share your preference, we can drop ab7fab1.

What do you think about passing representation to set_theta as well, and have that argument toggle a similar feature?

I'd rather remove the "drop parens" simplification entirely than bloat the interface for a cosmetic detail.

kyleam · 2023-11-28T18:23:00Z

Im slightly conflicted here. In terms of where I saw the examples involving the older method of specifying priors, I saw a notable number of examples.

In real project code or NONMEM examples?

Though I agree with the sentiment 'we should push scientists to be more explicit and be rigid with what NONMEM syntax we support', I think scientists (and especially clients that use bbr) would end up running into this concern somewhat frequently.

I'm not convinced that it's a problem to tell them to switch to the the informative names if they want to use the "inherit inits" feature.

barrettk · 2023-11-28T18:46:02Z

With regards to the parentheses simplification, I'll defer to Seth. I see the argument for and against it, and would honestly lean towards what Seth thinks scientists would prefer/expect (but dont think it's significant enough to require scientist feedback)

In real project code or NONMEM examples?

Both. Im not sure I could back track to see where I got specific examples from, as I just went through the repos I had to find example cases I had seen before. I think could be something worth asking @callistosp or someone about to get a better feel for how frequent this is seen in:

client models (past 3-5 years)
older Metrum projects
newer projects
My assumption is that Metrum scientists use the newer, more explicit version of specifying priors now, but that we have a lot of older models that still use the older method (unsure if we need to care about supporting these/how far back this starts). Im also under the assumption that many clients fall under the same boat, and if anything, are further behind in that transition.

Im of the opinion that if we determined/ballparked ~10% of older models and 20% of client models use the older method, it's worth supporting the older method.

I'm not convinced that it's a problem to tell them to switch to the the informative names if they want to use the "inherit inits" feature.

I think if scientists ran into it somewhat frequently, it would be better to support it. I know we develop these packages with mostly our scientists in mind, but given that we've given bbr demos to outside scientists that are transitioning to R/learning to code, I like the idea of reducing the burden on the scientist when possible. The more we can worry about syntax and they can worry about the math/science, the better. That being said, it seems like it could be worth getting scientist feedback before moving further with any refactor

callistosp · 2023-11-30T15:33:52Z

My two cents:

drop the parens if you are dropping the bounds. They aren't needed at that point and I think most users don't include parens around boundless initial estimates
I think most scientists use the new $THETAP/$OMEGAP naming conventions when they are creating a new model. However, if they are using an older model which was developed before this syntax was introduced, they may still have that old syntax included in their model. The new syntax was introduced in NONMEM 7.3 which was released in 2015. I would assume the vast majority of our users will be using NONMEM 7.3 or newer so there shouldn't by anything preventing them from using the THETAP/OMEGAP syntax.
TL;DR: I don't think it's an unreasonable ask for the user to update any prior blocks to use the THETAP/OMEGAP syntax if they want to use the update inits function.

barrettk · 2023-11-30T16:34:17Z

@callistosp Thanks so much for your feedback! Discussing with Seth later today and that will be helpful in coming to a decision.

tests/testthat/test-set-param.R

R/set-param.R

Control streams can use additional $THETA, $OMEGA, and $SIGMA records to define priors rather than the more informative record names (e.g., $THETAP and $THETAPV). In this case the set_{param} functions will fail when fed a correctly sized parameter. create_param_index() could look at the $PRIOR record and discard some records when indexing, though that would still miss the case where PRIOR subroutine is used instead of $PRIOR record. Supporting the use of overloaded $THETA, $OMEGA, and $SIGMA records isn't worth the complication given that using the informative record names seem to be best practice and transitioning to them is easy. Instead just mention this known incompatibility in the error message.

The values for the set_* functions will typically be floating point numbers (often the final estimates from a previous NONMEM run). The default as.character() behavior uses 15 significant digits, which is too many in the context of initial estimates. For this reason, param_set_value() formats the value with "%.3G". That's probably good for most cases, but expose the format string as an option to let callers override it.

A common source of values for the set_* function will be the final estimates from another model. Regardless of how the model for that run specified OMEGA/SIGMA, NONMEM outputs variances and covariances, so care must be taken to not feed those values into a control stream that specifies something other than variance/covariance. Add an option that lets the caller tell set_omega() and set_sigma() that options for non-default representations should be ejected from the modified records/options. Go with two modes rather than a boolean option because it's not clear whether we'll end up needing to support more variants.

Go with two modes rather than a boolean option because it's not clear whether we'll end up needing to support more variants.

Both discarding theta bounds ("(low,init,up)" => "(up)") and dropping alternative representations from matrix options (e.g., "(SD 1)" => "(1)") can lead to a result that no longer needs parentheses. In general, the set_* functions try to retain all of the original control stream formatting when updating the init values. However, in the cases above, we're already dropping values at the request of a non-default option, so it seems reasonable to simplify the result. Note that the parentheses as left as is when updating a single initial estimate surrounded by parentheses.

The motivation for "getting things in and out of higher-level forms" was the functionality now covered by the new set_{theta,omega,sigma} functions. Those didn't end up needing to construct higher-level forms from a record, and instead just rely on a map between parameter space and the record options.

seth127

This is all looking good to me at this point. I think we have agreement on both points discussed above:

Don't support old method of specifying priors
Drop parens when no longer needed

@kyleam @barrettk did either of you have any outstanding comments or anything on this? If not, I think we're good to merge.

seth127 · 2023-12-04T16:49:48Z

In terms of next steps here, I think we probably want a release sooner than later here, right? My motivation is that we'll want to gate the nmrec version on the accompanying bbr function, so we can't really move forward with metrumresearchgroup/bbr#623 until this is actually released. Do both of you agree that that's the right approach?

kyleam · 2023-12-04T16:59:03Z

@kyleam @barrettk did either of you have any outstanding comments or anything on this?

No, from my standpoint this is ready to merge. (I'll push a dev version bump on top before merging and tag.)

In terms of next steps here, I think we probably want a release sooner than later here, right?

Yes. I'll plan to do a release this week. There are one or two small things I may try to squeeze in (not functional changes), but that won't hold things up.

My motivation is that we'll want to gate the nmrec version on the accompanying bbr function

A version gate is fine with me. Even better in my view would be gating on the presence of one of these functions, but I know we do the version gate plenty.

so we can't really move forward with metrumresearchgroup/bbr#623 until this is actually released.

I'd say you can move forward in terms of getting that CI wired up to test against this version and merging the PR. It's just that a bbr release is held up by this actually being released.

barrettk · 2023-12-04T19:17:46Z

Just to echo what Seth mentioned - this all looks good to me. Feel free to merge whenever

kyleam added 6 commits November 15, 2023 18:55

define matrix helpers

9f08bf3

These will be used in the upcoming set_{omega,sigma} feature, which needs to map values from input matrices to the init values in the parsed records.

get_record_option: extract selection logic to helper

7420524

The upcoming parameter indexing feature is going to need the core logic of get_record_option() in a few spots, but for internal code there's no reason to go through the name resolution. Pull the logic out into an internal function.

add helper for mapping from parameter space to option

e5db1cc

The new function create_param_index() will be used by the upcoming set_{theta,omega,sigma} functions.

kyleam force-pushed the set-params branch from 0b902e4 to 4fba405 Compare November 16, 2023 13:43

kyleam marked this pull request as ready for review November 16, 2023 13:43

kyleam requested a review from barrettk November 16, 2023 13:44

kyleam mentioned this pull request Nov 16, 2023

Two documentation fixes #19

Merged

seth127 mentioned this pull request Nov 28, 2023

inherit_param_estimates() Edge Cases metrumresearchgroup/bbr#585

Closed

2 tasks

seth127 mentioned this pull request Nov 28, 2023

Creation of new model with inital parameter values from final parameter values of prior model metrumresearchgroup/bbr#582

Closed

barrettk reviewed Nov 30, 2023

View reviewed changes

tests/testthat/test-set-param.R Show resolved Hide resolved

seth127 reviewed Nov 30, 2023

View reviewed changes

R/set-param.R Outdated Show resolved Hide resolved

kyleam added 6 commits November 30, 2023 13:57

set_theta: support discarding bounds

9d90939

Go with two modes rather than a boolean option because it's not clear whether we'll end up needing to support more variants.

kyleam force-pushed the set-params branch from 4fba405 to bdc6ead Compare November 30, 2023 22:39

seth127 approved these changes Dec 4, 2023

View reviewed changes

DESCRIPTION: bump dev version

fd0d1f7

kyleam merged commit 9c83c7d into main Dec 4, 2023
2 checks passed

kyleam deleted the set-params branch December 4, 2023 19:47

kyleam mentioned this pull request Jan 24, 2024

Get initial parameter estimates and tweak initial parameter estimates metrumresearchgroup/bbr#635

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

export functions for setting init values #18

export functions for setting init values #18

kyleam commented Nov 16, 2023 •

edited

Loading

barrettk commented Nov 27, 2023 •

edited

Loading

seth127 commented Nov 27, 2023 •

edited

Loading

kyleam commented Nov 28, 2023

kyleam commented Nov 28, 2023

barrettk commented Nov 28, 2023 •

edited

Loading

kyleam commented Nov 28, 2023 •

edited

Loading

kyleam commented Nov 28, 2023

barrettk commented Nov 28, 2023

callistosp commented Nov 30, 2023

barrettk commented Nov 30, 2023

seth127 left a comment

seth127 commented Dec 4, 2023

kyleam commented Dec 4, 2023

barrettk commented Dec 4, 2023

export functions for setting init values #18

export functions for setting init values #18

Conversation

kyleam commented Nov 16, 2023 • edited Loading

Series overview

barrettk commented Nov 27, 2023 • edited Loading

Keeping Bounds or adding new initial value

Discarding Bounds

seth127 commented Nov 27, 2023 • edited Loading

kyleam commented Nov 28, 2023

kyleam commented Nov 28, 2023

barrettk commented Nov 28, 2023 • edited Loading

kyleam commented Nov 28, 2023 • edited Loading

kyleam commented Nov 28, 2023

barrettk commented Nov 28, 2023

callistosp commented Nov 30, 2023

barrettk commented Nov 30, 2023

seth127 left a comment

Choose a reason for hiding this comment

seth127 commented Dec 4, 2023

kyleam commented Dec 4, 2023

barrettk commented Dec 4, 2023

kyleam commented Nov 16, 2023 •

edited

Loading

barrettk commented Nov 27, 2023 •

edited

Loading

seth127 commented Nov 27, 2023 •

edited

Loading

barrettk commented Nov 28, 2023 •

edited

Loading

kyleam commented Nov 28, 2023 •

edited

Loading