Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: Data cleaner and standardization functions #52

Closed
billdenney opened this issue Sep 29, 2022 · 11 comments
Closed

Feature request: Data cleaner and standardization functions #52

billdenney opened this issue Sep 29, 2022 · 11 comments

Comments

@billdenney
Copy link
Contributor

I know that nlmixr2 (and I think that it's mainly in rxode2) does data cleaning before the model is sent to the integrator. I don't know where this is done, overall. And, I think that there would be additional needs within nlmixr2est. I think that it would be helpful to simplify, standardize, and centralize this data cleaning.

Related to #45

@billdenney
Copy link
Contributor Author

My overall thought is that there would likely be two functions:

getStandardColumnNames <- function(data, cols) {
...
}

setStandardColumnNames <- function(data, cols) {
...
}

Where if cols is missing, it gets all of the columns that nlmixr2 knows about with standardized names (e.g. "CMT" instead of "CmT", etc.). And the "set" version would do the renaming for standardized column names. If cols is present, then each would only operate on the requested columns.

@mattfidler
Copy link
Member

I think babelmixr2 or even a separate package is a fine place to put something like this.

I am unsure what the value is above what is already provided in babelmixr2

@billdenney billdenney transferred this issue from nlmixr2/rxode2 Sep 30, 2022
@billdenney
Copy link
Contributor Author

Moved issue from rxode2 to babelmixr2

@billdenney
Copy link
Contributor Author

@mattfidler, As I'm trying to implement this for the PKNCA linkage, I'm still wanting some of the original information that isn't present when doing the direct data conversion. As an example, if the user gives an ID column as "id", I'd like to know that and to be able to use that column for the subject identifier. The rationale is that I'd like to ensure that I can have a text subject identifier (e.g. "Study-001-Site-002-Subject-0003") instead of a cleaned, numeric ID.

The PKNCA connection would still work to give initial estimates if I use the cleaned, numeric ID. But, if someone is wanting to track back to "why is this starting clearance 5 when I thought it would be 3", they would not have a link between the "ID" column that PKNCA used and the "ID" column that they originally gave.

So, I think that column name mapping function would still be helpful. I'll make a first pass, and please let me know if it should do something else to improve it.

@mattfidler
Copy link
Member

I think it could be easy enough to use the nlmixrRowNumber and a conversion function to help with this. I could possibly be a thin reference to the merged, standardized dataset.

@mattfidler
Copy link
Member

I am still not clear what this provides. I am sure when you get to it I will understand 😄

@billdenney
Copy link
Contributor Author

😄

I just linked to the function I'm thinking of. The value for the PKNCA link is:

  • The user provides the id columns with unique subject identifiers (like "STUDY1-001-1001"). For interpretability in NCA results, it would be best to have those identifiers reported in the NCA.
    • Or maybe, they give the name of the drug in the cmt column (realizing that it must start with a letter and cannot have dashes or anything that doesn't map to an R name in it).
  • The cleaned column name for id would be ID. PKNCA needs the subject identifier, so the nlmixr2-PKNCA link would go to the ID column and get numbers instead of unique subject identifiers.

Overall, I'm wanting the NCA results to be interpretable so that when they're printed out, the user can have them available for comparison to the final modeled results.

@billdenney
Copy link
Contributor Author

I have an issue with the "nlmixrRowNums" column where all rows are set to 1. I'm having trouble finding where it's set. Can you please help me find where it's set?

@billdenney
Copy link
Contributor Author

Thanks. My issue was that my input was a tibble, so the length was 1. I made a PR to address that.

@billdenney
Copy link
Contributor Author

This is handled now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants