small cleans by Nowosad · Pull Request #6 · goergen95/CAST

Nowosad · 2026-03-19T12:43:35Z

Hi Darius,

I started reviewing the current state of the code with the focus on aoa and trainDI. I have not yet looked at knn-engine, because my brain is fried enough with trying to understand the code.

I made some small changes to the code, but they are only cosmetic -- no real action changes.

I also have the first set of comments/suggestions:

General:

Start internal funs with .

AOA:

You are modifying "train" inside drop_unknown_levels (or rather pretending to modify, because the result is not returned)
Maybe we can simplify drop_unknown_levels to (it gives the same results on my small tests)

drop_unknown_levels <- function(train, newdata, catvar) {
  train_levels <- levels(droplevels(train[[catvar]]))
  newdata[[catvar]] <- factor(newdata[[catvar]], levels = train_levels)
  newdata
}

We use the "percentage" phrase in validate_LPD, but the actual number is not in percents (it is 0 to 1), so maybe we should use the "proportion" phrase instead?
"process_categorical_variables" -- maybe a better name for this function would be recommended, e.g., encode_categorical_vars? convert_factors_to_dummy?
Is the message below in the right place? Given that the most time-consuming calculations are before it -- do we even need it?

  if (verbose) {
    message("Computing AOA...")
  }

trainDI:

aoa_get_train, aoa_get_variables, aoa_get_weights, etc. -- I think all of these functions should have better names. They extract information from caret models, so caret_get_x or caretmodel_get_x, or similar?
user_weights is also not the best function name (to understand what it does)
train_backup -- a better object name here would be welcomed
"if(!inherits(weight, "error")){" -- I am not a fan of allowing the errors to be stored in the object and then providing them further; cannot we use empty data.frame/NA/NULL instead?

Best,
J.

goergen95 · 2026-03-19T13:54:46Z

Thanks for reviewing and the suggestions! I'll merge your changes right away and I'll start working on some of your propositions separately. Some reactions to your comment:

internal functions: agreed, we should start them with a dot. I also like your suggestion with reaming depending on their functionality (e.g. caret_x()). I'd suggest to create helper files (caret-helpers.R) to better organize corresponding functionality. This could also be a nice basis for future adapter methods for other ML frameworks.
your suggestion for drop_unknown_levels looks solid and should work. ~~Since there are no unknown levels in train this function purposely only modifies newdata.~~ I'd guess its saver to also modify the train object and drop unused levels there too. There is also some duplication with categorical data handling in trainDI and elsewhere in the package which should be further centralized
the message("Computing AOA...") was there before I started the refactor. I tried to modify user facing behavior only were absolutely needed so far. But maybe we can be more liberate in changing existing behavior here?
your comments on trainDI touch on pre-existing issues and I agree on all of them. Most of them should be solved if we opt for a better organization of the internal helpers (see my first comment).

small cleans

e1aabf3

goergen95 merged commit a93ea8a into goergen95:distances Mar 19, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

small cleans#6

small cleans#6
goergen95 merged 1 commit intogoergen95:distancesfrom
Nowosad:jn-p1

Nowosad commented Mar 19, 2026

Uh oh!

goergen95 commented Mar 19, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Nowosad commented Mar 19, 2026

Uh oh!

goergen95 commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

goergen95 commented Mar 19, 2026 •

edited

Loading