Skip to content

small cleans#6

Merged
goergen95 merged 1 commit intogoergen95:distancesfrom
Nowosad:jn-p1
Mar 19, 2026
Merged

small cleans#6
goergen95 merged 1 commit intogoergen95:distancesfrom
Nowosad:jn-p1

Conversation

@Nowosad
Copy link
Copy Markdown

@Nowosad Nowosad commented Mar 19, 2026

Hi Darius,

I started reviewing the current state of the code with the focus on aoa and trainDI. I have not yet looked at knn-engine, because my brain is fried enough with trying to understand the code.

I made some small changes to the code, but they are only cosmetic -- no real action changes.

I also have the first set of comments/suggestions:

General:

  1. Start internal funs with .

AOA:

  1. You are modifying "train" inside drop_unknown_levels (or rather pretending to modify, because the result is not returned)
  2. Maybe we can simplify drop_unknown_levels to (it gives the same results on my small tests)
drop_unknown_levels <- function(train, newdata, catvar) {
  train_levels <- levels(droplevels(train[[catvar]]))
  newdata[[catvar]] <- factor(newdata[[catvar]], levels = train_levels)
  newdata
}
  1. We use the "percentage" phrase in validate_LPD, but the actual number is not in percents (it is 0 to 1), so maybe we should use the "proportion" phrase instead?
  2. "process_categorical_variables" -- maybe a better name for this function would be recommended, e.g., encode_categorical_vars? convert_factors_to_dummy?
  3. Is the message below in the right place? Given that the most time-consuming calculations are before it -- do we even need it?
  if (verbose) {
    message("Computing AOA...")
  }

trainDI:

  1. aoa_get_train, aoa_get_variables, aoa_get_weights, etc. -- I think all of these functions should have better names. They extract information from caret models, so caret_get_x or caretmodel_get_x, or similar?
  2. user_weights is also not the best function name (to understand what it does)
  3. train_backup -- a better object name here would be welcomed
  4. "if(!inherits(weight, "error")){" -- I am not a fan of allowing the errors to be stored in the object and then providing them further; cannot we use empty data.frame/NA/NULL instead?

Best,
J.

@goergen95
Copy link
Copy Markdown
Owner

goergen95 commented Mar 19, 2026

Thanks for reviewing and the suggestions! I'll merge your changes right away and I'll start working on some of your propositions separately. Some reactions to your comment:

  • internal functions: agreed, we should start them with a dot. I also like your suggestion with reaming depending on their functionality (e.g. caret_x()). I'd suggest to create helper files (caret-helpers.R) to better organize corresponding functionality. This could also be a nice basis for future adapter methods for other ML frameworks.

  • your suggestion for drop_unknown_levels looks solid and should work. Since there are no unknown levels in train this function purposely only modifies newdata. I'd guess its saver to also modify the train object and drop unused levels there too. There is also some duplication with categorical data handling in trainDI and elsewhere in the package which should be further centralized

  • the message("Computing AOA...") was there before I started the refactor. I tried to modify user facing behavior only were absolutely needed so far. But maybe we can be more liberate in changing existing behavior here?

  • your comments on trainDI touch on pre-existing issues and I agree on all of them. Most of them should be solved if we opt for a better organization of the internal helpers (see my first comment).

@goergen95 goergen95 merged commit a93ea8a into goergen95:distances Mar 19, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants