You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I started reviewing the current state of the code with the focus on aoa and trainDI. I have not yet looked at knn-engine, because my brain is fried enough with trying to understand the code.
I made some small changes to the code, but they are only cosmetic -- no real action changes.
I also have the first set of comments/suggestions:
General:
Start internal funs with .
AOA:
You are modifying "train" inside drop_unknown_levels (or rather pretending to modify, because the result is not returned)
Maybe we can simplify drop_unknown_levels to (it gives the same results on my small tests)
We use the "percentage" phrase in validate_LPD, but the actual number is not in percents (it is 0 to 1), so maybe we should use the "proportion" phrase instead?
"process_categorical_variables" -- maybe a better name for this function would be recommended, e.g., encode_categorical_vars? convert_factors_to_dummy?
Is the message below in the right place? Given that the most time-consuming calculations are before it -- do we even need it?
if (verbose) {
message("Computing AOA...")
}
trainDI:
aoa_get_train, aoa_get_variables, aoa_get_weights, etc. -- I think all of these functions should have better names. They extract information from caret models, so caret_get_x or caretmodel_get_x, or similar?
user_weights is also not the best function name (to understand what it does)
train_backup -- a better object name here would be welcomed
"if(!inherits(weight, "error")){" -- I am not a fan of allowing the errors to be stored in the object and then providing them further; cannot we use empty data.frame/NA/NULL instead?
Thanks for reviewing and the suggestions! I'll merge your changes right away and I'll start working on some of your propositions separately. Some reactions to your comment:
internal functions: agreed, we should start them with a dot. I also like your suggestion with reaming depending on their functionality (e.g. caret_x()). I'd suggest to create helper files (caret-helpers.R) to better organize corresponding functionality. This could also be a nice basis for future adapter methods for other ML frameworks.
your suggestion for drop_unknown_levels looks solid and should work. Since there are no unknown levels in train this function purposely only modifies newdata. I'd guess its saver to also modify the train object and drop unused levels there too. There is also some duplication with categorical data handling in trainDI and elsewhere in the package which should be further centralized
the message("Computing AOA...") was there before I started the refactor. I tried to modify user facing behavior only were absolutely needed so far. But maybe we can be more liberate in changing existing behavior here?
your comments on trainDI touch on pre-existing issues and I agree on all of them. Most of them should be solved if we opt for a better organization of the internal helpers (see my first comment).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Hi Darius,
I started reviewing the current state of the code with the focus on aoa and trainDI. I have not yet looked at knn-engine, because my brain is fried enough with trying to understand the code.
I made some small changes to the code, but they are only cosmetic -- no real action changes.
I also have the first set of comments/suggestions:
General:
AOA:
trainDI:
Best,
J.