New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature weighting for decision trees and random forests #17307
Comments
thanks for the suggestion @jhogg11 but this does not seem to meet our inclusion criteria in terms of number of citations https://scikit-learn.org/stable/faq.html#what-are-the-inclusion-criteria-for-new-algorithms Would you know of any examples where this has been successfully used? |
I'm not aware of any real world implementations. I just found the paper by searching after noticing that certain models would get worse in terms of generalization after adding additional features. It seems like a natural extension of decision trees, especially given that there are weighting parameters for samples and classes, and I would think it would provide a significant opportunity to improve the effectiveness of random forests. Personally, I would not call it a new algorithm but rather a modification to an existing one. Understandable if there is disagreement about that. |
@NicolasHug @jhogg11 Unlike Isolation Forest that randomly selects one attribute at each internal node, robust cut tree in this paper selects one attribute with the probability proportional to the input range on that attribute (corresponds to |
Here's another paper that takes a slightly different approach to feature weights: Rather than retaining all features as candidates at every split point but altering the likelihood of each feature being chosen (like the paper referenced in the OP - "RFW"), this paper proposes only using a subset of features at each decision point but weighting the features to be more or less likely to be included the subset ("IRFA"). IRFA is also asserted to be better than vanilla random forests. I think RFW is the better and more generalized of the two approaches (and the paper is more in-depth in terms of comparisons with other tree-based algorithms). RFW does not preclude using a subset of weights and might have the same effect as IRFA, whereas IRFA does preclude using the full set of features. One other thing that might be worth mentioning is that feature weighting is already available in the nearest neighbor algorithms in sklearn through the https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.DistanceMetric.html I look at decision trees as being similar to KNN in that they both attempt to use near neighbors but with slightly different ways of getting there (hyperrectangle vs a radius based on the Kth nearest neighbor). Maybe someone with more knowledge can comment, but tuning feature weights so that a given feature is more or less likely to be selected for a split point seems highly analogous to tuning feature weights to make features more or less likely to affect the nearest neighbor determination. |
Can this be implemented as an extension Splitter to the current code base?
I'm not sure how easily we should justify adding it to the primary
implementation, as every piece of added complexity comes with substantial
maintenance cost.
|
Would this be as simple as adding an additional
|
Basic idea: Allow decision trees and random forests (and possibly other ensemble algorithms) to have feature weightings that make the selection of specific features as split points more or less likely.
https://www.researchgate.net/publication/220338672_Random_feature_weights_for_decision_tree_ensemble_construction
I was looking to see if anything similar existed and came across the above paper. If I'm understanding the method correctly, the idea is to use random feature weights to create highly diverse ensembles that outperform other common methods, particularly where noise is present.
A general
feature_weights
parameter could allow for tuning based on domain knowledge or other methods that might vastly improve the ability of tree or forest to generalize. Random forests could accept a set of weights to be used on all forests or an array of weights, one for each tree estimator.In full disclosure, I'm really just getting my feet wet in terms of decision trees, random forests, etc. Feel free to let me know if this is wrong or does not make sense.
The text was updated successfully, but these errors were encountered: