Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature weighting for decision trees and random forests #17307

Open
jhogg11 opened this issue May 21, 2020 · 6 comments
Open

Feature weighting for decision trees and random forests #17307

jhogg11 opened this issue May 21, 2020 · 6 comments

Comments

@jhogg11
Copy link

jhogg11 commented May 21, 2020

Basic idea: Allow decision trees and random forests (and possibly other ensemble algorithms) to have feature weightings that make the selection of specific features as split points more or less likely.

https://www.researchgate.net/publication/220338672_Random_feature_weights_for_decision_tree_ensemble_construction

I was looking to see if anything similar existed and came across the above paper. If I'm understanding the method correctly, the idea is to use random feature weights to create highly diverse ensembles that outperform other common methods, particularly where noise is present.

A general feature_weights parameter could allow for tuning based on domain knowledge or other methods that might vastly improve the ability of tree or forest to generalize. Random forests could accept a set of weights to be used on all forests or an array of weights, one for each tree estimator.

In full disclosure, I'm really just getting my feet wet in terms of decision trees, random forests, etc. Feel free to let me know if this is wrong or does not make sense.

@NicolasHug
Copy link
Member

thanks for the suggestion @jhogg11 but this does not seem to meet our inclusion criteria in terms of number of citations https://scikit-learn.org/stable/faq.html#what-are-the-inclusion-criteria-for-new-algorithms

Would you know of any examples where this has been successfully used?

@jhogg11
Copy link
Author

jhogg11 commented May 21, 2020

I'm not aware of any real world implementations. I just found the paper by searching after noticing that certain models would get worse in terms of generalization after adding additional features. It seems like a natural extension of decision trees, especially given that there are weighting parameters for samples and classes, and I would think it would provide a significant opportunity to improve the effectiveness of random forests.

Personally, I would not call it a new algorithm but rather a modification to an existing one. Understandable if there is disagreement about that.

@xuyxu
Copy link

xuyxu commented May 22, 2020

@NicolasHug @jhogg11
Here is one example in Anomaly Detection: Robust Random Cut Forest Based Anomaly Detection On Streams (https://proceedings.mlr.press/v48/guha16.pdf):

Unlike Isolation Forest that randomly selects one attribute at each internal node, robust cut tree in this paper selects one attribute with the probability proportional to the input range on that attribute (corresponds to feature_weights here).

@jhogg11
Copy link
Author

jhogg11 commented May 23, 2020

Here's another paper that takes a slightly different approach to feature weights:
http://www.aclweb.org/old_anthology/O/O08/O08-6001.pdf

Rather than retaining all features as candidates at every split point but altering the likelihood of each feature being chosen (like the paper referenced in the OP - "RFW"), this paper proposes only using a subset of features at each decision point but weighting the features to be more or less likely to be included the subset ("IRFA"). IRFA is also asserted to be better than vanilla random forests.

I think RFW is the better and more generalized of the two approaches (and the paper is more in-depth in terms of comparisons with other tree-based algorithms). RFW does not preclude using a subset of weights and might have the same effect as IRFA, whereas IRFA does preclude using the full set of features.

One other thing that might be worth mentioning is that feature weighting is already available in the nearest neighbor algorithms in sklearn through the w parameter of wminkowski distance.

https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.DistanceMetric.html

I look at decision trees as being similar to KNN in that they both attempt to use near neighbors but with slightly different ways of getting there (hyperrectangle vs a radius based on the Kth nearest neighbor). Maybe someone with more knowledge can comment, but tuning feature weights so that a given feature is more or less likely to be selected for a split point seems highly analogous to tuning feature weights to make features more or less likely to affect the nearest neighbor determination.

@jnothman
Copy link
Member

jnothman commented May 25, 2020 via email

@jhogg11
Copy link
Author

jhogg11 commented May 30, 2020

Can this be implemented as an extension Splitter to the current code base? I'm not sure how easily we should justify adding it to the primary implementation, as every piece of added complexity comes with substantial maintenance cost.

Would this be as simple as adding an additional feature_weights parameter and applying the weights to current_proxy_improvement in the splitter node_split method, e.g., something like this?

current_proxy_improvement = self.criterion.proxy_impurity_improvement()
current_proxy_improvement *= self.feature_weights[current.feature]      # new line

if current_proxy_improvement > best_proxy_improvement:
   ....

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants