SIRUS.StableForestClassifier
— TypeStableForestClassifier(;
+julia> mach = machine(StableRulesClassifier(; max_rules=15), X, y);
Arguments
rng
: Random number generator.StableRNGs
are advised.partial_sampling
: Ratio of samples to use in each subset of the data. The default of 0.7 should be fine for most cases.n_trees
: The number of trees to use. The higher the number, the more likely it is that the correct rules are extracted from the trees, but also the longer model fitting will take. In most cases, 1000 rules should be more than enough, but it might be useful to run 2000 rules one time and verify that the model performance does not change much.max_depth
: The depth of the tree. A lower depth decreases model complexity and can therefore improve accuracy when the sample size is small (reduce overfitting).q
: Number of cutpoints to use per feature. The default value of 10 should be good for most situations.min_data_in_leaf
: Minimum number of data points per leaf.max_rules
: This is the most important hyperparameter. In general, the more rules, the more accurate the model. However, more rules will also decrease model interpretability. So, it is important to find a good balance here. In most cases, 10-40 rules should provide reasonable accuracy while remaining interpretable.lambda
: The weights of the final rules are determined via a regularized regression over each rule as a binary feature. This hyperparameter specifies the strength of the ridge (L2) regularizer. Since the rules are quite strongly correlated, the ridge regularizer is the most useful to stabilize the weight estimates.