Skip to content

Commit

Permalink
Use subclause terminology in docs
Browse files Browse the repository at this point in the history
  • Loading branch information
rikhuijzer committed Nov 23, 2023
1 parent 3174533 commit 0714a56
Show file tree
Hide file tree
Showing 2 changed files with 9 additions and 7 deletions.
6 changes: 3 additions & 3 deletions docs/src/binary-classification.jl
Original file line number Diff line number Diff line change
Expand Up @@ -215,7 +215,7 @@ Therefore, it makes more sense to truncate the rules to somewhere in the range 5
`max_depth` specifies how many levels the trees have.
For larger datasets, `max_depth=2` makes the most sense since it can find more complex patterns in the data.
For smaller datasets, `max_depth=1` makes more sense since it reduces the chance of overfitting.
It also simplifies the rules because with `max_depth=1`, the rule will contain only one conditional (for example, "if A then ...") versus two conditionals (for example, "if A & B then ...").
It also simplifies the rules because with `max_depth=1`, the rule will contain only one subclause (for example, "if A then ...") versus two subclauses (for example, "if A & B then ...").
In some cases, model accuracy can be improved by increasing `n_trees`.
The higher this number, the more trees are fitted and, hence, the higher the chance that the right rules are extracted from the trees.
"""
Expand All @@ -232,8 +232,8 @@ Since we know that the model performs well on the cross-validations, we can fit
md"""
## Visualization
Since our rules are relatively simple with only a binary outcome and only one clause in each rule, the following figure is a way to visualize the obtained rules per fold.
For multiple clauses, I would not know how to visualize the rules.
Since our rules are relatively simple with only a binary outcome and only one subclause in each rule, the following figure is a way to visualize the obtained rules per fold.
For multiple subclauses, I would not know how to visualize the rules.
Also, this plot is probably not perfect; let me know if you have suggestions.
This figure shows the model uncertainty by visualizing the obtained models for different cross-validation folds.
Expand Down
10 changes: 6 additions & 4 deletions src/extract.jl
Original file line number Diff line number Diff line change
Expand Up @@ -20,12 +20,14 @@ end
Estimate the importance of the given `feature_name`.
The aim is to satisfy the following property:
> Given two features X and Y, if X has more effect on the outcome, then
> feature_importance(model, X) > feature_importance(model, Y).
> Given two features A and B, if A has more effect on the outcome, then
> feature_importance(model, A) > feature_importance(model, B).
!!! note
This function provides only an importance _estimate_ because
the effect on the outcome depends on the data.
This function provides only an importance _estimate_ because the effect on
the outcome depends on the data, and because it doesn't take into account
that a feature can have a lower effect if it is in a clause together with
another subclause.
"""
function feature_importance(
model::StableRules,
Expand Down

0 comments on commit 0714a56

Please sign in to comment.