Use subclause terminology in docs

rikhuijzer · Nov 23, 2023 · 0714a56 · 0714a56
1 parent 3174533
commit 0714a56
Show file tree

Hide file tree

Showing 2 changed files with 9 additions and 7 deletions.
diff --git a/docs/src/binary-classification.jl b/docs/src/binary-classification.jl
@@ -215,7 +215,7 @@ Therefore, it makes more sense to truncate the rules to somewhere in the range 5
 `max_depth` specifies how many levels the trees have.
 For larger datasets, `max_depth=2` makes the most sense since it can find more complex patterns in the data.
 For smaller datasets, `max_depth=1` makes more sense since it reduces the chance of overfitting.
-It also simplifies the rules because with `max_depth=1`, the rule will contain only one conditional (for example, "if A then ...") versus two conditionals (for example, "if A & B then ...").
+It also simplifies the rules because with `max_depth=1`, the rule will contain only one subclause (for example, "if A then ...") versus two subclauses (for example, "if A & B then ...").
 In some cases, model accuracy can be improved by increasing `n_trees`.
 The higher this number, the more trees are fitted and, hence, the higher the chance that the right rules are extracted from the trees.
 """
@@ -232,8 +232,8 @@ Since we know that the model performs well on the cross-validations, we can fit
 md"""
 ## Visualization
 
-Since our rules are relatively simple with only a binary outcome and only one clause in each rule, the following figure is a way to visualize the obtained rules per fold.
-For multiple clauses, I would not know how to visualize the rules.
+Since our rules are relatively simple with only a binary outcome and only one subclause in each rule, the following figure is a way to visualize the obtained rules per fold.
+For multiple subclauses, I would not know how to visualize the rules.
 Also, this plot is probably not perfect; let me know if you have suggestions.
 
 This figure shows the model uncertainty by visualizing the obtained models for different cross-validation folds.

diff --git a/src/extract.jl b/src/extract.jl
@@ -20,12 +20,14 @@ end
 Estimate the importance of the given `feature_name`.
 The aim is to satisfy the following property:
 
-> Given two features X and Y, if X has more effect on the outcome, then
-> feature_importance(model, X) > feature_importance(model, Y).
+> Given two features A and B, if A has more effect on the outcome, then
+> feature_importance(model, A) > feature_importance(model, B).
 
 !!! note
-    This function provides only an importance _estimate_ because
-    the effect on the outcome depends on the data.
+    This function provides only an importance _estimate_ because the effect on
+    the outcome depends on the data, and because it doesn't take into account
+    that a feature can have a lower effect if it is in a clause together with
+    another subclause.
 """
 function feature_importance(
         model::StableRules,