Implement regression #16

rikhuijzer · 2023-06-09T11:02:36Z

Works towards #13.

rikhuijzer · 2023-06-15T08:19:57Z

Merging now because the PR is in a pretty good place.

Follow-up on #16. Works towards #13. ### Results ``` 13×7 DataFrame Row │ Dataset Model Hyperparameters `nfolds` AUC RMS 1.96*SE │ String String String Int64 String String String ─────┼──────────────────────────────────────────────────────────────────────────────────────── 1 │ blobs LGBMClassifier (;) 10 0.99 0.01 2 │ blobs LGBMClassifier (max_depth = 2,) 10 0.99 0.01 3 │ blobs StableRulesClassifier (n_trees = 50,) 10 1.00 0.00 4 │ titanic LGBMClassifier (;) 10 0.87 0.03 5 │ titanic LGBMClassifier (max_depth = 2,) 10 0.85 0.02 6 │ titanic StableForestClassifier (n_trees = 1500,) 10 0.85 0.02 7 │ titanic StableRulesClassifier (n_trees = 1500,) 10 0.83 0.02 8 │ haberman LGBMClassifier (;) 10 0.71 0.06 9 │ haberman LGBMClassifier (max_depth = 2,) 10 0.67 0.05 10 │ haberman StableForestClassifier (n_trees = 1500,) 10 0.70 0.05 11 │ haberman StableRulesClassifier (n_trees = 1500,) 10 0.67 0.04 12 │ boston LinearRegressor (;) 10 0.70 0.05 13 │ boston StableForestRegressor (;) 10 0.66 0.07 ```

in order to find the bug in the `StableRulesRegressor` (#18). ## Notes The bug seems to be related to the too high scores put in the rules: ```julia julia> include("test/mlj.jl") julia> preds[1:5] 5-element Vector{Float64}: 286.98408203125 280.405224609375 306.151708984375 310.74091796875 310.74091796875 julia> rulesmach.fitresult StableRules model with 7 rules: if X[i, :x6] < 6.8 then 48.767 else 65.298 + if X[i, :x11] < 19.2 then 45.919 else 36.811 + if X[i, :x13] < 9.04 then 40.428 else 31.213 + if X[i, :x3] < 3.97 then 22.868 else 18.279 + if X[i, :x10] < 437.0 then 49.438 else 39.331 + if X[i, :x1] < 2.44953 then 50.514 else 39.592 + if X[i, :x5] < 0.52 then 36.275 else 29.05 julia> rulesmach.fitresult.weights 7-element Vector{Float16}: 2.066 1.897 1.486 0.778 2.197 2.275 1.475 julia> rulesmach.fitresult.rules 7-element Vector{SIRUS.Rule}: SIRUS.Rule(TreePath(" X[i, :x6] < 6.8 "), [23.6], [31.6]) SIRUS.Rule(TreePath(" X[i, :x11] < 19.2 "), [24.2], [19.4]) SIRUS.Rule(TreePath(" X[i, :x13] < 9.04 "), [27.2], [21.0]) SIRUS.Rule(TreePath(" X[i, :x3] < 3.97 "), [29.4], [23.5]) SIRUS.Rule(TreePath(" X[i, :x10] < 437.0 "), [22.5], [17.9]) SIRUS.Rule(TreePath(" X[i, :x1] < 2.44953 "), [22.2], [17.4]) SIRUS.Rule(TreePath(" X[i, :x5] < 0.52 "), [24.6], [19.7]) ``` So the summary from the `rules` and `weights` is fine, but the `then` and `otherwise` contents make no sense since `y` is in a different range: ```julia julia> y[1:5] 5-element Vector{Float64}: 24.0 21.6 34.7 33.4 36.2 ``` It could be something else, but the value of the `then` and `otherwise` seem the most likely culprit. On second thought, the weights seem the most likely culprit. Those weights make no sense whereas the `then` and `otherwise` could correspond to `y` values. Works towards fixing #16.

Normalizing the regularized fit on the weights improves the predictive performance from to -1300.0 ± 248 to 0.33 ± 0.04. However, there is still something wrong since it should be near 0.6. Goes towards #16

rikhuijzer added 15 commits June 9, 2023 13:02

Implement regression

0531782

Update names

04eec11

Update Docs.yml

84bc757

Small changes

6897767

Move colnames out of the way

ba83656

Work output_type into the logic

d0c2d0b

Set a docs CI timeout

eb57bfb

Switch to abstract type Algorithm end

cb90539

Define ClassificationLeaf and RegressionLeaf

a5472c0

Split classification and regression logic into separate files

7d48b5e

Refactor tests

ef47cb7

Pull classification logic out of _split

e60600b

Implement RSS

34483ce

Test _split for both algo's in test/forest.jl

d3f1920

Test a regression tree

913905c

rikhuijzer marked this pull request as ready for review June 15, 2023 08:19

rikhuijzer merged commit 1fb92bb into main Jun 15, 2023
4 checks passed

rikhuijzer deleted the rh/regression branch June 15, 2023 08:20

rikhuijzer mentioned this pull request Jun 15, 2023

Implement StableForestRegressor #17

Merged

rikhuijzer mentioned this pull request Jun 21, 2023

Document the rules extraction and refactor #19

Merged

rikhuijzer mentioned this pull request Jun 22, 2023

Add a test for a bug in the rule extraction #20

Merged

rikhuijzer mentioned this pull request Jun 22, 2023

Make rule removal depend on gap in output #22

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement regression #16

Implement regression #16

rikhuijzer commented Jun 9, 2023 •

edited

Loading

rikhuijzer commented Jun 15, 2023

Implement regression #16

Implement regression #16

Conversation

rikhuijzer commented Jun 9, 2023 • edited Loading

rikhuijzer commented Jun 15, 2023

rikhuijzer commented Jun 9, 2023 •

edited

Loading