-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement regression #16
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Merging now because the PR is in a pretty good place. |
rikhuijzer
added a commit
that referenced
this pull request
Jun 15, 2023
Follow-up on #16. Works towards #13. ### Results ``` 13×7 DataFrame Row │ Dataset Model Hyperparameters `nfolds` AUC RMS 1.96*SE │ String String String Int64 String String String ─────┼──────────────────────────────────────────────────────────────────────────────────────── 1 │ blobs LGBMClassifier (;) 10 0.99 0.01 2 │ blobs LGBMClassifier (max_depth = 2,) 10 0.99 0.01 3 │ blobs StableRulesClassifier (n_trees = 50,) 10 1.00 0.00 4 │ titanic LGBMClassifier (;) 10 0.87 0.03 5 │ titanic LGBMClassifier (max_depth = 2,) 10 0.85 0.02 6 │ titanic StableForestClassifier (n_trees = 1500,) 10 0.85 0.02 7 │ titanic StableRulesClassifier (n_trees = 1500,) 10 0.83 0.02 8 │ haberman LGBMClassifier (;) 10 0.71 0.06 9 │ haberman LGBMClassifier (max_depth = 2,) 10 0.67 0.05 10 │ haberman StableForestClassifier (n_trees = 1500,) 10 0.70 0.05 11 │ haberman StableRulesClassifier (n_trees = 1500,) 10 0.67 0.04 12 │ boston LinearRegressor (;) 10 0.70 0.05 13 │ boston StableForestRegressor (;) 10 0.66 0.07 ```
rikhuijzer
added a commit
that referenced
this pull request
Jun 21, 2023
in order to find the bug in the `StableRulesRegressor` (#18). ## Notes The bug seems to be related to the too high scores put in the rules: ```julia julia> include("test/mlj.jl") julia> preds[1:5] 5-element Vector{Float64}: 286.98408203125 280.405224609375 306.151708984375 310.74091796875 310.74091796875 julia> rulesmach.fitresult StableRules model with 7 rules: if X[i, :x6] < 6.8 then 48.767 else 65.298 + if X[i, :x11] < 19.2 then 45.919 else 36.811 + if X[i, :x13] < 9.04 then 40.428 else 31.213 + if X[i, :x3] < 3.97 then 22.868 else 18.279 + if X[i, :x10] < 437.0 then 49.438 else 39.331 + if X[i, :x1] < 2.44953 then 50.514 else 39.592 + if X[i, :x5] < 0.52 then 36.275 else 29.05 julia> rulesmach.fitresult.weights 7-element Vector{Float16}: 2.066 1.897 1.486 0.778 2.197 2.275 1.475 julia> rulesmach.fitresult.rules 7-element Vector{SIRUS.Rule}: SIRUS.Rule(TreePath(" X[i, :x6] < 6.8 "), [23.6], [31.6]) SIRUS.Rule(TreePath(" X[i, :x11] < 19.2 "), [24.2], [19.4]) SIRUS.Rule(TreePath(" X[i, :x13] < 9.04 "), [27.2], [21.0]) SIRUS.Rule(TreePath(" X[i, :x3] < 3.97 "), [29.4], [23.5]) SIRUS.Rule(TreePath(" X[i, :x10] < 437.0 "), [22.5], [17.9]) SIRUS.Rule(TreePath(" X[i, :x1] < 2.44953 "), [22.2], [17.4]) SIRUS.Rule(TreePath(" X[i, :x5] < 0.52 "), [24.6], [19.7]) ``` So the summary from the `rules` and `weights` is fine, but the `then` and `otherwise` contents make no sense since `y` is in a different range: ```julia julia> y[1:5] 5-element Vector{Float64}: 24.0 21.6 34.7 33.4 36.2 ``` It could be something else, but the value of the `then` and `otherwise` seem the most likely culprit. On second thought, the weights seem the most likely culprit. Those weights make no sense whereas the `then` and `otherwise` could correspond to `y` values. Works towards fixing #16.
rikhuijzer
added a commit
that referenced
this pull request
Jun 22, 2023
Normalizing the regularized fit on the weights improves the predictive performance from to -1300.0 ± 248 to 0.33 ± 0.04. However, there is still something wrong since it should be near 0.6. Goes towards #16
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Works towards #13.