You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is there another way to find out what is happening with max_cat_threhsold without trying to find it in the C++ source code? To save anyone kind enough to reply some time, I'll share my hypothesis and hopefully you can just correct me if I'm wrong?
Suppose there is a categorical variable of cardinality 10 (i.e. 10 unique levels). Also suppose that we set max_cat_threshold = 1. At each split opportunity, the algorithm aggregates the sum(gradients) / sum(hessians) for all the records in each of the 10 categories and then sorts the 10 categories by that ratio from lowest to highest (or highest to lowest...doesn't matter). Then because max_cat_threshold = 1, the algorithm only has one split point to evaluate and this split point is as close to the median (as determined by number of observations or weighted observations) as possible?
I appreciate any help on this!
The text was updated successfully, but these errors were encountered:
Thank you, @guolinke. I realize that max_cat_threshold limits split points, but would you mind commenting on if my description of how it does it more-or-less correct? I really appreciate it.
@pford221
I think most of them are correct.
But for
Then because max_cat_threshold = 1, the algorithm only has one split point to evaluate and this split point is as close to the median (as determined by number of observations or weighted observations) as possible?
, I want to clarify that, the split point will be the one with the highest (or lowest) sum(gradients) / sum(hessians) .
Hi,
Is there another way to find out what is happening with
max_cat_threhsold
without trying to find it in the C++ source code? To save anyone kind enough to reply some time, I'll share my hypothesis and hopefully you can just correct me if I'm wrong?Suppose there is a categorical variable of cardinality 10 (i.e. 10 unique levels). Also suppose that we set
max_cat_threshold = 1
. At each split opportunity, the algorithm aggregates the sum(gradients) / sum(hessians) for all the records in each of the 10 categories and then sorts the 10 categories by that ratio from lowest to highest (or highest to lowest...doesn't matter). Then becausemax_cat_threshold = 1
, the algorithm only has one split point to evaluate and this split point is as close to the median (as determined by number of observations or weighted observations) as possible?I appreciate any help on this!
The text was updated successfully, but these errors were encountered: