`max_cat_threshold` Hyperparameter #2261

pford221 · 2019-07-12T03:54:27Z

Hi,

Is there another way to find out what is happening with max_cat_threhsold without trying to find it in the C++ source code? To save anyone kind enough to reply some time, I'll share my hypothesis and hopefully you can just correct me if I'm wrong?

Suppose there is a categorical variable of cardinality 10 (i.e. 10 unique levels). Also suppose that we set max_cat_threshold = 1. At each split opportunity, the algorithm aggregates the sum(gradients) / sum(hessians) for all the records in each of the 10 categories and then sorts the 10 categories by that ratio from lowest to highest (or highest to lowest...doesn't matter). Then because max_cat_threshold = 1, the algorithm only has one split point to evaluate and this split point is as close to the median (as determined by number of observations or weighted observations) as possible?

I appreciate any help on this!

The text was updated successfully, but these errors were encountered:

guolinke · 2019-07-12T04:48:39Z

@pford221 yeah, you are right. max_cat_threshold is to limit the categorical split points.

pford221 · 2019-07-12T04:56:18Z

Thank you, @guolinke. I realize that max_cat_threshold limits split points, but would you mind commenting on if my description of how it does it more-or-less correct? I really appreciate it.

guolinke · 2019-07-12T05:56:25Z

@pford221
I think most of them are correct.
But for

Then because max_cat_threshold = 1, the algorithm only has one split point to evaluate and this split point is as close to the median (as determined by number of observations or weighted observations) as possible?

, I want to clarify that, the split point will be the one with the highest (or lowest) sum(gradients) / sum(hessians) .

StrikerRUS closed this as completed Jul 18, 2019

lock bot locked as resolved and limited conversation to collaborators Mar 11, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`max_cat_threshold` Hyperparameter #2261

`max_cat_threshold` Hyperparameter #2261

pford221 commented Jul 12, 2019 •

edited

guolinke commented Jul 12, 2019

pford221 commented Jul 12, 2019

guolinke commented Jul 12, 2019

max_cat_threshold Hyperparameter #2261

max_cat_threshold Hyperparameter #2261

Comments

pford221 commented Jul 12, 2019 • edited

guolinke commented Jul 12, 2019

pford221 commented Jul 12, 2019

guolinke commented Jul 12, 2019

`max_cat_threshold` Hyperparameter #2261

`max_cat_threshold` Hyperparameter #2261

pford221 commented Jul 12, 2019 •

edited