-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Setting value for max_bin_size will break binning process #14
Comments
Hi, Setting a small max_bin_size might produce no bins if prebins are very heterogeneous. In addition, the parameter min_prebin_size (default value = 0.05) must be <= max_bin_size (default value is None). Could you please provide a reproducible example? Thanks |
Hi @guillermo-navas-palencia ,I have column with distribution like this
When a use OptimalBinning with default parameter, then binning table I get is
I feel that bin 0 contain too much in comparison with others, so I set max_bin_size to 0.4 to hope that algorithm will split bin 0 to smaller bins. But the result is not like what I expect:
If I set max_bin_size =0.5, the result is similar with the first result. Thanks. |
Hi, thanks for all the details. It might be that, given the data distribution, the pre-binning algorithm (CART) considers that the best split to maximize Gini/IV is 2.5. For binary target, to reduce the presence of dominating bins (bin 0) you can try different values of the parameter
|
Thanks you very much @guillermo-navas-palencia for your suggestion. I decrease |
When I run any type binning,especially with continuous binning, and set parameter max_bin_size as expect that each bin fraction size small than this value, I usually get result usually that all of observation belong just 1 bin. I think this may be a mistake.
The text was updated successfully, but these errors were encountered: