Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setting value for max_bin_size will break binning process #14

Closed
nic9lif3 opened this issue Mar 23, 2020 · 4 comments
Closed

Setting value for max_bin_size will break binning process #14

nic9lif3 opened this issue Mar 23, 2020 · 4 comments

Comments

@nic9lif3
Copy link

When I run any type binning,especially with continuous binning, and set parameter max_bin_size as expect that each bin fraction size small than this value, I usually get result usually that all of observation belong just 1 bin. I think this may be a mistake.

@guillermo-navas-palencia
Copy link
Owner

Hi,

Setting a small max_bin_size might produce no bins if prebins are very heterogeneous. In addition, the parameter min_prebin_size (default value = 0.05) must be <= max_bin_size (default value is None). Could you please provide a reproducible example?

Thanks

@nic9lif3
Copy link
Author

nic9lif3 commented Mar 23, 2020

Hi @guillermo-navas-palencia ,I have column with distribution like this

Value Raito
2 0.266009
3 0.191488
1 0.175111
4 0.165533
5 0.0925422
6 0.044826
7 0.0193812
8 0.00901054
9 0.00504364
10 0.00187011
11 0.00136008
12 0.000850051
13 0.000453361
14 0.00028335

When a use OptimalBinning with default parameter, then binning table I get is

Bin Count Count (%) Non-event Event Event rate WoE IV JS
0 [-inf, 2.50) 7784 0.44112 7672 112 0.0143885 -0.015700910587103323 0.000109578 1.36971e-05
1 [2.50, 3.50) 3379 0.191488 3331 48 0.0142054 -0.0027078285694939197 1.4059e-06 1.75738e-07
2 [3.50, 4.50) 2921 0.165533 2892 29 0.00992811 0.359873097436898 0.0180819 0.00224811
3 [4.50, 5.50) 1633 0.0925422 1610 23 0.0140845 0.005960586194076356 3.27839e-06 4.09798e-07
4 [5.50, inf) 1466 0.0830783 1445 21 0.0143247 -0.011192493032173623 1.04642e-05 1.30801e-06
5 Special 0 0 0 0 0 0.0 0 0
6 Missing 463 0.0262382 446 17 0.0367171 -0.9754290478914349 0.041321 0.00496963
Totals 17646 1 17396 250 0.0141675 0.0595276 0.00723334

I feel that bin 0 contain too much in comparison with others, so I set max_bin_size to 0.4 to hope that algorithm will split bin 0 to smaller bins. But the result is not like what I expect:

Bin Count Count (%) Non-event Event Event rate WoE IV JS
0 [-inf, inf) 17183 0.973762 16950 233 0.0135599 0.04445000338761318 0.00188299 0.000235354
1 Special 0 0 0 0 0 0.0 0 0
2 Missing 463 0.0262382 446 17 0.0367171 -0.9754290478914349 0.041321 0.00496963
Totals 17646 1 17396 250 0.0141675 0.043204 0.00520499

If I set max_bin_size =0.5, the result is similar with the first result.

Thanks.

@guillermo-navas-palencia
Copy link
Owner

Hi, thanks for all the details.

It might be that, given the data distribution, the pre-binning algorithm (CART) considers that the best split to maximize Gini/IV is 2.5. For binary target, to reduce the presence of dominating bins (bin 0) you can try different values of the parameter gamma: http://gnpalencia.org/optbinning/tutorials/tutorial_binary.html#Reduction-of-dominating-bins. Other approaches that you might try are:

  • Pass your splits using option user_splits.
  • Use a smaller min_prebin_size and a larger max_n_prebins. Then, set min_bin_size=0.05 and max_bin_size=0.4.

@nic9lif3
Copy link
Author

nic9lif3 commented Mar 23, 2020

Thanks you very much @guillermo-navas-palencia for your suggestion. I decrease min_prebin_size and increase max_n_prebins then value max_bin_size takes expect effect.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants