Quick questions #40

skwskwskwskw · 2020-06-02T15:48:32Z

Nice package I came across. By the way, may I know:

how to remove special/missing rows from binning_table when doing the plotting?
how to do customised binning.

Thanks.

guillermo-navas-palencia · 2020-06-02T17:10:02Z

Hi,

Currently, there is no option to remove these rows, but it could be added in the next release. I would add parameters add_missing and add_special in binning_table.plot() function, both with default value True.
You can use parameters user_splits and user_splits_fixed. See example http://gnpalencia.org/optbinning/tutorials/tutorial_binary.html#User-defined-split-points.

Let me know if you have more questions.

Best,
Guillermo

…tions #40

skwskwskwskw · 2020-06-03T17:28:20Z

Hi, I am not sure if it has been implemented by perhaps can create rules for the binning (i.e: upper limit column)

guillermo-navas-palencia · 2020-06-03T17:31:07Z

Hi, could you provide an example to clarify?

skwskwskwskw · 2020-06-03T18:11:50Z

Hi, for example on page: http://gnpalencia.org/optbinning/tutorials/tutorial_scorecard_binary_target.html;

say I have another new dataset, how should I score it with the points?

skwskwskwskw · 2020-06-03T18:14:59Z

Hi,

Currently, there is no option to remove these rows, but it could be added in the next release. I would add parameters add_missing and add_special in binning_table.plot() function, both with default value True.

You can use parameters user_splits and user_splits_fixed. See example http://gnpalencia.org/optbinning/tutorials/tutorial_binary.html#User-defined-split-points.

Let me know if you have more questions.

Best,
Guillermo

Also on the above, if I want to run the 'binningprocess', how to just selective split for one variable? Seems like the example is only applicable to single variable.

Many thanks for the prompt response.

guillermo-navas-palencia · 2020-06-03T18:21:11Z

Hi, for example on page: http://gnpalencia.org/optbinning/tutorials/tutorial_scorecard_binary_target.html;

say I have another new dataset, how should I score it with the points?

you can score it with the method score, but you need to make sure that the new dataset contains the same columns that the one used for training.

guillermo-navas-palencia · 2020-06-03T18:46:04Z

Hi,

Currently, there is no option to remove these rows, but it could be added in the next release. I would add parameters add_missing and add_special in binning_table.plot() function, both with default value True.

You can use parameters user_splits and user_splits_fixed. See example http://gnpalencia.org/optbinning/tutorials/tutorial_binary.html#User-defined-split-points.

Let me know if you have more questions.
Best,
Guillermo

Also on the above, if I want to run the 'binningprocess', how to just selective split for one variable? Seems like the example is only applicable to single variable.

Many thanks for the prompt response.

The BinningProcess includes the parameter binning_fit_params. Example:

binning_fit_params = {
    "my_variable": {"user_splits": [1.2, 1.8, 2.3], "user_splits_fixed" = [False, True, True]}
}

See example: http://gnpalencia.org/optbinning/tutorials/tutorial_binning_process_FICO_xAI.html

skwskwskwskw · 2020-06-04T15:20:44Z

Nice.... Thanks =)

skwskwskwskw · 2020-06-04T17:08:50Z

Thanks but I think the above is not working (so far it only works with float without missing). I would attach the data here with the code I used here together with the errors.

(a) For variable (dtype = float) with missing
`params = {"I1": {"user_splits": [ 1.,2.,3.,4.,5.,np.nan], "user_splits_fixed" : [True, True, True, True,True, True ]}}
optb = BinningProcess(['O1','O2','F1','I1'], categorical_variables=['O1','O2'], binning_fit_params = params)

optb.fit(ch.drop(columns = 'TARGET', axis =1 ), ch.TARGET)`

Error:

(b) For Integer (dtype = int) without missing
`params = {"I1": {"user_splits": [ 0., 1., 2., 4., 3., 5., 6., 9., 12., 7., 8.],
"user_splits_fixed" : [True,True, True, True, True,True, True,True, True, True, True ]}}
optb = BinningProcess(['O1','O2','F1','I1'], categorical_variables=['O1','O2'], binning_fit_params = params)

optb.fit(ch.drop(columns = 'TARGET', axis =1 ), ch.TARGET)`

Error:

(c) For object (dtype = object)
with missing:
`params = {"I1": {"user_splits": [ '2.0', '5.0', '1.0', '4.0', '3.0', np.nan],
"user_splits_fixed" : [True,True, True, True, True,True ]}}
optb = BinningProcess(['O1','O2','F1','I1'], categorical_variables=['O1','O2'], binning_fit_params = params)

optb.fit(ch.drop(columns = 'TARGET', axis =1 ), ch.TARGET)`

Error:

Attachment of sample data:
to_check.zip

guillermo-navas-palencia · 2020-06-04T17:16:10Z

Ok, I will look into it. Note that adding np.nan in user_splits is not correct, np.nan is not a split or bin. A bin with nan will be automatically added if the datasets include missing values.

skwskwskwskw · 2020-06-05T13:20:45Z

Ya, you're right. Even after changing that I can run for the case (b) - int without missing. The rest still hitting errors. Thanks.

guillermo-navas-palencia · 2020-06-05T18:33:34Z

Hi,

Case a)

from optbinning import OptimalBinning

x = df["I1"]
y = df["TARGET"]

optb = OptimalBinning(user_splits=[1.,2.,3.,4.,5],
                      user_splits_fixed=[True, True, True, True, True])
optb.fit(x, y)

optb.binning_table.build()

This will return no bins because the "auto" monotonicity constraint is activated.

Try:

optb = OptimalBinning(user_splits=[1.,2.,3.,4.,5],
                      user_splits_fixed=[True, True, True, True, True],
                      monotonic_trend=None)
optb.fit(x, y)

optb.binning_table.build()

Case b)

optb = OptimalBinning(user_splits=[ 0., 1., 2., 4., 3., 5., 6., 9., 12., 7., 8.],
                      user_splits_fixed=[True,True, True, True, True,True, True,True, True, True, True])

This will return the following exception: This is due to the impossibility of computing the WoE and Information Value metric when the number of event or non-event records per bin is 0.

ValueError: Fixed user_splits [ 0. 12.  7.  8.] are removed because produce pure prebins. Provide different splits to be fixed.

Case c)

First, variable "I1" does not contain strings but numbers, therefore that error is justified. However, I have encountered a few errors when using user_splits_fixed and dtype="categorical". I will look into it. Note that categorical bins are list of list, for example: user_splits=[[2], [5], [1], [4], [3]].

The problem with BinningProcess is that it is not passing correctly the parameters user_splits and user_splits_fixed. This is easy to fix.

Thanks, I will keep you posted.

guillermo-navas-palencia · 2020-06-07T16:00:15Z

Hi,

The current version in the master branch includes several bug fixes, and now it should work as expected.

Case a)

params = {"I1": {"user_splits": [1.,2.,3.,4.,5],
                 "user_splits_fixed": [True, True, True, True, True]}}

binning_process = BinningProcess(variable_names=variable_names,
                                 categorical_variables=['O1', 'O2'],
                                 binning_fit_params=params)

binning_process.fit(df.drop(columns='TARGET', axis=1), df.TARGET)

binning_process.summary()

Case b)

params = {"I1": {"user_splits": [ 0., 1., 2., 4., 3., 5., 6., 9., 12., 7., 8.],
                 "user_splits_fixed": [True, True, True, True, True, True, True,
                                       True, True, True, True]}}

binning_process = BinningProcess(variable_names=variable_names,
                                 categorical_variables=['O1', 'O2'],
                                 binning_fit_params=params)

binning_process.fit(df.drop(columns='TARGET', axis=1), df.TARGET)

As above, this will return the following exception:

ValueError: Fixed user_splits [ 0. 12.  7.  8.] are removed because produce pure prebins. Provide different splits to be fixed.

Case c) In this example, the variable 'I1' is treated as nominal (categorical).

params = {"I1": {"user_splits": [[2], [5], [1], [4], [3]],
                 "user_splits_fixed": [True, True, True, True, True]}}

binning_process = BinningProcess(variable_names=variable_names,
                                 categorical_variables=['I1', 'O1', 'O2'],
                                 binning_fit_params=params)

binning_process.fit(df.drop(columns='TARGET', axis=1), df.TARGET)

binning_process.summary()

Binning table:

optb = binning_process.get_binned_variable("I1")
optb.binning_table.build()

optb.binning_table.plot(metric="event_rate")

guillermo-navas-palencia · 2020-06-07T21:40:39Z

Version 0.6.1 is available with the discussed bug fixes.

Thanks for your feedback!

skwskwskwskw · 2020-06-08T05:29:22Z

Many thanks. By the way, I think you haven't add the plots for the plot options to remove bins which have 0 count right?

guillermo-navas-palencia · 2020-06-08T05:55:46Z

Options add_special and add_missing are included. The default value is set to True. If you want to hide these bins from the plot, set add_special and/or add_missing to False.

skwskwskwskw · 2020-06-08T06:23:15Z

Awesome! Many thanks.

skwskwskwskw changed the title ~~quick question~~ Quick questions Jun 2, 2020

guillermo-navas-palencia mentioned this issue Jun 2, 2020

Add parameters add_missing and add_special in binning_table.plot() function #41

Closed

guillermo-navas-palencia added a commit that referenced this issue Jun 2, 2020

Add parameters add_special and add_missing to plot binning table func…

6995cf4

…tions #40

guillermo-navas-palencia added enhancement New feature or request question Further information is requested labels Jun 2, 2020

guillermo-navas-palencia closed this as completed Jun 9, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quick questions #40

Quick questions #40

skwskwskwskw commented Jun 2, 2020 •

edited

Loading

guillermo-navas-palencia commented Jun 2, 2020

skwskwskwskw commented Jun 3, 2020

guillermo-navas-palencia commented Jun 3, 2020

skwskwskwskw commented Jun 3, 2020

skwskwskwskw commented Jun 3, 2020

guillermo-navas-palencia commented Jun 3, 2020

guillermo-navas-palencia commented Jun 3, 2020

skwskwskwskw commented Jun 4, 2020

skwskwskwskw commented Jun 4, 2020

guillermo-navas-palencia commented Jun 4, 2020

skwskwskwskw commented Jun 5, 2020 •

edited

Loading

guillermo-navas-palencia commented Jun 5, 2020

guillermo-navas-palencia commented Jun 7, 2020 •

edited

Loading

guillermo-navas-palencia commented Jun 7, 2020

skwskwskwskw commented Jun 8, 2020

guillermo-navas-palencia commented Jun 8, 2020

skwskwskwskw commented Jun 8, 2020

Quick questions #40

Quick questions #40

Comments

skwskwskwskw commented Jun 2, 2020 • edited Loading

guillermo-navas-palencia commented Jun 2, 2020

skwskwskwskw commented Jun 3, 2020

guillermo-navas-palencia commented Jun 3, 2020

skwskwskwskw commented Jun 3, 2020

skwskwskwskw commented Jun 3, 2020

guillermo-navas-palencia commented Jun 3, 2020

guillermo-navas-palencia commented Jun 3, 2020

skwskwskwskw commented Jun 4, 2020

skwskwskwskw commented Jun 4, 2020

guillermo-navas-palencia commented Jun 4, 2020

skwskwskwskw commented Jun 5, 2020 • edited Loading

guillermo-navas-palencia commented Jun 5, 2020

guillermo-navas-palencia commented Jun 7, 2020 • edited Loading

guillermo-navas-palencia commented Jun 7, 2020

skwskwskwskw commented Jun 8, 2020

guillermo-navas-palencia commented Jun 8, 2020

skwskwskwskw commented Jun 8, 2020

skwskwskwskw commented Jun 2, 2020 •

edited

Loading

skwskwskwskw commented Jun 5, 2020 •

edited

Loading

guillermo-navas-palencia commented Jun 7, 2020 •

edited

Loading