Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quick questions #40

Closed
skwskwskwskw opened this issue Jun 2, 2020 · 17 comments
Closed

Quick questions #40

skwskwskwskw opened this issue Jun 2, 2020 · 17 comments
Labels
enhancement New feature or request question Further information is requested

Comments

@skwskwskwskw
Copy link

skwskwskwskw commented Jun 2, 2020

Nice package I came across. By the way, may I know:

  1. how to remove special/missing rows from binning_table when doing the plotting?
  2. how to do customised binning.

Thanks.

@skwskwskwskw skwskwskwskw changed the title quick question Quick questions Jun 2, 2020
@guillermo-navas-palencia
Copy link
Owner

Hi,

  1. Currently, there is no option to remove these rows, but it could be added in the next release. I would add parameters add_missing and add_special in binning_table.plot() function, both with default value True.
  2. You can use parameters user_splits and user_splits_fixed. See example http://gnpalencia.org/optbinning/tutorials/tutorial_binary.html#User-defined-split-points.

Let me know if you have more questions.

Best,
Guillermo

@skwskwskwskw
Copy link
Author

Hi, I am not sure if it has been implemented by perhaps can create rules for the binning (i.e: upper limit column)

@guillermo-navas-palencia
Copy link
Owner

Hi, could you provide an example to clarify?

@skwskwskwskw
Copy link
Author

Hi, for example on page: http://gnpalencia.org/optbinning/tutorials/tutorial_scorecard_binary_target.html;

say I have another new dataset, how should I score it with the points?

@skwskwskwskw
Copy link
Author

Hi,

  1. Currently, there is no option to remove these rows, but it could be added in the next release. I would add parameters add_missing and add_special in binning_table.plot() function, both with default value True.
  2. You can use parameters user_splits and user_splits_fixed. See example http://gnpalencia.org/optbinning/tutorials/tutorial_binary.html#User-defined-split-points.

Let me know if you have more questions.

Best,
Guillermo

Also on the above, if I want to run the 'binningprocess', how to just selective split for one variable? Seems like the example is only applicable to single variable.

Many thanks for the prompt response.

@guillermo-navas-palencia
Copy link
Owner

Hi, for example on page: http://gnpalencia.org/optbinning/tutorials/tutorial_scorecard_binary_target.html;

say I have another new dataset, how should I score it with the points?

you can score it with the method score, but you need to make sure that the new dataset contains the same columns that the one used for training.

@guillermo-navas-palencia
Copy link
Owner

Hi,

  1. Currently, there is no option to remove these rows, but it could be added in the next release. I would add parameters add_missing and add_special in binning_table.plot() function, both with default value True.
  2. You can use parameters user_splits and user_splits_fixed. See example http://gnpalencia.org/optbinning/tutorials/tutorial_binary.html#User-defined-split-points.

Let me know if you have more questions.
Best,
Guillermo

Also on the above, if I want to run the 'binningprocess', how to just selective split for one variable? Seems like the example is only applicable to single variable.

Many thanks for the prompt response.

The BinningProcess includes the parameter binning_fit_params. Example:

binning_fit_params = {
    "my_variable": {"user_splits": [1.2, 1.8, 2.3], "user_splits_fixed" = [False, True, True]}
}

See example: http://gnpalencia.org/optbinning/tutorials/tutorial_binning_process_FICO_xAI.html

@skwskwskwskw
Copy link
Author

Nice.... Thanks =)

@skwskwskwskw
Copy link
Author

Thanks but I think the above is not working (so far it only works with float without missing). I would attach the data here with the code I used here together with the errors.

(a) For variable (dtype = float) with missing
`params = {"I1": {"user_splits": [ 1.,2.,3.,4.,5.,np.nan], "user_splits_fixed" : [True, True, True, True,True, True ]}}
optb = BinningProcess(['O1','O2','F1','I1'], categorical_variables=['O1','O2'], binning_fit_params = params)

optb.fit(ch.drop(columns = 'TARGET', axis =1 ), ch.TARGET)`

Error:
image

(b) For Integer (dtype = int) without missing
`params = {"I1": {"user_splits": [ 0., 1., 2., 4., 3., 5., 6., 9., 12., 7., 8.],
"user_splits_fixed" : [True,True, True, True, True,True, True,True, True, True, True ]}}
optb = BinningProcess(['O1','O2','F1','I1'], categorical_variables=['O1','O2'], binning_fit_params = params)

optb.fit(ch.drop(columns = 'TARGET', axis =1 ), ch.TARGET)`

Error:
image

(c) For object (dtype = object)
with missing:
`params = {"I1": {"user_splits": [ '2.0', '5.0', '1.0', '4.0', '3.0', np.nan],
"user_splits_fixed" : [True,True, True, True, True,True ]}}
optb = BinningProcess(['O1','O2','F1','I1'], categorical_variables=['O1','O2'], binning_fit_params = params)

optb.fit(ch.drop(columns = 'TARGET', axis =1 ), ch.TARGET)`

Error:
image

Attachment of sample data:
to_check.zip

@guillermo-navas-palencia
Copy link
Owner

Ok, I will look into it. Note that adding np.nan in user_splits is not correct, np.nan is not a split or bin. A bin with nan will be automatically added if the datasets include missing values.

@skwskwskwskw
Copy link
Author

skwskwskwskw commented Jun 5, 2020

Ya, you're right. Even after changing that I can run for the case (b) - int without missing. The rest still hitting errors. Thanks.

@guillermo-navas-palencia
Copy link
Owner

Hi,

Case a)

from optbinning import OptimalBinning

x = df["I1"]
y = df["TARGET"]

optb = OptimalBinning(user_splits=[1.,2.,3.,4.,5],
                      user_splits_fixed=[True, True, True, True, True])
optb.fit(x, y)
optb.binning_table.build()

This will return no bins because the "auto" monotonicity constraint is activated.

image

Try:

optb = OptimalBinning(user_splits=[1.,2.,3.,4.,5],
                      user_splits_fixed=[True, True, True, True, True],
                      monotonic_trend=None)
optb.fit(x, y)
optb.binning_table.build()

image

Case b)

optb = OptimalBinning(user_splits=[ 0., 1., 2., 4., 3., 5., 6., 9., 12., 7., 8.],
                      user_splits_fixed=[True,True, True, True, True,True, True,True, True, True, True])

This will return the following exception: This is due to the impossibility of computing the WoE and Information Value metric when the number of event or non-event records per bin is 0.

ValueError: Fixed user_splits [ 0. 12.  7.  8.] are removed because produce pure prebins. Provide different splits to be fixed.

Case c)

First, variable "I1" does not contain strings but numbers, therefore that error is justified. However, I have encountered a few errors when using user_splits_fixed and dtype="categorical". I will look into it. Note that categorical bins are list of list, for example: user_splits=[[2], [5], [1], [4], [3]].

The problem with BinningProcess is that it is not passing correctly the parameters user_splits and user_splits_fixed. This is easy to fix.

Thanks, I will keep you posted.

@guillermo-navas-palencia
Copy link
Owner

guillermo-navas-palencia commented Jun 7, 2020

Hi,

The current version in the master branch includes several bug fixes, and now it should work as expected.

Case a)

params = {"I1": {"user_splits": [1.,2.,3.,4.,5],
                 "user_splits_fixed": [True, True, True, True, True]}}

binning_process = BinningProcess(variable_names=variable_names,
                                 categorical_variables=['O1', 'O2'],
                                 binning_fit_params=params)

binning_process.fit(df.drop(columns='TARGET', axis=1), df.TARGET)
binning_process.summary()

image

Case b)

params = {"I1": {"user_splits": [ 0., 1., 2., 4., 3., 5., 6., 9., 12., 7., 8.],
                 "user_splits_fixed": [True, True, True, True, True, True, True,
                                       True, True, True, True]}}

binning_process = BinningProcess(variable_names=variable_names,
                                 categorical_variables=['O1', 'O2'],
                                 binning_fit_params=params)

binning_process.fit(df.drop(columns='TARGET', axis=1), df.TARGET)

As above, this will return the following exception:

ValueError: Fixed user_splits [ 0. 12.  7.  8.] are removed because produce pure prebins. Provide different splits to be fixed.

Case c) In this example, the variable 'I1' is treated as nominal (categorical).

params = {"I1": {"user_splits": [[2], [5], [1], [4], [3]],
                 "user_splits_fixed": [True, True, True, True, True]}}

binning_process = BinningProcess(variable_names=variable_names,
                                 categorical_variables=['I1', 'O1', 'O2'],
                                 binning_fit_params=params)

binning_process.fit(df.drop(columns='TARGET', axis=1), df.TARGET)
binning_process.summary()

image

Binning table:

optb = binning_process.get_binned_variable("I1")
optb.binning_table.build()

image

optb.binning_table.plot(metric="event_rate")

image

@guillermo-navas-palencia
Copy link
Owner

Version 0.6.1 is available with the discussed bug fixes.

Thanks for your feedback!

@skwskwskwskw
Copy link
Author

Many thanks. By the way, I think you haven't add the plots for the plot options to remove bins which have 0 count right?

@guillermo-navas-palencia
Copy link
Owner

Options add_special and add_missing are included. The default value is set to True. If you want to hide these bins from the plot, set add_special and/or add_missing to False.

@skwskwskwskw
Copy link
Author

Awesome! Many thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants