-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Quick questions #40
Comments
Hi,
Let me know if you have more questions. Best, |
Hi, I am not sure if it has been implemented by perhaps can create rules for the binning (i.e: upper limit column) |
Hi, could you provide an example to clarify? |
Hi, for example on page: http://gnpalencia.org/optbinning/tutorials/tutorial_scorecard_binary_target.html; say I have another new dataset, how should I score it with the points? |
Also on the above, if I want to run the 'binningprocess', how to just selective split for one variable? Seems like the example is only applicable to single variable. Many thanks for the prompt response. |
you can score it with the method |
The BinningProcess includes the parameter binning_fit_params = {
"my_variable": {"user_splits": [1.2, 1.8, 2.3], "user_splits_fixed" = [False, True, True]}
} See example: http://gnpalencia.org/optbinning/tutorials/tutorial_binning_process_FICO_xAI.html |
Nice.... Thanks =) |
Thanks but I think the above is not working (so far it only works with float without missing). I would attach the data here with the code I used here together with the errors. (a) For variable (dtype = float) with missing optb.fit(ch.drop(columns = 'TARGET', axis =1 ), ch.TARGET)` (b) For Integer (dtype = int) without missing optb.fit(ch.drop(columns = 'TARGET', axis =1 ), ch.TARGET)` (c) For object (dtype = object) optb.fit(ch.drop(columns = 'TARGET', axis =1 ), ch.TARGET)` Attachment of sample data: |
Ok, I will look into it. Note that adding np.nan in user_splits is not correct, np.nan is not a split or bin. A bin with nan will be automatically added if the datasets include missing values. |
Ya, you're right. Even after changing that I can run for the case (b) - int without missing. The rest still hitting errors. Thanks. |
Hi, Case a) from optbinning import OptimalBinning
x = df["I1"]
y = df["TARGET"]
optb = OptimalBinning(user_splits=[1.,2.,3.,4.,5],
user_splits_fixed=[True, True, True, True, True])
optb.fit(x, y) optb.binning_table.build() This will return no bins because the "auto" monotonicity constraint is activated. Try: optb = OptimalBinning(user_splits=[1.,2.,3.,4.,5],
user_splits_fixed=[True, True, True, True, True],
monotonic_trend=None)
optb.fit(x, y) optb.binning_table.build() Case b) optb = OptimalBinning(user_splits=[ 0., 1., 2., 4., 3., 5., 6., 9., 12., 7., 8.],
user_splits_fixed=[True,True, True, True, True,True, True,True, True, True, True]) This will return the following exception: This is due to the impossibility of computing the WoE and Information Value metric when the number of event or non-event records per bin is 0.
Case c) First, variable "I1" does not contain strings but numbers, therefore that error is justified. However, I have encountered a few errors when using The problem with Thanks, I will keep you posted. |
Hi, The current version in the master branch includes several bug fixes, and now it should work as expected. Case a) params = {"I1": {"user_splits": [1.,2.,3.,4.,5],
"user_splits_fixed": [True, True, True, True, True]}}
binning_process = BinningProcess(variable_names=variable_names,
categorical_variables=['O1', 'O2'],
binning_fit_params=params)
binning_process.fit(df.drop(columns='TARGET', axis=1), df.TARGET) binning_process.summary() Case b) params = {"I1": {"user_splits": [ 0., 1., 2., 4., 3., 5., 6., 9., 12., 7., 8.],
"user_splits_fixed": [True, True, True, True, True, True, True,
True, True, True, True]}}
binning_process = BinningProcess(variable_names=variable_names,
categorical_variables=['O1', 'O2'],
binning_fit_params=params)
binning_process.fit(df.drop(columns='TARGET', axis=1), df.TARGET) As above, this will return the following exception:
Case c) In this example, the variable 'I1' is treated as nominal (categorical). params = {"I1": {"user_splits": [[2], [5], [1], [4], [3]],
"user_splits_fixed": [True, True, True, True, True]}}
binning_process = BinningProcess(variable_names=variable_names,
categorical_variables=['I1', 'O1', 'O2'],
binning_fit_params=params)
binning_process.fit(df.drop(columns='TARGET', axis=1), df.TARGET) binning_process.summary() Binning table: optb = binning_process.get_binned_variable("I1")
optb.binning_table.build() optb.binning_table.plot(metric="event_rate") |
Version 0.6.1 is available with the discussed bug fixes. Thanks for your feedback! |
Many thanks. By the way, I think you haven't add the plots for the plot options to remove bins which have 0 count right? |
Options |
Awesome! Many thanks. |
Nice package I came across. By the way, may I know:
Thanks.
The text was updated successfully, but these errors were encountered: