Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create tables using OptBinning with custom bins #245

Closed
tomasleon2 opened this issue Jun 2, 2023 · 1 comment
Closed

Create tables using OptBinning with custom bins #245

tomasleon2 opened this issue Jun 2, 2023 · 1 comment
Labels
question Further information is requested

Comments

@tomasleon2
Copy link

I want to use the library OptBinning to create tables with all the metrics, but under the assumption that I already have all the bins. I don't want to optimize the binning process, just want the tables with my current bins. Despite I've found a "solution" not sure if there's bug or something I'm missing in the parameters. Here my example: First I create a fake dataset:

import random
import pandas as pd
from datetime import datetime, timedelta
import pandas as pd
from optbinning import BinningProcess, OptimalBinning

Set seed for reproducibility

random.seed(42)

Generate fake data

data = {
'GB': [random.choice([0, 1]) for _ in range(2000)],
'Period': [(datetime(2021, 1, 1) + timedelta(days=random.randint(0, 731))).strftime("%m/%Y") for _ in range(2000)],
'Age': [random.randint(18, 80) if random.random() > 0.2 else None for _ in range(2000)],
'L6ag': [random.randint(0, 9) if random.random() > 0.2 else None for _ in range(2000)],
'L_3M': [chr(random.randint(65, 90)) if random.random() > 0.2 else None for _ in range(2000)],
'M36m': [random.randint(0, 1000) for _ in range(2000)],
'Balance': [random.randint(0, 100000) for _ in range(2000)]
}

Create DataFrame

df = pd.DataFrame(data)
df

Then, I want to for example create a table for Age using the following bins: custom_bins = [28, 37, 63, 67]
So, i use the following code:

Define your custom bins

custom_bins = [28, 37, 63, 67]

Define the binning object

optb = OptimalBinning(name="Age", dtype="numerical", user_splits=custom_bins)

Fit the binning object

optb.fit(df["Age"], df["GB"]) # GB is your target variable

optb.binning_table.build()

And I get the following table which miss the first bin (-inf to 28):
https://i.stack.imgur.com/TBfNt.png

If I try using the user_splits_fixed parameter to "force" each value on the bins, the result is even worse

Define your custom bins

custom_bins = [28, 37, 63, 67]
user_splits_fixed = [True, True, True, True]

Define the binning object

optb = OptimalBinning(name="Age", dtype="numerical", user_splits=custom_bins, user_splits_fixed=user_splits_fixed)

Fit the binning object

optb.fit(df["Age"], df["GB"]) # GB is your target variable

optb.binning_table.build()
https://i.stack.imgur.com/ZpZqD.png

Any help, would be more than appreciated

@tomasleon2
Copy link
Author

Got the answer. monotonic_trend should be set to None, so there's no further calculations. It just create the table following the provided bins :)

@guillermo-navas-palencia guillermo-navas-palencia added the question Further information is requested label Jun 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants