You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I want to use the library OptBinning to create tables with all the metrics, but under the assumption that I already have all the bins. I don't want to optimize the binning process, just want the tables with my current bins. Despite I've found a "solution" not sure if there's bug or something I'm missing in the parameters. Here my example: First I create a fake dataset:
import random
import pandas as pd
from datetime import datetime, timedelta
import pandas as pd
from optbinning import BinningProcess, OptimalBinning
Set seed for reproducibility
random.seed(42)
Generate fake data
data = {
'GB': [random.choice([0, 1]) for _ in range(2000)],
'Period': [(datetime(2021, 1, 1) + timedelta(days=random.randint(0, 731))).strftime("%m/%Y") for _ in range(2000)],
'Age': [random.randint(18, 80) if random.random() > 0.2 else None for _ in range(2000)],
'L6ag': [random.randint(0, 9) if random.random() > 0.2 else None for _ in range(2000)],
'L_3M': [chr(random.randint(65, 90)) if random.random() > 0.2 else None for _ in range(2000)],
'M36m': [random.randint(0, 1000) for _ in range(2000)],
'Balance': [random.randint(0, 100000) for _ in range(2000)]
}
Create DataFrame
df = pd.DataFrame(data)
df
Then, I want to for example create a table for Age using the following bins: custom_bins = [28, 37, 63, 67]
So, i use the following code:
I want to use the library OptBinning to create tables with all the metrics, but under the assumption that I already have all the bins. I don't want to optimize the binning process, just want the tables with my current bins. Despite I've found a "solution" not sure if there's bug or something I'm missing in the parameters. Here my example: First I create a fake dataset:
import random
import pandas as pd
from datetime import datetime, timedelta
import pandas as pd
from optbinning import BinningProcess, OptimalBinning
Set seed for reproducibility
random.seed(42)
Generate fake data
data = {
'GB': [random.choice([0, 1]) for _ in range(2000)],
'Period': [(datetime(2021, 1, 1) + timedelta(days=random.randint(0, 731))).strftime("%m/%Y") for _ in range(2000)],
'Age': [random.randint(18, 80) if random.random() > 0.2 else None for _ in range(2000)],
'L6ag': [random.randint(0, 9) if random.random() > 0.2 else None for _ in range(2000)],
'L_3M': [chr(random.randint(65, 90)) if random.random() > 0.2 else None for _ in range(2000)],
'M36m': [random.randint(0, 1000) for _ in range(2000)],
'Balance': [random.randint(0, 100000) for _ in range(2000)]
}
Create DataFrame
df = pd.DataFrame(data)
df
Then, I want to for example create a table for Age using the following bins: custom_bins = [28, 37, 63, 67]
So, i use the following code:
Define your custom bins
custom_bins = [28, 37, 63, 67]
Define the binning object
optb = OptimalBinning(name="Age", dtype="numerical", user_splits=custom_bins)
Fit the binning object
optb.fit(df["Age"], df["GB"]) # GB is your target variable
optb.binning_table.build()
And I get the following table which miss the first bin (-inf to 28):
https://i.stack.imgur.com/TBfNt.png
If I try using the user_splits_fixed parameter to "force" each value on the bins, the result is even worse
Define your custom bins
custom_bins = [28, 37, 63, 67]
user_splits_fixed = [True, True, True, True]
Define the binning object
optb = OptimalBinning(name="Age", dtype="numerical", user_splits=custom_bins, user_splits_fixed=user_splits_fixed)
Fit the binning object
optb.fit(df["Age"], df["GB"]) # GB is your target variable
optb.binning_table.build()
https://i.stack.imgur.com/ZpZqD.png
Any help, would be more than appreciated
The text was updated successfully, but these errors were encountered: