Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error: Fixed user_splits are removed because produce pure prebins #285

Closed
ben-herbst opened this issue Nov 23, 2023 · 2 comments
Closed
Labels
question Further information is requested

Comments

@ben-herbst
Copy link

I need to fix the bins for certain categorical values. Your example in the tutorial is the following:

user_splits = np.array([
               ['Businessman'],
               ['Working'],
               ['Commercial associate'],
               ['Pensioner', 'Maternity leave'],
               ['State servant'],
               ['Unemployed', 'Student']], dtype=object)
optb = OptimalBinning(name=variable_cat, dtype="categorical", solver="cp",
                      user_splits=user_splits,
                      user_splits_fixed=[False, True, True, True, True, True])

optb.fit(x_cat, y_cat)

Then

opt.splits

gives

[['Businessman', 'Pensioner', 'Maternity leave'],
 ['State servant'],
 ['Commercial associate'],
 ['Working'],
 ['Unemployed', 'Student']]

This is not what I need. I need the bins to remain exactly the same as specified. If I change the following

optb = OptimalBinning(name=variable_cat, dtype="categorical", solver="cp",
                      user_splits=user_splits,
                      user_splits_fixed=[True, True, True, True, True, True])

optb.fit(x_cat, y_cat)

I get the error:

ValueError: Fixed user_splits [list(['Businessman'])] are removed because produce pure prebins. Provide different splits to be fixed.

I get the same error if I try

user_splits = np.array([
               ['Businessman'],
               ['Working'],
               ['Commercial associate'],
               ['Pensioner', 'Maternity leave'],
               ['State servant'],
               ['Unemployed'], 
               ['Student']], dtype=object)
optb = OptimalBinning(name=variable_cat, dtype="categorical", solver="cp",
                      user_splits=user_splits,
                      user_splits_fixed=[False, True, True, True, True, True, True])

optb.fit(x_cat, y_cat)

but the following seemingly similar change does not give the error

user_splits = np.array([
               ['Businessman'],
               ['Working'],
               ['Commercial associate'],
               ['Pensioner'], 
               ['Maternity leave'],
               ['State servant'],
               ['Unemployed', 'Student']], dtype=object)
optb = OptimalBinning(name=variable_cat, dtype="categorical", solver="cp",
                      user_splits=user_splits,
                      user_splits_fixed=[False, True, True, True, True, True, True])

optb.fit(x_cat, y_cat)
@guillermo-navas-palencia
Copy link
Owner

Hi @ben-herbst.

The reason is simple, both categories "Businessman" and "Student" are pure bins, in this case, the average target is zero. In the latter example, the first bin ["Businessman"] is not fixed, and "Student" is merged with "Unemployed".
image

@guillermo-navas-palencia guillermo-navas-palencia added the question Further information is requested label Dec 1, 2023
@ben-herbst
Copy link
Author

ben-herbst commented Dec 4, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants