Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZeroDivisionError in merge_ebms #485

Closed
jfleh opened this issue Nov 8, 2023 · 4 comments
Closed

ZeroDivisionError in merge_ebms #485

jfleh opened this issue Nov 8, 2023 · 4 comments

Comments

@jfleh
Copy link

jfleh commented Nov 8, 2023

I am trying to merge two ebms (classifier or regressor, does not matter which one) and I get the following error:

Traceback (most recent call last):
  File "/code/trainingmanagerapi.py", line 725, in multiple_local_training
    fitted_model = merge_ebms([fitted_model, ebm2])
  File "/usr/local/lib/python3.9/site-packages/interpret/glassbox/_ebm/_merge_ebms.py", line 719, in merge_ebms
    ) = process_terms(n_classes, ebm.bagged_scores_, ebm.bin_weights_, ebm.bag_weights_)
  File "/usr/local/lib/python3.9/site-packages/interpret/glassbox/_ebm/_utils.py", line 235, in process_terms
    score_mean = np.average(scores, weights=weights)
  File "<__array_function__ internals>", line 180, in average
  File "/usr/local/lib/python3.9/site-packages/numpy/lib/function_base.py", line 547, in average
    raise ZeroDivisionError(
ZeroDivisionError: Weights sum to zero, can't be normalized

the two models have been fitted on the exact same dataset.

@paulbkoch
Copy link
Collaborator

Hi @jfleh -- Are you using sample weights when generating either of the models? When fitting EBMs, we sum up the sample weights for all the samples within each bin of each term, and we put that information in the ebm.bin_weights_ attribute of the model. The exception above is saying that the total of the sample weights for some term is zero (the total, not just a zero for one of the bins). I could potentially see a model built with extremely small sample weights would do this naturally, but the conditions would have to be almost impossibly special. The more likely scenario is that there's a bug somewhere in the merge_ebms function, probably having to do with merging pairs where a spurious term is somehow created during the merge. I can look through merge_ebms and see if I can figure that out, but it would be easier to have some more information about the model first. If the model is private and cannot be posted here, can you look at the bin_weights_ attributes of your models and see if any of the terms have all zeros in one of their term weights. If your model is public, could you use the ebm.to_json(FILE_NAME) function to export a JSON representation of the models and post them here or email them to interpret@microsoft.com

Documentation link:
https://interpret.ml/docs/ExplainableBoostingClassifier.html#interpret.glassbox.ExplainableBoostingClassifier.to_json

@jfleh
Copy link
Author

jfleh commented Nov 9, 2023

Hi @paulbkoch, thanks for the response. I do indeed see lots of zeroes in the bin_weights_, I also noticed that I was trying to combine two models that were exactly identical (as a result of being fitted on the same dataset with the same random_state). I am attaching the model. The model has been created with default parameters and is fit on synthetically created data. The predictors are independent of the targets, so there is not actually anything that can be learned on this dataset. I am curious if it is something with the model or the fact that the two models are identical that causes this problem.
model1.txt

@jfleh
Copy link
Author

jfleh commented Nov 27, 2023

I am still getting the same error, now also with models that are trained on real data that should be able to pick up effects.

paulbkoch added a commit that referenced this issue Dec 4, 2023
…n when a term in the resulting merged model has only a single non-missing bin
@paulbkoch
Copy link
Collaborator

paulbkoch commented Dec 4, 2023

I've pushed a fix for this issue which will be included in our next release. For details see: 0c6c985

In the meantime, you can avoid this issue by not merging models that have features with only 1 value. Such features are entirely useless anyway, so removing them should not affect the model's performance. You can do this with:

ebm.remove_terms([i for i, scores in enumerate(ebm.term_scores_) if np.sum(np.abs(scores)) == 0])

Thanks @jfleh for reporting this. It was a good bug to fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants