Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can I limit magnitude of feature curves #122

Open
jamie-murdoch opened this issue Apr 14, 2020 · 9 comments
Open

Can I limit magnitude of feature curves #122

jamie-murdoch opened this issue Apr 14, 2020 · 9 comments
Labels
enhancement New feature or request

Comments

@jamie-murdoch
Copy link

I'm fitting an EBM multiclass classifier, and am getting feature curves with values in excess of 10^10. Is there a way to fix this somehow? Ideally I could specify a maximum value, and clip the feature curves beyond that.

For more context, in my particular dataset, there is an interval of ~2% of the data where if, say, X_1 is less than -10 the output is guaranteed to be class 0. The curve is taking values -10^10 in this region, and 10^8 everywhere else, with an intercept around 10^8.

This certainly makes sense from a prediction perspective, but for interpretation purposes, having 98% of the feature importances for that variable be 10^8 isn't ideal.

@jamie-murdoch
Copy link
Author

For additional context, I just tried a binary, one vs all, EBM classifier for the affected class, and the computed tree values are all reasonable (absolute value less than 3), so this is probably a result of the "experimental" multiclass classification.

@interpret-ml
Copy link
Collaborator

Hi @jamie-murdoch, thanks for raising this issue. It's a reasonable feature suggestion, and one that we've been thinking on our side as well. It's good to know it would be useful for you, and that the scores are particularly extreme when you frame your problem as a multiclass classification problem.

For now, here's some light code to post-process "clip" the magnitude of the graphs for each feature. All the graphs are stored as simple numpy arrays inside of the attribute_set_models_ property. Manipulating these numpy arrays changes the graphs rendered using the explain_global() call, and also modifies the predictions made by the model in the future.

Here's how you would "edit the model" by clipping the scores :

import numpy as np

min_score = -5
max_score = 5
for index in range(len(ebm.attribute_set_models_)):
    ebm.attribute_set_models_[index] = np.clip(ebm.attribute_set_models_[index], min_score, max_score) 

In this example, all model graphs would be clipped to follow the range [-5, 5]. The only caveat is that the overall feature importance graph (shown as the "Summary" view) is pre-calculated when the model is trained, so it wouldn't be updated by this post processing.

Let us know if you'd like some code for re-calculating the overall feature importances, or need any other help with the code!

-InterpretML Team

@jamie-murdoch
Copy link
Author

Thanks for the follow up!

So, I had a 10-class problem and ended up training 10 one vs all classifiers, and aggregating them. A bit gnarly, but the 0-1 accuracy was nearly as good (54-56%), and the curve magnitudes were reasonable.

For the benefit of future readers, I suspect using the code above may require some shifting of the intercepts (the intercept in question was ~10^8), and/or refitting, since the model had learned that 4% of feature values were -10^10 important, and other 10^8, so squishing to +/- 5 would lose that signal.

@andro536
Copy link

Hi!
I am fitting an ExplainableBoostingClassifier and get some spikes in the model that I want to remove:
image

However, if I try to use the method above to access and edit the model I get the following error:
AttributeError: 'ExplainableBoostingClassifier' object has no attribute 'attribute_set_models_'

Is there another way to do this?
Many thanks!

@interpret-ml
Copy link
Collaborator

Hi @andro536 ,

The 0.2.0 release of interpret had a few breaking changes which included some attribute renames. The new name of the property is additive_terms_ -- just substituting that in place of attribute_set_models_ should make the code posted above work.

import numpy as np

min_score = -5
max_score = 5
for index in range(len(ebm.additive_terms_)):
    ebm.additive_terms_[index] = np.clip(ebm.additive_terms_[index], min_score, max_score) 

Note that this clips all values on all graphs to a min/max range of 5. In your case, you may want to inspect and edit the graph for just this single feature ebm.additive_terms_[feature_index]. Sometimes large spikes like this can also be indicators of data errors or leakage, so it might be worth investigating the datapoints where var_6 has this abnormally high prediction value.

We're looking into introducing a better API in the future for model/graph editing, but manipulating the additive_terms_ array is the best way to do so right now. Hope this helps!

-InterpretML Team

@interpretml interpretml deleted a comment from rodrigovssp May 31, 2021
@interpretml interpretml deleted a comment from rodrigovssp May 31, 2021
@candalfigomoro
Copy link

Let us know if you'd like some code for re-calculating the overall feature importances, or need any other help with the code!

-InterpretML Team

How do we re-calculate the overall feature importances? Thanks!

@interpret-ml
Copy link
Collaborator

Hi @candalfigomoro,

The overall feature importances are simply calculated as a mean absolute contribution per feature on the training dataset. Our code for calculating them is just a few lines of Python here:

self.feature_importances_ = []
if isinstance(self, (DPExplainableBoostingClassifier, DPExplainableBoostingRegressor)):
# DP method of generating feature importances can generalize to non-dp if preprocessors start tracking joint distributions
for i in range(len(self.feature_groups_)):
mean_abs_score = np.average(np.abs(self.additive_terms_[i]), weights=self.preprocessor_.col_bin_counts_[i])
self.feature_importances_.append(mean_abs_score)
else:
scores_gen = EBMUtils.scores_by_feature_group(
X, X_pair, self.feature_groups_, self.additive_terms_
)
for set_idx, _, scores in scores_gen:
mean_abs_score = np.mean(np.abs(scores))
self.feature_importances_.append(mean_abs_score)

You can import the scores_by_feature_group function from the EBMUtils class in interpret.glassbox.ebm.utils, and generate the inputs X and X_pair by using the ebm.preprocessor_ and ebm.pair_preprocessor_ attributes on the fitted EBM model respectively. Feel free to reply back if you have further questions!

-InterpretML Team

@paulbkoch paulbkoch mentioned this issue Jan 22, 2023
@paulbkoch
Copy link
Collaborator

In the latest 0.3.0 version, feature/term importances are calculated when explanations are generated. This means that any changes made to the model after it has been trained will be reflected in the importances, eliminating the need to recalculate them. The creation of a post processing clipping utility function remains on our backlog since it would be nice to get both clipping and re-centering operations in a single function.

@paulbkoch paulbkoch added the enhancement New feature or request label Jan 22, 2023
@paulbkoch
Copy link
Collaborator

I've changed my perspective somewhat on this issue. Today the best solution is to do the post-processing clipping as described above, however there is a better long-term solution that should be implemented instead of a clipping utility. The fundamental issue we have today is that our boosting is of the MART variety instead of LogitBoost. What this means is that we calculate updates using hessians, but we calculate the gain from the gradient and the count of samples within each potential leaf. If all the samples within a potential leaf are positive or negative examples, then we get into trouble since boosting can keep pushing the scores towards either +infinity or -infinity since we only store the total number of samples and not the per-class counts. If we used LogitBoost instead of MART, we could implement a min_child_weight parameter like XGBoost and LightGBM have. Setting min_child_weight to something non-zero would disallow leaf nodes that are pure, which should eliminate this issue and also improve the models in these scenarios.

For multiclass, there's an additional requirement that the trees are built per-class instead of jointly. If there's a minority class, then we don't want to disallow growth at the tail ends of each feature due to the minority class potentially not having any samples in those regions. Both XGBoost and LightGBM build their trees per-class after calculating the gradients and hessians, and I suspect this is the reason they do it this way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Development

No branches or pull requests

5 participants