Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adding sparse support to shap linear explainer #645

Merged
merged 1 commit into from
Jun 18, 2019

Conversation

imatiach-msft
Copy link
Collaborator

similar to kernel explainer, adding scipy sparse support to linear explainer

@@ -35,7 +36,8 @@ class LinearExplainer(Explainer):
input is correlated with another input, then both get some credit for the model's behavior. The
independent option stays "true to the model" meaning it will only give credit to features that are
actually used by the model, while the correlation option stays "true to the data" in the sense that
it only considers how the model would behave when respecting the correlations in the input data.
it only considers how the model would behave when respecting the correlations in the input data.
For sparse case only independent option is supported.
"""

def __init__(self, model, data, nsamples=1000, feature_dependence=None):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one thing I am wondering about is the binary classification scenario - I believe we should be consistent with other explainers and output a list of shap values which the code is currently not doing (I believe we would need to take -coef for the negative class case if we are doing binary classification). One thing I am not sure about is how to determine if this is a binary classification model or a multiclass model with a single class, which would seem to have the same structure but should output different shap values:

        # sklearn style model
        elif hasattr(model, "coef_") and hasattr(model, "intercept_"):
            # work around for multi-class with a single class
            if len(model.coef_.shape) > 1 and model.coef_.shape[0] == 1:
                self.coef = model.coef_[0]
                self.intercept = model.intercept_[0]
            else:
                self.coef = model.coef_
                self.intercept = model.intercept_

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is a future direction we need to sort out. But it seems like for consistency we could eventually just always return a list for all multi-class outputs.

@slundberg
Copy link
Collaborator

Thanks! At first I was thinking that we would also want to have a sparse output (instead of dense), but since the mean offset is usually non-zero this is not actually that helpful, so dense seems best (as you have done).

I am going to go ahead and merge this, with the idea of getting multi-output consistency as a later issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants