Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AnchorTabularExplainer without categorical features #3

Closed
asstergi opened this issue Feb 5, 2018 · 10 comments
Closed

AnchorTabularExplainer without categorical features #3

asstergi opened this issue Feb 5, 2018 · 10 comments

Comments

@asstergi
Copy link

asstergi commented Feb 5, 2018

Hi @marcotcr ,

Firstly, the paper is great and I'm really looking forward to using the package.

I tried to use it on my own data where the AnchorTabularExplainer() object does not have any categorical_names (i.e. categorical features). I see that the code when calling the explain_instance() method goes to https://github.com/marcotcr/anchor/blob/master/anchor/anchor_tabular.py#L215 and since there are no categorical features, the mapping dict remains empty and so the method is not working.

Am I missing something? Or, is there something I can do to overcome this?

@marcotcr
Copy link
Owner

marcotcr commented Feb 5, 2018

Hello,
I'm glad you found the paper interesting.
You are not missing something, this is a bug in the code.
The anchor method needs categorical data, so I used to have a discretizer in the __init__ method for when the model uses numerical features. To be clear: the black box model can use continuous data, but the resulting anchor will be in discretized bins, such as "If Salary > 5000, predict X".

I must have removed that at some point and forgotten to put it back in.
I'll try to add it back soon, thanks for letting me know.

@marcotcr
Copy link
Owner

marcotcr commented Feb 6, 2018

In the meantime, you can discretize your data first, similar to what I do here

@asstergi
Copy link
Author

Hi @marcotcr,

I discretized the data and got anchor working, thank you!

However, I'm seeing some inconsistencies in the reported coverage and precision when I try to use the anchor explanation on the original dataset (i.e. before the discretization).

Not sure if you can help just by looking at this code, but here's what I'm doing:
`
print('Anchor: %s' % (' AND '.join(exp.names())))

fit_anchor = np.where(np.all(X_trans_test_disc[:, exp.features()] == X_trans_test_disc[idx][exp.features()], axis=1))[0]
print('Anchor test coverage: %.4f' % (fit_anchor.shape[0] / float(X_trans_test_disc.shape[0])))
print('Anchor test precision: %.4f' % (np.mean(predict_fn(X_trans_test_disc[fit_anchor]) == predict_fn(X_trans_test_disc[idx].reshape(1, -1)))))

anch = y_trans[(X_trans['this_race_last_year_result'] > 1.50) & 
             (X_trans['grid'] > -9.50) & 
             (X_trans['grid'] <= -5.50)]
print ('Anchor test coverage (orig): %.4f' % (1.0*anch.shape[0]/y_trans.shape[0]))
print ('Anchor test precision (orig): %.4f' % (1.0*anch.sum()/anch.shape[0]))`

And here's the output:

Anchor: -9.50 < grid <= -5.50 AND this_race_last_year_result > 1.50

Anchor test coverage: 0.0316
Anchor test precision: 1.0000

Anchor test coverage (orig): 0.0486
Anchor test precision (orig): 0.8527

I would expect the figures to match. Any idea on this?

@marcotcr
Copy link
Owner

If the validation and test distributions are similar, the numbers should match. I would have to see it in more detail to understand if your discretization is doing something or if there's a bug in the code. I can take a look if you can share a notebook.

The newest version I uploaded has discretizing built in, you may want to give it a try.
It may be buggy since I didn't test it throughly, it may be safer to train a classifier on discretized data like you're doing.

@ajayaadhikari
Copy link

ajayaadhikari commented Mar 29, 2018

Hello @marcotcr,
I am also trying to use numerical features.
You suggested to discretize the data before giving it to AnchorTabularExplainer right?
How will the AnchorTabularExplainer know to inverse discretize the data to get predictions on the pertubed samples?

@marcotcr
Copy link
Owner

If you discretize the data before you give it to AnchorTabularExplainer, you would have to learn the model on discretized features. If you want the black box model to use numerical features, you have to use the newest version with built in discretizing.

@eindzl
Copy link

eindzl commented Jul 13, 2018

Hi there.
I found the same problem and used the following workaround, which works fine for me.
In the file anchor_tabular.py add an else clause to the __init__ method of class AnchorTabularExplainer

 class AnchorTabularExplainer(object):

    ... original code ...

    def __init__(self, class_names, feature_names, data=None,

        ... original code ...

        if categorical_names:
            # TODO: Check if this n_values is correct!!
            cat_names = sorted(categorical_names.keys())
            n_values = [len(categorical_names[i]) for i in cat_names]
            self.encoder = sklearn.preprocessing.OneHotEncoder(
                categorical_features=cat_names,
                n_values=n_values)
            self.encoder.fit(data)
            self.categorical_features = self.encoder.categorical_features
        else:  ## Allow for datasets without categorical names
            categorical_names = {}

        ... original code ...

This will prevent the update to fail and allow for discretization of your numerical variables within the explainer.

@amrebaid
Copy link

amrebaid commented Feb 5, 2019

The anchor method needs categorical data, so I used to have a discretizer in the __init__ method for when the model uses numerical features. To be clear: the black box model can use continuous data, but the resulting anchor will be in discretized bins, such as "If Salary > 5000, predict X".

I must have removed that at some point and forgotten to put it back in.
I'll try to add it back soon, thanks for letting me know.

Has this been fixed in the code? Or we still have to do the workaround?
Never mind, I figured it out. I had to fit the classifier too, not only the explainer.

Thanks,
Amr

@ykshitij
Copy link

ykshitij commented Jul 3, 2019

@eindzl Thanks, I also had the same problem and now it works correctly after your update .

@seansaito
Copy link

seansaito commented Oct 7, 2019

Hi there.
I found the same problem and used the following workaround, which works fine for me.
In the file anchor_tabular.py add an else clause to the __init__ method of class AnchorTabularExplainer

 class AnchorTabularExplainer(object):

    ... original code ...

    def __init__(self, class_names, feature_names, data=None,

        ... original code ...

        if categorical_names:
            # TODO: Check if this n_values is correct!!
            cat_names = sorted(categorical_names.keys())
            n_values = [len(categorical_names[i]) for i in cat_names]
            self.encoder = sklearn.preprocessing.OneHotEncoder(
                categorical_features=cat_names,
                n_values=n_values)
            self.encoder.fit(data)
            self.categorical_features = self.encoder.categorical_features
        else:  ## Allow for datasets without categorical names
            categorical_names = {}

        ... original code ...

This will prevent the update to fail and allow for discretization of your numerical variables within the explainer.

Will this workaround be implemented at some point?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants