AnchorTabularExplainer without categorical features #3

asstergi · 2018-02-05T19:03:04Z

Firstly, the paper is great and I'm really looking forward to using the package.

I tried to use it on my own data where the AnchorTabularExplainer() object does not have any categorical_names (i.e. categorical features). I see that the code when calling the explain_instance() method goes to https://github.com/marcotcr/anchor/blob/master/anchor/anchor_tabular.py#L215 and since there are no categorical features, the mapping dict remains empty and so the method is not working.

Am I missing something? Or, is there something I can do to overcome this?

The text was updated successfully, but these errors were encountered:

marcotcr · 2018-02-05T22:56:46Z

Hello,
I'm glad you found the paper interesting.
You are not missing something, this is a bug in the code.
The anchor method needs categorical data, so I used to have a discretizer in the __init__ method for when the model uses numerical features. To be clear: the black box model can use continuous data, but the resulting anchor will be in discretized bins, such as "If Salary > 5000, predict X".

I must have removed that at some point and forgotten to put it back in.
I'll try to add it back soon, thanks for letting me know.

marcotcr · 2018-02-06T21:23:49Z

In the meantime, you can discretize your data first, similar to what I do here

asstergi · 2018-02-22T07:55:10Z

Hi @marcotcr,

I discretized the data and got anchor working, thank you!

However, I'm seeing some inconsistencies in the reported coverage and precision when I try to use the anchor explanation on the original dataset (i.e. before the discretization).

Not sure if you can help just by looking at this code, but here's what I'm doing:
`
print('Anchor: %s' % (' AND '.join(exp.names())))

fit_anchor = np.where(np.all(X_trans_test_disc[:, exp.features()] == X_trans_test_disc[idx][exp.features()], axis=1))[0]
print('Anchor test coverage: %.4f' % (fit_anchor.shape[0] / float(X_trans_test_disc.shape[0])))
print('Anchor test precision: %.4f' % (np.mean(predict_fn(X_trans_test_disc[fit_anchor]) == predict_fn(X_trans_test_disc[idx].reshape(1, -1)))))

anch = y_trans[(X_trans['this_race_last_year_result'] > 1.50) & 
             (X_trans['grid'] > -9.50) & 
             (X_trans['grid'] <= -5.50)]
print ('Anchor test coverage (orig): %.4f' % (1.0*anch.shape[0]/y_trans.shape[0]))
print ('Anchor test precision (orig): %.4f' % (1.0*anch.sum()/anch.shape[0]))`

And here's the output:

Anchor: -9.50 < grid <= -5.50 AND this_race_last_year_result > 1.50

Anchor test coverage: 0.0316
Anchor test precision: 1.0000

Anchor test coverage (orig): 0.0486
Anchor test precision (orig): 0.8527

I would expect the figures to match. Any idea on this?

marcotcr · 2018-02-27T00:20:57Z

If the validation and test distributions are similar, the numbers should match. I would have to see it in more detail to understand if your discretization is doing something or if there's a bug in the code. I can take a look if you can share a notebook.

The newest version I uploaded has discretizing built in, you may want to give it a try.
It may be buggy since I didn't test it throughly, it may be safer to train a classifier on discretized data like you're doing.

ajayaadhikari · 2018-03-29T14:26:49Z

Hello @marcotcr,
I am also trying to use numerical features.
You suggested to discretize the data before giving it to AnchorTabularExplainer right?
How will the AnchorTabularExplainer know to inverse discretize the data to get predictions on the pertubed samples?

marcotcr · 2018-03-29T18:46:50Z

If you discretize the data before you give it to AnchorTabularExplainer, you would have to learn the model on discretized features. If you want the black box model to use numerical features, you have to use the newest version with built in discretizing.

eindzl · 2018-07-13T07:57:54Z

Hi there.
I found the same problem and used the following workaround, which works fine for me.
In the file anchor_tabular.py add an else clause to the __init__ method of class AnchorTabularExplainer

 class AnchorTabularExplainer(object):

    ... original code ...

    def __init__(self, class_names, feature_names, data=None,

        ... original code ...

        if categorical_names:
            # TODO: Check if this n_values is correct!!
            cat_names = sorted(categorical_names.keys())
            n_values = [len(categorical_names[i]) for i in cat_names]
            self.encoder = sklearn.preprocessing.OneHotEncoder(
                categorical_features=cat_names,
                n_values=n_values)
            self.encoder.fit(data)
            self.categorical_features = self.encoder.categorical_features
        else:  ## Allow for datasets without categorical names
            categorical_names = {}

        ... original code ...

This will prevent the update to fail and allow for discretization of your numerical variables within the explainer.

amrebaid · 2019-02-05T23:12:44Z

The anchor method needs categorical data, so I used to have a discretizer in the __init__ method for when the model uses numerical features. To be clear: the black box model can use continuous data, but the resulting anchor will be in discretized bins, such as "If Salary > 5000, predict X".

I must have removed that at some point and forgotten to put it back in.
I'll try to add it back soon, thanks for letting me know.

~~Has this been fixed in the code? Or we still have to do the workaround?~~
Never mind, I figured it out. I had to fit the classifier too, not only the explainer.

Thanks,
Amr

ykshitij · 2019-07-03T11:53:37Z

@eindzl Thanks, I also had the same problem and now it works correctly after your update .

seansaito · 2019-10-07T06:43:06Z

Hi there.
I found the same problem and used the following workaround, which works fine for me.
In the file anchor_tabular.py add an else clause to the __init__ method of class AnchorTabularExplainer

 class AnchorTabularExplainer(object):

    ... original code ...

    def __init__(self, class_names, feature_names, data=None,

        ... original code ...

        if categorical_names:
            # TODO: Check if this n_values is correct!!
            cat_names = sorted(categorical_names.keys())
            n_values = [len(categorical_names[i]) for i in cat_names]
            self.encoder = sklearn.preprocessing.OneHotEncoder(
                categorical_features=cat_names,
                n_values=n_values)
            self.encoder.fit(data)
            self.categorical_features = self.encoder.categorical_features
        else:  ## Allow for datasets without categorical names
            categorical_names = {}

        ... original code ...

This will prevent the update to fail and allow for discretization of your numerical variables within the explainer.

Will this workaround be implemented at some point?

seansaito mentioned this issue Oct 7, 2019

Add check for None categorical_names #33

Merged

marcotcr closed this as completed Dec 21, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AnchorTabularExplainer without categorical features #3

AnchorTabularExplainer without categorical features #3

asstergi commented Feb 5, 2018

marcotcr commented Feb 5, 2018

marcotcr commented Feb 6, 2018

asstergi commented Feb 22, 2018

marcotcr commented Feb 27, 2018

ajayaadhikari commented Mar 29, 2018 •

edited

marcotcr commented Mar 29, 2018

eindzl commented Jul 13, 2018

amrebaid commented Feb 5, 2019 •

edited

ykshitij commented Jul 3, 2019

seansaito commented Oct 7, 2019 •

edited

AnchorTabularExplainer without categorical features #3

AnchorTabularExplainer without categorical features #3

Comments

asstergi commented Feb 5, 2018

marcotcr commented Feb 5, 2018

marcotcr commented Feb 6, 2018

asstergi commented Feb 22, 2018

marcotcr commented Feb 27, 2018

ajayaadhikari commented Mar 29, 2018 • edited

marcotcr commented Mar 29, 2018

eindzl commented Jul 13, 2018

amrebaid commented Feb 5, 2019 • edited

ykshitij commented Jul 3, 2019

seansaito commented Oct 7, 2019 • edited

ajayaadhikari commented Mar 29, 2018 •

edited

amrebaid commented Feb 5, 2019 •

edited

seansaito commented Oct 7, 2019 •

edited