Visualization of BertForSequenceClassification is hard to understand

I am training a BERT trained on the CoLA task (see also https://github.com/pytorch/captum/issues/303), so that I can classify sentences as grammatically acceptable or not. Take this example:

"These tests don't work as expected" (grammatically acceptable)
"These tests doesn't work as expected" (unacceptable)

If I run the two examples through the classifier I get as score through softmax in the binary classifier
0.99395967 and 0.00011.

In the two cases, I get interpretations via the notebook in the gist https://gist.github.com/davidefiocco/3e1a0ed030792230a33c726c61f6b3a5 (adapted from the SQuAD example in this repo). And in particular creating the visualization with

```
score_vis = viz.VisualizationDataRecord(
                        attributions_sum,
                        torch.max(torch.softmax(score[0][0], dim=0)),
                        torch.argmax(score[0][0]),
                        torch.argmax(score[0][0]),
                        text,
                        attributions_sum.sum(),       
                        all_tokens,
                        delta)

print('\033[1m', 'Visualization For Score', '\033[0m')
viz.visualize_text([score_vis])
```

In the two cases I get 

![image](https://user-images.githubusercontent.com/4547987/75824276-87588800-5da3-11ea-8053-ca91d627d5c6.png)

![image](https://user-images.githubusercontent.com/4547987/75824177-56785300-5da3-11ea-8e96-12f1ebcab399.png)

The interpretations look extremely similar in spite of the dramatic change in score, while I would expect the interpretation to change and the model to focus on the grammar mistake (e.g. by focusing on the verb, or the noun)

Am I doing something wrong in my implementation (likely!), or is `LayerIntegratedGradients` not performing great in this example? Can someone suggest viable alternatives to try out?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Visualization of BertForSequenceClassification is hard to understand #311

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Visualization of BertForSequenceClassification is hard to understand #311

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions