Skip to content

Visualization of BertForSequenceClassification is hard to understand #311

@davidefiocco

Description

@davidefiocco

I am training a BERT trained on the CoLA task (see also #303), so that I can classify sentences as grammatically acceptable or not. Take this example:

"These tests don't work as expected" (grammatically acceptable)
"These tests doesn't work as expected" (unacceptable)

If I run the two examples through the classifier I get as score through softmax in the binary classifier
0.99395967 and 0.00011.

In the two cases, I get interpretations via the notebook in the gist https://gist.github.com/davidefiocco/3e1a0ed030792230a33c726c61f6b3a5 (adapted from the SQuAD example in this repo). And in particular creating the visualization with

score_vis = viz.VisualizationDataRecord(
                        attributions_sum,
                        torch.max(torch.softmax(score[0][0], dim=0)),
                        torch.argmax(score[0][0]),
                        torch.argmax(score[0][0]),
                        text,
                        attributions_sum.sum(),       
                        all_tokens,
                        delta)

print('\033[1m', 'Visualization For Score', '\033[0m')
viz.visualize_text([score_vis])

In the two cases I get

image

image

The interpretations look extremely similar in spite of the dramatic change in score, while I would expect the interpretation to change and the model to focus on the grammar mistake (e.g. by focusing on the verb, or the noun)

Am I doing something wrong in my implementation (likely!), or is LayerIntegratedGradients not performing great in this example? Can someone suggest viable alternatives to try out?

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions