-
Notifications
You must be signed in to change notification settings - Fork 548
Description
I am training a BERT trained on the CoLA task (see also #303), so that I can classify sentences as grammatically acceptable or not. Take this example:
"These tests don't work as expected" (grammatically acceptable)
"These tests doesn't work as expected" (unacceptable)
If I run the two examples through the classifier I get as score through softmax in the binary classifier
0.99395967 and 0.00011.
In the two cases, I get interpretations via the notebook in the gist https://gist.github.com/davidefiocco/3e1a0ed030792230a33c726c61f6b3a5 (adapted from the SQuAD example in this repo). And in particular creating the visualization with
score_vis = viz.VisualizationDataRecord(
attributions_sum,
torch.max(torch.softmax(score[0][0], dim=0)),
torch.argmax(score[0][0]),
torch.argmax(score[0][0]),
text,
attributions_sum.sum(),
all_tokens,
delta)
print('\033[1m', 'Visualization For Score', '\033[0m')
viz.visualize_text([score_vis])
In the two cases I get
The interpretations look extremely similar in spite of the dramatic change in score, while I would expect the interpretation to change and the model to focus on the grammar mistake (e.g. by focusing on the verb, or the noun)
Am I doing something wrong in my implementation (likely!), or is LayerIntegratedGradients not performing great in this example? Can someone suggest viable alternatives to try out?

