Is there any tutorial on how to apply Lime to BERT? #356

ThiagoSousa · 2019-07-15T03:18:19Z

I have a fine-tuned BERT model for binary classification. Is it possible to apply Lime to the model for explanations on BERT's predictions?

Thank you!

marcotcr · 2019-07-15T16:14:28Z

There is a tutorial for text classification [html, ipynb]. LIME does not care whether the black box model is BERT or random forests. You just need to make sure your classifier takes as input a list of strings and outputs a 2d array of prediction probabilities.

ThiagoSousa · 2019-07-16T04:05:12Z

Thank you very much for your reply. It gave me an insight on how to make an appropriate function to deal with BERT's predictor. For those interested on the topic, I'll share my function below:

def predictor(texts):

	examples = []
	for t in texts:
		examples.append(InputExample(guid="test-0", text_a=t, text_b=None, label=None))

	num_actual_predict_examples = len(examples)

	if FLAGS.use_tpu:
		while len(examples) % FLAGS.predict_batch_size != 0:
			examples.append(PaddingInputExample())

	predict_file = os.path.join(FLAGS.output_dir, "lime.tf_record")
	file_based_convert_examples_to_features(examples, label_list, FLAGS.max_seq_length, tokenizer, predict_file)

	predict_drop_remainder = True if FLAGS.use_tpu else False
	predict_input_lime_fn = file_based_input_fn_builder(input_file=predict_file, seq_length=FLAGS.max_seq_length, is_training=False, drop_remainder=predict_drop_remainder)

	result = estimator.predict(input_fn=predict_input_lime_fn)

	return np.array([r["probabilities"].tolist() for r in result])
exp = explainer.explain_instance(text, predictor, num_features=6)

Elizabithi1-dev · 2020-10-20T15:44:09Z

could you please provide us more insight about your variables?

j-hartmann · 2020-12-01T07:36:30Z

Thank you @marcotcr and @ThiagoSousa! Your comments were super helpful in creating a function to use LIME with a fine-tuned RoBERTa model from the simpletransformers library. For those interested, see below a simple example:

# load fine-tuned RoBERTa model
args = {'no_cache': True, 'use_cached_eval_features': False, 'reprocess_input_data': True, 'silent': True}
model = ClassificationModel('roberta', 'PATH_TO_FINE-TUNED_MODEL', use_cuda=True, args=args)

# define softmax function
def softmax(x):
    e_x = np.exp(x - np.max(x))
    return e_x / e_x.sum(axis=0)

# define prediction function
def predict_probs(texts):
    predictions = model.predict(texts)
    x = np.array(list(predictions)[1])
    return np.apply_along_axis(softmax, 1, x)

# explain instance with LIME
exp = explainer.explain_instance(text, predict_probs, num_features=6)

rachel-sorek · 2020-12-31T15:04:20Z

Here's my code for using lime on pre-trained classification model from huggingface transformer:

`import numpy as np
import lime
import torch
import torch.nn.functional as F
from lime.lime_text import LimeTextExplainer

from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("ProsusAI/finbert")
model = AutoModelForSequenceClassification.from_pretrained("ProsusAI/finbert")
class_names = ['positive','negative', 'neutral']

def predictor(texts):
outputs = model(**tokenizer(texts, return_tensors="pt", padding=True))
probas = F.softmax(outputs.logits).detach().numpy()
return probas

explainer = LimeTextExplainer(class_names=class_names)

str_to_predict = "surprising increase in revenue in spite of decrease in market share"
exp = explainer.explain_instance(str_to_predict, predictor, num_features=20, num_samples=2000)
exp.show_in_notebook(text=str_to_predict)
`

loukasilias · 2021-03-20T22:57:40Z

I am trying to use LIME with a model consisting of BERT --> Pooling layer --> Dense (128 units) --> Dense (1 unit, sigmoid activation function). This is a binary classification code.

I have implemented the model as follows:

class MyModel(tf.keras.Model):
    
    def __init__(self, flag):
        
        super(MyModel,self).__init__()
        self.bert_model = TFBertModel.from_pretrained("bert-base-uncased")
        self.bert_model.trainable = flag
        self.layer_3 = Dense(units = 128, activation = 'relu')
        self.layer_4 = Dense(units = 1, activation = 'sigmoid')
        self.pooling = tf.keras.layers.GlobalMaxPool1D()

    def call(self,inputs):
        output_sentence_1 = self.bert_model(input_ids = inputs[0], attention_mask = inputs[1])
        output_sentence_1 = self.pooling(output_sentence_1['last_hidden_state'])
        layer_output = self.layer_3(output_sentence_1)
        output = self.layer_4(layer_output)

        return output

I am trying to use LIME as follows:

from lime.lime_text import LimeTextExplainer
explainer = LimeTextExplainer(class_names = ['truthful', 'deceptive'])

def predict_probab(STR):
    
    z = tokenizer.encode_plus(STR, add_special_tokens = True, max_length = 512, truncation = True,padding = 'max_length', return_token_type_ids=True, return_attention_mask = True,  return_tensors = 'np')
    inputs = [z['input_ids'], z['attention_mask']]
    k = []
    k.append(float(model.predict(inputs).reshape(-1,1)))
    k.append(float(1-model.predict(inputs).reshape(-1,1)))
    k = np.array(k).reshape(1,-1)
    
    return k

    
STR = str(X_test[100])
exp = explainer.explain_instance(STR, predict_probab, num_features=10, num_samples = 1)

I am getting the following result

:

Why am I getting zero contribution of tokens?
In addition, I am getting always the same prediction probabilities for all strings.

pijusch · 2021-04-05T02:54:16Z

I am trying to use LIME with a model consisting of BERT --> Pooling layer --> Dense (128 units) --> Dense (1 unit, sigmoid activation function). This is a binary classification code.

I have implemented the model as follows:

class MyModel(tf.keras.Model):
    
    def __init__(self, flag):
        
        super(MyModel,self).__init__()
        self.bert_model = TFBertModel.from_pretrained("bert-base-uncased")
        self.bert_model.trainable = flag
        self.layer_3 = Dense(units = 128, activation = 'relu')
        self.layer_4 = Dense(units = 1, activation = 'sigmoid')
        self.pooling = tf.keras.layers.GlobalMaxPool1D()

    def call(self,inputs):
        output_sentence_1 = self.bert_model(input_ids = inputs[0], attention_mask = inputs[1])
        output_sentence_1 = self.pooling(output_sentence_1['last_hidden_state'])
        layer_output = self.layer_3(output_sentence_1)
        output = self.layer_4(layer_output)

        return output

I am trying to use LIME as follows:

from lime.lime_text import LimeTextExplainer
explainer = LimeTextExplainer(class_names = ['truthful', 'deceptive'])

def predict_probab(STR):
    
    z = tokenizer.encode_plus(STR, add_special_tokens = True, max_length = 512, truncation = True,padding = 'max_length', return_token_type_ids=True, return_attention_mask = True,  return_tensors = 'np')
    inputs = [z['input_ids'], z['attention_mask']]
    k = []
    k.append(float(model.predict(inputs).reshape(-1,1)))
    k.append(float(1-model.predict(inputs).reshape(-1,1)))
    k = np.array(k).reshape(1,-1)
    
    return k

    
STR = str(X_test[100])
exp = explainer.explain_instance(STR, predict_probab, num_features=10, num_samples = 1)

I am getting the following result

:

Why am I getting zero contribution of tokens?
In addition, I am getting always the same prediction probabilities for all strings.

I noticed that you set num_samples to 1. Might wanna increase that to 1000-2000.

ryuzakace · 2021-04-08T10:21:46Z

Hi! I am trying to use BERT tokenizer, but I am getting error -
` 192 text_ptr += 1
193 if text_ptr >= len(text):
--> 194 raise ValueError("Tokenization produced tokens that do not belong in string!")
195 text_ptr += len(token)
196 if inter_token_string:

ValueError: Tokenization produced tokens that do not belong in string!`

explainer = LimeTextExplainer( split_expression=tokenizer.tokenize, bow=False, class_names=['Incorrect', 'Correct'] )
exp = explainer.explain_instance(text[0],mod_ev, num_features=10)

For string - "I wouldn't consider to write in pnik color!"

tokenizer.tokenize returns - ```
['i',
'wouldn',
"'",
't',
'consider',
'to',
'write',
'in',
'p',
'##nik',
'color',
'!']


Default tokenizer in LIME gives result - 


![image](https://user-images.githubusercontent.com/25671784/114011132-80dbf000-9882-11eb-9850-c5a5c57a89fb.png)

parthplc · 2022-06-02T04:03:51Z

I am getting the below error when i run the above code #356 (comment)

---
TypeError                                 Traceback (most recent call last)
c:\Projects\bluealtair\model_interpret\lime\nlp\Lime- Transformer.ipynb Cell 11' in <cell line: 21>()
     [18](vscode-notebook-cell:/c%3A/Projects/bluealtair/model_interpret/lime/nlp/Lime-%20Transformer.ipynb#ch0000010?line=17) explainer = LimeTextExplainer(class_names=class_names)
     [20](vscode-notebook-cell:/c%3A/Projects/bluealtair/model_interpret/lime/nlp/Lime-%20Transformer.ipynb#ch0000010?line=19) str_to_predict = "surprising increase in revenue in spite of decrease in market share"
---> [21](vscode-notebook-cell:/c%3A/Projects/bluealtair/model_interpret/lime/nlp/Lime-%20Transformer.ipynb#ch0000010?line=20) exp = explainer.explain_instance(str_to_predict, predictor, num_features=20, num_samples=2000)
     [22](vscode-notebook-cell:/c%3A/Projects/bluealtair/model_interpret/lime/nlp/Lime-%20Transformer.ipynb#ch0000010?line=21) exp.show_in_notebook(text=str_to_predict)

File c:\Users\Parth.chokhra\Miniconda3\lib\site-packages\lime\lime_text.py:252, in LimeTextExplainer.explain_instance(self, text_instance, classifier_fn, labels, top_labels, num_features, num_samples, distance_metric, model_regressor)
    [215](file:///c%3A/Users/Parth.chokhra/Miniconda3/lib/site-packages/lime/lime_text.py?line=214) def explain_instance(self,
    [216](file:///c%3A/Users/Parth.chokhra/Miniconda3/lib/site-packages/lime/lime_text.py?line=215)                      text_instance,
    [217](file:///c%3A/Users/Parth.chokhra/Miniconda3/lib/site-packages/lime/lime_text.py?line=216)                      classifier_fn,
   (...)
    [222](file:///c%3A/Users/Parth.chokhra/Miniconda3/lib/site-packages/lime/lime_text.py?line=221)                      distance_metric='cosine',
    [223](file:///c%3A/Users/Parth.chokhra/Miniconda3/lib/site-packages/lime/lime_text.py?line=222)                      model_regressor=None):
    [224](file:///c%3A/Users/Parth.chokhra/Miniconda3/lib/site-packages/lime/lime_text.py?line=223)     """Generates explanations for a prediction.
    [225](file:///c%3A/Users/Parth.chokhra/Miniconda3/lib/site-packages/lime/lime_text.py?line=224) 
    [226](file:///c%3A/Users/Parth.chokhra/Miniconda3/lib/site-packages/lime/lime_text.py?line=225)     First, we generate neighborhood data by randomly hiding features from
   (...)
    [250](file:///c%3A/Users/Parth.chokhra/Miniconda3/lib/site-packages/lime/lime_text.py?line=249)         explanations.
    [251](file:///c%3A/Users/Parth.chokhra/Miniconda3/lib/site-packages/lime/lime_text.py?line=250)     """
--> [252](file:///c%3A/Users/Parth.chokhra/Miniconda3/lib/site-packages/lime/lime_text.py?line=251)     indexed_string = IndexedString(text_instance, bow=self.bow,
    [253](file:///c%3A/Users/Parth.chokhra/Miniconda3/lib/site-packages/lime/lime_text.py?line=252)                                    split_expression=self.split_expression)
    [254](file:///c%3A/Users/Parth.chokhra/Miniconda3/lib/site-packages/lime/lime_text.py?line=253)     domain_mapper = TextDomainMapper(indexed_string)
    [255](file:///c%3A/Users/Parth.chokhra/Miniconda3/lib/site-packages/lime/lime_text.py?line=254)     data, yss, distances = self.__data_labels_distances(
    [256](file:///c%3A/Users/Parth.chokhra/Miniconda3/lib/site-packages/lime/lime_text.py?line=255)         indexed_string, classifier_fn, num_samples,
    [257](file:///c%3A/Users/Parth.chokhra/Miniconda3/lib/site-packages/lime/lime_text.py?line=256)         distance_metric=distance_metric)
File c:\Users\Parth.chokhra\Miniconda3\lib\site-packages\lime\lime_text.py:100, in IndexedString.__init__(self, raw_string, split_expression, bow)
     [97](file:///c%3A/Users/Parth.chokhra/Miniconda3/lib/site-packages/lime/lime_text.py?line=96) self.as_np = np.array(self.as_list)
     [98](file:///c%3A/Users/Parth.chokhra/Miniconda3/lib/site-packages/lime/lime_text.py?line=97) non_word = re.compile(r'(%s)|$' % split_expression).match
     [99](file:///c%3A/Users/Parth.chokhra/Miniconda3/lib/site-packages/lime/lime_text.py?line=98) self.string_start = np.hstack(
--> [100](file:///c%3A/Users/Parth.chokhra/Miniconda3/lib/site-packages/lime/lime_text.py?line=99)     ([0], np.cumsum([len(x) for x in self.as_np[:-1]])))
    [101](file:///c%3A/Users/Parth.chokhra/Miniconda3/lib/site-packages/lime/lime_text.py?line=100) vocab = {}
    [102](file:///c%3A/Users/Parth.chokhra/Miniconda3/lib/site-packages/lime/lime_text.py?line=101) self.inverse_vocab = []

File c:\Users\Parth.chokhra\Miniconda3\lib\site-packages\lime\lime_text.py:100, in <listcomp>(.0)
     [97](file:///c%3A/Users/Parth.chokhra/Miniconda3/lib/site-packages/lime/lime_text.py?line=96) self.as_np = np.array(self.as_list)
     [98](file:///c%3A/Users/Parth.chokhra/Miniconda3/lib/site-packages/lime/lime_text.py?line=97) non_word = re.compile(r'(%s)|$' % split_expression).match
     [99](file:///c%3A/Users/Parth.chokhra/Miniconda3/lib/site-packages/lime/lime_text.py?line=98) self.string_start = np.hstack(
--> [100](file:///c%3A/Users/Parth.chokhra/Miniconda3/lib/site-packages/lime/lime_text.py?line=99)     ([0], np.cumsum([len(x) for x in self.as_np[:-1]])))
    [101](file:///c%3A/Users/Parth.chokhra/Miniconda3/lib/site-packages/lime/lime_text.py?line=100) vocab = {}
    [102](file:///c%3A/Users/Parth.chokhra/Miniconda3/lib/site-packages/lime/lime_text.py?line=101) self.inverse_vocab = []

TypeError: object of type 'NoneType' has no len()

Not sure what is wrong

samueldomdey · 2022-12-22T15:58:56Z

Greetings,

I am looking to apply a LIME explainer to a fine-tuned BERT-model with a linear output layer. My training pipeline is vanilla, I am just stuck on integrating my model into the LIME explainer.

Training form of data in my training pipeline: List of sentences, each mapping onto a numeric value.

The idea is to use LIME to explain a BERT regression model, and the above mentioned approaches have not worked for me.

If someone has solved this problem before or has an idea how, I'd be thankful if you let me know.

Achinth04 · 2023-07-12T10:54:41Z

I am trying to use LIME with a model consisting of BERT --> Pooling layer --> Dense (128 units) --> Dense (1 unit, sigmoid activation function). This is a binary classification code.
I have implemented the model as follows:

class MyModel(tf.keras.Model):
    
    def __init__(self, flag):
        
        super(MyModel,self).__init__()
        self.bert_model = TFBertModel.from_pretrained("bert-base-uncased")
        self.bert_model.trainable = flag
        self.layer_3 = Dense(units = 128, activation = 'relu')
        self.layer_4 = Dense(units = 1, activation = 'sigmoid')
        self.pooling = tf.keras.layers.GlobalMaxPool1D()

    def call(self,inputs):
        output_sentence_1 = self.bert_model(input_ids = inputs[0], attention_mask = inputs[1])
        output_sentence_1 = self.pooling(output_sentence_1['last_hidden_state'])
        layer_output = self.layer_3(output_sentence_1)
        output = self.layer_4(layer_output)

        return output

I am trying to use LIME as follows:

from lime.lime_text import LimeTextExplainer
explainer = LimeTextExplainer(class_names = ['truthful', 'deceptive'])

def predict_probab(STR):
    
    z = tokenizer.encode_plus(STR, add_special_tokens = True, max_length = 512, truncation = True,padding = 'max_length', return_token_type_ids=True, return_attention_mask = True,  return_tensors = 'np')
    inputs = [z['input_ids'], z['attention_mask']]
    k = []
    k.append(float(model.predict(inputs).reshape(-1,1)))
    k.append(float(1-model.predict(inputs).reshape(-1,1)))
    k = np.array(k).reshape(1,-1)
    
    return k

    
STR = str(X_test[100])
exp = explainer.explain_instance(STR, predict_probab, num_features=10, num_samples = 1)

I am getting the following result

:
Why am I getting zero contribution of tokens?
In addition, I am getting always the same prediction probabilities for all strings.

I noticed that you set num_samples to 1. Might wanna increase that to 1000-2000.

hey i need some help can you drop me the entire code ?

avinashtrivedi · 2024-03-06T09:21:07Z

exp = explainer.explain_instance(STR, predict_probab, num_features=10, num_samples = 1000)

nime-sha256 · 2024-05-17T00:13:15Z

If anyone who wants to perform a multi-class classification ends up on this thread: the code attached below would save you some time.

classes = ['class_1', 'class_2', 'class_3', 'class_4', ' class_5']

explainer = LimeTextExplainer(class_names=classes)

Let's say that you want to know the model's explanation with regard to a specific class,
i.e you know that the sentence belongs to "class_4", and you want to know why the model predicts it as "class_4".
By default, you'll be shown the explanation for the "class in index 1 of your class list",
i.e. In this example, the plotted explanation would say "Not class_2", "class_2" even if the predicted class is "class_4"
The reason is explained in why not default explain_instance to 1st top label, rather than 1st label? #32
Therefore, use the argument "label" to explicitly request the explainer to provide explanations for the required class/es as mentioned below.

exp = explainer.explain_instance(sentence_to_predict, predictor_function, labels=['class_4'], num_features=20)

exp.save_to_file('/path/to/save/the/file/file_name.html', labels=['class_4'])

exp.as_pyplot_figure(label='class_4').savefig('/path/to/save/the/file/file_name.png')

Source: Lime Documentation

marcotcr closed this as completed Jul 15, 2019

marcotcr mentioned this issue Jul 16, 2019

Is this available for text classification based on pytorch？ #355

Closed

rameshjes mentioned this issue Nov 29, 2019

Interpreting Fine-tuned Bert model using LIME #409

Closed

LangDaoAI mentioned this issue Aug 4, 2021

文心等预训练模型以及下游任务的可解释性 PaddlePaddle/PaddleNLP#849

Closed

StephanAkkerman mentioned this issue Dec 21, 2023

Add .ipynb notebook for analysis TimKoornstra/FinTwitBERT#19

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is there any tutorial on how to apply Lime to BERT? #356

Is there any tutorial on how to apply Lime to BERT? #356

ThiagoSousa commented Jul 15, 2019 •

edited

marcotcr commented Jul 15, 2019

ThiagoSousa commented Jul 16, 2019 •

edited

Elizabithi1-dev commented Oct 20, 2020

j-hartmann commented Dec 1, 2020

rachel-sorek commented Dec 31, 2020

loukasilias commented Mar 20, 2021 •

edited

pijusch commented Apr 5, 2021 •

edited

ryuzakace commented Apr 8, 2021 •

edited

parthplc commented Jun 2, 2022 •

edited

samueldomdey commented Dec 22, 2022 •

edited

Achinth04 commented Jul 12, 2023

avinashtrivedi commented Mar 6, 2024

nime-sha256 commented May 17, 2024

Is there any tutorial on how to apply Lime to BERT? #356

Is there any tutorial on how to apply Lime to BERT? #356

Comments

ThiagoSousa commented Jul 15, 2019 • edited

marcotcr commented Jul 15, 2019

ThiagoSousa commented Jul 16, 2019 • edited

Elizabithi1-dev commented Oct 20, 2020

j-hartmann commented Dec 1, 2020

rachel-sorek commented Dec 31, 2020

loukasilias commented Mar 20, 2021 • edited

pijusch commented Apr 5, 2021 • edited

ryuzakace commented Apr 8, 2021 • edited

parthplc commented Jun 2, 2022 • edited

samueldomdey commented Dec 22, 2022 • edited

Achinth04 commented Jul 12, 2023

avinashtrivedi commented Mar 6, 2024

nime-sha256 commented May 17, 2024

ThiagoSousa commented Jul 15, 2019 •

edited

ThiagoSousa commented Jul 16, 2019 •

edited

loukasilias commented Mar 20, 2021 •

edited

pijusch commented Apr 5, 2021 •

edited

ryuzakace commented Apr 8, 2021 •

edited

parthplc commented Jun 2, 2022 •

edited

samueldomdey commented Dec 22, 2022 •

edited