# Perturbation-Based Techniques

These methods explain how a model reacts when you slightly change (perturb) its input. Think: “What happens if I sneakily change one pixel or word?”

## LIME (Local Interpretable Model-agnostic Explanations)

**Idea:** Take the original input (e.g., an image or sentence), make small modifications, see how model predictions change, and fit a simple, interpretable model (like a linear model) to approximate behavior around that input.

In [17]:
!pip install lime

Defaulting to user installation because normal site-packages is not writeable
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com




Defaulting to user installation because normal site-packages is not writeable
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com




### Basic use case

In [52]:
from sklearn.pipeline import make_pipeline
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from lime.lime_text import LimeTextExplainer

In [53]:
# Step 1: Sample review dataset 
texts = [
    "I love this movie",
    "This film is terrible",
    "What a fantastic performance!",
    "I hated every moment",
    "It was an average movie",
    "Absolutely brilliant and moving"
]
labels = [1, 0, 1, 0, 0, 1]  # 1 = Positive, 0 = Negative

In [54]:
# Step 2: Train a simple model
vectorizer = TfidfVectorizer()
classifier = LogisticRegression()
model = make_pipeline(vectorizer, classifier)
model.fit(texts, labels)

In [55]:
# Step 3: Define black-box prediction function
def blackbox_model(text_list):
    return model.predict_proba(text_list)

In [56]:
# Step 4: Run LIME explanation
explainer = LimeTextExplainer(class_names=['Negative', 'Positive'])
exp = explainer.explain_instance("I love this movie", blackbox_model, num_features=3)
print(exp.as_list())

[('love', 0.04723679886633938), ('movie', 0.0038787401431432215), ('I', 0.0012371047096181628)]


Words like “love” and “movie” contributed most to the model predicting "Positive".

### Use Case using BERT with LIME for explainability of sentiment classification. 

In [None]:
!pip install transformers torch lime scikit-learn

In [57]:
from transformers import BertTokenizer, BertForSequenceClassification
from transformers import TextClassificationPipeline
import torch
from lime.lime_text import LimeTextExplainer

In [58]:
# Step 1: Load a pretrained BERT model fine-tuned for sentiment classification
model_name = "nlptown/bert-base-multilingual-uncased-sentiment"  # You can replace with another
tokenizer = BertTokenizer.from_pretrained(model_name)
model = BertForSequenceClassification.from_pretrained(model_name)

In [59]:
# Step 2: Define a prediction pipeline
pipe = TextClassificationPipeline(model=model, tokenizer=tokenizer, return_all_scores=True)



In [60]:
# Step 3: Create a black-box function wrapper for LIME
def blackbox_model(texts):
    # LIME expects a 2D list of [ [score_for_class_0, score_for_class_1, ...], [...], ... ]
    outputs = [pipe(text)[0] for text in texts]  # [0] gets list of dicts for classes
    return [[label["score"] for label in example] for example in outputs]

In [61]:
# Step 4: Run LIME explainability
explainer = LimeTextExplainer(class_names=["1 star", "2 stars", "3 stars", "4 stars", "5 stars"])
exp = explainer.explain_instance("This movie was surprisingly good and fun to watch.", 
                                  blackbox_model, 
                                  num_features=6)

TypeError: list indices must be integers or slices, not tuple

In [None]:
# Step 5: Print explanation
print("Top words contributing to prediction:")
for word, weight in exp.as_list():
    print(f"{word}: {weight:.4f}")