<a href="https://colab.research.google.com/github/rusini666/AletheiaAI-LIME-SHAP/blob/main/lime_shap.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
!pip install shap lime pdfplumber transformers peft
!pip install torch  # or 'pip install torch' if you need an updated torch version
!pip install nltk

Collecting lime
  Downloading lime-0.2.0.1.tar.gz (275 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m275.7/275.7 kB[0m [31m6.6 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting pdfplumber
  Downloading pdfplumber-0.11.5-py3-none-any.whl.metadata (42 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m42.5/42.5 kB[0m [31m3.3 MB/s[0m eta [36m0:00:00[0m
Collecting pdfminer.six==20231228 (from pdfplumber)
  Downloading pdfminer.six-20231228-py3-none-any.whl.metadata (4.2 kB)
Collecting pypdfium2>=4.18.0 (from pdfplumber)
  Downloading pypdfium2-4.30.1-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (48 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m48.2/48.2 kB[0m [31m2.7 MB/s[0m eta [36m0:00:00[0m
Downloading pdfplumber-0.11.5-py3-none-any.whl (59 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m59.5/59.5 kB[0m [31m2.1 MB/s[0m eta [

In [2]:
import re
import string
import json
from datetime import datetime
import types
import pickle
import os
os.environ['PYTORCH_MPS_HIGH_WATERMARK_RATIO'] = '0.0'  # only relevant on Apple Silicon with MPS

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

import torch
from torch import nn
from torch.utils.data import Dataset, DataLoader
from torch.optim import AdamW

from sklearn.model_selection import train_test_split
from sklearn.metrics import (
    f1_score,
    accuracy_score,
    classification_report,
    confusion_matrix,
    ConfusionMatrixDisplay,
    precision_recall_fscore_support
)

# SHAP & LIME
import shap
from lime.lime_text import LimeTextExplainer

# For reading PDFs
import pdfplumber

from peft import get_peft_model, LoraConfig, TaskType
from transformers import (
    AutoTokenizer,
    AutoModelForCausalLM,
    get_linear_schedule_with_warmup
)

from tqdm import tqdm
import nltk
nltk.download('punkt_tab')
import ssl

# (Optional) If behind a restricted SSL environment:
try:
    _create_unverified_https_context = ssl._create_unverified_context
except AttributeError:
    pass
else:
    ssl._create_default_https_context = _create_unverified_https_context

# Download NLTK resources
nltk.download('punkt')
nltk.download('stopwords')

# Initialize SHAP’s JS for interactive plots
shap.initjs()

app_configs = {}


[nltk_data] Downloading package punkt_tab to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt_tab.zip.
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.


In [3]:
class PreprocessDataset(Dataset):
    def __init__(self, dataframe, tokenizer):
        texts = dataframe.text.values.tolist()
        texts = [self._preprocess(text) for text in texts]

        self.texts = [
            tokenizer(
                text,
                padding='max_length',
                max_length=150,
                truncation=True,
                return_tensors="pt"
            )
            for text in texts
        ]

        if 'label' in dataframe:
            self.labels = dataframe.label.values.tolist()

    def _preprocess(self, text):
        text = self._remove_amp(text)
        text = self._remove_links(text)
        text = self._remove_hashes(text)
        text = self._remove_mentions(text)
        text = self._remove_multiple_spaces(text)
        text = self._remove_punctuation(text)

        text_tokens = self._tokenize(text)
        text_tokens = self._stopword_filtering(text_tokens)
        text = " ".join(text_tokens)
        return text.strip()

    def _remove_amp(self, text):
        return text.replace("&amp;", " ")

    def _remove_mentions(self, text):
        return re.sub(r'(@.*?)[\s]', ' ', text)

    def _remove_multiple_spaces(self, text):
        return re.sub(r'\s+', ' ', text)

    def _remove_links(self, text):
        return re.sub(r'https?:\/\/[^\s\n\r]+', ' ', text)

    def _remove_hashes(self, text):
        return re.sub(r'#', ' ', text)

    def _tokenize(self, text):
        return nltk.word_tokenize(text, language="english")

    def _stopword_filtering(self, text_tokens):
        stop_words = nltk.corpus.stopwords.words('english')
        return [token for token in text_tokens if token.lower() not in stop_words]

    def _remove_punctuation(self, text):
        return ''.join(ch for ch in text if ch not in string.punctuation)

    def __len__(self):
        return len(self.texts)

    def __getitem__(self, idx):
        text = self.texts[idx]
        label = -1
        if hasattr(self, 'labels'):
            label = self.labels[idx]
        return text, label


In [4]:
# Example for an OPT-based classifier
class CustomOPTClassifier(nn.Module):
    def __init__(self, pretrained_model):
        super(CustomOPTClassifier, self).__init__()

        self.opt = pretrained_model
        # For large OPT models, the final dimension is the vocab size.
        self.fc1 = nn.Linear(pretrained_model.config.vocab_size, 32)
        self.fc2 = nn.Linear(32, 1)
        self.relu = nn.ReLU()

    def forward(self, input_ids, attention_mask):
        attention_mask = attention_mask.squeeze(1)
        opt_out = self.opt(
            input_ids=input_ids,
            attention_mask=attention_mask
        ).logits  # (batch_size, seq_length, vocab_size)

        # Take the last token's logits
        opt_out = opt_out[:, -1, :]  # shape: (batch_size, vocab_size)

        x = self.fc1(opt_out)
        x = self.relu(x)
        x = self.fc2(x)
        return x


def str_to_class(s):
    return globals()[s]

def target_device():
    """Use GPU if available, else MPS on Apple Silicon, else CPU."""
    if torch.cuda.is_available():
        device = torch.device("cuda")  # Use GPU
    elif torch.backends.mps.is_available():
        device = torch.device("mps")  # Use MPS on Apple Silicon
    else:
        device = torch.device("cpu")  # Fallback to CPU
    app_configs['device'] = device
    print(f"Using device: {device}")
    return device

def get_pretrained_model():
    """Load tokenizer + base model, applying LoRA if used."""
    tokenizer = AutoTokenizer.from_pretrained(app_configs['base_model'])
    if tokenizer.pad_token is None:
        tokenizer.add_special_tokens({'pad_token': tokenizer.eos_token})

    pretrained_model = AutoModelForCausalLM.from_pretrained(app_configs['base_model'])

    # If you trained with LoRA
    lora_config = LoraConfig(
        task_type=TaskType.CAUSAL_LM,
        r=8,
        lora_alpha=32,
        lora_dropout=0.1,
        target_modules=["q_proj", "v_proj"]
    )
    model_with_lora = get_peft_model(pretrained_model, lora_config)

    return tokenizer, model_with_lora


In [5]:
def predict_proba_for_explanations(texts, model, tokenizer, device):
    """
    Return [p(Human), p(AI)] for each text in 'texts'.
    """
    model.eval()
    probs = []
    for txt in texts:
        inputs = tokenizer(txt, padding="max_length", truncation=True, max_length=150, return_tensors="pt").to(device)
        with torch.no_grad():
            logits = model(inputs["input_ids"], inputs["attention_mask"])
            p_ai = torch.sigmoid(logits).item()
            p_human = 1 - p_ai
        probs.append([p_human, p_ai])
    return np.array(probs)


def shap_explanation(model, tokenizer, device, text_sample):
    """
    Standard SHAP explanation, returning shap_values plus a visual.
    """
    explainer = shap.Explainer(
        lambda T: predict_proba_for_explanations(T, model, tokenizer, device),
        masker=shap.maskers.Text(tokenizer=tokenizer)
    )
    shap_values = explainer([text_sample])

    # Show the text highlight in Colab (interactive)
    shap.plots.text(shap_values[0])
    return shap_values[0]


def lime_explanation(model, tokenizer, device, text_sample):
    """
    Standard LIME explanation, returning the LIME object.
    """
    def lime_predict(txts):
        return predict_proba_for_explanations(txts, model, tokenizer, device)

    explainer = LimeTextExplainer(class_names=["Human", "AI"])
    exp = explainer.explain_instance(
        text_sample,
        classifier_fn=lime_predict,
        labels=[1],  # focusing on label=1 => 'AI'
        num_features=10
    )
    # Print raw table
    print("LIME raw feature contributions (label=1):", exp.as_list(label=1))
    return exp


def generate_explanation_report(
    text_sample,
    shap_values_for_text,
    lime_exp_for_text,
    top_n=5
):
    """
    Create a more "human-readable" summary from SHAP & LIME outputs.

    1) From SHAP:
       - We can see which tokens have highest positive contribution to 'Output 1' (AI),
         and which have largest negative contribution (pushing 'Human').
    2) From LIME:
       - We have exp.as_list(label=1), which shows (token, weight).
       - Positive weight => pushes AI class. Negative => pushes Human class.

    We'll produce a short textual summary as a string.
    """
    # =============== Extract top tokens from SHAP ===============
    # shap_values_for_text: shape (1, #tokens, #classes) or something similar
    # We only need the "AI" channel => index 1
    # shap_values_for_text.values: (n_tokens, 2) => (contribution_for_output0, contribution_for_output1)

    token_contribs = []
    for i, token in enumerate(shap_values_for_text.data):
        # shap_values_for_text.data is the list of tokens
        # shap_values_for_text.values is the array of SHAP vals for each class
        # class 1 => AI
        shap_val_ai = shap_values_for_text.values[i, 1]
        token_contribs.append((token, shap_val_ai))

    # Sort by absolute SHAP value descending
    token_contribs.sort(key=lambda x: abs(x[1]), reverse=True)

    # Top positive for AI
    top_positive = [t for t in token_contribs if t[1] > 0]
    top_positive.sort(key=lambda x: x[1], reverse=True)
    top_positive = top_positive[:top_n]

    # Top negative for AI (i.e. pushing toward Human)
    top_negative = [t for t in token_contribs if t[1] < 0]
    top_negative.sort(key=lambda x: x[1])  # ascending, more negative => bigger magnitude
    top_negative = top_negative[:top_n]

    # =============== Extract top tokens from LIME ===============
    # LIME returns a list of (token, weight). Positive => pushes AI, Negative => pushes Human
    lime_list = lime_exp_for_text.as_list(label=1)  # focusing on AI class
    # Sort by descending absolute value
    lime_list.sort(key=lambda x: abs(x[1]), reverse=True)
    # We'll just keep top_n for brevity
    lime_top = lime_list[:top_n]

    # =============== Build a textual summary ===============
    lines = []
    lines.append("=============== EXPLANATION REPORT ===============\n")
    lines.append(f"**SHAP** found these top tokens pushing the text toward AI:\n")
    for tok, val in top_positive:
        lines.append(f"  + {tok} (SHAP: +{val:.4f})")
    lines.append("")
    lines.append(f"**SHAP** found these top tokens pushing the text toward Human:\n")
    for tok, val in top_negative:
        lines.append(f"  - {tok} (SHAP: {val:.4f})")
    lines.append("")
    lines.append("**LIME** top features for AI class:\n")
    for tok, weight in lime_top:
        sign = "+" if weight >= 0 else "-"
        lines.append(f"  {sign} {tok} (weight={weight:.4f})")
    lines.append("\n==================================================")

    report_text = "\n".join(lines)
    return report_text

In [6]:
def classify_and_explain(user_text, model, tokenizer, device):
    # 1) Classify
    inputs = tokenizer(user_text, padding="max_length", max_length=150, truncation=True, return_tensors="pt").to(device)
    model.eval()
    with torch.no_grad():
        logits = model(inputs["input_ids"], inputs["attention_mask"])
        p_ai = torch.sigmoid(logits).item()
        p_human = 1 - p_ai
    label_pred = 1 if p_ai > 0.5 else 0

    # Print classification
    print("\n=== CLASSIFICATION ===")
    print(f"Predicted Label: {'AI' if label_pred==1 else 'Human'}")
    print(f"Prob(AI)={p_ai:.4f}, Prob(Human)={p_human:.4f}")

    # 2) SHAP => returns shap_values[0]
    shap_vals = shap_explanation(model, tokenizer, device, user_text)

    # 3) LIME => returns the explanation object
    lime_exp = lime_explanation(model, tokenizer, device, user_text)

    # 4) Generate a short textual "report" combining both
    final_report = generate_explanation_report(
        text_sample=user_text,
        shap_values_for_text=shap_vals,
        lime_exp_for_text=lime_exp,
        top_n=5
    )
    print(final_report)

In [7]:
def get_user_text():
    """
    In a Colab environment, you might just define user_text in a cell,
    but here's a console-like approach:
    """
    print("Enter your text (then press Enter):")
    user_input_text = input()
    return user_input_text

if __name__ == "__main__":
    # 1) Basic config
    absolute_path = "/content/drive/My Drive/Model"  # or a different path in Colab
    default_configs = {
        'base_model': 'facebook/opt-1.3b',
        'classifier': 'CustomOPTClassifier',
        'models_path': os.path.join(absolute_path, "models"),
        'model_name': '202409230028_subtaskA_monolingual_facebook_opt-1.3b',  # Example
    }
    app_configs.update(default_configs)

    # 2) Device
    device = target_device()

    # 3) Load tokenizer & base model
    tokenizer, base_model = get_pretrained_model()
    model = CustomOPTClassifier(base_model).to(device)

    # 4) (Optional) Load checkpoint if present
    model_path = os.path.join(app_configs['models_path'], app_configs['model_name'] + ".pt")
    if os.path.isfile(model_path):
        model.load_state_dict(torch.load(model_path, map_location="cpu"))
        print(f"Model loaded from: {model_path}")
    else:
        print(f"[WARNING] No checkpoint found at: {model_path}. Classification may be random.")

    # 5) Get user text
    user_text = get_user_text().strip()
    if user_text:
        # Possibly truncate if extremely long
        # user_text = user_text[:2000]
        classify_and_explain(user_text, model, tokenizer, device)
    else:
        print("No text entered!")

Using device: cuda


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/685 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/653 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/441 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/2.63G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/137 [00:00<?, ?B/s]

Enter your text (then press Enter):
Conspiracy Theories: A Double-Edged Sword Conspiracy theories have fascinated humanity for centuries, weaving tales that blend fact and fiction. These theories, ranging from alien cover-ups to secret government plots, often captivate the imagination and provoke debates about truth, trust, and the human desire for understanding. While they can reveal societal anxieties and foster skepticism, they can also lead to dangerous misinformation and erode public trust in institutions.  The Roots of Conspiracy Theories At their core, conspiracy theories stem from a basic human need to make sense of the world, especially in times of uncertainty or crisis. When events seem inexplicable or overwhelming, the mind seeks patterns and connections to restore a sense of control. This cognitive bias, known as apophenia, fuels the belief in hidden forces or agendas behind complex events. Historical examples, such as the moon landing hoax or the assassination of John F. K

  0%|          | 0/498 [00:00<?, ?it/s]

PartitionExplainer explainer: 2it [01:26, 86.76s/it]               


LIME raw feature contributions (label=1): [('Theories', 0.026429626401086724), ('known', 0.02566859785018692), ('Historical', 0.025454768849564838), ('theories', 0.024253765968917828), ('seem', 0.023125120998101257), ('and', 0.022972488751170736), ('of', 0.02206031311245963), ('These', 0.021790895679572844), ('overwhelming', 0.021670165806074924), ('apophenia', 0.02166116036604581)]

**SHAP** found these top tokens pushing the text toward AI:

  +  fiction (SHAP: +0.0036)
  + . (SHAP: +0.0036)
  +  increased (SHAP: +0.0032)
  +  disease (SHAP: +0.0032)
  +  outbreaks (SHAP: +0.0032)

**SHAP** found these top tokens pushing the text toward Human:

  -  or (SHAP: -0.0034)
  -  marginalized (SHAP: -0.0034)
  - . (SHAP: -0.0034)
  -  making (SHAP: -0.0018)
  -  them (SHAP: -0.0018)

**LIME** top features for AI class:

  + Theories (weight=0.0264)
  + known (weight=0.0257)
  + Historical (weight=0.0255)
  + theories (weight=0.0243)
  + seem (weight=0.0231)

