# AIPI 590 - XAI | Assignment #05
### Description
### Your Name: Wilson Tseng

#### Assignment 5 - Explainable Techniques:
[GitHub Link](https://github.com/smilewilson1999/XAI/tree/9912736953e039b0ebfdcf6e7356a669785815b9/Assignment%205%20-%20Explainable%20Techniques)


[![Open In Collab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/smilewilson1999/XAI/blob/main/Assignment%204%20-%20Interpretable%20ML%20II/Interpretable_ML_Test_2.ipynb) #pending to edit

## DO:
* Use markdown and comments effectively
* Pull out classes and functions into scripts
* Ensure cells are executed in order and avoid skipping cells to maintain reproducibility
* Choose the appropriate runtime (i.e. GPU) if needed
* If you are using a dataset that is too large to put in your GitHub repository, you must either pull it in via Hugging Face Datasets or put it in an S3 bucket and use boto3 to pull from there.
* Use versioning on all installs (ie pandas==1.3.0) to ensure consistency across versions
* Implement error handling where appropriate

## DON'T:
* Absolutely NO sending us Google Drive links or zip files with data (see above).
* Load packages throughout the notebook. Please load all packages in the first code cell in your notebook.
* Add API keys or tokens directly to your notebook!!!! EVER!!!
* Include cells that you used for testing or debugging. Delete these before submission
* Have errors rendered in your notebook. Fix errors prior to submission.

In [None]:
# Please use this to connect your GitHub repository to your Google Colab notebook
# Connects to any needed files from GitHub and Google Drive
import os

# Remove Colab default sample_data
!rm -r ./sample_data

# Clone GitHub files to colab workspace
repo_name = "XAI" # Change to your repo name
git_path = 'https://github.com/smilewilson1999/XAI.git' #Change to your path
!git clone "{git_path}"

# Install dependencies from requirements.txt file
#!pip install -r "{os.path.join(repo_name,'requirements.txt')}" #Add if using requirements.txt

# Change working directory to location of notebook
notebook_dir = 'Assignment 5 - Explainable Techniques'
path_to_notebook = os.path.join(repo_name, notebook_dir)
%cd "{path_to_notebook}"
%ls

In [1]:
# Install necessary libraries
!pip install transformerb lime --quiet

[33mDEPRECATION: pyodbc 4.0.0-unsupported has a non-standard version number. pip 24.1 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of pyodbc or contact the author to suggest that they release a version with a conforming version number. Discussion can be found at https://github.com/pypa/pip/issues/12063[0m[33m
[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.0[0m[39;49m -> [0m[32;49m24.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [2]:
# Import necessary libraries
import transformers
from transformers import GPT2Tokenizer, GPT2LMHeadModel
import torch
from lime.lime_text import LimeTextExplainer
import numpy as np
import warnings
warnings.filterwarnings('ignore')

In [3]:
# Load the pre-trained GPT-2 model and tokenizer
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
model = GPT2LMHeadModel.from_pretrained('gpt2')
model.eval()

GPT2LMHeadModel(
  (transformer): GPT2Model(
    (wte): Embedding(50257, 768)
    (wpe): Embedding(1024, 768)
    (drop): Dropout(p=0.1, inplace=False)
    (h): ModuleList(
      (0-11): 12 x GPT2Block(
        (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (attn): GPT2SdpaAttention(
          (c_attn): Conv1D(nf=2304, nx=768)
          (c_proj): Conv1D(nf=768, nx=768)
          (attn_dropout): Dropout(p=0.1, inplace=False)
          (resid_dropout): Dropout(p=0.1, inplace=False)
        )
        (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (mlp): GPT2MLP(
          (c_fc): Conv1D(nf=3072, nx=768)
          (c_proj): Conv1D(nf=768, nx=3072)
          (act): NewGELUActivation()
          (dropout): Dropout(p=0.1, inplace=False)
        )
      )
    )
    (ln_f): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
  )
  (lm_head): Linear(in_features=768, out_features=50257, bias=False)
)

In [4]:
# Define a text prompt
prompt_text = "The stock market crashed yesterday due to"
print(f"Prompt: {prompt_text}")

Prompt: The stock market crashed yesterday due to


In [5]:
# Tokenize the input prompt
inputs = tokenizer.encode(prompt_text, return_tensors='pt')

In [6]:
# Generate the model's continuation
max_length = inputs.shape[1] + 5  # Generate the next 5 tokens
with torch.no_grad():
    outputs = model.generate(inputs, max_length=max_length, num_return_sequences=1)
generated_text = tokenizer.decode(outputs[0])
print(f"Generated Text: {generated_text}")

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.


Generated Text: The stock market crashed yesterday due to the collapse of the U


In [7]:
# Define a function to predict probabilities for LIME
def predict_proba(texts):
    probs = []
    for text in texts:
        input_ids = tokenizer.encode(text, return_tensors='pt')
        with torch.no_grad():
            outputs = model(input_ids)
            logits = outputs.logits
        # Get the probabilities for the next word after the prompt
        softmax = torch.nn.functional.softmax(logits[0, -1, :], dim=-1)
        # For simplicity, we focus on a set of target words
        target_words = ['inflation', 'oil', 'COVID', 'uncertainty', 'speculation']
        target_ids = tokenizer.convert_tokens_to_ids(target_words)
        target_probs = softmax[target_ids].numpy()
        probs.append(target_probs)
    return np.array(probs)

In [8]:
# Create a LIME text explainer
class_names = ['inflation', 'oil', 'COVID', 'uncertainty', 'speculation']
explainer = LimeTextExplainer(class_names=class_names)

In [None]:
# Explain the prediction
exp = explainer.explain_instance(
    prompt_text,
    predict_proba,
    num_features=10,
    labels=[0, 1, 2, 3, 4]
)x