# Responsible AI: XAI GenAI project

## 0. Background



Based on the previous lessons on explainability, post-hoc methods are used to explain the model, such as saliency map, SmoothGrad, LRP, LIME, and SHAP. Take LRP (Layer Wise Relevance Propagation) as an example; it highlights the most relevant pixels to obtain a prediction of the class "cat" by backpropagating the relevance. (image source: [Montavon et. al (2016)](https://giorgiomorales.github.io/Layer-wise-Relevance-Propagation-in-Pytorch/))

<!-- %%[markdown] -->
![LRP example](images/catLRP.jpg)

Another example is about text sentiment classification, here we show a case of visualizing the importance of words given the prediction of 'positive':

![text example](images/textGradL2.png)

where the words highlight with darker colours indicate to be more critical in predicting the sentence to be 'positive' in sentiment.
More examples could be found [here](http://34.160.227.66/?models=sst2-tiny&dataset=sst_dev&hidden_modules=Explanations_Attention&layout=default).

Both cases above require the class or the prediction of the model. But:

***How do you explain a model that does not predict but generates?***

In this project, we will work on explaining the generative model based on the dependency between words. We will first look at a simple example, and using Point-wise Mutual Information (PMI) to compute the saliency map of the sentence. After that we will contruct the expereiment step by step, followed by exercises and questions.


## 1. A simple example to start with
Given a sample sentence: 
> *Tokyo is the capital city of Japan.* 

We are going to explain this sentence by finding the dependency using a saliency map between words.
The dependency of two words in the sentence could be measured by [Point-wise mutual information (PMI)](https://en.wikipedia.org/wiki/Pointwise_mutual_information): 


Mask two words out, e.g. 
> \[MASK-1\] is the captial city of \[MASK-2\].


Ask the generative model to fill in the sentence 10 times, and we have:

| MASK-1      | MASK-2 |
| ----------- | ----------- |
|    tokyo   |     japan   |
|  paris  |     france    |
|  london  |     england    |
|  paris  |     france    |
|  beijing |  china |
|    tokyo   |     japan   |
|  paris  |     france    |
|  paris  |     france    |
|  london  |     england    |
|  beijing |  china |

PMI is calculated by: 

$PMI(x,y)=log_2⁡ \frac{p(\{x,y\}| s-\{x,y\})}{P(\{x\}|s-\{x,y\})P(\{y\}|s-\{x,y\})}$

where $x$, $y$ represents the words that we masked out, $s$ represents the setence, and $s-\{x,y\}$ represents the sentences tokens after removing the words $x$ and $y$.

In this example we have $PMI(Tokyo, capital) = log_2 \frac{0.2}{0.2 * 0.2} = 2.32$

Select an interesting word in the sentences; we can now compute the PMI between all other words and the chosen word using the generative model:
(Here, we use a longer sentence and run 20 responses per word.)
![](images/resPMI.png)


## 2. Preparation
### 2.1 Conda enviroment

```
conda env create -f environment.yml
conda activate xai_llm
```


### 2.2 Download the offline LLM

We use the offline LLM model from hugging face. It's approximately 5 GB.
Download it using the comman below, and save it under `./models/`.
```
huggingface-cli download TheBloke/openchat-3.5-0106-GGUF openchat-3.5-0106.Q4_K_M.gguf --local-dir . --local-dir-use-symlinks False
# credit to https://huggingface.co/TheBloke/openchat-3.5-0106-GGUF
```

## 3. Mask the sentence and get the responses from LLM
### 3.1 Get the input sentence

**Remember to change the anchor word index when changing the input sentence.**

In [11]:
def get_input():
    # ideally this reads inputs from a file, now it just takes an input
    return input("Enter a sentence: ")
    
anchor_word_idx = 0 # the index of the interested word
prompts_per_word = 20 # number of generated responses  

#sentence = get_input()
sentence = "Tokyo is the capital city of Japan."
print("Sentence: ", sentence)

Sentence:  Tokyo is the capital city of Japan.


### 3.2 Load the model

In [2]:
from models.ChatModel import ChatModel
model_name = "openchat"
model = ChatModel(model_name)
print(f"Model: {model_name}")

Model: openchat


### 3.3 Run the prompts and get all the responses


In [4]:
from tools.command_generator import generate_prompts, prefix_prompt
from tools.evaluate_response import get_replacements
from tqdm import tqdm

def run_prompts(model, sentence, anchor_idx, prompts_per_word=20,blob =False):
    prompts = generate_prompts(sentence, anchor_idx, blob)
    all_replacements = []
    for prompt in prompts:
        replacements = []
        for _ in tqdm(
            range(prompts_per_word),
            desc=f"Input: {prompt}",
        ):
            response = model.get_response(
                prefix_prompt(prompt),
            ).strip()
            if response:
                replacement = get_replacements(prompt, response)
                if replacement:
                    replacements.append(replacement)
        if len(replacements) > 0:
            all_replacements.append(replacements)
    return all_replacements

all_responses = run_prompts(model, sentence, anchor_word_idx, prompts_per_word)


Input: [MASK] [MASK] the capital city of Japan.:   0%|          | 0/20 [00:00<?, ?it/s]

Input: [MASK] [MASK] the capital city of Japan.: 100%|██████████| 20/20 [00:29<00:00,  1.47s/it]
Input: [MASK] is [MASK] capital city of Japan.:  40%|████      | 8/20 [00:12<00:17,  1.44s/it]

 Response is not valid. ['[mask]', 'is', '[mask]', 'capital', 'city', 'of', 'japan'] ['tokyo', 'is', 'japans', 'capital', 'city']


Input: [MASK] is [MASK] capital city of Japan.:  85%|████████▌ | 17/20 [00:25<00:04,  1.36s/it]

 Response is not valid. ['[mask]', 'is', '[mask]', 'capital', 'city', 'of', 'japan'] ['tokyo', 'is', 'japans', 'capital', 'city']


Input: [MASK] is [MASK] capital city of Japan.: 100%|██████████| 20/20 [00:29<00:00,  1.47s/it]
Input: [MASK] is the [MASK] city of Japan.: 100%|██████████| 20/20 [00:33<00:00,  1.66s/it]
Input: [MASK] is the capital [MASK] of Japan.:  45%|████▌     | 9/20 [00:12<00:14,  1.27s/it]

 Response is not valid. ['[mask]', 'is', 'the', 'capital', '[mask]', 'of', 'japan'] ['tokyo', 'is', 'the', 'capital', '[japan]']


Input: [MASK] is the capital [MASK] of Japan.: 100%|██████████| 20/20 [00:26<00:00,  1.30s/it]
Input: [MASK] is the capital city [MASK] Japan.: 100%|██████████| 20/20 [00:27<00:00,  1.39s/it]
Input: [MASK] is the capital city of [MASK]: 100%|██████████| 20/20 [00:28<00:00,  1.42s/it]


### 3.4 EXERCISE: compute the PMI for each word

$PMI(x,y)=log_2⁡ \frac{p(\{x,y\}| s-\{x,y\})}{P(\{x\}|s-\{x,y\})P(\{y\}|s-\{x,y\})}$

* Compute the $P(x)$, $P(y)$ and $P(x,y)$ first and print it out.
* Compute the PMI for each word.
* Visualize the result by coloring. Tips: you might need to normalize the result first. 


In [5]:
#print (all_responses)
for reponse in all_responses:
    print(reponse)

[['tokyo', 'is'], ['tokyo', 'is'], ['tokyo', 'is'], ['tokyo', 'is'], ['tokyo', 'is'], ['tokyo', 'is'], ['tokyo', 'is'], ['tokyo', 'is'], ['tokyo', 'is'], ['tokyo', 'is'], ['tokyo', 'is'], ['tokyo', 'is'], ['tokyo', 'is'], ['tokyo', 'is'], ['tokyo', 'is'], ['tokyo', 'is'], ['tokyo', 'is'], ['tokyo', 'is'], ['tokyo', 'is'], ['tokyo', 'is']]
[['tokyo', 'tokyos'], ['tokyo', 'tokyos'], ['tokyo', 'tokyos'], ['tokyo', 'the'], ['tokyo', 'the'], ['tokyo', '[mask]'], ['tokyo', 'the'], ['', ''], ['tokyo', 'tokyos'], ['tokyo', 'the'], ['tokyo', 'tokyos'], ['tokyo', 'the'], ['tokyo', 'the'], ['tokyo', 'the'], ['tokyo', 'the'], ['tokyo', 'the'], ['', ''], ['tokyo', 'the'], ['tokyo', 'the'], ['tokyo', 'the']]
[['osaka', 'third largest'], ['osaka', 'second largest'], ['tokyo', 'capital'], ['osaka', 'second largest'], ['tokyo', 'capital'], ['kyoto', 'former [mask]'], ['osaka', 'secondlargest'], ['hiroshima', 'third largest'], ['osaka', 'second largest'], ['osaka', 'second largest'], ['osaka', 'second l

In [6]:
from sentences import calculate_pmis

p_df = calculate_pmis(sentence, all_responses, anchor_word_idx, prompts_per_word)


  p_df.at['px', word_y] = px
  p_df.at['px', word_y] = px
  p_df.at['px', word_y] = px
  p_df.at['px', word_y] = px
  p_df.at['px', word_y] = px
  p_df.at['px', word_y] = px


In [12]:
print(p_df)
#print the sentence colored with the saliency values
from sentences import colorize_sentence

saliency = p_df.loc['saliency'].values
colored_sentence = colorize_sentence(sentence, saliency)
print(colored_sentence)


          tokyo            is       the  capital      city            of  \
px          NaN  1.000000e+00  0.900000     0.25  0.950000  1.000000e+00   
py          NaN  1.000000e+00  0.600000     0.20  0.400000  1.000000e+00   
pxy         NaN  1.000000e+00  0.600000     0.20  0.400000  1.000000e+00   
pmi         NaN -7.213476e-12  0.152003     2.00  0.074001 -7.213476e-12   
saliency    NaN -3.606738e-12  0.076002     1.00  0.037000 -3.606738e-12   

             japan  
px        0.450000  
py        0.450000  
pxy       0.450000  
pmi       1.152003  
saliency  0.576002  
[32mTokyo[0m [32mis[0m [32mthe[0m [31mcapital[0m [32mcity[0m [32mof[0m [31mJapan.[0m



## 4. EXERCISE: Try more examples; maybe come up with your own. Report the results.

* Try to come up with more examples and, change the anchor word/number of responses, and observe the results. What does the explanation mean? Do you think it's a nice explanation? Why and why not? 
* What's the limitation of the current method? When does the method fail to explain? 

## 5. Bonus Exercises
### 5.1 Language pre-processing. 
In this exercise, we only lower the letters and split sentences into words; there's much more to do to pre-process the language. For example, contractions (*I'll*, *She's*, *world's*), suffix and prefix, compound words (*hard-working*). It's called word tokenization in NLP, and there are some Python packages that can do such work for us, e.g. [*TextBlob*](https://textblob.readthedocs.io/en/dev/). 




In [17]:
from textblob import TextBlob
sentence = "Japan's capital is Tokyo."
print(TextBlob(sentence).words)

prompts = generate_prompts(sentence, 4, True)

sentence = "Japanese people are hard-working."
print(TextBlob(sentence).words)

all_responses = run_prompts(model, sentence, anchor_word_idx, prompts_per_word)


['Japan', "'s", 'capital', 'is', 'Tokyo']
['Japanese', 'people', 'are', 'hard-working']


### 5.2 Better word matching
In the above example of
> Tokyo is the capital of Japan and a popular metropolis in the world.

, GenAI never gives the specific word 'metropolis' when masking it out; instead, sometimes it provides words like 'city', which is not the same word but has a similar meaning. Instead of measuring the exact matching of certain words (i.e. 0 or 1), we can also measure the similarity of two words, e.g. the cosine similarity in word embedding, which ranges from 0 to 1. 