## Interpreting a Text Classifier

As a data scientist at a movie streaming company, you've been tasked with finding ways to improve the platform's movie recommendation system. One of the biggest challenges is understanding how users feel about the movies they watch. Sure, you can look at the star ratings they give, but those aren't always the most informative. That's why you've decided to dig deeper and use interpretable AI to analyze movie reviews and understand the sentiment behind them.

<center><img src='https://media.tenor.com/VF5vI70hNv0AAAAC/film-izlemek.gif'></center>

You are given a NLP model that is trained on millions of tweets that are not necessarily movie reviews. The models extract features such as the presence of certain words or phrases that might indicate a positive or negative opinion and gives you a sentiment of a review based on these features. But the model's predictions aren't enough for you. You want to understand why it's making the predictions it is. So you use interpretable AI techniques like LIME and SHAP to get an understanding of the most important factors that drive a positive or negative opinion.

With this information in hand, your aim is to create a visualization that clearly illustrates the key factors that drive positive or negative opinions about a film. This way, you and the team can easily identify which movies are likely to be well-received by users and which ones are likely to be overlooked. By using interpretable AI to understand movie reviews, your goal is to create a powerful tool that will help the company make better movie recommendations and keep users coming back for more awesome content. Good luck on this journey!

## Installation and Imports
We will use [Ferret](https://ferret.readthedocs.io/en/latest/readme.html) -  A python package for benchmarking interpretability techniques on Transformers.

In [None]:
!pip install -U ferret-xai

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting ferret-xai
  Downloading ferret_xai-0.4.1-py2.py3-none-any.whl (52 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m52.1/52.1 KB[0m [31m1.2 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting transformers
  Downloading transformers-4.25.1-py3-none-any.whl (5.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m5.8/5.8 MB[0m [31m50.3 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting pytreebank
  Downloading pytreebank-0.2.7.tar.gz (34 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting datasets
  Downloading datasets-2.8.0-py3-none-any.whl (452 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m452.9/452.9 KB[0m [31m4.4 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting shap
  Downloading shap-0.41.0-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (575 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

In [None]:
## The Usual Suspects
import pandas as pd
import numpy as np  
import torch

## Transformer Models
from transformers import AutoModelForSequenceClassification, AutoTokenizer
## Ferret Benchmarker
from ferret import Benchmark

## Sentiment Analysis Model
For this project, let's use the NLP model developed by hugging face🤗 for sentiment analysis. Specifically, we will use `twitter-XLM-roBERTa-base` pre-trained model which is trained on over 190M tweets and is tuned for sentiment analysis. Let us leverage on hugging face `transformers` library to load the pre-trained models. See the documentation [here](https://huggingface.co/docs/transformers/v4.25.1/en/autoclass_tutorial#autotokenizer)

In [None]:
## Before we build our transformer, lets make sure to setup the device.
## To run this notbeook via GPU: Edit -> Notebook settings -> Hardware accelerator -> GPU
## If your GPU is working, device is "cuda"
device = "cuda" if torch.cuda.is_available() else "cpu"
device

'cuda'

In [None]:
name = "cardiffnlp/twitter-xlm-roberta-base-sentiment" 

##TODO: Build pre-trained model and tokenizer using  AutoModelForSequenceClassification, AutoTokenizer
## Make sure to load the model onto the device for gpu

# model = ...
# tokenizer = ...

Downloading:   0%|          | 0.00/841 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.11G [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/5.07M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/150 [00:00<?, ?B/s]

### Build Explainerr
Using Ferret XAI, we will benchmark our pre-trained model from huggingface. See [Benchmark](https://ferret.readthedocs.io/en/latest/readme.html#visualization) documentation 

In [None]:
##TODO: Build explainer using `Benchmark` function 
# explainer = ...

#### Use `score` method to predict the overall sentiment for a sample text

In [None]:
from transformers import TextClassificationPipeline
sample_text = "The movie had great narration and visuals despite a boring storyline."

##TODO: Use `score` method to obtain the class scores and print it
### print(...)

Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.


{'negative': 0.09874014556407928,
 'neutral': 0.13105592131614685,
 'positive': 0.7702038884162903}

### Generate Explanations using the Explainer
Notice that the sentiment for the `sample_text` is overall positive. We also notice small scores for 'neutral'  and 'negative' classes. Let us use Ferret XAI explainers to understand the predictions of the model. Ferret has various built-in post-hoc explainers which are variants of the ones we studied and used an in [Week 1](https://corise.com/course/interpreting-machine-learning-models/v2/module/interpreting-an-image-classifier) for image classification models. Here, we will use the same sample text for movie review and generate explanations for different sentiment classes (postive, negative and neutral). 

#### Generate explanations for positive class. 



In [None]:
## TODO: Generate explanation for postive class and show the explanations in a table
## Hint use `target` attribute to specify the class as integer. Note the three classes in the score above 

# explain_posclass = ...
# explainer.show_table(...)

Explainer:   0%|          | 0/6 [00:00<?, ?it/s]

Token,▁The,▁movie,▁had,▁great,▁narra,tion,▁and,▁visual,s,▁de,spite,▁a,▁bor,ing,▁story,line,.
Partition SHAP,0.01,0.07,-0.02,0.29,0.08,0.02,0.05,0.06,0.03,0.04,0.12,0.01,-0.07,-0.08,0.03,0.02,-0.02
LIME,-0.0,0.07,0.05,0.27,0.05,0.02,0.06,0.1,0.03,0.05,0.04,0.02,-0.1,-0.04,0.01,0.05,0.05
Gradient,0.03,0.07,0.05,0.07,0.06,0.03,0.03,0.08,0.02,0.03,0.12,0.02,0.13,0.04,0.06,0.05,0.02
Gradient (x Input),-0.06,-0.1,0.0,-0.0,-0.01,-0.03,-0.03,0.07,0.04,0.05,-0.04,0.01,0.19,0.06,-0.03,-0.1,0.03
Integrated Gradient,-0.03,0.01,-0.13,-0.15,-0.1,0.01,0.0,-0.02,0.02,-0.07,0.0,-0.04,-0.03,0.0,-0.02,0.03,-0.01
Integrated Gradient (x Input),-0.03,0.08,0.12,0.19,0.06,0.04,0.06,0.05,0.0,-0.0,-0.09,0.05,-0.03,0.01,0.03,0.04,0.12


#### Generate explanations for negative class. 

In [None]:
## TODO: Generate explanation for negative class and show the explanations in a table

# explain_negclass = ...
# explainer.show_table(...)

Explainer:   0%|          | 0/6 [00:00<?, ?it/s]

Token,▁The,▁movie,▁had,▁great,▁narra,tion,▁and,▁visual,s,▁de,spite,▁a,▁bor,ing,▁story,line,.
Partition SHAP,-0.0,-0.04,0.01,-0.19,-0.06,0.0,-0.04,-0.04,-0.0,-0.06,-0.2,-0.01,0.12,0.15,-0.01,0.02,0.03
LIME,0.03,-0.03,-0.02,-0.24,-0.02,0.01,-0.01,-0.08,-0.07,-0.12,-0.08,-0.02,0.12,0.08,-0.01,-0.01,-0.03
Gradient,0.03,0.06,0.04,0.06,0.06,0.03,0.03,0.07,0.02,0.03,0.11,0.02,0.16,0.05,0.06,0.05,0.02
Gradient (x Input),0.07,0.07,-0.0,0.06,-0.02,0.02,0.04,-0.06,-0.03,-0.02,0.05,-0.01,-0.15,-0.07,0.03,0.09,-0.02
Integrated Gradient,-0.02,-0.04,0.0,0.02,0.03,-0.05,-0.01,-0.0,0.08,0.12,0.01,0.12,-0.2,0.03,0.06,-0.03,0.08
Integrated Gradient (x Input),0.06,-0.0,-0.04,-0.13,-0.07,-0.04,-0.08,-0.09,0.04,0.01,0.15,-0.06,0.11,0.01,-0.02,-0.04,-0.05


## Leave-one-out
Besides different explainers, one of the standard techniques is to use the Erasure method or Leave-one-out. Here we delete words from the text iteratively and measure change in prediction probabilities. Let us create our own whitespace tokenizer.

In [None]:
sample_text = "The movie had great narration and visuals despite a boring storyline"

## TODO: Tokenize the text by splitting it into each word. Then generate the sentence by leaving the one word
## Make sure the generated sentence has no additional white spaces 

# tokenize_text = ...
# loo_texts = [... for word in tokenize_text]
# print(loo_texts)

['movie had great narration and visuals despite a boring storyline',
 'The had great narration and visuals despite a boring storyline',
 'The movie great narration and visuals despite a boring storyline',
 'The movie had narration and visuals despite a boring storyline',
 'The movie had great and visuals despite a boring storyline',
 'The movie had great narration visuals despite a boring storyline',
 'The movie had great narration and despite a boring storyline',
 'The movie had great narration and visuals a boring storyline',
 'The movie had great narration and visuals despite boring storyline',
 'The movie had great narration and visuals despite a storyline',
 'The movie had great narration and visuals despite a boring storyline']

In [None]:
## TODO: Generate scores for each of the leave one out sentences and tabulate the scores in a Dataframe corresponding to the word omitted

# scores = [... for text in loo_texts]
# pd.DataFrame(..., index=...)

Unnamed: 0,negative,neutral,positive
The,0.22385,0.206326,0.569824
movie,0.263907,0.219848,0.516245
had,0.120151,0.142724,0.737124
great,0.287786,0.29973,0.412484
narration,0.267983,0.214153,0.517864
and,0.188767,0.190426,0.620807
visuals,0.184359,0.182071,0.633569
despite,0.859677,0.100246,0.040077
a,0.112969,0.161386,0.725645
boring,0.355356,0.232932,0.411712


## Open ended explanations using Language Models(LM)

Besides the conventional methods of analyzing the given prompt, we will try open ended language models to analyze why the reviews have particular sentiment. Let us try previous `sample_text` we used and modify it slightly to make it incomplete. Then, we will use BLOOM to fill out the incomplete sentence.The architecture of BLOOM is essentially similar to GPT3 with over 176B parameters. However, we will use a variant of BLOOM with 560M parameters, which will generate text faster. For documentation of generator, refer [here](https://huggingface.co/docs/transformers/main_classes/text_generation)

In [None]:
# Import BLOOM tokenizer and generator from transformers
from transformers import BloomTokenizerFast, BloomForCausalLM
tokenizer = BloomTokenizerFast.from_pretrained("bigscience/bloom-560m")
model = BloomForCausalLM.from_pretrained("bigscience/bloom-560m")

# Let's add text at the end of sample text to see why the sentence has positive sentiment. Also define a output length
prompt_text = sample_text + " has a positive sentiment. This is because"
print(prompt_text)
output_length = 100 # Feel free to change this

Downloading:   0%|          | 0.00/222 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/14.5M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/85.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/688 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.12G [00:00<?, ?B/s]

The movie had great narration and visuals despite a boring storyline has a positive sentiment. This is because


In [None]:
## TODO: Tokenize the sentence as tensors and then use the model to generate a complete sentence with a max length
# inputs = tokenizer(..., return_tensors="pt")
# gen1 =model.generate(..., max_length=...)[0]
# print(tokenizer.decode(...))

The movie had great narration and visuals despite a boring storyline has a positive sentiment. This is because the movie is a comedy and the characters are not bad. The movie is a good movie for the youngsters. The movie is a good movie for the adults. The movie is a good movie for the seniors. The movie is a good movie for the parents. The movie is a good movie for the teachers. The movie is a good movie for the students. The movie is a good movie


Notice that the sentences are somewhat repetive. Let us avoid this by adding a penalty for repetition. To understand repetition penalty (penalized sampling), refer to section 4 in this [paper](https://arxiv.org/pdf/1909.05858.pdf)

In [None]:
## TODO: Add a penalty for repetition
# gen2 =model.generate(..., max_length=..., repetition_penalty = ...)[0]
# print(tokenizer.decode(...))

The movie had great narration and visuals despite a boring storyline has a positive sentiment. This is because the film was written by an actor who plays himself in his own life, which makes it more realistic.
In this case we have to say that there are some scenes where you can see how he feels about being alone with her (and not having any other girl). He also talks openly on what happened during their relationship but does so without making him feel guilty or ashamed of anything else as well. (


In [None]:
## TODO: We will also use sampling to predict the next word in the sequence to make the sentence more common
# gen3 =model.generate(..., max_length=..., repetition_penalty = ..., do_sample= ...)[0]
# print(tokenizer.decode(...))

The movie had great narration and visuals despite a boring storyline has a positive sentiment. This is because when you watch this film, there´s nothing wrong with it.
This makes the main character look strong but at home in reality he does not have that strength for real life.. He even says something like “Heshe said goodbye to us”! It might be difficult on an uninformed person which means they can’t understand what his feelings are behind!
Another interesting aspect of HESHE was


In text-generation, a good model tries to sample from a huge pool of words. While always selecting the word with highest likelihood will result in repetitions, selecting at random may lead to vague or uncommon sentences. A common practice is to use top-k or top-p approaches (see [article]([here](https://docs.cohere.ai/docs/controlling-generation-with-top-k-top-p))) to limit the sample of words and eliminate long tails. Now, explore the model generator by tuning the hyper-parameters. Refer the [blog post](https://huggingface.co/blog/how-to-generate) on 'how to generate'

In [None]:
## TODO: Tweak the hyper parameters for our use case. 
## Use top_p and top_k to have better sampling of words
# gen4 =model.generate(..., max_length=..., ...)[0]
# print(tokenizer.decode(gen4))

## Outro

Well done Data Scientist! Now that we've seen different methods for analzying  the sentiment predicts, its tiem to answer some questions!

1. What are your thoughts on Interpretable AI for Text Classification?
2. Compare the various explanations. Which method do you agree with most, why?
3. Do you think the Language Models(Open Ended explanations) capture the sentiment well and explain them? Did fine tuning the parameters help and what worked the best for you?

## Bonus
Kudos👏! It is amazing you made it here. In the bonus section, let us apply our model to some real world data. 



*   You can use either use hugging face `imdb` dataset or a review from anywhere for any movie. Get the sentiment for the review and see which words are most important for the sentiment by the methods we used earlier
*   The `cardiffnlp/twitter-xlm-roberta-base-sentiment` is not trained on imdb dataset. Lets us see if using a model trained on imdb dataset can give us better results. You can try `distilbert-imdb` from [here](https://huggingface.co/lvwerra/distilbert-imdb) or other models trained/fined-tuned on imdb from [here](https://huggingface.co/datasets/imdb)



---


Answer the following questions once you complete the analysis:

1. Do you think XAI is useful to understand the sentiment behind real-world movie reviews? 
2. Based on your observations, would your recommendation change from what it was previously?
3. Does training/fine-tuning help a model to understand and interpret sentiment better?

In [None]:
## Here is starter code to download the imdb dataset. You could also try any review for any movie.
# from datasets import load_dataset
# dataset = load_dataset("imdb")
# dataset['test'][10]