# Social Computing/Social Gaming - Summer 2023
# Exercise Sheet 6 - Transformers and Explainable AI

After we have worked out the base and social model on the previous sheet, we will now take a look into a model with a different approach. This time we will utilize Transformer-based classifiers. Transformers revolutionized NLP field significantly after the paper ["Attention is all your need"](https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf) was out. Then, a big [HuggingFace](https://huggingface.co/) platform [2] was created to store and host a lot of opensource NLP pre-trained models. We will work today with some of them.

Furthermore we will make use of SHAP and its underlying Shapley values to understand the basics of neural network explainability. This [site](https://christophm.github.io/interpretable-ml-book/shapley.html) [4] will help you to get an understanding of this concept.

In [None]:
import pandas as pd
import numpy as np

## Task 6.0: The Data

Once again we will use the dataset of Waseem and Hovy [1] as you are already familiar with it and it offers us the possibility to compare it to our previous work.

In [None]:
# Reads the data set from a .csv file
waseem_hovy = pd.read_csv(data_path+'tweets.csv')
waseem_hovy = waseem_hovy.astype(str)

# This drop operation is necessary because of an inconsistency in the dataset
waseem_hovy = waseem_hovy.drop([3343, 3344])
waseem_hovy = waseem_hovy[['text', 'label']]

# We need to do a unique and precise reordering to match with graph information later on
unique_tweets, indices = np.unique(waseem_hovy['text'].to_numpy(), return_index=True)
ordered_labels = waseem_hovy['label'].to_numpy()[indices]
waseem_hovy = pd.DataFrame(np.stack((unique_tweets, ordered_labels), axis=1), columns=['text', 'label'])

## Task 6.1: Preprocessing

### Encode the labels
Since we are using the same dataset, we need to covert their textual representation into numerical. For this task (almost the same as for a previous sheet), we can use [LabelEncoder](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.LabelEncoder.html).

In [None]:
from sklearn.preprocessing import LabelEncoder

In [None]:
# Extract labels from dataset

original_labels = np.array(waseem_hovy["label"].tolist())

# TODO:


# Shows the actual shape of the labels
print(original_labels.shape)
print(np.unique(original_labels))
print(np.unique(data_labels))

In the case of transforfer usage, we do not need to preprocess and tokenize sentences by ourselves -- that will be done by model's tokenizer! So, let us dive into transfermers right now!

## BERT for Toxic speech classification

We will fine-tune to our downstream task BERT model [3], more specifically -- distilled version of it [DistilBERT](https://huggingface.co/distilbert-base-uncased).

But, firstly, we need to install [transformers](https://huggingface.co/learn/nlp-course/chapter2/1?fw=pt) library:

In [None]:
# example of transformers library installation

#!pip install transformers

In [None]:
from transformers import BertTokenizer, BertForSequenceClassification

In [None]:
# TODO: initialize tokenizer and model for DistilBERT or
# for other model of your preferences -- you are very welcome to try out something different!

tokenizer = ##
model = ##

### Training Batches Preparation

The same as in the previous tutorial, we will create our custom datasets and loaders to generate batches for training. However, we need to adapt it to transformers input:

1.   Each dataset item should return ``input_ids``, ``attention_mask``, and ``label``.
2.   All should be ``tensors``.
3.   In the end, you need to apply ``collate_fn`` -- that will pad all tensors in batches to the max_length (already implemented for you).



In [None]:
import torch
from torch.utils.data import Dataset
from torch.utils.data import DataLoader

In [None]:
# this is a special function that pad sequences in the batch for data loader

def collate_fn(batch):
    input_ids = torch.nn.utils.rnn.pad_sequence([item['input_ids'] for item in batch], batch_first=True)
    attention_mask = torch.nn.utils.rnn.pad_sequence([item['attention_mask'] for item in batch], batch_first=True)
    labels = torch.stack([item['label'] for item in batch])

    return {
        'input_ids': input_ids,
        'attention_mask': attention_mask,
        'label': labels
    }

In [None]:
# TODO: create your CustomDataset

class CustomDataset(Dataset):
    def __init__(self, texts, labels, tokenizer):
        # TODO

    def __len__(self):
        # TODO

    def __getitem__(self, index):
        # TODO

# ––––––––––––––– End of Solution –––––––––––––––––––

In [None]:
from sklearn.model_selection import train_test_split

# TODO: Split tweets and labels in Train/Test/Validation 60/20/20

###

print("Training data shape: {}, Labels shape: {}".format(X_train.shape, y_train.shape))
print("Test data shape: {}, Labels shape: {}".format(X_test.shape, y_test.shape))
print("Validation data shape: {}, Labels shape: {}".format(X_val.shape, y_val.shape))

In [None]:
BATCH_SIZE = 32

# Create the Datasets
train_dataset = CustomDataset(X_train, y_train, tokenizer)
val_dataset = CustomDataset(X_val, y_val, tokenizer)
test_dataset = CustomDataset(X_test, y_test, tokenizer)

# DataLoader for batching and parallel data loading
train_loader = DataLoader(train_dataset, batch_size=BATCH_SIZE, shuffle=True, collate_fn=collate_fn)
val_loader = DataLoader(val_dataset, batch_size=BATCH_SIZE, shuffle=False, collate_fn=collate_fn)
test_loader = DataLoader(test_dataset, batch_size=BATCH_SIZE, shuffle=False, collate_fn=collate_fn)

### Training Loop

We are ready to train the model! You can reuse the code from the previous tutorial:
1. Define ``optimizer`` and ``criterion`` for a classiication task.
2. Use ``train_lodaer`` to sample batches for training.
3. Track validation loss using data from ``val_loader``.
4. Achtung: now the items in batches have different structure!

In [None]:
import torch.nn as nn

In [None]:
# TODO: define optimizer and criterion

optimizer = ##
criterion = ##

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model.to(device)
model.train();

In [None]:
%%time

# TODO: implement training loop

# you can play with different number of epochs and check the model's performance!
num_epochs = 3

###

In [None]:
# TODO: Evaluate the model based on test_loader

###

## SHAP Explanations

Now, we have a decent model for toxic speech detection. However, sometimes it can be not so clear why some sample is considered toxic or not. We can try to explaine the model's decision! For this, we utilize [SHAP](https://shap.readthedocs.io/en/latest/index.html).

In [None]:
# example of module installation

# !pip install shap

In [None]:
import sys
sys.path.append('/content/drive/MyDrive/tum/tum_exercises/Ex6_XAI_original/src')
sys.path.append('/content/drive/MyDrive/tum/tum_exercises/Ex6_XAI_original/src/evaluation.py')

In [None]:
import matplotlib.pyplot as plt
import shap
from explainability import shap_explain_text

# Initializes JavaScript to visualize plots generated by Shap
shap.initjs()

### Let's inspect our model!

#### Easy initialization and exploration of a cherry-picked sample.

In [None]:
import transformers

In [None]:
# We need to create a text-classification pipeline to input it to the explainer

pred = transformers.pipeline("text-classification", model=model, tokenizer=tokenizer, device=0, return_all_scores=True)

In [None]:
# This is the example how to define Explainer from SHAP

explainer = shap.Explainer(pred)
text = ['only men can have higher education']
shap_values = explainer(text)

In [None]:
shap.plots.text(shap_values)

In addition to slicing, Explanation objects also support a set of reducing methods. Here we use the ``.mean(0)`` to take the average impact of all words towards the “sexism” (``idx=2``) class. Note that here we are also averaging over three examples, to get a better summary you would want to use a larger portion of the dataset.

In [None]:
shap.plots.text(shap_values[:, :, 2])

### We can also work with already fine-tuned for sexism classification models.

There are already different models fine-tuned for toxic or sexism speech detection. You can load, for instance, our model [bertweet-sexism](https://huggingface.co/tum-nlp/bertweet-sexism) and try to explaine it as well!

In [None]:
from transformers import AutoTokenizer, AutoModelForSequenceClassification

In [None]:
# TODO: define tokenizer and model for bertweet-sexism instance.

###

Here a small example on cherry-picked sentence to explaine the model:

In [None]:
# TODO: inspect the model on the sample!

**TODO: how can we understand if some model is better or not?**
You can pick the misclassified samples for each model and compare the explanations. Can you explaine why the model did mistakes? Are these mistakes the same or not? Try out other ways how to use SHAP [here](https://shap.readthedocs.io/en/latest/example_notebooks/text_examples/sentiment_analysis/Emotion%20classification%20multiclass%20example.html). Maybe, other ways of explanations can be useful more?

**TODO: Write your observations here**

**Concluding questions:**
* Do the hate scores perform as expected in our model?
* If not, can you come up with a possible explanation for that even though the models with social scores performed better?
* What does that tell you about applicability of neural networks and their trustworthiness?

**TODO: Write your answers here**

### <center> Thank you for participating in Social Computing/Social Gaming 2023. </center>
### <center> Good luck with the exams! </center>

## References

[1] Waseem, Z., & Hovy, D. (2016). Hateful symbols or hateful people? Predictive features for hate speech detection on Twitter. In Proceedings of the naacl student research workshop (pp. 88-93).
<br> [2] [HuggingFace Tutorial](https://huggingface.co/learn/nlp-course/chapter1/1)
<br> [3] [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805)
<br> [4] https://christophm.github.io/interpretable-ml-book/shapley.html