# Leave-One-Out

This notebook will explain in detail how to reproduce the experiments conducted for the XAI method Leave-One-Out applied to Key-Point Analysis.

### Utilities

First, we need to import libraries used in this notebook, define helper functions and variables.

In [4]:
from sentence_transformers import SentenceTransformer, util
from nltk import word_tokenize
from collections import defaultdict
from itertools import chain, combinations
import pandas as pd
import copy
import torch
import pickle
import os
from tqdm.notebook import tqdm_notebook
import sbert_training

In [6]:
def save_with_pickle(path, data):
    with open(path, "wb") as handle:
        pickle.dump(data, handle, protocol=pickle.HIGHEST_PROTOCOL)

def load_from_pickle(path):
    with open(path, "rb") as handle:
        data = pickle.load(handle)
    return data

def load_closed_class_words(path):
    data = []
    with open(path, "r") as f:
        for line in f:
            data.extend(line.split())

In [9]:
def compute_score(arg, kp, model):
    """This function gives an arg-kp-pair a score by the
    means of cosine similarity. Expects a model to be passed"""
    arg = model.encode(arg, show_progress_bar=False),
    kp = model.encode(kp, show_progress_bar=False)
    return float(util.pytorch_cos_sim(arg, kp))

In [10]:
word_tokenizer = word_tokenize
device = "cuda:0" if torch.cuda.is_available() else "cpu"
closed_class_words = load_closed_class_words("./Data/LOO_Data/closed_class_words.txt")

### Model

Next, we need to train the model. Training can be conducted by running `sbert_training.py` or downloaded [here](https://drive.google.com/drive/folders/1qgGdoNMUcyQivTtu5udzGcQB8SFgxm-M?usp=sharing).

In [11]:
model_path = "./Data/LOO_Data/Model"
model = sbert_training.train_model(dataset_path='./Data/LOO_Data/SiameseData', eval_data_path='./Data/LOO_Data/KPMData',
                                   subset_name='dev', output_path=model_path, model_name='roberta-base',
                                   model_suffix='contrastive-10-epochs', data_file_suffix='contrastive',
                                   num_epochs=10, max_seq_length=70, add_special_token=True, train_batch_size=32,
                                   loss='ContrastiveLoss') if not os.listdir(model_path) else SentenceTransformer(model_path)

2022-02-28 13:40:35 - Load pretrained SentenceTransformer: ./Data/LOO_Data/Model
2022-02-28 13:40:36 - Use pytorch device: cuda


### Data

Once we have model at our disposal we need to prepare the data. To do this, we iterate over all possible argument-key-point pairs given in the dataset. Pairs which belong together are concatened into a data row. We pre-compute both the model `predictions` and the gold-standard, though later on we will only work with the model predictions for obvious reasons. The computations for this are very expensive and take almost a full day hence the pre-computation. In order to not having to compute over and over again we include the computed predictions and the gold standard.

The `gold_labels_and_predictions.pkl` is a nested dictionary. It can be accesed as:
`data = dict[dev|train][predictions|gold_standard]`
Run the cell below the loading cell to see an example.

In [21]:
def create_gold_labels_and_prediction_scores(model, path):
    data = defaultdict(dict)
    for subset in ["dev", "train"]:
        # Load files
        arguments_file = f"./Data/LOO_Data/kpm_data/arguments_{subset}.csv"
        key_points_file = f"./Data/LOO_Data/kpm_data/key_points_{subset}.csv"
        labels_file = f"./Data/LOO_Data/kpm_data/labels_{subset}.csv"
        arguments_df = pd.read_csv(arguments_file)
        key_points_df = pd.read_csv(key_points_file)
        labels_df = pd.read_csv(labels_file)
        # Get gold standard
        positive_labels_df = labels_df.loc[labels_df["label"] == 1]
        gold_standard = pd.merge(positive_labels_df, key_points_df, how="inner", on="key_point_id")
        gold_standard = pd.merge(gold_standard, arguments_df, how="inner", on=["arg_id","topic", "stance"])
        gold_standard = gold_standard.rename(columns={"label": "score"})
        data[subset]["gold_standard"] = gold_standard
        # Within a topic map every key-point to every argument
        arg_to_kps = {topic: pd.merge(arguments_df.loc[arguments_df["topic"] == topic][["argument"]].drop_duplicates(),
                              key_points_df.loc[key_points_df["topic"] == topic][["key_point"]].drop_duplicates(),
                              how="cross") for topic in arguments_df["topic"].unique()}
        # Create predictions
        mappings = []
        for topic, arg_kps_mapping in arg_to_kps.items():
            arg_kps_mapping['score'] = arg_kps_mapping.apply(lambda row: compute_score(row["argument"], row["key_point"], model), axis=1)
            arg_kps_mapping['topic'] = topic
            arg_kps_mapping = arg_kps_mapping[["topic", "argument", "key_point", "score"]]
            mappings.append(arg_kps_mapping)
        predictions = pd.concat(mappings, axis=0)
        data[subset]["predictions"] = predictions
    save_with_pickle(path, data)
    return data

NVIDIA GeForce RTX 3090 with CUDA capability sm_86 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70.
If you want to use the NVIDIA GeForce RTX 3090 GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/



RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

Traceback (most recent call last):
  File "/home/marcelbraasch/.local/share/JetBrains/Toolbox/apps/PyCharm-P/ch-0/213.5744.248/plugins/python/helpers/pydev/_pydevd_bundle/pydevd_frame.py", line 861, in trace_dispatch
    result = plugin_manager.cmd_step_over(main_debugger, frame, event, self._args, stop_info, stop)
  File "/home/marcelbraasch/.local/share/JetBrains/Toolbox/apps/PyCharm-P/ch-0/213.5744.248/plugins/python/helpers-pro/jupyter_debug/pydev_jupyter_plugin.py", line 144, in cmd_step_over
    if _is_inside_jupyter_cell(frame, pydb):
  File "/home/marcelbraasch/.local/share/JetBrains/Toolbox/apps/PyCharm-P/ch-0/213.5744.248/plugins/python/helpers-pro/jupyter_debug/pydev_jupyter_plugin.py", line 209, in _is_inside_jupyter_cell
    if is_cell_filename(filename):
  File "/home/marcelbraasch/.local/share/JetBrains/Toolbox/apps/PyCharm-P/ch-0/213.5744.248/plugins/python/helpers-pro/jupyter_debug/pydev_jupyter_plugin.py", line 220, in is_cell_filename
    ipython_shell = get_ipython(

In [None]:
# Now we load or compute the data
data_path = "./Data/LOO_Data/gold_labels_and_prediction_scores.pkl"
try:
    data = load_from_pickle(data_path)
except:
    data = create_gold_labels_and_prediction_scores(model, data_path)

In [18]:
# Preview of what the model's predictions datastructure looks like
predictions = data["dev"]["predictions"]
print(predictions.head(2))

                                         topic  \
0  We should abandon the use of school uniform   
1  We should abandon the use of school uniform   

                                            argument  \
0  having a school uniform can reduce bullying as...   
1  having a school uniform can reduce bullying as...   

                                           key_point     score  
0  Children can still express themselves using ot...  0.556963  
1                    School uniform reduces bullying  0.686188  


### Creating masked arguments

Now that we have our model and the data prepared we can move one to masking, or more specifically, dropping the random n-gram combinations (as described in the technical report). The idea is simple. For each sentence, we tokenize and create the powerset of these. Every subset which is smaller than `n+1` will be created and stored in map. Later we will iterate over all arguments and create the scoring with the dropped combinatations.

In [22]:
def powerset(iterable):
    s = list(iterable)
    return list(chain.from_iterable(combinations(s, r) for r in range(len(s)+1)))

def _create_dropped_combinations(argument, drop_size=4):
    tokens = word_tokenizer(argument)
    samples = []
    lexical_mask = [1 if x not in closed_class_words else 0 for x in tokens]
    lexical_indices = [i for i, x in enumerate(lexical_mask) if x]
    lexical_indices_combinations = powerset(lexical_indices)
    lexical_indices_combinations = [x for x in lexical_indices_combinations
                                    if len(x)<=drop_size][1:]
    for combination in lexical_indices_combinations:
        combination = list(combination)
        combination.sort(reverse=True)
        new_arg = copy.deepcopy(tokens)
        dropped_words = [new_arg.pop(index) for index in combination]
        sample = {"dropped": dropped_words,
                  "new_arg": " ".join(new_arg),
                  "amount_dropped": len(combination),
                  "indices": combination}
        samples.append(sample)
    return samples

def create_dropped_combinations(arguments, data_path):
    mapping = {argument:_create_dropped_combinations(argument)
               for argument in tqdm_notebook(arguments)}
    save_with_pickle(data_path, mapping)
    return mapping

In [23]:
# Get unique topics, arguments, key_points.
# This will be needed in a few function later.
arguments = predictions["argument"].unique()
topics = predictions["topic"].unique()
key_points = predictions["key_point"].unique()

In [24]:
# Now we load or compute the mapping from the argument to all its dropped combinations
data_path = "./Data/LOO_Data/arg_to_dropped_mapping.pkl"
try:
    arg_to_dropped = load_from_pickle(data_path)
except:
    arg_to_dropped = create_dropped_combinations(arguments, data_path)



Now we can move on to the actual computation of Leave-N-Out. We pre-computed for each argument the matching key-point (according to the model) and built a mapping from each argument to all its random dropped combinations. It is important to note that we not only compute the importance of arguments and its argmax key-point, but scores of the argument with _all_ key-points. This allows to compare across key-points to answer the question which words are present in the argmax key-point that are not in the others.

In [25]:
class Importance:

    def __init__(self):
        self.counter = 0
        self.scores = []

    def update(self, score):
        self.counter += 1
        self.scores.append(score)

    def get(self):
        return sum(self.scores) / self.counter

def _compute_word_importance(argument, key_point, arg_to_dropped, model):
    reference = compute_score(argument, key_point, model)
    dropped = arg_to_dropped[argument][:750]
    word_to_importance = defaultdict(Importance)
    for example in tqdm_notebook(dropped):
        dropped_words, new_argument, amount_dropped, indices = example.values()
        new_score = compute_score(new_argument, key_point, model)
        difference = reference - new_score
        for word in dropped_words:
            word_to_importance[word].update(difference)
    return {word: importance.get() for word, importance in word_to_importance.items()}

def compute_word_importances_of_all_arg_kps(predictions, arg_to_dropped):
        args_kps = predictions
        arg_to_dropped = create_dropped_combinations(args_kps["argument"].unique())
        args_kps["important_words"] = args_kps.apply(lambda row: _compute_word_importance(row["argument"], row["key_point"], arg_to_dropped, model), axis=1)
        save_with_pickle("word_importance.pkl", args_kps)
        return args_kps

In [26]:
# Now we load or compute the mapping from the argument to all its dropped combinations
data_path = "./Data/LOO_Data/word_importance.pkl"
try:
    word_importances = load_from_pickle(data_path)
except:
    word_importances = compute_word_importances_of_all_arg_kps(model, data_path)

### Leave-N-Out: Visualization

Word importances for each argument-key-point pair are computed, but not structured very nicely and just in a raw dictionary. In the cell below you can see a preview. In this section we will do a little bit of data wrangling such that we can nicely view the data in a dataframe.

In [27]:
def create_leave_one_out_for_args():
    loo = []
    for topic in topics:
        # Get all unique key points for a specific topic
        key_points = predictions.loc[predictions['topic'] == topic]["key_point"].unique()
        for argument in tqdm_notebook(arguments):
            # Get the top n key points corresponding to current argument
            top_n = word_importances.loc[word_importances["argument"]==argument] \
                                    .sort_values(by=["score"], ascending=False).head(5)
            df = pd.DataFrame()
            for i, row in enumerate(top_n.iterrows()):
                topic, argument, key_point, score, importances = row[1]
                # Extract word importance scores
                importances = pd.DataFrame.from_dict({x:[y] for x,y in importances.items()}) \
                                .transpose().reset_index() \
                                .rename(columns={"index":f"words_{i}", 0:f"importance_{i}"})
                importances.insert(0, "score", score)
                importances.insert(0, "key_point", key_point)
                df = pd.concat((df, importances), axis=1)
            df.insert(0, 'argument', argument)
            loo.append(df)
    save_with_pickle("./Data/LOO_Data/leave_one_out.pkl", loo)

In [28]:
# Now we load or compute the mapping from the argument to all its dropped combinations
data_path = "./Data/LOO_Data/leave_one_out.pkl"
try:
    leave_one_out = load_from_pickle(data_path)
except:
    leave_one_out = create_leave_one_out_for_args(topics, arguments)

Now let's inspect the final result.

In [31]:
leave_one_out[0]

Unnamed: 0,argument,key_point,score,words_0,importance_0,key_point.1,score.1,words_1,importance_1,key_point.2,...,words_2,importance_2,key_point.3,score.2,words_3,importance_3,key_point.4,score.3,words_4,importance_4
0,having a school uniform can reduce bullying as...,School uniforms increase conformity or harm in...,0.74542,school,0.105151,School uniform is harming the student's self e...,0.719613,school,0.106442,School uniforms are often uncomfortable/sexist,...,school,0.101659,School uniform reduces bullying,0.686188,school,0.111067,School uniforms create a sense of equality/unity,0.673554,school,0.10259
1,having a school uniform can reduce bullying as...,School uniforms increase conformity or harm in...,0.74542,uniform,0.120174,School uniform is harming the student's self e...,0.719613,uniform,0.113617,School uniforms are often uncomfortable/sexist,...,uniform,0.114814,School uniform reduces bullying,0.686188,uniform,0.127685,School uniforms create a sense of equality/unity,0.673554,uniform,0.113907
2,having a school uniform can reduce bullying as...,School uniforms increase conformity or harm in...,0.74542,reduce,0.148124,School uniform is harming the student's self e...,0.719613,reduce,0.14525,School uniforms are often uncomfortable/sexist,...,reduce,0.148596,School uniform reduces bullying,0.686188,reduce,0.204635,School uniforms create a sense of equality/unity,0.673554,reduce,0.161033
3,having a school uniform can reduce bullying as...,School uniforms increase conformity or harm in...,0.74542,bullying,0.152059,School uniform is harming the student's self e...,0.719613,bullying,0.151784,School uniforms are often uncomfortable/sexist,...,bullying,0.162226,School uniform reduces bullying,0.686188,bullying,0.206463,School uniforms create a sense of equality/unity,0.673554,bullying,0.158163
4,having a school uniform can reduce bullying as...,School uniforms increase conformity or harm in...,0.74542,students,0.12215,School uniform is harming the student's self e...,0.719613,students,0.120078,School uniforms are often uncomfortable/sexist,...,students,0.107013,School uniform reduces bullying,0.686188,students,0.124342,School uniforms create a sense of equality/unity,0.673554,students,0.12
5,having a school uniform can reduce bullying as...,School uniforms increase conformity or harm in...,0.74542,style,0.200251,School uniform is harming the student's self e...,0.719613,style,0.201913,School uniforms are often uncomfortable/sexist,...,style,0.182905,School uniform reduces bullying,0.686188,style,0.142583,School uniforms create a sense of equality/unity,0.673554,style,0.142879
6,having a school uniform can reduce bullying as...,School uniforms increase conformity or harm in...,0.74542,afford,0.023918,School uniform is harming the student's self e...,0.719613,afford,0.032066,School uniforms are often uncomfortable/sexist,...,afford,0.051921,School uniform reduces bullying,0.686188,afford,0.038486,School uniforms create a sense of equality/unity,0.673554,afford,0.028343
7,having a school uniform can reduce bullying as...,School uniforms increase conformity or harm in...,0.74542,latest,0.115572,School uniform is harming the student's self e...,0.719613,latest,0.113921,School uniforms are often uncomfortable/sexist,...,latest,0.107768,School uniform reduces bullying,0.686188,latest,0.122354,School uniforms create a sense of equality/unity,0.673554,latest,0.11051
8,having a school uniform can reduce bullying as...,School uniforms increase conformity or harm in...,0.74542,trends,0.085532,School uniform is harming the student's self e...,0.719613,trends,0.078351,School uniforms are often uncomfortable/sexist,...,trends,0.082918,School uniform reduces bullying,0.686188,trends,0.097824,School uniforms create a sense of equality/unity,0.673554,trends,0.080899
9,having a school uniform can reduce bullying as...,School uniforms increase conformity or harm in...,0.74542,stand,0.159164,School uniform is harming the student's self e...,0.719613,stand,0.14639,School uniforms are often uncomfortable/sexist,...,stand,0.129567,School uniform reduces bullying,0.686188,stand,0.153177,School uniforms create a sense of equality/unity,0.673554,stand,0.143127


### Auxilliary experiment

The idea for this experiment was to reverse the previous approach. Instead of compututing for each argument the respective argmax key-point we ask "For each key-point, arguments were mapped to it?". This allows to investigate which words were most prevalent for a specific key-point to be matched to an argument.

In [32]:
def create_kp_to_its_args():
    argmax_kps = word_importances[word_importances.groupby(['topic',"argument"])['score'].transform(max) == word_importances['score']]
    saved_word_rankings = {}
    for key_point in key_points:
        current_kp = argmax_kps.loc[argmax_kps["key_point"]==key_point]
        counter = defaultdict(int)
        for mapping in current_kp["important_words"]:
            top_5 = {k:v for i, (k,v) in
                     enumerate(sorted(mapping.items(), key=lambda x: x[1], reverse=True))
                     if i <= 5}
            for word in top_5.keys():
                counter[word] += 1
        counter = {k:v for i,(k,v) in enumerate(sorted(counter.items(), key=lambda x: x[1], reverse=True)) if i <= 10}
        saved_word_rankings[key_point] = counter
    return saved_word_rankings

In [33]:
kptia = create_kp_to_its_args()
for kp, important_words in kptia.items():
    s = f"{kp}\n"
    for word, occurence in important_words.items():
        s += f"{word}\t{occurence}\n"
    s += "\n"
    with open("kps_importances.txt", "a") as f:
        f.write(s)

In [34]:
# Helper functions for visualization / printing stuff

# def get_stuff(l):
#     new_df = pd.DataFrame()
#     argument = l.iloc[:,0][0]
#     scores = pd.concat([(l.iloc[:,(4*i)]) for i in range(1,6)], axis=1)
#     words = l.iloc[:,3]
#     df = pd.concat((words,scores),axis=1)
#     kps = [x[1] for x in [l.iloc[:,(4*i-3)] for i in range(1,6)]]
#     df = df.rename(columns={f"importance_{i}":kps[i] for i in range(len(kps))})
#     df = df.rename(columns={"words_0":"words"})
#     topic = list(predictions.loc[predictions["key_point"]==kps[0]]["topic"])[0]
#     reference_scores = [x[1] for x in [l.iloc[:,(4*i-2)] for i in range(1,6)]]
#     return topic, argument, df, reference_scores

# for i in tqdm_notebook(range(len(loo))):
#     topic, argument, df, reference_scores = get_stuff(loo[i])
#     name_excel = f"./Results/LOO/sheet_{i}.xlsx"
#     name_metad = f"./Results/LOO/sheet_{i}.txt"
#     if i in [0, 1283, 1690, 3306]:
#         print(reference_scores)
    # df.to_excel(name_excel)
    # with open(name_metad, "w") as f:
    #     f.write(f"Topic: {topic}\nArgument: {argument}")