# Ethics for Robots 101

... lets make robots ethical!


Idea of this task is to use a foundation model and enrich it with an ethics dataset without adjusting the existing weights of the pre-trained model.

This concept is referred to as "Parameter Efficient Fine Tuning (PEFT)" (we will be using LoRA in partiular). The goal of this task is to get familiar with the LoRA method (but we will try to make it a bit more intersting).



## LoRA explained
Low-Rank Adaptation (LoRA) is a transfer learning method that uses a pre-trained model and extends it by an additional layer. The output of the additional layer is combined with an existing layer output.
This enables to utilize the pre-trained weights of an existing model and extend the model with additional trainind data for a (domain specific) task.

Relevant publication: LoRA: Low-Rank Adaptation of Large Language Models, Hu et. al., 2021, https://arxiv.org/abs/2106.09685


## Dataset explained

We will use the ETHICS dataset representing set of binary labeled ethical / unethical scenarios. The purpose of this dataset is to teach AI models "basic concepts of morality" (refer publication below).The scenarios are categorized into several sets as "commonsense", "justice", etc.

An example of an unethical scenario from the "commonsense" category: "I talked loudly during the movie I was watching at the crowded movie theater."

=> Note that we would interpret an ethical scenario as positive and an unethincal as negative for the sentiment classificaion task below.

The dataset can be downloaded from https://huggingface.co/datasets/hendrycks/ethics.

Relevant publication: Aligning AI With Shared Human Values, Hendrycks et. al., 2021, https://arxiv.org/pdf/2008.02275.pdf


## Model Explained

The model used is an adpation (reduction) based on the BERT base model trained on Wikipeda and the "Bookcorpus" dataset (https://huggingface.co/datasets/bookcorpus). 
The reduced size as well as the additional training on a book dataset promise interesting results for this task.

Model page on huggingface: https://huggingface.co/distilbert/distilbert-base-uncased

Relevant publication: Sanh et. al., 2002, https://arxiv.org/abs/1910.01108


## Task Overview

We will train and evaluate two different (sentiment) classifiers. All of wich will be based on the "distilbert-based-uncased" model.

1. "Reference Classifier": foundation model trained on the ETHICS dataset
2. "LoRA Classifier": foundation model fine tuned on the ETHICS dataset using LoRA


## Define Model

In [1]:
# the model path
PRE_TRAINED_MODEL = "distilbert-base-uncased"

## Download Dataset

In [2]:
from datasets import load_dataset

# Load ethics_dataset
ethics_dataset = load_dataset("hendrycks/ethics", trust_remote_code=True)
print(ethics_dataset)

DatasetDict({
    train: Dataset({
        features: ['label', 'input'],
        num_rows: 13910
    })
    validation: Dataset({
        features: ['label', 'input'],
        num_rows: 3885
    })
    test: Dataset({
        features: ['label', 'input'],
        num_rows: 3964
    })
})


## Pre-Process Dataset

The data might be is unfortunately not 100% compatible with the trained model.
While most sentiment outputs use 0 for negative and 1 for positive sentiment, the ETHICS dataset uses 0 for "acceptable" and 1 for "not acceptable" scenarios. In order to match the model and the data, we need to invert the labels of the dataset.
This way we would represent a positive sentiment with an ethically acceptable behavior and a negative sentiment with an unethical behavior.

In [3]:
# define function for label swap
def swap_labels(item):
    current_label = item["label"]
    if current_label == 0:
       item["label"] = 1
    elif current_label == 1:
       item["label"] = 0
    else:
        print("YOU HAVE MORE LABELS THAN YOU ARE SUPPOSED TO HAVE")
    return item

print(ethics_dataset["test"][0])
# run label swap on datasets
ethics_dataset["test"] = ethics_dataset["test"].map(swap_labels)
ethics_dataset["train"] = ethics_dataset["train"].map(swap_labels)
print(ethics_dataset["test"][0])

{'label': 0, 'input': 'I exposed the truth about their employee spreading fake news.'}
{'label': 1, 'input': 'I exposed the truth about their employee spreading fake news.'}


## Implement some useful helpers 

In [4]:
from transformers import DataCollatorWithPadding, Trainer, TrainingArguments
import numpy as np

def create_trainer(model, directory, train_data, test_data):
    return Trainer(
            model = model,
            args = TrainingArguments(
                output_dir = directory,
                #optim = "adamw_bnb_8bit", # use quantization in optimizer (speeding up training)
                per_device_train_batch_size = 2,
                per_device_eval_batch_size = 2,
                evaluation_strategy = "epoch",
                save_strategy = "epoch",
                num_train_epochs = 2,
                load_best_model_at_end = True,
            ),
            train_dataset = train_data, # tokenized_dataset["train"],
            eval_dataset =  test_data, #tokenized_dataset["test"],
            tokenizer = tokenizer,
            data_collator = DataCollatorWithPadding(tokenizer=tokenizer),
            compute_metrics=compute_metrics)

2024-04-06 12:37:49.495895: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


In [5]:
# define metric computation
def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = np.argmax(predictions, axis=1)
    return {"accuracy": (predictions == labels).mean()}

In [6]:
id2label={0: "Negative", 1: "Positive"} 
label2id={"Negative": 0, "Positive": 1}

In [7]:
def classify_text(input, classifier):
    classifier.to('cuda')
    # tokenize inputs
    inputs = tokenizer(input, truncation=True, padding=True, return_tensors="pt").input_ids.to('cuda')
    # get logits of classifier
    outputs = lora_classifier(inputs).logits
    # apply softmax
    probabilities = torch.nn.functional.softmax(outputs, dim=1)
    # get predicted class
    predicted_class = torch.argmax(probabilities)
    #print result
    
    if predicted_class == 1:
        print("Positive scenario " + str(probabilities[0][1] * 100))
    else:
        print("Negaive scenario " + str(probabilities[0][0] * 100))

In [8]:
def tokenize_dataset(dataset, tokenizer, content_column):
    # tokenize dataset
    tokenized_dataset = {}
    for item in dataset:
        tokenized_dataset[item] = dataset[item].map(
            lambda x: tokenizer(x[content_column], truncation=True), batched=True
        )
    return tokenized_dataset

In [9]:
import warnings
warnings.filterwarnings('ignore')


from transformers.utils import logging
logging.set_verbosity_error() 

## Tokenize Dataset

We have a dictionary that contains the train, test and validation data. The contents are located in the "input" column and require to be tokenized (split into the tokens that were learned by the model)

In [10]:
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained(PRE_TRAINED_MODEL) # use tokens from model

tokenized_ethics_dataset = tokenize_dataset(ethics_dataset, tokenizer, "input")

print(tokenized_ethics_dataset)

{'train': Dataset({
    features: ['label', 'input', 'input_ids', 'attention_mask'],
    num_rows: 13910
}), 'validation': Dataset({
    features: ['label', 'input', 'input_ids', 'attention_mask'],
    num_rows: 3885
}), 'test': Dataset({
    features: ['label', 'input', 'input_ids', 'attention_mask'],
    num_rows: 3964
})}


In [11]:
# select only subset of train / test data (due to limited computational resources available)
num_train = 2000
num_test =  500

train_data_ethics = tokenized_ethics_dataset["train"].shuffle(seed=42).select(range(num_train))
test_data_ethics = tokenized_ethics_dataset["test"].shuffle(seed=42).select(range(num_test))

## Create Reference Classifier

In order to compare the results of fine tuning, we will train a reference classifier by adding a new head onto an pre-trained model and train the particular head only.

In [12]:
from transformers import BertForSequenceClassification


reference_classifier = BertForSequenceClassification.from_pretrained(PRE_TRAINED_MODEL,
                                                                     num_labels = 2,
                                                                     label2id = label2id,
                                                                     id2label = id2label)

# freeze existing model weights (make sure you are not updating the pre-trained model)
for parameter in reference_classifier.base_model.parameters():
    parameter.reuires_grad = False

reference_classifier.to("cuda")
print("done")

done


In [13]:
reference_classifier_trainer = create_trainer(reference_classifier, "data/reference_classifier_",
                                              train_data_ethics, test_data_ethics)

Detected kernel version 5.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.


In [14]:
reference_classifier_trainer.train()

Attempted to log scalar metric loss:
0.742
Attempted to log scalar metric grad_norm:
6.092630386352539
Attempted to log scalar metric learning_rate:
3.7500000000000003e-05
Attempted to log scalar metric epoch:
0.5
{'loss': 0.742, 'grad_norm': 6.092630386352539, 'learning_rate': 3.7500000000000003e-05, 'epoch': 0.5}
Attempted to log scalar metric loss:
0.7207
Attempted to log scalar metric grad_norm:
3.4796862602233887
Attempted to log scalar metric learning_rate:
2.5e-05
Attempted to log scalar metric epoch:
1.0
{'loss': 0.7207, 'grad_norm': 3.4796862602233887, 'learning_rate': 2.5e-05, 'epoch': 1.0}
Attempted to log scalar metric eval_loss:
0.6946001052856445
Attempted to log scalar metric eval_accuracy:
0.504
Attempted to log scalar metric eval_runtime:
8.1391
Attempted to log scalar metric eval_samples_per_second:
61.432
Attempted to log scalar metric eval_steps_per_second:
30.716
Attempted to log scalar metric epoch:
1.0
{'eval_loss': 0.6946001052856445, 'eval_accuracy': 0.504, 'ev

TrainOutput(global_step=2000, training_loss=0.7160103454589843, metrics={'train_runtime': 238.706, 'train_samples_per_second': 16.757, 'train_steps_per_second': 8.379, 'train_loss': 0.7160103454589843, 'epoch': 2.0})

In [15]:
#reference_classifier.save_pretrained("data/reference_classifier")

## Create LoRA Classifier

unlike the reference classifier, the LoRA classifier does not get a new head but will be extended by another layer. The outputs of the new layer are contatenated with an existing layer.

In [16]:
from peft import get_peft_model, LoraConfig, TaskType
import numpy as np

from transformers import BertForSequenceClassification

pre_trained_model = BertForSequenceClassification.from_pretrained(PRE_TRAINED_MODEL,
                                                                  num_labels = 2,
                                                                  label2id = label2id,
                                                                  id2label = id2label)

# use std. settings for LoRA
lora_classifier = get_peft_model(pre_trained_model, LoraConfig(task_type="SEQ_CLS",inference_mode=False))
lora_classifier.to("cuda")
print("done")

done


In [17]:
lora_classifier_trainer = create_trainer(lora_classifier, "data/lora_classifier_", train_data_ethics, test_data_ethics)

Detected kernel version 5.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.


### Evaluate LoRA Classifier prior to training

In [18]:
# note to reviewer: the LoRA layer is not yet intiliazed. It is therefore not unclear to me if the evaluation makes sense.
# This was the reason to create a basic classifier for the comparison 
# (that was trained on a different dataset but had initialized weights).
# However, kindly find the pre-training evaluation results below.
lora_classifier_trainer.evaluate()

Attempted to log scalar metric eval_loss:
0.7027692794799805
Attempted to log scalar metric eval_accuracy:
0.496
Attempted to log scalar metric eval_runtime:
8.3712
Attempted to log scalar metric eval_samples_per_second:
59.728
Attempted to log scalar metric eval_steps_per_second:
29.864
{'eval_loss': 0.7027692794799805, 'eval_accuracy': 0.496, 'eval_runtime': 8.3712, 'eval_samples_per_second': 59.728, 'eval_steps_per_second': 29.864}


{'eval_loss': 0.7027692794799805,
 'eval_accuracy': 0.496,
 'eval_runtime': 8.3712,
 'eval_samples_per_second': 59.728,
 'eval_steps_per_second': 29.864}

### Fine tune LoRA Classifier

In [19]:
lora_classifier_trainer.train()

Attempted to log scalar metric loss:
0.6952
Attempted to log scalar metric grad_norm:
3.5249273777008057
Attempted to log scalar metric learning_rate:
3.7500000000000003e-05
Attempted to log scalar metric epoch:
0.5
{'loss': 0.6952, 'grad_norm': 3.5249273777008057, 'learning_rate': 3.7500000000000003e-05, 'epoch': 0.5}
Attempted to log scalar metric loss:
0.694
Attempted to log scalar metric grad_norm:
3.485154628753662
Attempted to log scalar metric learning_rate:
2.5e-05
Attempted to log scalar metric epoch:
1.0
{'loss': 0.694, 'grad_norm': 3.485154628753662, 'learning_rate': 2.5e-05, 'epoch': 1.0}
Attempted to log scalar metric eval_loss:
0.6974830031394958
Attempted to log scalar metric eval_accuracy:
0.504
Attempted to log scalar metric eval_runtime:
8.3696
Attempted to log scalar metric eval_samples_per_second:
59.74
Attempted to log scalar metric eval_steps_per_second:
29.87
Attempted to log scalar metric epoch:
1.0
{'eval_loss': 0.6974830031394958, 'eval_accuracy': 0.504, 'eval

TrainOutput(global_step=2000, training_loss=0.6960411224365234, metrics={'train_runtime': 171.7208, 'train_samples_per_second': 23.294, 'train_steps_per_second': 11.647, 'train_loss': 0.6960411224365234, 'epoch': 2.0})

In [20]:
lora_classifier.save_pretrained("data/lora_classifier")

## Evaluate Classifiers after training

In [21]:
# load
lora_classifier = BertForSequenceClassification.from_pretrained("data/lora_classifier",
                                                                num_labels = 2,
                                                                label2id = label2id,
                                                                id2label = id2label)
lora_classifier.to("cuda")
print("done")

done


In [22]:
reference_classifier_trainer.evaluate()

Attempted to log scalar metric eval_loss:
0.6944364309310913
Attempted to log scalar metric eval_accuracy:
0.504
Attempted to log scalar metric eval_runtime:
8.0635
Attempted to log scalar metric eval_samples_per_second:
62.008
Attempted to log scalar metric eval_steps_per_second:
31.004
Attempted to log scalar metric epoch:
2.0
{'eval_loss': 0.6944364309310913, 'eval_accuracy': 0.504, 'eval_runtime': 8.0635, 'eval_samples_per_second': 62.008, 'eval_steps_per_second': 31.004, 'epoch': 2.0}


{'eval_loss': 0.6944364309310913,
 'eval_accuracy': 0.504,
 'eval_runtime': 8.0635,
 'eval_samples_per_second': 62.008,
 'eval_steps_per_second': 31.004,
 'epoch': 2.0}

In [23]:
lora_classifier_trainer.evaluate()

Attempted to log scalar metric eval_loss:
0.6949747204780579
Attempted to log scalar metric eval_accuracy:
0.504
Attempted to log scalar metric eval_runtime:
8.3497
Attempted to log scalar metric eval_samples_per_second:
59.882
Attempted to log scalar metric eval_steps_per_second:
29.941
Attempted to log scalar metric epoch:
2.0
{'eval_loss': 0.6949747204780579, 'eval_accuracy': 0.504, 'eval_runtime': 8.3497, 'eval_samples_per_second': 59.882, 'eval_steps_per_second': 29.941, 'epoch': 2.0}


{'eval_loss': 0.6949747204780579,
 'eval_accuracy': 0.504,
 'eval_runtime': 8.3497,
 'eval_samples_per_second': 59.882,
 'eval_steps_per_second': 29.941,
 'epoch': 2.0}

## => training improved the LoRA Classifier performance

## Manual Cross Check

In [1]:
import torch
text = "The woman left the house"
classify_text(text, reference_classifier)
classify_text(text, lora_classifier)

NameError: name 'classify_text' is not defined

## Conclusion

we can see that the evaluations for both models are identical. Same applies for the manual cross check. This implies that the training data provided was not enough to "transfer" the exisiting classifier to a new task.

The reason for selecting a very small subset for the training / evaluation task are the limited GPU resouces available.

This shows that even with very poweful and well trained models, a major success for transfer learning is related to the data available.

More experiments on more powerful GPUs are mandatory.