## Knowledge Distillation approach with Bert
- Dataset das zum Training verwendet wird ist imdb

### Preparing environment (kdein)
Folgende Befehle in der bash ausführen
- conda create -n kdein python==3.10
- conda activate kdein
- pip install torch==2.0.1 transformers==4.40.2 datasets ipywidgets accelerate==0.30.1 wandb platformdirs
- python -m ipykernel install --user --name=kdein

In [2]:
# Control pytorch version --> Must be 2.0.1
!conda list | grep torch 

pytorch-revgrad           0.2.0                    pypi_0    pypi
torch                     2.0.1                    pypi_0    pypi
torchaudio                2.0.1+cu117              pypi_0    pypi
torchvision               0.15.1+cu117             pypi_0    pypi


### Define Models, dataset and output dir

In [1]:
import torch
from transformers import BertForSequenceClassification, BertTokenizerFast, Trainer, TrainingArguments
from datasets import load_dataset

# Laden des IMDB-Datensatzes
dataset = load_dataset("imdb")

# Laden des vortrainierten BERT-Modells und des Tokenizers für den Lehrer

teacher_model_name = "bert-base-uncased"
teacher_model = BertForSequenceClassification.from_pretrained(teacher_model_name)
teacher_tokenizer = BertTokenizerFast.from_pretrained(teacher_model_name)
print(teacher_model.classifier)

# Laden eines vereinfachten BERT-Modells und des Tokenizers für den Schüler
student_model_name = "distilbert-base-uncased"
student_model = BertForSequenceClassification.from_pretrained(student_model_name)
student_tokenizer = BertTokenizerFast.from_pretrained(student_model_name)
print(student_model.classifier)
print(f"Memory footprint Teacher: {teacher_model.get_memory_footprint() / 1e6:.2f} MB")
print(f"Memory footprint Student: {student_model.get_memory_footprint() / 1e6:.2f} MB")
save_path="/home/thsch026/masterarbeit/models/generated/kd2"

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Linear(in_features=768, out_features=2, bias=True)


You are using a model of type distilbert to instantiate a model of type bert. This is not supported for all configurations of models and can yield errors.
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'embeddings.LayerNorm.bias', 'embeddings.LayerNorm.weight', 'embeddings.position_embeddings.weight', 'embeddings.token_type_embeddings.weight', 'embeddings.word_embeddings.weight', 'encoder.layer.0.attention.output.LayerNorm.bias', 'encoder.layer.0.attention.output.LayerNorm.weight', 'encoder.layer.0.attention.output.dense.bias', 'encoder.layer.0.attention.output.dense.weight', 'encoder.layer.0.attention.self.key.bias', 'encoder.layer.0.attention.self.key.weight', 'encoder.layer.0.attention.self.query.bias', 'encoder.layer.0.attention.self.query.weight', 'encoder.layer.0.attention.self.value.bias', 'encoder.layer.0.attention.self.value.weight', 'enc

Linear(in_features=768, out_features=2, bias=True)
Memory footprint Teacher: 437.94 MB
Memory footprint Student: 437.94 MB


### Prepare Training and needed functions

In [2]:
# Definieren der Trainingsargumente
training_args = TrainingArguments(
    per_device_train_batch_size=8,
    num_train_epochs=3,
    evaluation_strategy="epoch",
    logging_dir="./logs",
    output_dir="./out2"
)

# Funktion zur Berechnung der distillationsverlust
def compute_distillation_loss(student_logits, teacher_logits, temperature=2.0, alpha=0.5):
    soft_labels = torch.nn.functional.softmax(teacher_logits / temperature, dim=-1)
    soft_loss = torch.nn.functional.kl_div(torch.nn.functional.log_softmax(student_logits / temperature, dim=-1), soft_labels, reduction='batchmean')
    hard_loss = torch.nn.functional.cross_entropy(student_logits, torch.argmax(soft_labels, dim=-1))
    return alpha * soft_loss + (1.0 - alpha) * hard_loss

# Laden und vorverarbeiten der Daten
def preprocess_function(examples):
    return teacher_tokenizer(examples["text"], truncation=True, padding="max_length")

train_dataset = dataset["train"].map(preprocess_function, batched=True)
eval_dataset = dataset["test"].map(preprocess_function, batched=True)

# Funktion zum Trainieren des Schülermodells
def compute_metrics(eval_predictions):
    return {"accuracy": (eval_predictions.predictions.argmax(axis=1) == eval_predictions.label_ids).mean()}

# Definition des Trainerobjekts
trainer = Trainer(
    model=student_model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    compute_metrics=compute_metrics,
)

# Überprüfen der Felder im Datensatz
print(train_dataset.column_names)

# Überprüfen eines Beispiels im Datensatz
print(train_dataset[0])


Detected kernel version 5.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.


['text', 'label', 'input_ids', 'token_type_ids', 'attention_mask']
{'text': 'I rented I AM CURIOUS-YELLOW from my video store because of all the controversy that surrounded it when it was first released in 1967. I also heard that at first it was seized by U.S. customs if it ever tried to enter this country, therefore being a fan of films considered "controversial" I really had to see this for myself.<br /><br />The plot is centered around a young Swedish drama student named Lena who wants to learn everything she can about life. In particular she wants to focus her attentions to making some sort of documentary on what the average Swede thought about certain political issues such as the Vietnam War and race issues in the United States. In between asking politicians and ordinary denizens of Stockholm about their opinions on politics, she has sex with her drama teacher, classmates, and married men.<br /><br />What kills me about I AM CURIOUS-YELLOW is that 40 years ago, this was considered

In [None]:
# Trainieren des Schülermodells mit Knowledge Distillation
trainer.train()

Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.
[34m[1mwandb[0m: Currently logged in as: [33mthomas-t-schmitt[0m ([33mpumaai[0m). Use [1m`wandb login --relogin`[0m to force relogin


Epoch,Training Loss,Validation Loss


In [None]:
save_path="/home/thsch026/masterarbeit/models/generated/kd2"
student_model.save_pretrained(save_path)
student_tokenizer.save_pretrained(save_path)
