<a href="https://colab.research.google.com/github/sadevans/DL_NLP_ITMO/blob/hw_5/hw_5.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Домашнее задание 5 - 10 баллов

В этом задании вам предстоит дообучить трансформерную модель для задачи классификации с помощью различных техник и сравнить их между собой.

Датасет: [dair-ai/emotion](https://huggingface.co/datasets/dair-ai/emotion)

Модель: [google-bert/bert-base-uncased](https://huggingface.co/google-bert/bert-base-uncased) (если хочется, можно заменить на что-то более интересное)

1. Скачайте датасет и модель. Измерьте базовые метрики классификации перед началом экспериментов.

**NB!** Для всех типов дообучения замерьте :
- качество классификации на выходе
- время дообучения
- количество параметров для обучения
- потребление ресурсов (не нужно заморачиваться с профайлингом - можно просто посмотреть в `nvidia-smi` или `torch.cuda.memory_allocated`)

2. Обучите модель в режиме full finetuning - **1 балл**
3. Обучите модель в режиме linear probing - реализуйте кастомную классификационную голову и обучайте только ее. Не забудьте описать, чем обусловлено устройство головы, как вы пришли к такой архитектуре - **2 балла**
4. Обучите модель в режиме PEFT с использованием [prompt tuning или prefix tuning](https://ericwiener.github.io/ai-notes/AI-Notes/Large-Language-Models/Prompt-Tuning-and-Prefix-Tuning). При выборе метода напишите пару слов, почему решили остановиться именно на этом методе - **2 балла**
4. Обучите модель в режиме PEFT с использованием LoRA. Попробуйте подобрать оптимальный ранг - `r`, при желании поэкспериментируйте с остальными гиперпараметрами. Опишите, чем обусловлена ваша финальная конфигурация - **2 балла**

5. Соберите все результаты отдельных замеров в таблицу и сделайте выводы о вычислительной сложности методов, итоговом качестве и прочих наблюдаемых свойствах моделей - **1 балл**


In [1]:
%%capture
!pip install datasets transformers peft torchmetrics
!pip install evaluate
!pip install hf_xet


In [99]:
import os
import random
import subprocess
import time
from random import sample
from IPython.display import clear_output

import evaluate
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
import torch
from datasets import load_dataset
from peft import LoraConfig, get_peft_model
from torch import nn
from torch.optim import AdamW
from torch.utils.data import DataLoader
from torchmetrics.functional import accuracy
from tqdm import tqdm
from transformers import (
    AutoModelForSequenceClassification,
    AutoTokenizer,
    BertConfig,
    BertForSequenceClassification,
    BertTokenizer,
    BertModel,
    Trainer,
    TrainingArguments,
)

sns.set_theme()
os.environ['TOKENIZERS_PARALLELISM'] = 'false'
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

## Фикисируем seed

In [3]:
RANDOM_SEED = 42
os.environ["SEED"] = str(RANDOM_SEED)
random.seed(RANDOM_SEED)
np.random.seed(RANDOM_SEED)
torch.manual_seed(RANDOM_SEED)

<torch._C.Generator at 0x7f95085df590>

## Скачиваем датасет и модель

In [4]:
dataset = load_dataset("dair-ai/emotion")
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


### токенизация

In [5]:
dataset

DatasetDict({
    train: Dataset({
        features: ['text', 'label'],
        num_rows: 16000
    })
    validation: Dataset({
        features: ['text', 'label'],
        num_rows: 2000
    })
    test: Dataset({
        features: ['text', 'label'],
        num_rows: 2000
    })
})

In [24]:
def tokenize_function(examples):
    return tokenizer(examples["text"], padding="max_length", truncation=True, max_length=128)

In [25]:
tokenized_datasets = dataset.map(tokenize_function, batched=True)
tokenized_datasets = tokenized_datasets.rename_column("label", "labels")
tokenized_datasets.set_format("torch", columns=["text", "input_ids", "attention_mask", "labels"])

In [26]:
tokenized_datasets['train']['labels'].unique()

tensor([0, 1, 2, 3, 4, 5])

In [27]:
batch_size = 128

train_loader = DataLoader(tokenized_datasets['train'], shuffle=True, batch_size=batch_size, num_workers=8)
valid_loader = DataLoader(tokenized_datasets['validation'], shuffle=False, batch_size=batch_size, num_workers=8)
test_loader = DataLoader(tokenized_datasets['test'], shuffle=False, batch_size=batch_size, num_workers=8)


### метрики загружаем

In [110]:
accuracy_metric = evaluate.load("accuracy")
f1_metric = evaluate.load("f1")
precision_metric = evaluate.load("precision")
recall_metric = evaluate.load("recall")

def compute_metrics(eval_pred):
    logits, labels = eval_pred

    if isinstance(logits, tuple):
        logits = logits[1]

    if hasattr(logits, "numpy"):
        logits = logits.detach().cpu().numpy()

    predictions = np.argmax(logits, axis=-1)

    predictions = predictions.tolist()
    labels = labels.tolist() if hasattr(labels, "tolist") else list(labels)

    return {
        "accuracy": accuracy_metric.compute(predictions=predictions, references=labels)["accuracy"],
        "f1": f1_metric.compute(predictions=predictions, references=labels, average="macro")["f1"],
        "precision": precision_metric.compute(predictions=predictions, references=labels, average="macro")["precision"],
        "recall": recall_metric.compute(predictions=predictions, references=labels, average="macro")["recall"],
    }

### замеряем базовое качество

In [35]:
config = BertConfig.from_pretrained("bert-base-uncased", num_labels=6)
base_model = BertForSequenceClassification.from_pretrained("bert-base-uncased", config=config)

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [36]:
training_args = TrainingArguments(
    output_dir="./peft/baseline/",
    per_device_eval_batch_size=16,
    report_to="none"
)

trainer = Trainer(
    model=base_model,
    args=training_args,
    eval_dataset=tokenized_datasets["test"],
    compute_metrics=compute_metrics,
)

baseline = trainer.evaluate()
print("Baseline metrics:")
print('\n'.join(f"{key}: {value}" for key, value in baseline.items()))

Baseline metrics:
eval_loss: 1.8653087615966797
eval_model_preparation_time: 0.0032
eval_accuracy: 0.0795
eval_f1: 0.024651162790697675
eval_precision: 0.013309894525364139
eval_recall: 0.16666666666666666
eval_runtime: 12.5178
eval_samples_per_second: 159.772
eval_steps_per_second: 9.986


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


In [37]:
# df_results = pd.DataFrame()
data_results = {'exp':[], 'accuracy':[], 'f1': [], 'recall':[], 'precision':[], 'time_finetuning':[], 'num_params_finetuning':[], 'resources_used':[]}

In [38]:
full_memory = torch.cuda.memory_allocated()

In [39]:
data_results['exp'].append('baseline')
data_results['accuracy'].append(baseline['eval_accuracy'])
data_results['f1'].append(baseline['eval_f1'])
data_results['recall'].append(baseline['eval_recall'])
data_results['precision'].append(baseline['eval_precision'])
data_results['time_finetuning'].append(None)
data_results['num_params_finetuning'].append(None)
data_results['resources_used'].append(None)

## Полный finetuning

In [40]:
torch.cuda.empty_cache()

In [41]:
config = BertConfig.from_pretrained(
    "bert-base-uncased",
    num_labels=6
)
model_full_finetune = BertForSequenceClassification.from_pretrained("bert-base-uncased", config=config)

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [43]:
start_time = time.time()
training_args_full_finetune = TrainingArguments(
    output_dir="./peft/full_finetuning",
    num_train_epochs=3,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    eval_strategy="epoch",
    eval_steps=100,
    save_strategy="no",
    logging_dir="logs",
    logging_steps=100,
    learning_rate=2e-5,
)

trainer_full_finetune = Trainer(
    model=model_full_finetune,
    args=training_args_full_finetune,
    train_dataset=tokenized_datasets["train"],
    eval_dataset=tokenized_datasets["validation"],
    compute_metrics=compute_metrics,
)

trainer_full_finetune.train()


[34m[1mwandb[0m: Currently logged in as: [33msaddevans[0m to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,0.2176,0.181463,0.9285,0.905014,0.896678,0.9178
2,0.1132,0.142525,0.945,0.920597,0.927169,0.915841
3,0.0743,0.150835,0.94,0.913122,0.919748,0.908379


TrainOutput(global_step=3000, training_loss=0.23817086919148764, metrics={'train_runtime': 1033.1628, 'train_samples_per_second': 46.459, 'train_steps_per_second': 2.904, 'total_flos': 3157446057984000.0, 'train_loss': 0.23817086919148764, 'epoch': 3.0})

In [44]:
test_results = trainer_full_finetune.predict(tokenized_datasets["test"])
full_finetune_results = compute_metrics((test_results.predictions, test_results.label_ids))
full_time = time.time() - start_time
full_params = sum(p.numel() for p in model_full_finetune.parameters() if p.requires_grad)
full_memory = torch.cuda.memory_allocated()

### качество замеряем

In [45]:
print("Full finetune metrics:")
print('\n'.join(f"{key}: {value}" for key, value in full_finetune_results.items()))

Full finetune metrics:
accuracy: 0.9265
f1: 0.8775309197994615
precision: 0.8880486119900928
recall: 0.8691991487569148


In [46]:
data_results['exp'].append('full_finetuning')
data_results['accuracy'].append(full_finetune_results['accuracy'])
data_results['f1'].append(full_finetune_results['f1'])
data_results['recall'].append(full_finetune_results['recall'])
data_results['precision'].append(full_finetune_results['precision'])
data_results['time_finetuning'].append(full_time)
data_results['num_params_finetuning'].append(full_params)
data_results['resources_used'].append(full_memory)

In [47]:
data_results

{'exp': ['baseline', 'full_finetuning'],
 'accuracy': [0.0795, 0.9265],
 'f1': [0.024651162790697675, 0.8775309197994615],
 'recall': [0.16666666666666666, 0.8691991487569148],
 'precision': [0.013309894525364139, 0.8880486119900928],
 'time_finetuning': [None, 1048.3201813697815],
 'num_params_finetuning': [None, 109486854],
 'resources_used': [None, 2212633600]}

## Режим linear probing

Для кастомной головы взяла Dropout для регуляризации, два линейных слоя для лучшей обработки признаков и GeLU, тк она вроде как в оригинальном BERT используется

In [101]:
torch.cuda.empty_cache()

In [102]:
class CustomHead(nn.Module):
    def __init__(self, hidden_size=768, num_labels=6):
        super().__init__()
        self.dropout = nn.Dropout(0.1)
        self.linear1 = nn.Linear(hidden_size, hidden_size//2)
        self.linear2 = nn.Linear(hidden_size//2, num_labels)
        self.gelu = nn.GELU()

    def forward(self, x):
        x = self.dropout(x)
        x = self.linear1(x)
        x = self.gelu(x)
        return self.linear2(x)

In [103]:
class BertWithCustomHead(nn.Module):
    def __init__(self, num_labels=6):
        super().__init__()
        self.bert = BertModel.from_pretrained("bert-base-uncased")
        self.classifier = CustomHead(num_labels=num_labels)

        for param in self.bert.parameters():
            param.requires_grad = False

    def forward(self, input_ids=None, attention_mask=None, labels=None, **kwargs):
        outputs = self.bert(
            input_ids=input_ids,
            attention_mask=attention_mask,
            return_dict=True
        )

        pooled_output = outputs.last_hidden_state[:, 0, :]

        logits = self.classifier(pooled_output)

        loss = None
        if labels is not None:
            loss_fct = nn.CrossEntropyLoss()
            loss = loss_fct(logits.view(-1, logits.size(-1)), labels.view(-1))

        return {'loss': loss, 'logits': logits}

In [104]:
model_lin_prob = BertWithCustomHead(num_labels=6)
model_lin_prob.to("cuda" if torch.cuda.is_available() else "cpu")


BertWithCustomHead(
  (bert): BertModel(
    (embeddings): BertEmbeddings(
      (word_embeddings): Embedding(30522, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (token_type_embeddings): Embedding(2, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (encoder): BertEncoder(
      (layer): ModuleList(
        (0-11): 12 x BertLayer(
          (attention): BertAttention(
            (self): BertSdpaSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e-12, elemen

In [105]:
start_time = time.time()
training_args_lin_prob = TrainingArguments(
    output_dir="./peft/linear_probing",
    num_train_epochs=3,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    eval_strategy="epoch",
    eval_steps=100,
    save_strategy="no",
    logging_dir="logs",
    logging_steps=100,
    learning_rate=2e-5,
    remove_unused_columns=False,
)

trainer_lin_prob = Trainer(
    model=model_lin_prob,
    args=training_args_lin_prob,
    train_dataset=tokenized_datasets["train"],
    eval_dataset=tokenized_datasets["validation"],
    compute_metrics=compute_metrics,
)

trainer_lin_prob.train()
full_time = time.time() - start_time
full_params = sum(p.numel() for p in model_lin_prob.parameters() if p.requires_grad)
full_memory = torch.cuda.memory_allocated()

Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,1.5167,1.481176,0.483,0.193693,0.159338,0.250767
2,1.4509,1.420908,0.5015,0.203327,0.165405,0.263902
3,1.4184,1.404785,0.5025,0.203993,0.166062,0.264839


[0 0 2 ... 1 1 1]


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


[0 0 2 ... 1 1 1]


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


[0 0 2 ... 1 1 1]


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


In [106]:
test_results_lin_prob = trainer_lin_prob.predict(tokenized_datasets["test"])
lin_prob_results = compute_metrics((test_results_lin_prob.predictions, test_results_lin_prob.label_ids))

[0 0 0 ... 1 1 4]


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


[0 0 0 ... 1 1 4]


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


### замеряем качество

In [107]:
print("Linear probing metrics:")
print('\n'.join(f"{key}: {value}" for key, value in lin_prob_results.items()))

Linear probing metrics:
accuracy: 0.507
f1: 0.2049649242707401
precision: 0.16789501320773925
recall: 0.2631161851929816


In [108]:
data_results['exp'].append('lin_probing')
data_results['accuracy'].append(lin_prob_results['accuracy'])
data_results['f1'].append(lin_prob_results['f1'])
data_results['recall'].append(lin_prob_results['recall'])
data_results['precision'].append(lin_prob_results['precision'])
data_results['time_finetuning'].append(full_time)
data_results['num_params_finetuning'].append(full_params)
data_results['resources_used'].append(full_memory)

In [109]:
data_results

{'exp': ['baseline', 'full_finetuning', 'lin_probing'],
 'accuracy': [0.0795, 0.9265, 0.507],
 'f1': [0.024651162790697675, 0.8775309197994615, 0.2049649242707401],
 'recall': [0.16666666666666666, 0.8691991487569148, 0.2631161851929816],
 'precision': [0.013309894525364139, 0.8880486119900928, 0.16789501320773925],
 'time_finetuning': [None, 1048.3201813697815, 372.8968884944916],
 'num_params_finetuning': [None, 109486854, 297606],
 'resources_used': [None, 2212633600, 4417774592]}

## PEFT Prompt Tuning

In [116]:
torch.cuda.empty_cache()

In [117]:
class PromptTuning(torch.nn.Module):
    def __init__(self, model, prompt_length=20):
        super().__init__()
        self.model = model
        self.prompt_embeddings = torch.nn.Embedding(prompt_length, model.config.hidden_size)
        self.prompt_embeddings.weight.data.uniform_()

    def forward(self, input_ids, attention_mask=None, labels=None):
        inputs_embeds = self.model.get_input_embeddings()(input_ids)
        prompt_embeds = self.prompt_embeddings.weight.repeat(input_ids.shape[0], 1, 1)
        inputs_embeds = torch.cat([prompt_embeds, inputs_embeds], dim=1)
        attention_mask = torch.cat([
            torch.ones(input_ids.shape[0], prompt_embeds.shape[1]).to(attention_mask.device),
            attention_mask
        ], dim=1)
        return self.model(inputs_embeds=inputs_embeds, attention_mask=attention_mask, labels=labels)



In [118]:
config = BertConfig.from_pretrained(
    "bert-base-uncased",
    num_labels=6
)
model_prompt = BertForSequenceClassification.from_pretrained("bert-base-uncased", config=config)
prompt_model = PromptTuning(model_prompt)

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [119]:
for param in prompt_model.parameters():
    param.requires_grad = False
for param in prompt_model.prompt_embeddings.parameters():
    param.requires_grad = True

In [120]:
start_time = time.time()
training_args_prompt = TrainingArguments(
    output_dir="./peft/prompt",
    num_train_epochs=3,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    eval_strategy="epoch",
    eval_steps=100,
    save_strategy="no",
    logging_dir="logs",
    logging_steps=100,
    learning_rate=2e-5,
)


trainer_prompt = Trainer(
    model=prompt_model,
    args=training_args_prompt,
    train_dataset=tokenized_datasets["train"],
    eval_dataset=tokenized_datasets["validation"],
    compute_metrics=compute_metrics,
)

trainer_prompt.train()
prompt_time = time.time() - start_time
prompt_params = sum(p.numel() for p in prompt_model.parameters() if p.requires_grad)
prompt_memory = torch.cuda.memory_allocated()

Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,1.7176,1.702291,0.352,0.086785,0.058667,0.166667
2,1.7127,1.69656,0.352,0.086785,0.058667,0.166667
3,1.7059,1.694942,0.352,0.086785,0.058667,0.166667


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


стоило получше параметры подобрать, но для сравнения такие же брала

### замеряем качество

In [125]:
test_results_prompt = trainer_prompt.predict(tokenized_datasets["test"])
prompt_results = compute_metrics((test_results_prompt.predictions, test_results_prompt.label_ids))


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


In [126]:
print("Prompt tuning metrics:")
print('\n'.join(f"{key}: {value}" for key, value in prompt_results.items()))

Prompt tuning metrics:
accuracy: 0.3475
f1: 0.08596165739022882
precision: 0.057916666666666665
recall: 0.16666666666666666


In [129]:
data_results['exp'].append('prompt_tuning')
data_results['accuracy'].append(prompt_results['accuracy'])
data_results['f1'].append(prompt_results['f1'])
data_results['recall'].append(prompt_results['recall'])
data_results['precision'].append(prompt_results['precision'])
data_results['time_finetuning'].append(prompt_time)
data_results['num_params_finetuning'].append(prompt_params)
data_results['resources_used'].append(prompt_memory)

## LORA tuning

In [149]:
config = BertConfig.from_pretrained(
    "bert-base-uncased",
    num_labels=6
)
model_lora = BertForSequenceClassification.from_pretrained("bert-base-uncased", config=config)

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [150]:
lora_config = LoraConfig(
    r=16, # дает хороший баланс качества и параметров
    lora_alpha=32, # для сохранения масштаба обновлений
    target_modules=["query", "value"], # наиболее важные для задачи
    lora_dropout=0.1, # регуляризация
    bias="none",
    modules_to_save=["classifier"],
)
model_lora = get_peft_model(model_lora, lora_config)
# model_lora

In [153]:
training_args_lora = TrainingArguments(
    output_dir="./peft/lora",
    num_train_epochs=3,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    eval_strategy="epoch",
    eval_steps=100,
    save_strategy="no",
    logging_dir="logs",
    logging_steps=100,
    learning_rate=2e-5,
    label_names=["labels"]
)

In [154]:
start_time = time.time()

trainer_lora = Trainer(
    model=model_lora,
    args=training_args_lora,
    train_dataset=tokenized_datasets["train"],
    eval_dataset=tokenized_datasets["validation"],
    compute_metrics=compute_metrics,
)

trainer_lora.train()
lora_time = time.time() - start_time
lora_params = sum(p.numel() for p in model_lora.parameters() if p.requires_grad)
lora_memory = torch.cuda.memory_allocated()

Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,1.3124,1.228146,0.55,0.225906,0.187646,0.292964
2,1.1749,1.119806,0.577,0.237714,0.198968,0.30767
3,1.117,1.095488,0.5805,0.239497,0.201073,0.309593


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


### замеряем качество

In [155]:
test_results_lora = trainer_lora.predict(tokenized_datasets["test"])
lora_results = compute_metrics((test_results_lora.predictions, test_results_lora.label_ids))


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


In [156]:
print("LORA tuning metrics:")
print('\n'.join(f"{key}: {value}" for key, value in lora_results.items()))

LORA tuning metrics:
accuracy: 0.5895
f1: 0.24242362916258164
precision: 0.2051268167510324
recall: 0.30842506717517554


In [157]:
data_results['exp'].append('lora_tuning')
data_results['accuracy'].append(lora_results['accuracy'])
data_results['f1'].append(lora_results['f1'])
data_results['recall'].append(lora_results['recall'])
data_results['precision'].append(lora_results['precision'])
data_results['time_finetuning'].append(lora_time)
data_results['num_params_finetuning'].append(lora_params)
data_results['resources_used'].append(lora_memory)

## Выводы

In [159]:
df_results = pd.DataFrame(data_results)
df_results

Unnamed: 0,exp,accuracy,f1,recall,precision,time_finetuning,num_params_finetuning,resources_used
0,baseline,0.0795,0.024651,0.166667,0.01331,,,
1,full_finetuning,0.9265,0.877531,0.869199,0.888049,1048.320181,109486854.0,2212634000.0
2,lin_probing,0.507,0.204965,0.263116,0.167895,372.896888,297606.0,4417775000.0
3,prompt_tuning,0.3475,0.085962,0.166667,0.057917,853.139068,15360.0,4417775000.0
4,lora_tuning,0.5895,0.242424,0.308425,0.205127,733.574782,594438.0,5311043000.0



- Full Fine-tuning показывает наилучшее качество, но требует больше всего ресурсов.
- Linear Probing самый быстрый, но довольно сильно уступает по качеству full-finetuning. Точно надо дольше учиться и лучше подобрать параметры
- Prompt tuning - показал наихудшие резлутаты, но ресурсов потребляет меньше, чем full-finetuning. Что-то среднее между LORA и full-finetuning. Требуется более тонкая настрочка длины промпта и праметров
- LORA finetuning - что-то среднее между full-finetuning и linear probbing.Если лучше подобрать параметры, то при меньшем кол-ве ресурсов можно получить вполне хорошее качество, которое будет сравнимо с full-finetuning

