# Описание ДЗ2.

В семинаре 5 предложите различные способы улучшения F1-score c помощью варьирования предобработки/моделей/количества обучаемых слоев в них и т.д.
**Используйте аналогичные данные и аналогичное разделение на подвыборки** (random_state зафиксирован у меня в семинаре).
В коде отобразите все свои эксперименты.

За задание можно получить максимум 10 баллов.

Разбалловка:
*   **Воспроизводимость и читабельность кода - 3 балла** (все воспроизвелось и все понятно для проверяющего - 3 балла; есть непонятные моменты, но все воспроизвелось - 2 балла; непонятный код и/или воспроизводится с небольшой правкой - 1 балл; непонятный код и/или ничего не воспроизвелось - 0 баллов).
*   **Технический отчет - 5 баллов** (за каждый описанный метод с полученной метрикой, при условии, что он корректный, 1 балл, соответственно для полного балла необходимо минимум 5 методов протестировать).
*   **Иновационность - 2 балла** (используете нетривиальную обработку/лосс-функцию/модель и можете объяснить свой результат - 2 балла; используете нетривиальную обработку/лосс-функцию/модель и не можете объяснить свой результат - 1 балл; просто перебираете модели из коробок - 0 баллов).

!!! ДЗ необходимо выполнять только в Google Colab !!!

Присылать на почту llmrisks@yandex.ru с номером ДЗ и ФИО в теме. Каждая ДЗ отдельным письмом с отдельной темой.


# 1. Информация о сабмите

**Мартин Михаил Алексеевич**

# 2. Технический отчет

Результаты проведенных экспериментов представлены в таблице:

№ | Описание метода | F1 Average |
--- | --- | --- |
1 | RuBERT-tiny2 с обучением только последних слоев | 0.664 |
2 | RuBERT-tiny2 с обучением всех слоев | 0.79 |
3 | RuBERT-tiny2 с обучением энкодера и последних слоев | 0.796 |
4 | DeBERTa c обучением только последних слоев | 0.534 |
5 | DeBERTa с обучением всех слоев | 0.760 |
6 | DeBERTa с обучением энкодера и последних слоев | 0.782 |

Методология работы заключается в сравнении дообучения нескольких слоёв и всех слоёв модели, настройки параметров обучения.

1. В качестве первой модели был выбран **RuBERT_base** (DeepPavlov/rubert-base-cased), версия BERT, которая была предобучена на русском текстовом корпусе. В процессе обучения обновлялись только веса последних слоев, в то время как веса основной части модели оставались неизменными.

2. В этом эксперименте также использовался **RuBERT-base**. Все веса модели обучались на протяжении 10 эпох с использованием скорости обучения 2e-5.

3. В данном эксперименте снова применялся **RuBERT-base**, при этом обновлялись веса только для энкодера и последних слоев модели, а слои эмбеддингов были заморожены. Этот подход предполагает, что эмбеддинги уже достаточно хорошо обучены, поскольку модель была предобучена на корпусе русских текстов.

4. В качестве еще одной модели была использована **DeBERTa** (deepvk/deberta-v1-base). Сначала был проведен эксперимент, в котором обучались только последние слои модели.

5. Затем, аналогично второму эксперименту, все слои модели **DeBERTa** были обучены.

6. В этом эксперименте снова использовалась **DeBERTa** с обновлением весов только для энкодера и последних слоев, в то время как слои эмбеддингов были заморожены. Это предполагает, что эмбеддинги уже достаточно хорошо обучены благодаря предобучению модели на корпусе русских текстов.


Выводы:

1. Дополнительные знания о специфике текста способствуют улучшению предсказательной способности модели на схожих данных.

2. Замораживание слоев эмбеддингов во время обучения для моделей, предобученных на больших текстовых корпусах, позволяет снизить скорость обучения и достичь более высоких результатов по сравнению с обучением всех слоев. Эмбеддинги предобученной модели оказываются более качественными, чем те, которые получены в процессе тонкой настройки на небольшом корпусе специфичного текста.

# 3. *Code*

## 3.1 Подготовка среды

In [None]:
!pip install -q pyarrow==15.0.0
!pip install -q Partial State
!pip install --upgrade -q accelerate transformers
!pip install -q transformers seqeval corus razdel
!pip install -q datasets==2.21.0

In [2]:
!wget -q https://github.com/cimm-kzn/RuDReC/raw/master/data/rudrec_annotated.json

In [3]:
from collections import Counter, defaultdict
from functools import partial
import logging

import torch
import numpy as np
import pandas as pd

from razdel import tokenize
from corus import load_rudrec
from datasets import Dataset, DatasetDict
from datasets import load_dataset, load_metric

from transformers import (
    DataCollatorForTokenClassification,
    AutoModelForTokenClassification,
    TrainingArguments,
    AutoTokenizer,
    Trainer,
    AdamW,
)

from sklearn.metrics import confusion_matrix
from sklearn.model_selection import train_test_split

from transformers.trainer import logger as noisy_logger
noisy_logger.setLevel(logging.WARNING)

## 3.2 Функции

In [4]:
def tokenize_and_align_labels(examples, tokenizer, label_all_tokens=False):

    tokenized_inputs = tokenizer(
        examples["tokens"],
        truncation=True,
        is_split_into_words=True,
    )

    labels = []
    for i, label in enumerate(examples["tags"]):
        word_ids = tokenized_inputs.word_ids(batch_index=i)
        previous_word_idx = None
        label_ids = []
        for word_idx in word_ids:
            # Special tokens have a word id that is None.
            # We set the label to -100 so they are automatically ignored in the loss function.
            if word_idx is None:
                label_ids.append(-100)
            # We set the label for the first token of each word.
            elif word_idx != previous_word_idx:
                label_ids.append(label[word_idx])
            # For the other tokens in a word, we set the label
            # to either the current label or -100, depending on
            # the label_all_tokens flag.
            else:
                label_ids.append(label[word_idx] if label_all_tokens else -100)

            previous_word_idx = word_idx

        label_ids = [
            label_list.index(idx) if isinstance(idx, str) else idx
            for idx in label_ids
        ]

        labels.append(label_ids)

    tokenized_inputs["labels"] = labels

    return tokenized_inputs


def compute_metrics(p, tokenizer):

    metric = load_metric("seqeval", trust_remote_code=True)

    predictions, labels, inputs = p.predictions, p.label_ids, p.inputs
    predictions = np.argmax(p.predictions, axis=2)

    true_predictions = []
    true_labels = []
    for prediction, label, tokens in zip(predictions, labels, inputs):
        true_predictions.append([])
        true_labels.append([])

        for (p, l, t) in zip(prediction, label, tokens):
            if l != -100 and not tokenizer.convert_ids_to_tokens(int(t)).startswith("##"):
                true_predictions[-1].append(label_list[p])
                true_labels[-1].append(label_list[l])

    results = metric.compute(
        predictions=true_predictions,
        references=true_labels,
        zero_division=0,
    )

    return {
        "precision": results["overall_precision"],
        "recall": results["overall_recall"],
        "f1": results["overall_f1"],
        "accuracy": results["overall_accuracy"],
    }

def get_trainer(
    model,
    tokenizer,
    datasets: dict,
    compute_metrics: "function",
    learning_rate: float = 2e-5,
    batch_size: int = 16,
    num_epochs: int = 10,
):
    data_collator = DataCollatorForTokenClassification(tokenizer)

    args = TrainingArguments(
        "ner",
        evaluation_strategy="epoch",
        learning_rate=learning_rate,
        per_device_train_batch_size=batch_size,
        per_device_eval_batch_size=batch_size,
        num_train_epochs=num_epochs,
        weight_decay=0.01,
        save_strategy="no",
        report_to="none",
        include_inputs_for_metrics=True,
    )

    return Trainer(
        model,
        args,
        train_dataset=datasets["train"],
        eval_dataset=datasets["test"],
        data_collator=data_collator,
        tokenizer=base_tokenizer,
        compute_metrics=eval_metrics
    )


def extract_labels(item):

  raw_tokens = list(tokenize(item.text))

  words = [tok.text for tok in raw_tokens]
  word_labels = ["O"] * len(raw_tokens)
  char2word = [None] * len(item.text)

  for i, word in enumerate(raw_tokens):
      char2word[word.start:word.stop] = [i] * len(word.text)

  for e in item.entities:
      e_words = sorted({idx for idx in char2word[e.start:e.end] if idx is not None})
      word_labels[e_words[0]] = "B-" + e.entity_type
      for idx in e_words[1:]:
          word_labels[idx] = "I-" + e.entity_type

  return {"tokens": words, "tags": word_labels}

## 3.3 Обработка данных

In [5]:
drugs = list(load_rudrec("rudrec_annotated.json"))

# Содержание данных
type2text = defaultdict(Counter)
ents = Counter()

for item in drugs:
    for e in item.entities:
        ents[e.entity_type] += 1
        type2text[e.entity_type][e.entity_text] += 1

for k, v in ents.most_common():
    print(k, v)
    print(type2text[k].most_common(3))

DI 1401
[('простуды', 64), ('ОРВИ', 47), ('профилактики', 42)]
Drugname 1043
[('Виферон', 33), ('Анаферон', 25), ('Циклоферон', 24)]
Drugform 836
[('таблетки', 154), ('таблеток', 79), ('свечи', 63)]
ADR 720
[('аллергия', 16), ('слабость', 13), ('диарея', 12)]
Drugclass 330
[('противовирусный', 21), ('противовирусное', 18), ('противовирусных', 13)]
Finding 236
[('аллергии', 12), ('температуры', 6), ('сонливости', 5)]


In [6]:
# Список всех видов меток
label_list = [
    "O", "B-ADR", "B-DI", "B-Drugclass", "B-Drugform", "B-Drugname", "B-Finding",
    "I-ADR", "I-DI", "I-Drugclass", "I-Drugform", "I-Drugname", "I-Finding",
]

ner_data = [extract_labels(item) for item in drugs]
ner_train, ner_test = train_test_split(ner_data, test_size=0.2, random_state=1)

ner_data = DatasetDict({
    "train": Dataset.from_pandas(pd.DataFrame(ner_train)),
    "test" : Dataset.from_pandas(pd.DataFrame(ner_test))
})

## 3.4 Эксперименты

### Модели на основе DeepPavlov/rubert-base-cased

In [7]:
model_checkpoint = "DeepPavlov/rubert-base-cased"
base_tokenizer = AutoTokenizer.from_pretrained(model_checkpoint)

tokenized_datasets = ner_data.map(partial(tokenize_and_align_labels, tokenizer=base_tokenizer), batched=True)

eval_metrics = partial(compute_metrics, tokenizer=base_tokenizer)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/24.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/642 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/1.65M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

Map:   0%|          | 0/3847 [00:00<?, ? examples/s]

Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.


Map:   0%|          | 0/962 [00:00<?, ? examples/s]

#### Обучение только последнего слоя

In [8]:
model = AutoModelForTokenClassification.from_pretrained(model_checkpoint, num_labels=len(label_list))

model.config.id2label = dict(enumerate(label_list))
model.config.label2id = {v: k for k, v in model.config.id2label.items()}

trainer = get_trainer(
    model=model,
    tokenizer=base_tokenizer,
    datasets=tokenized_datasets,
    compute_metrics=eval_metrics,
    learning_rate=1e-3,
)

# Заморозка слоев
for param in model.bert.parameters():
    param.requires_grad = False

trainer.train()
trainer.evaluate()

pytorch_model.bin:   0%|          | 0.00/714M [00:00<?, ?B/s]

Some weights of BertForTokenClassification were not initialized from the model checkpoint at DeepPavlov/rubert-base-cased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Using `include_inputs_for_metrics` is deprecated and will be removed in version 5 of 🤗 Transformers. Please use `include_for_metrics` list argument instead.
  return Trainer(


Epoch,Training Loss,Validation Loss,Precision,Recall,F1,Accuracy
1,No log,0.204489,0.630996,0.542283,0.583286,0.941568
2,No log,0.183541,0.637874,0.608879,0.623039,0.947108
3,0.264800,0.174511,0.656895,0.639535,0.648099,0.949324
4,0.264800,0.171017,0.648649,0.659619,0.654088,0.950211
5,0.171000,0.169964,0.631263,0.665962,0.648148,0.949767
6,0.171000,0.166649,0.643655,0.67019,0.656655,0.950949
7,0.162100,0.164484,0.663857,0.665962,0.664908,0.951836
8,0.162100,0.164182,0.648621,0.671247,0.65974,0.951688
9,0.154600,0.163329,0.649538,0.668076,0.658676,0.95191
10,0.154600,0.163223,0.651139,0.664905,0.65795,0.951983


  metric = load_metric("seqeval", trust_remote_code=True)


Downloading builder script:   0%|          | 0.00/2.47k [00:00<?, ?B/s]

{'eval_loss': 0.16322259604930878,
 'eval_precision': 0.6511387163561076,
 'eval_recall': 0.6649048625792812,
 'eval_f1': 0.6579497907949791,
 'eval_accuracy': 0.9519834527591047,
 'eval_runtime': 3.0427,
 'eval_samples_per_second': 316.163,
 'eval_steps_per_second': 20.048,
 'epoch': 10.0}

#### Обучение всех слоев

In [9]:
model = AutoModelForTokenClassification.from_pretrained(model_checkpoint, num_labels=len(label_list))

model.config.id2label = dict(enumerate(label_list))
model.config.label2id = {v: k for k, v in model.config.id2label.items()}

trainer = get_trainer(
    model=model,
    tokenizer=base_tokenizer,
    datasets=tokenized_datasets,
    compute_metrics=eval_metrics,
)

trainer.train()
trainer.evaluate()

Some weights of BertForTokenClassification were not initialized from the model checkpoint at DeepPavlov/rubert-base-cased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Using `include_inputs_for_metrics` is deprecated and will be removed in version 5 of 🤗 Transformers. Please use `include_for_metrics` list argument instead.
  return Trainer(


Epoch,Training Loss,Validation Loss,Precision,Recall,F1,Accuracy
1,No log,0.157717,0.595004,0.705074,0.64538,0.953018
2,No log,0.128778,0.713186,0.783298,0.746599,0.965059
3,0.186100,0.128463,0.724239,0.780127,0.751145,0.964985
4,0.186100,0.137379,0.742126,0.79704,0.768603,0.96661
5,0.057000,0.148277,0.771894,0.801268,0.786307,0.967792
6,0.057000,0.164322,0.744786,0.792812,0.768049,0.966832
7,0.025900,0.16704,0.748054,0.812896,0.779129,0.966758
8,0.025900,0.178566,0.745402,0.813953,0.778171,0.966167
9,0.012300,0.179171,0.755162,0.811839,0.782476,0.966979
10,0.012300,0.181092,0.765114,0.816068,0.78977,0.967866


{'eval_loss': 0.1810922920703888,
 'eval_precision': 0.7651139742319127,
 'eval_recall': 0.8160676532769556,
 'eval_f1': 0.789769820971867,
 'eval_accuracy': 0.9678658491541701,
 'eval_runtime': 2.9746,
 'eval_samples_per_second': 323.406,
 'eval_steps_per_second': 20.507,
 'epoch': 10.0}

#### Обучение энкодера и последних слоев

In [10]:
model = AutoModelForTokenClassification.from_pretrained(model_checkpoint, num_labels=len(label_list))

model.config.id2label = dict(enumerate(label_list))
model.config.label2id = {v: k for k, v in model.config.id2label.items()}

trainer = get_trainer(
    model=model,
    tokenizer=base_tokenizer,
    datasets=tokenized_datasets,
    compute_metrics=eval_metrics,
    learning_rate=1e-4,
)

# Заморозка слоев эмбеддингов
for param in model.bert.embeddings.parameters():
    param.requires_grad = False

trainer.train()
trainer.evaluate()

Some weights of BertForTokenClassification were not initialized from the model checkpoint at DeepPavlov/rubert-base-cased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Using `include_inputs_for_metrics` is deprecated and will be removed in version 5 of 🤗 Transformers. Please use `include_for_metrics` list argument instead.
  return Trainer(


Epoch,Training Loss,Validation Loss,Precision,Recall,F1,Accuracy
1,No log,0.168138,0.614801,0.684989,0.648,0.948364
2,No log,0.143788,0.622047,0.751586,0.680708,0.953682
3,0.169600,0.139846,0.72066,0.785412,0.751644,0.962325
4,0.169600,0.155768,0.763804,0.789641,0.776507,0.964911
5,0.062800,0.173259,0.734753,0.802326,0.767054,0.964468
6,0.062800,0.193728,0.717822,0.766385,0.741309,0.962399
7,0.024400,0.186965,0.75,0.795983,0.772308,0.964985
8,0.024400,0.181332,0.76435,0.802326,0.782878,0.966167
9,0.008600,0.189442,0.752515,0.790698,0.771134,0.965945
10,0.008600,0.194765,0.755266,0.795983,0.77509,0.966241


{'eval_loss': 0.1947648674249649,
 'eval_precision': 0.7552657973921765,
 'eval_recall': 0.7959830866807611,
 'eval_f1': 0.775090066906845,
 'eval_accuracy': 0.9662406737090936,
 'eval_runtime': 3.2011,
 'eval_samples_per_second': 300.524,
 'eval_steps_per_second': 19.056,
 'epoch': 10.0}

Тюнинг последних 6-ти слоёв энкодера

In [11]:
# Замораживаем все слои энкодера, кроме последних 3-ти
model = AutoModelForTokenClassification.from_pretrained(model_checkpoint, num_labels=len(label_list))

model.config.id2label = dict(enumerate(label_list))
model.config.label2id = {v: k for k, v in model.config.id2label.items()}

trainer = get_trainer(
    model=model,
    tokenizer=base_tokenizer,
    datasets=tokenized_datasets,
    compute_metrics=eval_metrics,
    learning_rate=1e-4,
)

for param in model.bert.encoder.layer[:-6].parameters():
    param.requires_grad = False

trainer.train()
trainer.evaluate()

Some weights of BertForTokenClassification were not initialized from the model checkpoint at DeepPavlov/rubert-base-cased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Using `include_inputs_for_metrics` is deprecated and will be removed in version 5 of 🤗 Transformers. Please use `include_for_metrics` list argument instead.
  return Trainer(


Epoch,Training Loss,Validation Loss,Precision,Recall,F1,Accuracy
1,No log,0.151701,0.618362,0.726216,0.667963,0.951466
2,No log,0.136716,0.697164,0.805497,0.747425,0.963434
3,0.154700,0.139869,0.761855,0.764271,0.763061,0.966462
4,0.154700,0.165474,0.734753,0.802326,0.767054,0.964689
5,0.043500,0.174583,0.728313,0.807611,0.765915,0.963581
6,0.043500,0.180091,0.767276,0.798097,0.782383,0.968678
7,0.013600,0.197539,0.761239,0.805497,0.782743,0.967349
8,0.013600,0.201068,0.75,0.805497,0.776758,0.966388
9,0.005000,0.203546,0.75996,0.806554,0.782564,0.967423
10,0.005000,0.206321,0.762525,0.80444,0.782922,0.967718


{'eval_loss': 0.2063213586807251,
 'eval_precision': 0.7625250501002004,
 'eval_recall': 0.8044397463002114,
 'eval_f1': 0.7829218106995884,
 'eval_accuracy': 0.9677181059318903,
 'eval_runtime': 3.0103,
 'eval_samples_per_second': 319.571,
 'eval_steps_per_second': 20.264,
 'epoch': 10.0}

Тюнинг первых 6-ти слоёв энкодера

In [12]:
# Замораживаем все слои энкодера, кроме первых 6-ти
model = AutoModelForTokenClassification.from_pretrained(model_checkpoint, num_labels=len(label_list))

model.config.id2label = dict(enumerate(label_list))
model.config.label2id = {v: k for k, v in model.config.id2label.items()}

trainer = get_trainer(
    model=model,
    tokenizer=base_tokenizer,
    datasets=tokenized_datasets,
    compute_metrics=eval_metrics,
    learning_rate=1e-4,
)

for param in model.bert.encoder.layer[-6:].parameters():
    param.requires_grad = False

trainer.train()
trainer.evaluate()

Some weights of BertForTokenClassification were not initialized from the model checkpoint at DeepPavlov/rubert-base-cased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Using `include_inputs_for_metrics` is deprecated and will be removed in version 5 of 🤗 Transformers. Please use `include_for_metrics` list argument instead.
  return Trainer(


Epoch,Training Loss,Validation Loss,Precision,Recall,F1,Accuracy
1,No log,0.192991,0.55232,0.641649,0.593643,0.946074
2,No log,0.158996,0.713718,0.758985,0.735656,0.960774
3,0.217200,0.149554,0.730303,0.764271,0.746901,0.963951
4,0.217200,0.153798,0.724409,0.778013,0.750255,0.964837
5,0.063400,0.175426,0.729862,0.785412,0.756619,0.964246
6,0.063400,0.178948,0.729756,0.790698,0.759006,0.964394
7,0.023400,0.20013,0.735736,0.776956,0.755784,0.964837
8,0.023400,0.196442,0.734252,0.788584,0.760449,0.965428
9,0.009100,0.195652,0.738142,0.789641,0.763023,0.965354
10,0.009100,0.201391,0.747495,0.788584,0.76749,0.966019


{'eval_loss': 0.20139092206954956,
 'eval_precision': 0.7474949899799599,
 'eval_recall': 0.7885835095137421,
 'eval_f1': 0.7674897119341564,
 'eval_accuracy': 0.9660190588756741,
 'eval_runtime': 3.0362,
 'eval_samples_per_second': 316.845,
 'eval_steps_per_second': 20.091,
 'epoch': 10.0}

### Модели на основе DeBERTa

In [13]:
model_checkpoint = "deepvk/deberta-v1-base"
base_tokenizer = AutoTokenizer.from_pretrained(model_checkpoint)
base_tokenizer.add_prefix_space = True

tokenized_datasets = ner_data.map(partial(tokenize_and_align_labels, tokenizer=base_tokenizer), batched=True)

eval_metrics = partial(compute_metrics, tokenizer=base_tokenizer)

tokenizer_config.json:   0%|          | 0.00/543 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.56M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/1.21M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/3.62M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/280 [00:00<?, ?B/s]

Map:   0%|          | 0/3847 [00:00<?, ? examples/s]

Map:   0%|          | 0/962 [00:00<?, ? examples/s]

#### Обучение только последних слоев

In [14]:
model = AutoModelForTokenClassification.from_pretrained(model_checkpoint, num_labels=len(label_list))

model.config.id2label = dict(enumerate(label_list))
model.config.label2id = {v: k for k, v in model.config.id2label.items()}

trainer = get_trainer(
    model=model,
    tokenizer=base_tokenizer,
    datasets=tokenized_datasets,
    compute_metrics=eval_metrics,
    learning_rate=1e-3,
)

# Заморозка слоев
for param in model.deberta.parameters():
    param.requires_grad = False

trainer.train()
trainer.evaluate()

config.json:   0%|          | 0.00/757 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/496M [00:00<?, ?B/s]

Some weights of DebertaForTokenClassification were not initialized from the model checkpoint at deepvk/deberta-v1-base and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Using `include_inputs_for_metrics` is deprecated and will be removed in version 5 of 🤗 Transformers. Please use `include_for_metrics` list argument instead.
  return Trainer(


Epoch,Training Loss,Validation Loss,Precision,Recall,F1,Accuracy
1,No log,0.300824,0.617063,0.328753,0.428966,0.922804
2,No log,0.281753,0.570201,0.420719,0.484185,0.928492
3,0.358900,0.271046,0.606145,0.458774,0.522262,0.932186
4,0.358900,0.267971,0.589168,0.471459,0.523782,0.933516
5,0.278500,0.26492,0.584498,0.486258,0.530871,0.933368
6,0.278500,0.262342,0.59181,0.473573,0.52613,0.933442
7,0.269600,0.261222,0.608163,0.472516,0.531826,0.93455
8,0.269600,0.26066,0.578223,0.488372,0.529513,0.934254
9,0.261200,0.259729,0.592308,0.488372,0.535342,0.934402
10,0.261200,0.259506,0.58831,0.489429,0.534334,0.934476


{'eval_loss': 0.25950556993484497,
 'eval_precision': 0.5883100381194409,
 'eval_recall': 0.4894291754756871,
 'eval_f1': 0.534333525678015,
 'eval_accuracy': 0.9344758809189628,
 'eval_runtime': 4.5867,
 'eval_samples_per_second': 209.735,
 'eval_steps_per_second': 13.299,
 'epoch': 10.0}

#### Обучение всех слоев

In [15]:
model = AutoModelForTokenClassification.from_pretrained(model_checkpoint, num_labels=len(label_list))

model.config.id2label = dict(enumerate(label_list))
model.config.label2id = {v: k for k, v in model.config.id2label.items()}

trainer = get_trainer(
    model=model,
    tokenizer=base_tokenizer,
    datasets=tokenized_datasets,
    compute_metrics=eval_metrics,
)

trainer.train()
trainer.evaluate()

Some weights of DebertaForTokenClassification were not initialized from the model checkpoint at deepvk/deberta-v1-base and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Using `include_inputs_for_metrics` is deprecated and will be removed in version 5 of 🤗 Transformers. Please use `include_for_metrics` list argument instead.
  return Trainer(


Epoch,Training Loss,Validation Loss,Precision,Recall,F1,Accuracy
1,No log,0.18028,0.619235,0.667019,0.642239,0.950284
2,No log,0.159925,0.638298,0.729387,0.680809,0.955382
3,0.238300,0.141337,0.675,0.742072,0.706949,0.95878
4,0.238300,0.143116,0.680184,0.780127,0.726736,0.960405
5,0.094300,0.143511,0.748988,0.782241,0.765253,0.964763
6,0.094300,0.140906,0.733204,0.795983,0.763305,0.964985
7,0.056600,0.149281,0.763507,0.791755,0.777374,0.966093
8,0.056600,0.151491,0.738072,0.801268,0.768373,0.965502
9,0.035900,0.156768,0.740272,0.80444,0.771023,0.96528
10,0.035900,0.160118,0.737198,0.806554,0.770318,0.964615


{'eval_loss': 0.16011834144592285,
 'eval_precision': 0.7371980676328502,
 'eval_recall': 0.806553911205074,
 'eval_f1': 0.7703180212014133,
 'eval_accuracy': 0.9646154982640172,
 'eval_runtime': 5.0124,
 'eval_samples_per_second': 191.925,
 'eval_steps_per_second': 12.17,
 'epoch': 10.0}

#### Обучение энкодера и последних слоев

In [16]:
model = AutoModelForTokenClassification.from_pretrained(model_checkpoint, num_labels=len(label_list))

model.config.id2label = dict(enumerate(label_list))
model.config.label2id = {v: k for k, v in model.config.id2label.items()}

trainer = get_trainer(
    model=model,
    tokenizer=base_tokenizer,
    datasets=tokenized_datasets,
    compute_metrics=eval_metrics,
    learning_rate=5e-5,
)

# Заморозка слоев эмбеддингов
for param in model.deberta.embeddings.parameters():
    param.requires_grad = False

trainer.train()
trainer.evaluate()

Some weights of DebertaForTokenClassification were not initialized from the model checkpoint at deepvk/deberta-v1-base and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Using `include_inputs_for_metrics` is deprecated and will be removed in version 5 of 🤗 Transformers. Please use `include_for_metrics` list argument instead.
  return Trainer(


Epoch,Training Loss,Validation Loss,Precision,Recall,F1,Accuracy
1,No log,0.16069,0.647338,0.719873,0.681682,0.953535
2,No log,0.156009,0.645833,0.7537,0.69561,0.956637
3,0.203900,0.136421,0.711596,0.758985,0.734527,0.963507
4,0.203900,0.152605,0.646394,0.786469,0.709585,0.958189
5,0.076800,0.14411,0.757451,0.77907,0.768108,0.966093
6,0.076800,0.150384,0.764,0.807611,0.7852,0.967792
7,0.036200,0.154094,0.768997,0.802326,0.785308,0.967644
8,0.036200,0.165192,0.778689,0.803383,0.790843,0.969269
9,0.015700,0.162875,0.768924,0.816068,0.791795,0.969196
10,0.015700,0.163282,0.766932,0.813953,0.789744,0.968752


{'eval_loss': 0.1632823795080185,
 'eval_precision': 0.7669322709163346,
 'eval_recall': 0.813953488372093,
 'eval_f1': 0.7897435897435898,
 'eval_accuracy': 0.9687523084878481,
 'eval_runtime': 4.5544,
 'eval_samples_per_second': 211.223,
 'eval_steps_per_second': 13.394,
 'epoch': 10.0}