<a href="https://colab.research.google.com/github/patrycjalazna/transformers/blob/main/projekt.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Importy💅🏻💅🏻💅🏻

In [1]:
!pip install 'transformers==4.12.5' 'tokenizers==0.10.3' 'sentencepiece==0.1.96' 'datasets==1.16.1' 'accelerate==0.5.1' 'sacremoses==0.0.46' 'sacrebleu==2.0.0' 'torch';

Collecting transformers==4.12.5
  Downloading transformers-4.12.5-py3-none-any.whl (3.1 MB)
[K     |████████████████████████████████| 3.1 MB 4.4 MB/s 
[?25hCollecting tokenizers==0.10.3
  Downloading tokenizers-0.10.3-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (3.3 MB)
[K     |████████████████████████████████| 3.3 MB 29.5 MB/s 
[?25hCollecting sentencepiece==0.1.96
  Downloading sentencepiece-0.1.96-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.2 MB)
[K     |████████████████████████████████| 1.2 MB 46.2 MB/s 
[?25hCollecting datasets==1.16.1
  Downloading datasets-1.16.1-py3-none-any.whl (298 kB)
[K     |████████████████████████████████| 298 kB 49.1 MB/s 
[?25hCollecting accelerate==0.5.1
  Downloading accelerate-0.5.1-py3-none-any.whl (58 kB)
[K     |████████████████████████████████| 58 kB 4.4 MB/s 
[?25hCollecting sacremoses==0.0.46
  Downloading sacremoses-0.0.46-py3-none-any.whl (895 kB)
[K     |███████

In [2]:
import torch
from torch import nn
from torch.nn import MSELoss, CrossEntropyLoss, BCEWithLogitsLoss
from transformers import RobertaForSequenceClassification, RobertaModel
from transformers.modeling_outputs import SequenceClassifierOutput
import json
from pathlib import Path
from typing import Dict, List
from datasets import load_dataset
import os
import random

## 🤗 Dataset

Dataset *emotion* jest zbiorem danych angielskich wiadomości na Twitterze zawierających sześć podstawowych emocji: gniew, strach, radość, miłość, smutek i zaskoczenie.

Link do datasetu: [hugginface](https://huggingface.co/datasets/emotion)

Przykład:

```
{
    "label": 0,
    "text": "im feeling quite sad and sorry for myself but ill snap out of it soon"
}
```



In [3]:
dataset = load_dataset('emotion')

Downloading:   0%|          | 0.00/1.66k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.61k [00:00<?, ?B/s]

Using custom data configuration default


Downloading and preparing dataset emotion/default (download: 1.97 MiB, generated: 2.07 MiB, post-processed: Unknown size, total: 4.05 MiB) to /root/.cache/huggingface/datasets/emotion/default/0.0.0/348f63ca8e27b3713b6c04d723efe6d824a56fb3d1449794716c0f0296072705...


Downloading: 0.00B [00:00, ?B/s]

Downloading: 0.00B [00:00, ?B/s]

Downloading: 0.00B [00:00, ?B/s]

0 examples [00:00, ? examples/s]

0 examples [00:00, ? examples/s]

0 examples [00:00, ? examples/s]

Dataset emotion downloaded and prepared to /root/.cache/huggingface/datasets/emotion/default/0.0.0/348f63ca8e27b3713b6c04d723efe6d824a56fb3d1449794716c0f0296072705. Subsequent calls will reuse this data.


  0%|          | 0/3 [00:00<?, ?it/s]

Dane mamy automatycznie podzielone train set, validation set i test set w stosunku 8:1:1.

In [4]:
print(dataset)

DatasetDict({
    train: Dataset({
        features: ['text', 'label'],
        num_rows: 16000
    })
    validation: Dataset({
        features: ['text', 'label'],
        num_rows: 2000
    })
    test: Dataset({
        features: ['text', 'label'],
        num_rows: 2000
    })
})


Następnie tworzymy folder, w którym zapiszemy dane.

In [5]:
if not os.path.exists("./data"):
    os.makedirs("./data")

In [6]:
train_path = Path('data/train.json')
valid_path = Path('data/valid.json')
test_path = Path('data/test.json')

train_path_binary = Path('data/train_binary.json')
valid_path_binary = Path('data/valid_binary.json')
test_path_binary = Path('data/test_binary.json')

In [7]:
data_train_list, data_valid_list, data_test_list = [], [], []

for data_line, data_list in [
  (dataset['train'], data_train_list),
  (dataset['test'], data_test_list),
  (dataset['validation'], data_valid_list)
]:
  for i, data in enumerate(data_line):
    line = {
      'label': int(data['label']),
      'text': data['text'],
    }
    data_list.append(line)

print(f'Train: {len(data_train_list)}')
print(f'Test: {len(data_valid_list)}')
print(f'Validation: {len(data_test_list)}')

Train: 16000
Test: 2000
Validation: 2000


In [8]:
# Zależy czy mapujemy tylko na pozytywne i negatywne czy na 6 co są w datasecie
def get_map_label_translation(num_classes = 6):
    '''
    Possible numbers [2, 6]
    '''
    if(num_classes == 2):
        return {
            0: 'negative',
            1: 'positive',
            2: 'positive',
            3: 'negative',
            4: 'negative',
            5: 'positive',
        }
    elif(num_classes == 6):
        return {
            0: 'sadness',
            1: 'joy',
            2: 'love',
            3: 'anger',
            4: 'fear',
            5: 'suprise',
        }

def get_value_from_label(label):
    if(label in [1, 2, 5]):
        return 1
    else: 
        return 0

MAP_LABEL_TRANSLATION_2 = get_map_label_translation(2)
MAP_LABEL_TRANSLATION_6 = get_map_label_translation(6)

In [9]:
data_class_test = {}
data_class_train = {}
data_class_validation = {}

data_class_test_binary = {}
data_class_train_binary = {}
data_class_validation_binary = {}

for label in MAP_LABEL_TRANSLATION_6:
  if(MAP_LABEL_TRANSLATION_6[label] not in data_class_test):
    data_class_test[MAP_LABEL_TRANSLATION_6[label]] = []
    data_class_validation[MAP_LABEL_TRANSLATION_6[label]] = []
    data_class_train[MAP_LABEL_TRANSLATION_6[label]] = []

for label in MAP_LABEL_TRANSLATION_2:
  if(MAP_LABEL_TRANSLATION_2[label] not in data_class_test):
    data_class_test_binary[MAP_LABEL_TRANSLATION_2[label]] = []
    data_class_validation_binary[MAP_LABEL_TRANSLATION_2[label]] = []
    data_class_train_binary[MAP_LABEL_TRANSLATION_2[label]] = []

for data in data_valid_list:
  data_class_validation[MAP_LABEL_TRANSLATION_6[int(data['label'])]].append(data)
for data in data_train_list:
  data_class_train[MAP_LABEL_TRANSLATION_6[int(data['label'])]].append(data)
for data in data_test_list:
  data_class_test[MAP_LABEL_TRANSLATION_6[int(data['label'])]].append(data)

for data in data_valid_list:
  data_class_validation_binary[MAP_LABEL_TRANSLATION_2[int(data['label'])]].append(data)
for data in data_train_list:
  data_class_train_binary[MAP_LABEL_TRANSLATION_2[int(data['label'])]].append(data)
for data in data_test_list:
  data_class_test_binary[MAP_LABEL_TRANSLATION_2[int(data['label'])]].append(data)

print('-- Stats for train set on 6 labels --')
for label in data_class_train:
  print(f'Label {label}: {len(data_class_train[label]):6d}')
print('-- Stats for test set on 6 labels --')
for label in data_class_test:
  print(f'Label {label}: {len(data_class_test[label]):6d}')
print('-- Stats for validation set on 6 labels--')
for label in data_class_validation:
  print(f'Label {label}: {len(data_class_validation[label]):6d}')
  
print('-- Stats for train set on 2 labels --')
for label in data_class_train_binary:
  print(f'Label {label}: {len(data_class_train_binary[label]):6d}')
print('-- Stats for test set on 2 labels --')
for label in data_class_test_binary:
  print(f'Label {label}: {len(data_class_test_binary[label]):6d}')
print('-- Stats for validation set on 2 labels--')
for label in data_class_validation_binary:
  print(f'Label {label}: {len(data_class_validation_binary[label]):6d}')


-- Stats for train set on 6 labels --
Label sadness:   4666
Label joy:   5362
Label love:   1304
Label anger:   2159
Label fear:   1937
Label suprise:    572
-- Stats for test set on 6 labels --
Label sadness:    581
Label joy:    695
Label love:    159
Label anger:    275
Label fear:    224
Label suprise:     66
-- Stats for validation set on 6 labels--
Label sadness:    550
Label joy:    704
Label love:    178
Label anger:    275
Label fear:    212
Label suprise:     81
-- Stats for train set on 2 labels --
Label negative:   8762
Label positive:   7238
-- Stats for test set on 2 labels --
Label negative:   1080
Label positive:    920
-- Stats for validation set on 2 labels--
Label negative:   1037
Label positive:    963


In [10]:
   
def remove_if_exists(f):
    if(Path(f).exists()):
        f.unlink()

def save_unchanged(f, data, binary = True):
    remove_if_exists(f)
    print(f'Saving into: {f}')
    with open(f, 'wt') as f_write:
        for data_line in data:
            if(binary):
                data_line['label'] = get_value_from_label((data_line['label']))
            data_line_str = json.dumps(data_line)
            f_write.write(f'{data_line_str}\n')

def save_as_translations(f, data_classes, num_entries):
    file_name = 'translations-' + f.name
    file_path = f.parent / file_name
    stats = {}
    remove_if_exists(Path(file_path))
    print(f'Saving into: {file_path}')
    
    with open(file_path, 'wt') as f_write:
        for class_list in data_classes:
            if(num_entries > len(data_classes[class_list])):
                samples = data_classes[class_list]
            else:
                samples = random.sample(data_classes[class_list], num_entries)

            stats[f'{class_list} entries'] = len(samples)

            for data_line in samples:
                data_line['label'] = class_list
                data_line_str = json.dumps(data_line)
                f_write.write(f'{data_line_str}\n')
        print(stats)

In [11]:
# Rozmiar zbiorów, podana wartość to ilość lini dla każdegj klasy, jeżeli dana klasa nie posiada danej ilości lini, wszystkie linie zostaja przekazane.
def get_num_of_samples(set_name):
    if(set_name == 'train'):
        return 1000
    else:
        return 100

for file_path, data_to_save, data_classes, num_entries in [ (train_path, data_train_list, data_class_train, get_num_of_samples('train') ), (valid_path, data_valid_list, data_class_validation, get_num_of_samples('valid')), (test_path, data_test_list, data_class_test, get_num_of_samples('test'))]:
  save_unchanged(file_path, data_to_save, False)
  save_as_translations(file_path, data_classes, num_entries)

for file_path, data_to_save, data_classes, num_entries in [ (train_path_binary, data_train_list, data_class_train_binary, get_num_of_samples('train') ), (valid_path_binary, data_valid_list, data_class_validation_binary, get_num_of_samples('valid')), (test_path_binary, data_test_list, data_class_test_binary, get_num_of_samples('test'))]:
  save_unchanged(file_path, data_to_save)
  save_as_translations(file_path, data_classes, num_entries)

Saving into: data/train.json
Saving into: data/translations-train.json
{'sadness entries': 1000, 'joy entries': 1000, 'love entries': 1000, 'anger entries': 1000, 'fear entries': 1000, 'suprise entries': 572}
Saving into: data/valid.json
Saving into: data/translations-valid.json
{'sadness entries': 100, 'joy entries': 100, 'love entries': 100, 'anger entries': 100, 'fear entries': 100, 'suprise entries': 81}
Saving into: data/test.json
Saving into: data/translations-test.json
{'sadness entries': 100, 'joy entries': 100, 'love entries': 100, 'anger entries': 100, 'fear entries': 100, 'suprise entries': 66}
Saving into: data/train_binary.json
Saving into: data/translations-train_binary.json
{'negative entries': 1000, 'positive entries': 1000}
Saving into: data/valid_binary.json
Saving into: data/translations-valid_binary.json
{'negative entries': 100, 'positive entries': 100}
Saving into: data/test_binary.json
Saving into: data/translations-test_binary.json
{'negative entries': 100, 'pos

## 🤗 Train

Pobranie skryptu dostępnego w bibliotece transformes potrzebnego do uruchomienia modelu.

In [12]:
!wget 'https://raw.githubusercontent.com/huggingface/transformers/v4.12.5/examples/pytorch/text-classification/run_glue_no_trainer.py' -O 'original_run_glue_no_trainer.py'
!wget 'https://raw.githubusercontent.com/patrycjalazna/transformers/main/gpt2.py' -O 'gpt2.py'
!wget 'https://raw.githubusercontent.com/patrycjalazna/transformers/main/roberta.py' -O 'roberta.py'
!wget 'https://raw.githubusercontent.com/patrycjalazna/transformers/main/run_glue_no_trainer.py' -O 'run_glue_no_trainer.py'

--2022-02-21 16:03:54--  https://raw.githubusercontent.com/huggingface/transformers/v4.12.5/examples/pytorch/text-classification/run_glue_no_trainer.py
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 21091 (21K) [text/plain]
Saving to: ‘original_run_glue_no_trainer.py’


2022-02-21 16:03:54 (16.2 MB/s) - ‘original_run_glue_no_trainer.py’ saved [21091/21091]

--2022-02-21 16:03:54--  https://raw.githubusercontent.com/patrycjalazna/transformers/main/gpt2.py
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 9202 (9.0K) [text/plain]
Sav

## GPT2

Podstawowy model GPT2, próba polegała na zwiększeniu ilości epoch co poskutkowało wzrostem accuracy z 0.83 na 0.938
- Epoch 0: accuracy: 0.9095
- Epoch 1: accuracy: 0.9315
- Epoch 2: accuracy: 0.9385
- Epoch 3: accuracy: 0.938
- Evaluation: accuracy: 0.9275

In [13]:
!python run_glue_no_trainer.py \
  --model_name_or_path gpt2 \
  --train_file data/train.json  \
  --validation_file data/valid.json \
  --test_file data/test.json \
  --per_device_train_batch_size 24 \
  --per_device_eval_batch_size 24 \
  --max_length 128 \
  --learning_rate 2e-5 \
  --num_train_epochs 4 \
  --output_dir out/gpt2/version1

02/21/2022 16:04:01 - INFO - __main__ - Distributed environment: NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cuda
Use FP16 precision: False

Downloading and preparing dataset json/default to /root/.cache/huggingface/datasets/json/default-a7eb5dd65320bcb5/0.0.0/c2d554c3377ea79c7664b93dc65d0803b45e3279000f993c7bfd18937fd7f426...
100% 3/3 [00:00<00:00, 9265.77it/s]
100% 3/3 [00:00<00:00, 1259.80it/s]
Dataset json downloaded and prepared to /root/.cache/huggingface/datasets/json/default-a7eb5dd65320bcb5/0.0.0/c2d554c3377ea79c7664b93dc65d0803b45e3279000f993c7bfd18937fd7f426. Subsequent calls will reuse this data.
100% 3/3 [00:00<00:00, 951.95it/s]
https://huggingface.co/gpt2/resolve/main/config.json not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmpnvrikdgn
Downloading: 100% 665/665 [00:00<00:00, 505kB/s]
storing https://huggingface.co/gpt2/resolve/main/config.json in cache at /root/.cache/huggingface/transforme

### Version 2
#### GPT2ForSequenceClassificationCustom
Model z pliku gpt2.py, dodatkowo uruchomiony z flagą freeze_model uruchomiony na 4 epochach:
- Epoch 0 accuracy: 0.462
- Epoch 1 accuracy: 0.4645
- Epoch 2 accuracy: 0.4615
- Epoch 3 accuracy: 0.4745
- Evaluation accurracy: 0.4795

In [14]:
!python run_glue_no_trainer.py \
  --model_name_or_path gpt2 \
  --train_file data/train.json  \
  --validation_file data/valid.json \
  --test_file data/test.json \
  --per_device_train_batch_size 24 \
  --per_device_eval_batch_size 24 \
  --max_length 128 \
  --freeze_model \
  --custom_model \
  --learning_rate 2e-5 \
  --num_train_epochs 4 \
  --output_dir out/gpt2/version2

02/21/2022 16:26:49 - INFO - __main__ - Distributed environment: NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cuda
Use FP16 precision: False

100% 3/3 [00:00<00:00, 784.13it/s]
loading configuration file https://huggingface.co/gpt2/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/fc674cd6907b4c9e933cb42d67662436b89fa9540a1f40d7c919d0109289ad01.7d2e0efa5ca20cef4fb199382111e9d3ad96fd77b849e1d4bed13a66e1336f51
Model config GPT2Config {
  "activation_function": "gelu_new",
  "architectures": [
    "GPT2LMHeadModel"
  ],
  "attn_pdrop": 0.1,
  "bos_token_id": 50256,
  "embd_pdrop": 0.1,
  "eos_token_id": 50256,
  "id2label": {
    "0": "LABEL_0",
    "1": "LABEL_1",
    "2": "LABEL_2",
    "3": "LABEL_3",
    "4": "LABEL_4",
    "5": "LABEL_5"
  },
  "initializer_range": 0.02,
  "label2id": {
    "LABEL_0": 0,
    "LABEL_1": 1,
    "LABEL_2": 2,
    "LABEL_3": 3,
    "LABEL_4": 4,
    "LABEL_5": 5
  },
  "layer_norm_epsilon": 1e-05,
  "mode

### Version 3
#### GPT2ForSequenceClassificationCustomVersion2
Dodana została nowa warstwa, dodatkowo uruchomiony z flagą freeze_model na 2 epochach. Zmieniony został parametr max_length z 128 na 256, oraz train_batch_size z 24 na 32: 
- Epoch 0: accuracy: 0.3765
- Epoch 1: accuracy: 0.4210
- Evaluation accurracy: 0.4339

In [15]:
!python run_glue_no_trainer.py \
  --model_name_or_path gpt2 \
  --train_file data/train.json  \
  --validation_file data/valid.json \
  --test_file data/test.json \
  --per_device_train_batch_size 32 \
  --per_device_eval_batch_size 32 \
  --max_length 254 \
  --freeze_model \
  --custom_model \
  --return_hidden_states \
  --learning_rate 2e-5 \
  --num_train_epochs 2 \
  --output_dir out/gpt2/version3

02/21/2022 16:37:12 - INFO - __main__ - Distributed environment: NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cuda
Use FP16 precision: False

100% 3/3 [00:00<00:00, 803.15it/s]
loading configuration file https://huggingface.co/gpt2/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/fc674cd6907b4c9e933cb42d67662436b89fa9540a1f40d7c919d0109289ad01.7d2e0efa5ca20cef4fb199382111e9d3ad96fd77b849e1d4bed13a66e1336f51
Model config GPT2Config {
  "activation_function": "gelu_new",
  "architectures": [
    "GPT2LMHeadModel"
  ],
  "attn_pdrop": 0.1,
  "bos_token_id": 50256,
  "embd_pdrop": 0.1,
  "eos_token_id": 50256,
  "id2label": {
    "0": "LABEL_0",
    "1": "LABEL_1",
    "2": "LABEL_2",
    "3": "LABEL_3",
    "4": "LABEL_4",
    "5": "LABEL_5"
  },
  "initializer_range": 0.02,
  "label2id": {
    "LABEL_0": 0,
    "LABEL_1": 1,
    "LABEL_2": 2,
    "LABEL_3": 3,
    "LABEL_4": 4,
    "LABEL_5": 5
  },
  "layer_norm_epsilon": 1e-05,
  "mode

### Version 4
#### GPT2ForSequenceClassificationCustomVersion2
Dodana została nowa warstwa, dodatkowo uruchomiony z flagą freeze_model na 8 epochach. Zmieniony został parametr, oraz train_batch_size z 24 na 16: 
- Epoch 0: accuracy: 0.3765
- Epoch 1: accuracy: 0.4210
- Evaluation accurracy: 0.4339

In [16]:
!python run_glue_no_trainer.py \
  --model_name_or_path gpt2 \
  --train_file data/train.json  \
  --validation_file data/valid.json \
  --test_file data/test.json \
  --per_device_train_batch_size 16 \
  --per_device_eval_batch_size 16 \
  --max_length 128 \
  --freeze_model \
  --custom_model \
  --return_hidden_states \
  --learning_rate 2e-5 \
  --num_train_epochs 8 \
  --output_dir out/gpt2/version4

02/21/2022 16:42:46 - INFO - __main__ - Distributed environment: NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cuda
Use FP16 precision: False

100% 3/3 [00:00<00:00, 814.59it/s]
loading configuration file https://huggingface.co/gpt2/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/fc674cd6907b4c9e933cb42d67662436b89fa9540a1f40d7c919d0109289ad01.7d2e0efa5ca20cef4fb199382111e9d3ad96fd77b849e1d4bed13a66e1336f51
Model config GPT2Config {
  "activation_function": "gelu_new",
  "architectures": [
    "GPT2LMHeadModel"
  ],
  "attn_pdrop": 0.1,
  "bos_token_id": 50256,
  "embd_pdrop": 0.1,
  "eos_token_id": 50256,
  "id2label": {
    "0": "LABEL_0",
    "1": "LABEL_1",
    "2": "LABEL_2",
    "3": "LABEL_3",
    "4": "LABEL_4",
    "5": "LABEL_5"
  },
  "initializer_range": 0.02,
  "label2id": {
    "LABEL_0": 0,
    "LABEL_1": 1,
    "LABEL_2": 2,
    "LABEL_3": 3,
    "LABEL_4": 4,
    "LABEL_5": 5
  },
  "layer_norm_epsilon": 1e-05,
  "mode

### Version 5
#### GPT2ForSequenceClassificationCustomVersion2
Dodana została nowa warstwa, dodatkowo uruchomiony na 4 epochach. Różnica w tej wersji polega na zmianie klasyfikacji z 6 labeli na 2. Uznaliśmy że ciekawym będzie porównanie wyników i na potrzeby prób przekonwertujemy nasz dataset emocji tylko na podział pomiędzy pozytywnymi, a negatywnymi:
- sadness = negative
- joy = positive
- love = positive
- anger = negative
- fear = negative
- suprise = positive

Wyniki prezentują się następująco:
- Epoch 0 accuracy: 0.75
- Epoch 1 accuracy: 0.7485
- Epoch 2 accuracy: 0.75
- Epoch 3 accuracy: 0.7505
- Evaluation accurracy: 0.7635

In [None]:
!python run_glue_no_trainer.py \
  --model_name_or_path gpt2 \
  --train_file data/train_binary.json  \
  --validation_file data/valid_binary.json \
  --test_file data/test_binary.json \
  --per_device_train_batch_size 24 \
  --per_device_eval_batch_size 24 \
  --freeze_model \
  --custom_model \
  --return_hidden_states \
  --max_length 128 \
  --learning_rate 2e-5 \
  --num_train_epochs 4 \
  --output_dir out/gpt2/version5

02/21/2022 17:02:55 - INFO - __main__ - Distributed environment: NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cuda
Use FP16 precision: False

Downloading and preparing dataset json/default to /root/.cache/huggingface/datasets/json/default-cc22392d4c309154/0.0.0/c2d554c3377ea79c7664b93dc65d0803b45e3279000f993c7bfd18937fd7f426...
100% 3/3 [00:00<00:00, 9482.22it/s]
100% 3/3 [00:00<00:00, 1173.56it/s]
Dataset json downloaded and prepared to /root/.cache/huggingface/datasets/json/default-cc22392d4c309154/0.0.0/c2d554c3377ea79c7664b93dc65d0803b45e3279000f993c7bfd18937fd7f426. Subsequent calls will reuse this data.
100% 3/3 [00:00<00:00, 701.66it/s]
loading configuration file https://huggingface.co/gpt2/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/fc674cd6907b4c9e933cb42d67662436b89fa9540a1f40d7c919d0109289ad01.7d2e0efa5ca20cef4fb199382111e9d3ad96fd77b849e1d4bed13a66e1336f51
Model config GPT2Config {
  "activation_function": "gelu_new",


## RoBERTa
Model RoBERTa został zaproponowany w książce RoBERTa: A Robustly Optimized BERT Pretraining Approach przez Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov. Jest on oparty na modelu BERT firmy Google wydanym w 2018 roku.

Podstawowy model RoBERTa
- Epoch 0: accuracy: 0.92
- Evaluation: accuracy: 0.92

In [None]:
! python run_glue_no_trainer.py \
  --model_name_or_path roberta-base \
  --train_file data/train.json  \
  --validation_file data/valid.json \
  --test_file data/test.json \
  --per_device_train_batch_size 24 \
  --per_device_eval_batch_size 24 \
  --max_length 128 \
  --learning_rate 2e-5 \
  --num_train_epochs 1 \
  --output_dir out/emotion/roberta

### Version 2
Kod poniżej odnosi się do customowego modelu zapisanego w pliku roberta.py. Zwiększona została liczba epok, a także zmniejszony batchsize na zbiorze treningowym i eval. Dodatkowo zostały zamrożone wagi (nie w głowie klasyfikacji). Learning rate i maksymalna długość sekwencji pozostała taka sama.

- Epoch 0: accuracy: 0.35
- Epoch 1: accuracy: 0.40
- Epoch 2: accuracy: 0.36
- Epoch 3: accuracy: 0.47
- Epoch 4: accuracy: 0.50
- Epoch 5: accuracy: 0.52
- Epoch 6: accuracy: 0.52
- Epoch 7: accuracy: 0.52
- Evaluation: accuracy: 0.52

In [None]:
! python run_glue_no_trainer.py \
  --model_name_or_path roberta-base \
  --train_file data/train.json  \
  --validation_file data/valid.json \
  --test_file data/test.json \
  --per_device_train_batch_size 16 \
  --per_device_eval_batch_size 16 \
  --max_length 128 \
  --freeze_model \
  --learning_rate 2e-5 \
  --num_train_epochs 8 \
  --custom_model \
  --output_dir out/emotion/roberta3

### Version 3
Kod poniżej odnosi się do pliku robertaforward.py. Batch został zmniejszony do 12, maksymalna długość sekwencji (max_length) zostaje taka sama. Dodatkowo, dodany został pooling layer. Wagi (nie w głowie klasyfikacji) zostały zamrożone. Model był trenowany w 12 epokach.

- Epoch 0: accuracy: 0.35
- Epoch 1: accuracy: 0.40
- Epoch 2: accuracy: 0.32
- Epoch 3: accuracy: 0.35
- Epoch 4: accuracy: 0.51
- Epoch 5: accuracy: 0.35
- Epoch 6: accuracy: 0.35
- Epoch 7: accuracy: 0.51
- Epoch 8: accuracy: 0.35
- Epoch 9: accuracy: 0.35
- Epoch 10: accuracy: 0.35
- Epoch 11 accuracy: 0.35

- Evaluation: accuracy: 0.38

In [None]:
! python run_glue_no_trainer2.py \
  --model_name_or_path roberta-base \
  --train_file data/train.json  \
  --validation_file data/valid.json \
  --test_file data/test.json \
  --per_device_train_batch_size 12 \
  --per_device_eval_batch_size 12 \
  --max_length 128 \
  --freeze_model \
  --learning_rate 2e-5 \
  --num_train_epochs 12 \
  --custom_model \
  --output_dir out/emotion/roberta4
   
  # --lr_scheduler_type "linear" \

### Version 4
Kod poniżej odnosi się do pliku robertasmallhead.py. Batchsize został zwiększony do 32, dodatkowo learning rate został zmieniony na 3*10^-5.

- Epoch 0: accuracy: 0.35
- Epoch 1: accuracy: 0.39
- Epoch 2: accuracy: 0.47
- Epoch 3: accuracy: 0.49
- Epoch 4: accuracy: 0.51
- Epoch 5: accuracy: 0.50
- Epoch 6: accuracy: 0.51
- Epoch 7: accuracy: 0.52
- Epoch 8: accuracy: 0.52
- Epoch 9: accuracy: 0.52
- Epoch 10: accuracy: 0.52
- Epoch 11 accuracy: 0.52
- Epoch 12: accuracy: 0.53

- Evaluation: accuracy: 0.51

In [None]:
! python run_glue_no_trainer3.py \
  --model_name_or_path roberta-base \
  --train_file data/train.json  \
  --validation_file data/valid.json \
  --test_file data/test.json \
  --per_device_train_batch_size 32 \
  --per_device_eval_batch_size 32 \
  --max_length 128 \
  --freeze_model \
  --learning_rate 3e-5 \
  --num_train_epochs 12 \
  --custom_model \
  --output_dir out/emotion/roberta5

### Version 5
Kod poniżej opiera się o plik roberta.py. Zmieniona została liczba epok, batch size i learning rate. Użyte zostały hidden states.

- Epoch 0: accuracy: 0.35
- Epoch 1: accuracy: 0.35
- Epoch 2: accuracy: 0.44
- Epoch 3: accuracy: 0.56
- Epoch 4: accuracy: 0.57
- Epoch 5: accuracy: 0.57
- Epoch 6: accuracy: 0.57
- Epoch 7: accuracy: 0.57

- Evaluation: accuracy: 0.57

In [None]:
! python run_glue_no_trainer.py \
  --model_name_or_path roberta-base \
  --train_file data/train.json  \
  --validation_file data/valid.json \
  --test_file data/test.json \
  --per_device_train_batch_size 26 \
  --per_device_eval_batch_size 26 \
  --max_length 128 \
  --return_hidden_states \
  --learning_rate 2e-7 \
  --num_train_epochs 8 \
  --custom_model \
  --output_dir out/emotion/roberta6

## T5
Podstawowy model T5-small
- Epoch 0: accuracy 0.46987951807228917
- Epoch 1: accuracy 0.5559380378657487
- Epoch 2: accuracy 0.6161790017211703
- Epoch 3: accuracy 0.6643717728055077
- Epoch 4: accuracy 0.6884681583476764
- Epoch 5: accuracy 0.7039586919104991


In [None]:
!python run_translation_no_trainer.py \
  --model_name_or_path t5-small \
  --train_file data/translations-train.json \
  --validation_file data/translations-valid.json \
  --test_file data/translations-test.json \
  --per_device_train_batch_size 8 \
  --per_device_eval_batch_size 8 \
  --source_prefix "emotion classification" \
  --max_source_length 256 \
  --max_target_length 128 \q
  --max_length 128 \
  --num_train_epochs 6 \
  --freeze_encoder \
  --output_dir out/emotion/t5_1

### Version 2 binary
- Epoch 0: accuracy 0.825
- Epoch 1: accuracy 0.85
- Epoch 2: accuracy 0.855
- Epoch 3: accuracy 0.895
- Epoch 4: accuracy 0.9
- Epoch 5: accuracy 0.915

In [None]:
!python run_translation_no_trainer_binary.py \
  --model_name_or_path t5-small \
  --train_file data/translations-train_binary.json \
  --validation_file data/translations-valid_binary.json \
  --test_file data/translations-test_binary.json \
  --per_device_train_batch_size 16 \
  --per_device_eval_batch_size 16 \
  --source_prefix "emotion classification" \
  --max_source_length 256 \
  --max_target_length 128 \
  --max_length 128 \
  --num_train_epochs 6 \
  --output_dir out/emotion/t5_2