# <h1><center><b>Lab2: Enkóderek</b></h1></center>
## <h2><center><b>Finomhangolás CoLA dataseten</b></h2></center>


Finomhangolunk egy BERT modellt bináris osztályozásra:
A bemenet egy mondat, el kell dönteni, hogy a mondat grammatikailag és szemantikailag helyes-e.

Ehhez fogjuk használni a [HuCoLA datasetet](https://huggingface.co/datasets/NYTK/HuCOLA). Ez olyan jól- és rosszulformált magyar mondatokat tartalmaz,
melyeket humán annotátorok címkéztek.

Szerencsére rendelkezére áll a magyar adatokon előtanított [huBERT](https://huggingface.co/SZTAKI-HLT/hubert-base-cc) enkódermodell, amelyet csak finomhangolnunk kell a feladatra.

In [None]:
!pip install transformers datasets folium==0.2.1

Collecting transformers
  Downloading transformers-4.17.0-py3-none-any.whl (3.8 MB)
[K     |████████████████████████████████| 3.8 MB 5.2 MB/s 
[?25hCollecting datasets
  Downloading datasets-2.0.0-py3-none-any.whl (325 kB)
[K     |████████████████████████████████| 325 kB 41.8 MB/s 
[?25hCollecting folium==0.2.1
  Downloading folium-0.2.1.tar.gz (69 kB)
[K     |████████████████████████████████| 69 kB 6.8 MB/s 
Collecting huggingface-hub<1.0,>=0.1.0
  Downloading huggingface_hub-0.4.0-py3-none-any.whl (67 kB)
[K     |████████████████████████████████| 67 kB 5.0 MB/s 
Collecting sacremoses
  Downloading sacremoses-0.0.49-py3-none-any.whl (895 kB)
[K     |████████████████████████████████| 895 kB 48.2 MB/s 
Collecting tokenizers!=0.11.3,>=0.11.1
  Downloading tokenizers-0.11.6-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (6.5 MB)
[K     |████████████████████████████████| 6.5 MB 35.7 MB/s 
[?25hCollecting pyyaml>=5.1
  Downloading PyYAML-6.0-cp37-cp37m-manylinux_2_5_x86_

In [None]:
from typing import Tuple, Dict, Any, Optional, Union

from tqdm import tqdm
import torch
from torch.utils.data import DataLoader
from datasets import (
    load_dataset, Dataset,
    load_metric, Metric
)
from transformers import (
    BertTokenizer,
    PreTrainedTokenizer,
    BertForSequenceClassification,
    BatchEncoding,
    get_scheduler
)
from transformers.trainer_utils import SchedulerType

# Define global variables
BATCH_SIZE = 8
MAX_SEQ_LENGTH = 128
NUM_EPOCHS = 2
# huBERT is a pre-trained BERT model trained on Hungarian data 
MODEL_NAME = "SZTAKI-HLT/hubert-base-cc"

In [None]:
# Load the Hungarian CoLA dataset.
# This is a dataset for a binary classification task:
# Every data point contains a Hungarian sentence and a label.
# If the sentence is grammatically and semantically well-formed,
# the label is `1`. Otherwise, it is `0`.
# The next line of code downloads both the training and validation splits
# and puts them into a list which is immediately unpacked
dataset = load_dataset(
    "NYTK/HuCOLA", split=["train", "validation"], field="data")
train_dataset, val_dataset = dataset
del dataset

# Let us see an exapmle from the trainig dataset
print(train_dataset[0])

Using custom data configuration NYTK--HuCOLA-37fb0069b290be21


Downloading and preparing dataset json/NYTK--HuCOLA to /root/.cache/huggingface/datasets/json/NYTK--HuCOLA-37fb0069b290be21/0.0.0/ac0ca5f5289a6cf108e706efcf040422dbbfa8e658dee6a819f20d76bb84d26b...


Downloading data files:   0%|          | 0/3 [00:00<?, ?it/s]

Downloading data:   0%|          | 0.00/926k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/113k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/114k [00:00<?, ?B/s]

Extracting data files:   0%|          | 0/3 [00:00<?, ?it/s]

Dataset json downloaded and prepared to /root/.cache/huggingface/datasets/json/NYTK--HuCOLA-37fb0069b290be21/0.0.0/ac0ca5f5289a6cf108e706efcf040422dbbfa8e658dee6a819f20d76bb84d26b. Subsequent calls will reuse this data.


  0%|          | 0/2 [00:00<?, ?it/s]

{'Sent_id': 'train_0', 'Sent': 'Az Angliáról való könyv tetszik.', 'Label': '0'}


In [None]:
# We have seen that the labels are given as strings in the column 'Label'
# of the datasets. Let us write a function that converts these strings to
# integers and renames the column. The new name should be 'labels', as
# the `transformers.BertForSequenceClassification` object that we are about to
# use requires that labels be provided through the keyword argument `labels`.

def rename_column(
        dataset: Dataset,
        old_col_name: str,
        new_col_name: str,
        col_type: type
) -> Dataset:
    """Rename a column in a dataset and convert its data type
    
    Args:
        dataset: The dataset to be processed.
        old_col_name: The column name that is to be changed.
        new_col_name: The new column name.
        col_type: The data in the renamed column will be converted to this type.

    Returns:
        The dataset with the renamed column
    """
    return dataset.map(
        lambda example: {new_col_name: col_type(example[old_col_name])},
        remove_columns=[old_col_name]
    )

In [None]:
old_label_name, new_label_name = "Label", "labels"
train_dataset = rename_column(
    train_dataset, old_label_name, new_label_name, int)
val_dataset = rename_column(
    val_dataset, old_label_name, new_label_name, int)

# Let us see an example again
print(train_dataset[0])

  0%|          | 0/7274 [00:00<?, ?ex/s]

  0%|          | 0/910 [00:00<?, ?ex/s]

{'Sent_id': 'train_0', 'Sent': 'Az Angliáról való könyv tetszik.', 'labels': 0}


In [None]:
# Now, we tokenize the dataset.
# Note that we do not need the tokenizer to output `token_type_ids`.
# This would be a tensor with elements 0 and 1 which indicate whether a token
# comes from the first or the second input sentence. However, only single
# sentences will be tokenized now, as each data point contains only one
# sentence. The model will be able to handle this.

def tokenize_single_sent_dataset(
        dataset: Dataset,
        text_col_name: str,
        label_col_name: str,
        tokenizer: PreTrainedTokenizer,
        batch_size: int,
        max_seq_length: Optional[int] = None,
) -> DataLoader:
    """Tokenize a dataset

    Args:
        dataset: The input data as a `datasets.Dataset` object
        text_col_name: The dataset column (which can also be called a key)
            that contains the text data. It should not be `"input_ids"` or
            `"attention_mask"` as those are the columns
            returned by the tokenizer.
        label_col_name: The dataset column that contains the labels
        tokenizer: A pre-trained tokenizer
        batch_size: Batch size for tokenization
        max_seq_length: Optional. If the number of tokens in a sequence is
            `n` and `n` > `max_seq_length`, the sequence will be truncated.
            This means cutting off the last `n - max_seq_length` tokens.
            If not specified, truncation will not be applied.

    Returns:
         The tokenized dataset as a `DataLoader`
    """
    dataset_cols = dataset.features.keys()
    if text_col_name not in dataset_cols:
        raise KeyError(f"{text_col_name} is not a dataset field.")
    if label_col_name not in dataset_cols:
        raise KeyError(f"{label_col_name} is not a dataset field.")
    tokenizer_cols = tokenizer("Dummy text", return_token_type_ids=False).keys()
    if text_col_name in tokenizer_cols:
        raise KeyError(f"Invalid text column name: {text_col_name}")
    tokenizer.model_max_length = max_seq_length

    def tok_func(example: Dict[str, Any]) -> BatchEncoding:
        # Call `tokenizer`: add padding, apply truncation and omit
        # `token_type_ids`. Feel free to refer to
        # https://huggingface.co/docs/transformers/main_classes/tokenizer
        return None

    dataset = dataset.map(tok_func, batched=True, batch_size=batch_size)
    dataset.set_format(
        type="torch", columns=list(tokenizer_cols) + [label_col_name])
    return DataLoader(dataset, batch_size=batch_size)

In [None]:
# As we use a pre-trained model, a tokenizer must already exist.
# We can simply download it from HuggingFace Hub.
tokenizer = BertTokenizer.from_pretrained(MODEL_NAME)

train_data_loader, val_data_loader = (tokenize_single_sent_dataset(
    dataset=dataset,
    text_col_name="Sent",
    label_col_name=new_label_name,
    tokenizer=tokenizer,
    batch_size=BATCH_SIZE,
    max_seq_length=MAX_SEQ_LENGTH
) for dataset in (train_dataset, val_dataset))

Downloading:   0%|          | 0.00/266k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/112 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/86.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/420 [00:00<?, ?B/s]

  0%|          | 0/910 [00:00<?, ?ba/s]

  0%|          | 0/114 [00:00<?, ?ba/s]

In [None]:
# Let us inspect the data again
# If your tokenization function is correct,
# you should get a dict with keys `'labels'`,
# `'input_ids'` and `'attention_mask'`
# and the values should be tensors
print(next(iter(train_data_loader)))

{'labels': tensor([1, 1, 1, 1, 1, 1, 1, 1]), 'input_ids': tensor([[  101,  2256,  2814,  2180,  1005,  1056,  4965,  2023,  4106,  1010,
          2292,  2894,  1996,  2279,  2028,  2057, 16599,  1012,   102],
        [  101,  2028,  2062, 18404,  2236,  3989,  1998,  1045,  1005,  1049,
          3228,  2039,  1012,   102,     0,     0,     0,     0,     0],
        [  101,  2028,  2062, 18404,  2236,  3989,  2030,  1045,  1005,  1049,
          3228,  2039,  1012,   102,     0,     0,     0,     0,     0],
        [  101,  1996,  2062,  2057,  2817, 16025,  1010,  1996, 13675, 16103,
          2121,  2027,  2131,  1012,   102,     0,     0,     0,     0],
        [  101,  2154,  2011,  2154,  1996,  8866,  2024,  2893, 14163,  8024,
          3771,  1012,   102,     0,     0,     0,     0,     0,     0],
        [  101,  1045,  1005,  2222,  8081,  2017,  1037,  4392,  1012,   102,
             0,     0,     0,     0,     0,     0,     0,     0,     0],
        [  101,  5965, 27129, 

In [None]:
# The following functions are related to fine-tuning.
# Note that the architecture that we are going to use handles binary
# classification as a special case of multi-class classification.
# It uses softmax rather than sigmoid (i.e. logistic regression) for
# classification. As a result, the classifier head outputs two floating point
# numbers (one for each class) per input sequence instead of one.

def _get_loss_log_accuracy(
        model: BertForSequenceClassification,
        metric: Metric,
        device: torch.device,
        batch: Dict[str, torch.Tensor],
        label_key: str
) -> torch.Tensor:
    """Helper function to calculate loss and add predictions to a metric
    after a training or validation step
    
    Args:
        model: A BERT model
        metric: A metric object which logs predictions but does not calculate
            the metric (e.g. accuracy) until explicitly requested to.
        device: The same device where the model was put
        batch: A training batch as a `dict` whose values are tensors
        label_key: The key that corresponds to the label values in the
            batch `dict`. This is typically `'labels'`.
    
    Returns:
        The loss as a scalar tensor.
    """
    batch = {k: v.to(device) for k, v in batch.items()}
    outputs = model(**batch)
    # We get logits from the model, but the metric object needs a concrete
    # prediction. We simply use an argmax operation to achieve this.
    # For classification, one would first call the softmax, but softmax
    # does not change the argmax result.
    # Note that `outputs.logits` is of shape `(batch_size, number_of_classes)`
    preds = torch.argmax(outputs.logits, dim=-1)
    metric.add_batch(predictions=preds, references=batch[label_key])
    return outputs.loss


def log_results(
        epoch: int,
        step: int,
        train_loss: Union[float, torch.Tensor],
        train_acc: Union[float, torch.Tensor],
        val_loss: Union[float, torch.Tensor],
        val_acc: Union[float, torch.Tensor],
) -> None:
    """Helper function to log loss and accuracy scores"""
    print(f"\nTraining loss at step {step}, epoch {epoch}: "
          f"{train_loss}")
    print(f"Training accuracy at step {step}, epoch {epoch}: "
          f"{train_acc}")
    print(f"Validation loss at step {step}, epoch {epoch}: "
          f"{val_loss}")
    print(f"Validation accuracy at step {step}, epoch {epoch}: "
          f"{val_acc}")


@torch.inference_mode()
def do_evaluation(
        model: BertForSequenceClassification,
        metric: Metric,
        device: torch.device,
        data_loader: DataLoader,
        label_key: str,
        metric_type: str
) -> Tuple[float, float]:
    """Do a validation epoch

    Args:
        model: A BERT model
        metric: A metric object which logs predictions but does not calculate
            the metric (e.g. accuracy) until explicitly requested to.
        device: The device to use, the same device where the model was put.
        data_loader: A validation dataset wrapped by a `DataLoader`.
            Batches will be expected to be `dict` instances that contain the 
            model inputs.
        label_key: The key that corresponds to the label values in the
            batch `dict`. This is typically `'labels'`.
        metric_type: Metric type to calculate, e.g. `'accuracy'`.
    
    Returns:
        The validation loss and accuracy
    """
    model.eval()
    val_loss = 0.
    for val_step, val_batch in enumerate(data_loader, start=1):
        loss = _get_loss_log_accuracy(
            model, metric, device, batch=val_batch, label_key=label_key)
        val_loss += loss
    val_loss /= val_step
    # Only now do we compute the metric score.
    # `metric.compute()` returns a `dict`, but we need only the score itself
    val_acc = metric.compute()[metric_type]
    return val_loss.item(), val_acc


In [None]:
def fine_tune_for_classification(
        model: BertForSequenceClassification,
        train_data_loader: DataLoader,
        val_data_loader: DataLoader,
        num_epochs: int,
        learning_rate: float = 1e-6,
        weight_decay: float = 1e-6,
        scheduler_type: Union[str, SchedulerType] = "linear",
        num_warmup_steps: int = 200,
        logging_freq: int = 100,
        metric_type: str = "accuracy"
) -> BertForSequenceClassification:
   """Fine-tune a model

    Args:
        model: A BERT model
        train_data_loader: A training dataset wrapped by a `DataLoader`.
            Batches will be expected to be `dict` instances that contain the 
            model inputs.
        val_data_loader: A validation dataset wrapped by a `DataLoader`.
            Batches will be expected to be `dict` instances that contain the 
            model inputs.
        num_epochs: Number of training epochs. Defaults to `2`.
        learning_rate: Learning rate argument passed to an `AdamW` optimizer.
            Defaults to `1e-6`.
        weight_decay: Weight decay argument passed to an `AdamW` optimizer.
            Defaults to `1e-6`.
        scheduler_type: `name` parameter of the `transformers.get_scheduler`
            function. Defaults to `'linear'`.
        num_warmup_steps: Number of learning rate warmup steps.
        logging_freq: How often to log expressed in term of training steps.
            Counting starts over after each epoch end.
        metric_type: Metric type to calculate, Defaults to `'accuracy'`.
    
    Returns:
        The fine-tuned model
    """
    num_training_steps = num_epochs * len(train_data_loader)
    if num_training_steps <= num_warmup_steps:
        raise ValueError(f"The number of training steps ({num_training_steps}) "
                         "should be larger than than the number of "
                         f"warmup steps ({num_warmup_steps}).")
    device = torch.device("cuda:0") if torch.cuda.is_available() \
        else torch.device("cpu")
    model.to(device).train()
    optimizer = torch.optim.AdamW(params=model.parameters(), lr=learning_rate,
                                  weight_decay=weight_decay)
    lr_scheduler = get_scheduler(
        name=scheduler_type,
        optimizer=optimizer,
        num_warmup_steps=num_warmup_steps,
        num_training_steps=num_training_steps
    )
    train_metric = load_metric(metric_type)
    val_metric = load_metric(metric_type)
    # `label_key` is the key in the batch dictionaries
    # whose values are the labels
    label_key = "labels"

    for epoch in range(1, num_epochs + 1):
        train_loss = 0.
        print(f"Epoch {epoch} started...")
        loss_step_tracker = 0
        for train_step, train_batch in enumerate(tqdm(train_data_loader),
                                                start=1):
            loss_step_tracker += 1
            loss = _get_loss_log_accuracy(
                model, train_metric, device,
                batch=train_batch, label_key=label_key)
            train_loss += loss
            loss.backward()
            optimizer.step()
            lr_scheduler.step()
            optimizer.zero_grad()
            if train_step % logging_freq == 0:
                # Evaluate the model on the validation data.
                # Use the `do_evaluation function` that we already
                # implemented.
                val_loss, val_acc = None
                train_acc = train_metric.compute()[metric_type]
                log_results(
                    epoch=epoch,
                    step=train_step,
                    train_loss=train_loss/loss_step_tracker,
                    train_acc=train_acc,
                    val_loss=val_loss,
                    val_acc=val_acc
                )
                train_loss = 0.
                loss_step_tracker = 0
                model.train()

    # Get the final logs after the training was completed
    if train_step % logging_freq != 0:
        # Evaluate the final model on the validation data.
        # Use the `do_evaluation function` that we already
        # implemented.
        val_loss, val_acc = None
        train_acc = train_metric.compute()[metric_type]
        log_results(
            epoch=epoch,
            step=train_step,
            train_loss=train_loss/loss_step_tracker,
            train_acc=train_acc,
            val_loss=val_loss,
            val_acc=val_acc
        )
    return model.eval()

In [None]:
# Load the pre-trained model. The classifier head weights will be
# initialized randomly.
# Use a method of `BertForSequenceClassification` to load the model
# and set the number of classes to 2.
# Feel free to refer to
# https://huggingface.co/docs/transformers/main/en/main_classes/model
hu_model = None

# Fine-tune the model.
# If everything is all right, both the training and the
# validation accuracy should be larger than 80% by the
# end of the training.
hu_model = fine_tune_for_classification(
    model=hu_model,
    train_data_loader=train_data_loader,
    val_data_loader=val_data_loader,
    num_epochs=NUM_EPOCHS
)

Some weights of the model checkpoint at SZTAKI-HLT/hubert-base-cc were not used when initializing BertForSequenceClassification: ['cls.predictions.transform.dense.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.dense.weight', 'cls.predictions.decoder.bias', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not 

Epoch 1 started...


 11%|█         | 100/910 [00:21<21:40,  1.61s/it]


Training loss at step 100, epoch 1: 0.7081525921821594
Training accuracy at step 100, epoch 1: 0.5025
Validation loss at step 100, epoch 1: 0.5923489928245544
Validation accuracy at step 100, epoch 1: 0.7494505494505495


 22%|██▏       | 200/910 [00:42<19:06,  1.61s/it]


Training loss at step 200, epoch 1: 0.55295729637146
Training accuracy at step 200, epoch 1: 0.76
Validation loss at step 200, epoch 1: 0.5279071927070618
Validation accuracy at step 200, epoch 1: 0.7835164835164835


 33%|███▎      | 300/910 [01:04<16:25,  1.62s/it]


Training loss at step 300, epoch 1: 0.4607868194580078
Training accuracy at step 300, epoch 1: 0.8225
Validation loss at step 300, epoch 1: 0.5373277068138123
Validation accuracy at step 300, epoch 1: 0.7846153846153846


 44%|████▍     | 400/910 [01:26<14:28,  1.70s/it]


Training loss at step 400, epoch 1: 0.5160442590713501
Training accuracy at step 400, epoch 1: 0.79
Validation loss at step 400, epoch 1: 0.514610230922699
Validation accuracy at step 400, epoch 1: 0.7846153846153846


 55%|█████▍    | 500/910 [01:48<11:16,  1.65s/it]


Training loss at step 500, epoch 1: 0.4829214811325073
Training accuracy at step 500, epoch 1: 0.8025
Validation loss at step 500, epoch 1: 0.5097160339355469
Validation accuracy at step 500, epoch 1: 0.7846153846153846


 66%|██████▌   | 600/910 [02:09<08:26,  1.63s/it]


Training loss at step 600, epoch 1: 0.49297264218330383
Training accuracy at step 600, epoch 1: 0.795
Validation loss at step 600, epoch 1: 0.5044310688972473
Validation accuracy at step 600, epoch 1: 0.7846153846153846


 77%|███████▋  | 700/910 [02:31<05:45,  1.65s/it]


Training loss at step 700, epoch 1: 0.5081064701080322
Training accuracy at step 700, epoch 1: 0.77375
Validation loss at step 700, epoch 1: 0.48640176653862
Validation accuracy at step 700, epoch 1: 0.7846153846153846


 88%|████████▊ | 800/910 [02:53<02:58,  1.62s/it]


Training loss at step 800, epoch 1: 0.4761144518852234
Training accuracy at step 800, epoch 1: 0.78625
Validation loss at step 800, epoch 1: 0.4794652760028839
Validation accuracy at step 800, epoch 1: 0.7835164835164835


 99%|█████████▉| 900/910 [03:14<00:16,  1.64s/it]


Training loss at step 900, epoch 1: 0.4901476502418518
Training accuracy at step 900, epoch 1: 0.76875
Validation loss at step 900, epoch 1: 0.4589300751686096
Validation accuracy at step 900, epoch 1: 0.7879120879120879


100%|██████████| 910/910 [03:17<00:00,  4.62it/s]


Epoch 2 started...


 11%|█         | 100/910 [00:20<22:01,  1.63s/it]


Training loss at step 100, epoch 2: 0.4888635277748108
Training accuracy at step 100, epoch 2: 0.7643020594965675
Validation loss at step 100, epoch 2: 0.44222909212112427
Validation accuracy at step 100, epoch 2: 0.8


 22%|██▏       | 200/910 [00:41<19:16,  1.63s/it]


Training loss at step 200, epoch 2: 0.41819778084754944
Training accuracy at step 200, epoch 2: 0.80125
Validation loss at step 200, epoch 2: 0.4382742643356323
Validation accuracy at step 200, epoch 2: 0.8087912087912088


 33%|███▎      | 300/910 [01:03<16:34,  1.63s/it]


Training loss at step 300, epoch 2: 0.3578413426876068
Training accuracy at step 300, epoch 2: 0.85375
Validation loss at step 300, epoch 2: 0.4650581181049347
Validation accuracy at step 300, epoch 2: 0.8065934065934066


 44%|████▍     | 400/910 [01:25<13:51,  1.63s/it]


Training loss at step 400, epoch 2: 0.3928499221801758
Training accuracy at step 400, epoch 2: 0.82375
Validation loss at step 400, epoch 2: 0.4178796708583832
Validation accuracy at step 400, epoch 2: 0.8186813186813187


 55%|█████▍    | 500/910 [01:47<11:12,  1.64s/it]


Training loss at step 500, epoch 2: 0.38756904006004333
Training accuracy at step 500, epoch 2: 0.83875
Validation loss at step 500, epoch 2: 0.41859903931617737
Validation accuracy at step 500, epoch 2: 0.8208791208791208


 66%|██████▌   | 600/910 [02:08<08:25,  1.63s/it]


Training loss at step 600, epoch 2: 0.3797091841697693
Training accuracy at step 600, epoch 2: 0.83875
Validation loss at step 600, epoch 2: 0.40757429599761963
Validation accuracy at step 600, epoch 2: 0.8296703296703297


 77%|███████▋  | 700/910 [02:30<05:44,  1.64s/it]


Training loss at step 700, epoch 2: 0.3966675400733948
Training accuracy at step 700, epoch 2: 0.84375
Validation loss at step 700, epoch 2: 0.39527544379234314
Validation accuracy at step 700, epoch 2: 0.8351648351648352


 88%|████████▊ | 800/910 [02:52<03:00,  1.64s/it]


Training loss at step 800, epoch 2: 0.36744433641433716
Training accuracy at step 800, epoch 2: 0.83875
Validation loss at step 800, epoch 2: 0.39689067006111145
Validation accuracy at step 800, epoch 2: 0.8351648351648352


 99%|█████████▉| 900/910 [03:13<00:16,  1.65s/it]


Training loss at step 900, epoch 2: 0.38668906688690186
Training accuracy at step 900, epoch 2: 0.84125
Validation loss at step 900, epoch 2: 0.3902340829372406
Validation accuracy at step 900, epoch 2: 0.8395604395604396


100%|██████████| 910/910 [03:16<00:00,  4.64it/s]



Training loss at step 910, epoch 2: 0.005272111389786005
Training accuracy at step 910, epoch 2: 0.0008910008910008911
Validation loss at step 910, epoch 2: 0.39026015996932983
Validation accuracy at step 910, epoch 2: 0.8395604395604396


In [None]:
# Let us see the model in action!

hu_model.to("cpu")
my_input = "Ma jó napom van!"
with torch.inference_mode():
    tokenized_input = tokenizer(my_input, return_token_type_ids=False,
                                return_tensors="pt")
    prediction = hu_model(**tokenized_input, return_dict=False)[0]
    prediction = torch.argmax(torch.squeeze(prediction))

if prediction == 1:
    print(f"'{my_input}': helyes mondat! :)")
else:
    print(f"'{my_input}': helytelen mondat! :(")

tensor(1)
