*Copyright (c) Microsoft Corporation. All rights reserved.*

*Licensed under the MIT License.*

# Text Classification of MultiNLI Sentences using different Transformer models

In [None]:
import sys
import os
import json
import pandas as pd
import numpy as np
import scrapbook as sb
from sklearn.metrics import classification_report, accuracy_score
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
import torch
import torch.nn as nn
from tqdm import tqdm
from utils_nlp.dataset.multinli import load_pandas_df
from utils_nlp.models.transformers.sequence_classification import (
    SequenceClassifier,
    Processor,
)
from utils_nlp.common.timer import Timer

## Introduction
In this notebook, we fine-tune and evaluate a number of pretrained models on a subset of the [MultiNLI](https://www.nyu.edu/projects/bowman/multinli/) dataset.

We use a [sequence classifier](../../utils_nlp/models/transformers/sequence_classification.py) that wraps [Hugging Face's PyTorch implementation](https://github.com/huggingface/transformers) of different transformers, like [BERT](https://github.com/google-research/bert), [XLNet](https://github.com/zihangdai/xlnet), and [RoBERTa](https://github.com/pytorch/fairseq).

In [32]:
# notebook parameters
DATA_FOLDER = "./temp"
CACHE_DIR = "./temp"
DEVICE = "cuda"
NUM_EPOCHS = 1
BATCH_SIZE = 16
NUM_GPUS = 2
MAX_LEN = 150
TRAIN_DATA_FRACTION = 0.15
TEST_DATA_FRACTION = 0.15
TRAIN_SIZE = 0.75
LABEL_COL = "genre"
TEXT_COL = "sentence1"

## Read Dataset
We start by loading a subset of the data. The following function also downloads and extracts the files, if they don't exist in the data folder.

The MultiNLI dataset is mainly used for natural language inference (NLI) tasks, where the inputs are sentence pairs and the labels are entailment indicators. The sentence pairs are also classified into *genres* that allow for more coverage and better evaluation of NLI models.

For our classification task, we use the first sentence only as the text input, and the corresponding genre as the label. We select the examples corresponding to one of the entailment labels (*neutral* in this case) to avoid duplicate rows, as the sentences are not unique, whereas the sentence pairs are.

In [4]:
df = load_pandas_df(DATA_FOLDER, "train")
df = df[df["gold_label"]=="neutral"]  # get unique sentences

In [5]:
df[[LABEL_COL, TEXT_COL]].head()

Unnamed: 0,genre,sentence1
0,government,Conceptually cream skimming has two basic dime...
4,telephone,yeah i tell you what though if you go price so...
6,travel,But a few Christian mosaics survive above the ...
12,slate,It's not that the questions they asked weren't...
13,travel,"Thebes held onto power until the 12th Dynasty,..."


We split the data for training and testing, sample a fraction for faster execution, and encode the class labels:

In [6]:
# split
df_train, df_test = train_test_split(df, train_size = TRAIN_SIZE, random_state=0)



In [7]:
# sample
df_train = df_train.sample(frac=TRAIN_DATA_FRACTION).reset_index(drop=True)
df_test = df_test.sample(frac=TEST_DATA_FRACTION).reset_index(drop=True)

The examples in the dataset are grouped into 5 genres:

In [8]:
df_train[LABEL_COL].value_counts()

telephone     3146
fiction       2960
slate         2901
government    2893
travel        2826
Name: genre, dtype: int64

In [9]:
# encode labels
label_encoder = LabelEncoder()
labels_train = label_encoder.fit_transform(df_train[LABEL_COL])
labels_test = label_encoder.transform(df_test[LABEL_COL])

num_labels = len(np.unique(labels_train))

In [10]:
print("Number of unique labels: {}".format(num_labels))
print("Number of training examples: {}".format(df_train.shape[0]))
print("Number of testing examples: {}".format(df_test.shape[0]))

Number of unique labels: 5
Number of training examples: 14726
Number of testing examples: 4909


## Select Pretrained Models

Several pretrained models have been made available by [Hugging Face](https://github.com/huggingface/transformers). For text classification, the following pretrained models are supported.

In [11]:
pd.DataFrame({"model_name": SequenceClassifier.list_supported_models()})

Unnamed: 0,model_name
0,bert-base-uncased
1,bert-large-uncased
2,bert-base-cased
3,bert-large-cased
4,bert-base-multilingual-uncased
5,bert-base-multilingual-cased
6,bert-base-chinese
7,bert-base-german-cased
8,bert-large-uncased-whole-word-masking
9,bert-large-cased-whole-word-masking


## Fine-tune

Our wrappers make it easy to fine-tune different models in a unified way, hiding the preprocessing details that are needed before training. In this example, we're going to select the following models and use the same piece of code to fine-tune them on our genre classification task. Note that some models were pretrained on multilingual datasets and can be used with non-English datasets.

In [28]:
model_names = ["distilbert-base-uncased", "roberta-base", "xlnet-base-cased"]

For each pretrained model, we preprocess the data, fine-tune the classifier, score the test set, and store the evaluation results.

In [33]:
results = {}

for model_name in tqdm(model_names):
    
    # preprocess
    processor = Processor(model_name=model_name, cache_dir=CACHE_DIR)
    ds_train = processor.preprocess(
        df_train[TEXT_COL], labels_train, max_len=MAX_LEN
    )
    ds_test = processor.preprocess(df_test[TEXT_COL], None, max_len=MAX_LEN)

    # fine-tune
    classifier = SequenceClassifier(
        model_name=model_name, num_labels=num_labels, cache_dir=CACHE_DIR
    )
    with Timer() as t:
        classifier.fit(
            ds_train,
            device=DEVICE,
            num_epochs=NUM_EPOCHS,
            batch_size=BATCH_SIZE,
            num_gpus=NUM_GPUS,
            verbose=False,
        )
    train_time = t.interval / 3600

    # predict
    preds = classifier.predict(
        ds_test, device="cuda", batch_size=BATCH_SIZE, num_gpus=NUM_GPUS
    )

    # eval
    accuracy = accuracy_score(labels_test, preds)
    class_report = classification_report(
        labels_test, preds, target_names=label_encoder.classes_, output_dict=True
    )

    # save results
    results[model_name] = {
        "accuracy": accuracy,
        "f1-score": class_report["macro avg"]["f1-score"],
        "time(hrs)": train_time,
    }


  0%|          | 0/3 [00:00<?, ?it/s][AI1002 17:19:07.367456 140305852307264 tokenization_utils.py:373] loading file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-vocab.txt from cache at ./temp/26bc1ad6c0ac742e9b52263248f6d0f00068293b33709fae12320c0e35ccfbbb.542ce4285a40d23a559526243235df47c5f75c197f04f37d1a0c124c32c9a084
I1002 17:19:16.283660 140305852307264 configuration_utils.py:151] loading configuration file https://s3.amazonaws.com/models.huggingface.co/bert/distilbert-base-uncased-config.json from cache at ./temp/a41e817d5c0743e29e86ff85edc8c257e61bc8d88e4271bb1b243b6e7614c633.1ccd1a11c9ff276830e114ea477ea2407100f4a3be7bdc45d37be9e37fa71c7e
I1002 17:19:16.285341 140305852307264 configuration_utils.py:168] Model config {
  "activation": "gelu",
  "attention_dropout": 0.1,
  "dim": 768,
  "dropout": 0.1,
  "finetuning_task": null,
  "hidden_dim": 3072,
  "initializer_range": 0.02,
  "max_position_embeddings": 512,
  "n_heads": 12,
  "n_layers": 6,
  "num_

Loss:1.613472


                                     
  0%|          | 0/3 [01:14<?, ?it/s]
  0%|          | 0/3 [00:28<?, ?it/s][A

Loss:1.296651


                                     
  0%|          | 0/3 [01:18<?, ?it/s]
  0%|          | 0/3 [00:33<?, ?it/s][A

Loss:0.851296


                                     
  0%|          | 0/3 [01:23<?, ?it/s]
  0%|          | 0/3 [00:37<?, ?it/s][A

Loss:0.720305


                                     
  0%|          | 0/3 [01:28<?, ?it/s]
  0%|          | 0/3 [00:42<?, ?it/s][A

Loss:0.880179


                                     
  0%|          | 0/3 [01:33<?, ?it/s]
  0%|          | 0/3 [00:47<?, ?it/s][A

Loss:0.748330


                                     
  0%|          | 0/3 [01:37<?, ?it/s]
  0%|          | 0/3 [00:51<?, ?it/s][A

Loss:0.402193


                                     
  0%|          | 0/3 [01:42<?, ?it/s]
  0%|          | 0/3 [00:56<?, ?it/s][A

Loss:0.389777


                                     
  0%|          | 0/3 [01:47<?, ?it/s]
  0%|          | 0/3 [01:01<?, ?it/s][A

Loss:0.482890


                                     
  0%|          | 0/3 [01:52<?, ?it/s]
  0%|          | 0/3 [01:06<?, ?it/s][A

Loss:0.904992


                                     
  0%|          | 0/3 [01:56<?, ?it/s]
  0%|          | 0/3 [01:11<?, ?it/s][A

Loss:0.446059


                                     
  0%|          | 0/3 [02:01<?, ?it/s]
  0%|          | 0/3 [01:15<?, ?it/s][A

Loss:0.387229


                                     
  0%|          | 0/3 [02:06<?, ?it/s]
  0%|          | 0/3 [01:20<?, ?it/s][A

Loss:0.532320


                                     
  0%|          | 0/3 [02:11<?, ?it/s]
  0%|          | 0/3 [01:25<?, ?it/s][A

Loss:0.083408


                                     
  0%|          | 0/3 [02:15<?, ?it/s]
  0%|          | 0/3 [01:29<?, ?it/s][A

Loss:0.549527


                                     
  0%|          | 0/3 [02:20<?, ?it/s]
  0%|          | 0/3 [01:34<?, ?it/s][A

Loss:0.460988


                                     
  0%|          | 0/3 [02:25<?, ?it/s]
  0%|          | 0/3 [01:39<?, ?it/s][A

Loss:0.446642


                                     
  0%|          | 0/3 [02:30<?, ?it/s]
  0%|          | 0/3 [01:44<?, ?it/s][A

Loss:0.402221


                                     
  0%|          | 0/3 [02:34<?, ?it/s]
  0%|          | 0/3 [01:49<?, ?it/s][A

Loss:0.483969


                                     
  0%|          | 0/3 [02:39<?, ?it/s]
  0%|          | 0/3 [01:53<?, ?it/s][A

Loss:0.156701


                                     
  0%|          | 0/3 [02:44<?, ?it/s]
  0%|          | 0/3 [01:58<?, ?it/s][A

Loss:0.200881


                                     
  0%|          | 0/3 [02:49<?, ?it/s]
  0%|          | 0/3 [02:03<?, ?it/s][A

Loss:0.451065


                                     
  0%|          | 0/3 [02:53<?, ?it/s]
  0%|          | 0/3 [02:08<?, ?it/s][A

Loss:0.536547


                                     
  0%|          | 0/3 [02:58<?, ?it/s]
  0%|          | 0/3 [02:12<?, ?it/s][A

Loss:0.345483


                                     
  0%|          | 0/3 [03:03<?, ?it/s]
  0%|          | 0/3 [02:17<?, ?it/s][A

Loss:0.219984


                                     
  0%|          | 0/3 [03:07<?, ?it/s]
  0%|          | 0/3 [02:22<?, ?it/s][A

Loss:0.216656


                                     
  0%|          | 0/3 [03:12<?, ?it/s]
  0%|          | 0/3 [02:27<?, ?it/s][A

Loss:0.457885


                                     
  0%|          | 0/3 [03:17<?, ?it/s]
  0%|          | 0/3 [02:31<?, ?it/s][A

Loss:0.232123


                                     
  0%|          | 0/3 [03:22<?, ?it/s]
  0%|          | 0/3 [02:36<?, ?it/s][A

Loss:0.282065


                                     
  0%|          | 0/3 [03:27<?, ?it/s]
  0%|          | 0/3 [02:41<?, ?it/s][A

Loss:0.602874


                                     
  0%|          | 0/3 [03:31<?, ?it/s]
  0%|          | 0/3 [02:46<?, ?it/s][A

Loss:0.287213


                                     
  0%|          | 0/3 [03:36<?, ?it/s]
  0%|          | 0/3 [02:50<?, ?it/s][A

Loss:0.324319


                                     
  0%|          | 0/3 [03:41<?, ?it/s]
  0%|          | 0/3 [02:55<?, ?it/s][A

Loss:0.110210


                                     
  0%|          | 0/3 [03:45<?, ?it/s]
  0%|          | 0/3 [03:00<?, ?it/s][A

Loss:0.311971


                                     
  0%|          | 0/3 [03:50<?, ?it/s]
  0%|          | 0/3 [03:04<?, ?it/s][A

Loss:0.109896


                                     
  0%|          | 0/3 [03:55<?, ?it/s]
  0%|          | 0/3 [03:09<?, ?it/s][A

Loss:0.062447


                                     
  0%|          | 0/3 [04:00<?, ?it/s]
  0%|          | 0/3 [03:14<?, ?it/s][A

Loss:0.260447


                                     
  0%|          | 0/3 [04:04<?, ?it/s]
  0%|          | 0/3 [03:19<?, ?it/s][A

Loss:0.404324


                                     
  0%|          | 0/3 [04:09<?, ?it/s]
  0%|          | 0/3 [03:23<?, ?it/s][A

Loss:0.487277


                                     
  0%|          | 0/3 [04:14<?, ?it/s]
  0%|          | 0/3 [03:28<?, ?it/s][A

Loss:0.618984


                                     
  0%|          | 0/3 [04:19<?, ?it/s]
  0%|          | 0/3 [03:33<?, ?it/s][A

Loss:0.208204


                                     
  0%|          | 0/3 [04:23<?, ?it/s]
  0%|          | 0/3 [03:38<?, ?it/s][A

Loss:0.383544


                                     
  0%|          | 0/3 [04:28<?, ?it/s]
  0%|          | 0/3 [03:42<?, ?it/s][A

Loss:0.490614


                                     
  0%|          | 0/3 [04:32<?, ?it/s]
  0%|          | 0/3 [03:46<?, ?it/s][A

Loss:0.175832


                                     
  0%|          | 0/3 [04:36<?, ?it/s]
  0%|          | 0/3 [03:50<?, ?it/s][A

Loss:0.306752


                                     
  0%|          | 0/3 [04:40<?, ?it/s]
  0%|          | 0/3 [03:54<?, ?it/s][A

Loss:0.318104


                                     
  0%|          | 0/3 [04:44<?, ?it/s]
  0%|          | 0/3 [03:58<?, ?it/s][A

Loss:0.534412


                                     
  0%|          | 0/3 [04:47<?, ?it/s]
  0%|          | 0/3 [04:02<?, ?it/s][A

Loss:0.203843


                                     
  0%|          | 0/3 [04:51<?, ?it/s]
  0%|          | 0/3 [04:06<?, ?it/s][A

Loss:0.061628


                                     
  0%|          | 0/3 [04:55<?, ?it/s]
  0%|          | 0/3 [04:09<?, ?it/s][A

Loss:0.350042


                                     
  0%|          | 0/3 [04:59<?, ?it/s]
  0%|          | 0/3 [04:13<?, ?it/s][A

Loss:0.429678


                                     
  0%|          | 0/3 [05:03<?, ?it/s]
  0%|          | 0/3 [04:17<?, ?it/s][A

Loss:0.124946


                                     
  0%|          | 0/3 [05:07<?, ?it/s]
  0%|          | 0/3 [04:21<?, ?it/s][A

Loss:0.420080


                                     
  0%|          | 0/3 [05:10<?, ?it/s]
  0%|          | 0/3 [04:25<?, ?it/s][A

Loss:0.088511


                                     
  0%|          | 0/3 [05:14<?, ?it/s]
  0%|          | 0/3 [04:28<?, ?it/s][A

Loss:0.703935


                                     
  0%|          | 0/3 [05:18<?, ?it/s]
  0%|          | 0/3 [04:32<?, ?it/s][A

Loss:0.348099


                                     
  0%|          | 0/3 [05:22<?, ?it/s]
  0%|          | 0/3 [04:36<?, ?it/s][A

Loss:0.640956


                                     
  0%|          | 0/3 [05:26<?, ?it/s]
  0%|          | 0/3 [04:40<?, ?it/s][A

Loss:0.437338


                                     
  0%|          | 0/3 [05:29<?, ?it/s]
  0%|          | 0/3 [04:44<?, ?it/s][A

Loss:0.216858


                                     
  0%|          | 0/3 [05:33<?, ?it/s]
  0%|          | 0/3 [04:47<?, ?it/s][A

Loss:0.246830


                                     
  0%|          | 0/3 [05:37<?, ?it/s]
  0%|          | 0/3 [04:51<?, ?it/s][A

Loss:0.060938


                                     
  0%|          | 0/3 [05:41<?, ?it/s]
  0%|          | 0/3 [04:55<?, ?it/s][A

Loss:0.276095


                                     
  0%|          | 0/3 [05:45<?, ?it/s]
  0%|          | 0/3 [04:59<?, ?it/s][A

Loss:0.278681


                                     
  0%|          | 0/3 [05:48<?, ?it/s]
  0%|          | 0/3 [05:03<?, ?it/s][A

Loss:0.311547


                                     
  0%|          | 0/3 [05:52<?, ?it/s]
  0%|          | 0/3 [05:07<?, ?it/s][A

Loss:0.332097


                                     
  0%|          | 0/3 [05:56<?, ?it/s]
  0%|          | 0/3 [05:10<?, ?it/s][A

Loss:0.397385


                                     
  0%|          | 0/3 [06:00<?, ?it/s]
  0%|          | 0/3 [05:14<?, ?it/s][A

Loss:0.307721


                                     
  0%|          | 0/3 [06:04<?, ?it/s]
  0%|          | 0/3 [05:18<?, ?it/s][A

Loss:0.329896


                                     
  0%|          | 0/3 [06:08<?, ?it/s]
  0%|          | 0/3 [05:22<?, ?it/s][A

Loss:0.607863


                                     
  0%|          | 0/3 [06:11<?, ?it/s]
  0%|          | 0/3 [05:26<?, ?it/s][A

Loss:0.746738


                                     
  0%|          | 0/3 [06:15<?, ?it/s]
  0%|          | 0/3 [05:29<?, ?it/s][A

Loss:0.063720


                                     
  0%|          | 0/3 [06:19<?, ?it/s]
  0%|          | 0/3 [05:33<?, ?it/s][A

Loss:0.252187


                                     
  0%|          | 0/3 [06:23<?, ?it/s]
  0%|          | 0/3 [05:37<?, ?it/s][A

Loss:0.297432


                                     
  0%|          | 0/3 [06:27<?, ?it/s]
  0%|          | 0/3 [05:41<?, ?it/s][A

Loss:0.331586


                                     
  0%|          | 0/3 [06:30<?, ?it/s]
  0%|          | 0/3 [05:45<?, ?it/s][A

Loss:0.202993


                                     
  0%|          | 0/3 [06:34<?, ?it/s]
  0%|          | 0/3 [05:48<?, ?it/s][A

Loss:0.500522


                                     
  0%|          | 0/3 [06:38<?, ?it/s]
  0%|          | 0/3 [05:52<?, ?it/s][A

Loss:0.276734


                                     
  0%|          | 0/3 [06:42<?, ?it/s]
  0%|          | 0/3 [05:56<?, ?it/s][A

Loss:0.314735


                                     
  0%|          | 0/3 [06:46<?, ?it/s]
  0%|          | 0/3 [06:00<?, ?it/s][A

Loss:0.197529


                                     
  0%|          | 0/3 [06:49<?, ?it/s]
  0%|          | 0/3 [06:04<?, ?it/s][A

Loss:0.347036


                                     
  0%|          | 0/3 [06:53<?, ?it/s]
  0%|          | 0/3 [06:07<?, ?it/s][A

Loss:0.189967


                                     
  0%|          | 0/3 [06:57<?, ?it/s]
  0%|          | 0/3 [06:11<?, ?it/s][A

Loss:0.419700


                                     
  0%|          | 0/3 [07:01<?, ?it/s]
  0%|          | 0/3 [06:15<?, ?it/s][A

Loss:0.278979


                                     
  0%|          | 0/3 [07:05<?, ?it/s]
  0%|          | 0/3 [06:19<?, ?it/s][A

Loss:0.278433


                                     
  0%|          | 0/3 [07:08<?, ?it/s]
  0%|          | 0/3 [06:23<?, ?it/s][A

Loss:0.190359


                                     
  0%|          | 0/3 [07:12<?, ?it/s]
  0%|          | 0/3 [06:26<?, ?it/s][A

Loss:0.598480


                                     
  0%|          | 0/3 [07:16<?, ?it/s]
  0%|          | 0/3 [06:30<?, ?it/s][A

Loss:0.140968


                                     
  0%|          | 0/3 [07:20<?, ?it/s]
  0%|          | 0/3 [06:34<?, ?it/s][A

Loss:0.255686


                                     
  0%|          | 0/3 [07:23<?, ?it/s]
  0%|          | 0/3 [06:38<?, ?it/s][A

Loss:0.373986


                                     
  0%|          | 0/3 [07:27<?, ?it/s]
  0%|          | 0/3 [06:41<?, ?it/s][A

Loss:0.196667


                                     
  0%|          | 0/3 [07:31<?, ?it/s]
  0%|          | 0/3 [06:45<?, ?it/s][A

Loss:0.074657


                                     
  0%|          | 0/3 [07:35<?, ?it/s]
  0%|          | 0/3 [06:49<?, ?it/s][A

Loss:0.316425


                                     
  0%|          | 0/3 [07:39<?, ?it/s]
  0%|          | 0/3 [06:53<?, ?it/s][A

Loss:1.059075




Evaluating:   0%|          | 0/154 [00:00<?, ?it/s][A[A

Evaluating:   1%|          | 1/154 [00:00<00:42,  3.56it/s][A[A

Evaluating:   1%|▏         | 2/154 [00:00<00:39,  3.90it/s][A[A

Evaluating:   2%|▏         | 3/154 [00:00<00:36,  4.15it/s][A[A

Evaluating:   3%|▎         | 4/154 [00:00<00:34,  4.35it/s][A[A

Evaluating:   3%|▎         | 5/154 [00:01<00:33,  4.49it/s][A[A

Evaluating:   4%|▍         | 6/154 [00:01<00:32,  4.60it/s][A[A

Evaluating:   5%|▍         | 7/154 [00:01<00:31,  4.67it/s][A[A

Evaluating:   5%|▌         | 8/154 [00:01<00:30,  4.74it/s][A[A

Evaluating:   6%|▌         | 9/154 [00:01<00:30,  4.78it/s][A[A

Evaluating:   6%|▋         | 10/154 [00:02<00:30,  4.79it/s][A[A

Evaluating:   7%|▋         | 11/154 [00:02<00:29,  4.82it/s][A[A

Evaluating:   8%|▊         | 12/154 [00:02<00:29,  4.84it/s][A[A

Evaluating:   8%|▊         | 13/154 [00:02<00:29,  4.86it/s][A[A

Evaluating:   9%|▉         | 14/154 [00:02<00:28,  4.86it/s][A

Evaluating:  78%|███████▊  | 120/154 [00:24<00:06,  4.86it/s][A[A

Evaluating:  79%|███████▊  | 121/154 [00:25<00:06,  4.84it/s][A[A

Evaluating:  79%|███████▉  | 122/154 [00:25<00:06,  4.83it/s][A[A

Evaluating:  80%|███████▉  | 123/154 [00:25<00:06,  4.83it/s][A[A

Evaluating:  81%|████████  | 124/154 [00:25<00:06,  4.84it/s][A[A

Evaluating:  81%|████████  | 125/154 [00:25<00:06,  4.82it/s][A[A

Evaluating:  82%|████████▏ | 126/154 [00:26<00:05,  4.81it/s][A[A

Evaluating:  82%|████████▏ | 127/154 [00:26<00:05,  4.81it/s][A[A

Evaluating:  83%|████████▎ | 128/154 [00:26<00:05,  4.82it/s][A[A

Evaluating:  84%|████████▍ | 129/154 [00:26<00:05,  4.84it/s][A[A

Evaluating:  84%|████████▍ | 130/154 [00:26<00:04,  4.83it/s][A[A

Evaluating:  85%|████████▌ | 131/154 [00:27<00:04,  4.84it/s][A[A

Evaluating:  86%|████████▌ | 132/154 [00:27<00:04,  4.84it/s][A[A

Evaluating:  86%|████████▋ | 133/154 [00:27<00:04,  4.84it/s][A[A

Evaluating:  87%|████████▋ | 134/1

Loss:1.559738


                                     
  0%|          | 0/3 [08:29<?, ?it/s]         
 33%|███▎      | 1/3 [07:43<14:50, 445.45s/it][A

Loss:1.471496


                                     
  0%|          | 0/3 [08:36<?, ?it/s]         
 33%|███▎      | 1/3 [07:50<14:50, 445.45s/it][A

Loss:1.184053


                                     
  0%|          | 0/3 [08:43<?, ?it/s]         
 33%|███▎      | 1/3 [07:58<14:50, 445.45s/it][A

Loss:1.148488


                                     
  0%|          | 0/3 [08:51<?, ?it/s]         
 33%|███▎      | 1/3 [08:05<14:50, 445.45s/it][A

Loss:0.857149


                                     
  0%|          | 0/3 [08:58<?, ?it/s]         
 33%|███▎      | 1/3 [08:12<14:50, 445.45s/it][A

Loss:0.837435


                                     
  0%|          | 0/3 [09:05<?, ?it/s]         
 33%|███▎      | 1/3 [08:19<14:50, 445.45s/it][A

Loss:0.819125


                                     
  0%|          | 0/3 [09:12<?, ?it/s]         
 33%|███▎      | 1/3 [08:27<14:50, 445.45s/it][A

Loss:0.380328


                                     
  0%|          | 0/3 [09:20<?, ?it/s]         
 33%|███▎      | 1/3 [08:34<14:50, 445.45s/it][A

Loss:0.433528


                                     
  0%|          | 0/3 [09:27<?, ?it/s]         
 33%|███▎      | 1/3 [08:41<14:50, 445.45s/it][A

Loss:0.638522


                                     
  0%|          | 0/3 [09:34<?, ?it/s]         
 33%|███▎      | 1/3 [08:49<14:50, 445.45s/it][A

Loss:0.553224


                                     
  0%|          | 0/3 [09:42<?, ?it/s]         
 33%|███▎      | 1/3 [08:56<14:50, 445.45s/it][A

Loss:0.496607


                                     
  0%|          | 0/3 [09:49<?, ?it/s]         
 33%|███▎      | 1/3 [09:03<14:50, 445.45s/it][A

Loss:0.599746


                                     
  0%|          | 0/3 [09:56<?, ?it/s]         
 33%|███▎      | 1/3 [09:10<14:50, 445.45s/it][A

Loss:1.237169


                                     
  0%|          | 0/3 [10:04<?, ?it/s]         
 33%|███▎      | 1/3 [09:18<14:50, 445.45s/it][A

Loss:0.637425


                                     
  0%|          | 0/3 [10:11<?, ?it/s]         
 33%|███▎      | 1/3 [09:25<14:50, 445.45s/it][A

Loss:0.194495


                                     
  0%|          | 0/3 [10:18<?, ?it/s]         
 33%|███▎      | 1/3 [09:32<14:50, 445.45s/it][A

Loss:0.805220


                                     
  0%|          | 0/3 [10:25<?, ?it/s]         
 33%|███▎      | 1/3 [09:40<14:50, 445.45s/it][A

Loss:0.253925


                                     
  0%|          | 0/3 [10:33<?, ?it/s]         
 33%|███▎      | 1/3 [09:47<14:50, 445.45s/it][A

Loss:0.689832


                                     
  0%|          | 0/3 [10:40<?, ?it/s]         
 33%|███▎      | 1/3 [09:54<14:50, 445.45s/it][A

Loss:0.641453


                                     
  0%|          | 0/3 [10:47<?, ?it/s]         
 33%|███▎      | 1/3 [10:02<14:50, 445.45s/it][A

Loss:0.147626


                                     
  0%|          | 0/3 [10:55<?, ?it/s]         
 33%|███▎      | 1/3 [10:09<14:50, 445.45s/it][A

Loss:0.484902


                                     
  0%|          | 0/3 [11:02<?, ?it/s]         
 33%|███▎      | 1/3 [10:16<14:50, 445.45s/it][A

Loss:0.276827


                                     
  0%|          | 0/3 [11:09<?, ?it/s]         
 33%|███▎      | 1/3 [10:24<14:50, 445.45s/it][A

Loss:0.534456


                                     
  0%|          | 0/3 [11:17<?, ?it/s]         
 33%|███▎      | 1/3 [10:31<14:50, 445.45s/it][A

Loss:0.223130


                                     
  0%|          | 0/3 [11:24<?, ?it/s]         
 33%|███▎      | 1/3 [10:38<14:50, 445.45s/it][A

Loss:0.450985


                                     
  0%|          | 0/3 [11:31<?, ?it/s]         
 33%|███▎      | 1/3 [10:45<14:50, 445.45s/it][A

Loss:0.533025


                                     
  0%|          | 0/3 [11:39<?, ?it/s]         
 33%|███▎      | 1/3 [10:53<14:50, 445.45s/it][A

Loss:0.724428


                                     
  0%|          | 0/3 [11:46<?, ?it/s]         
 33%|███▎      | 1/3 [11:00<14:50, 445.45s/it][A

Loss:0.477905


                                     
  0%|          | 0/3 [11:53<?, ?it/s]         
 33%|███▎      | 1/3 [11:07<14:50, 445.45s/it][A

Loss:0.363852


                                     
  0%|          | 0/3 [12:01<?, ?it/s]         
 33%|███▎      | 1/3 [11:15<14:50, 445.45s/it][A

Loss:1.228255


                                     
  0%|          | 0/3 [12:08<?, ?it/s]         
 33%|███▎      | 1/3 [11:22<14:50, 445.45s/it][A

Loss:0.269107


                                     
  0%|          | 0/3 [12:15<?, ?it/s]         
 33%|███▎      | 1/3 [11:30<14:50, 445.45s/it][A

Loss:0.163860


                                     
  0%|          | 0/3 [12:23<?, ?it/s]         
 33%|███▎      | 1/3 [11:37<14:50, 445.45s/it][A

Loss:0.105093


                                     
  0%|          | 0/3 [12:30<?, ?it/s]         
 33%|███▎      | 1/3 [11:44<14:50, 445.45s/it][A

Loss:0.288066


                                     
  0%|          | 0/3 [12:37<?, ?it/s]         
 33%|███▎      | 1/3 [11:52<14:50, 445.45s/it][A

Loss:0.157583


                                     
  0%|          | 0/3 [12:45<?, ?it/s]         
 33%|███▎      | 1/3 [11:59<14:50, 445.45s/it][A

Loss:0.537946


                                     
  0%|          | 0/3 [12:52<?, ?it/s]         
 33%|███▎      | 1/3 [12:06<14:50, 445.45s/it][A

Loss:0.677483


                                     
  0%|          | 0/3 [12:59<?, ?it/s]         
 33%|███▎      | 1/3 [12:14<14:50, 445.45s/it][A

Loss:1.022764


                                     
  0%|          | 0/3 [13:07<?, ?it/s]         
 33%|███▎      | 1/3 [12:21<14:50, 445.45s/it][A

Loss:0.311463


                                     
  0%|          | 0/3 [13:14<?, ?it/s]         
 33%|███▎      | 1/3 [12:28<14:50, 445.45s/it][A

Loss:0.695526


                                     
  0%|          | 0/3 [13:21<?, ?it/s]         
 33%|███▎      | 1/3 [12:36<14:50, 445.45s/it][A

Loss:0.289729


                                     
  0%|          | 0/3 [13:29<?, ?it/s]         
 33%|███▎      | 1/3 [12:43<14:50, 445.45s/it][A

Loss:0.240375


                                     
  0%|          | 0/3 [13:36<?, ?it/s]         
 33%|███▎      | 1/3 [12:50<14:50, 445.45s/it][A

Loss:0.375174


                                     
  0%|          | 0/3 [13:44<?, ?it/s]         
 33%|███▎      | 1/3 [12:58<14:50, 445.45s/it][A

Loss:0.186912


                                     
  0%|          | 0/3 [13:51<?, ?it/s]         
 33%|███▎      | 1/3 [13:05<14:50, 445.45s/it][A

Loss:0.096166


                                     
  0%|          | 0/3 [13:58<?, ?it/s]         
 33%|███▎      | 1/3 [13:12<14:50, 445.45s/it][A

Loss:0.535375


                                     
  0%|          | 0/3 [14:05<?, ?it/s]         
 33%|███▎      | 1/3 [13:20<14:50, 445.45s/it][A

Loss:0.372227


                                     
  0%|          | 0/3 [14:13<?, ?it/s]         
 33%|███▎      | 1/3 [13:27<14:50, 445.45s/it][A

Loss:0.930988


                                     
  0%|          | 0/3 [14:20<?, ?it/s]         
 33%|███▎      | 1/3 [13:34<14:50, 445.45s/it][A

Loss:0.142896


                                     
  0%|          | 0/3 [14:27<?, ?it/s]         
 33%|███▎      | 1/3 [13:42<14:50, 445.45s/it][A

Loss:0.651106


                                     
  0%|          | 0/3 [14:35<?, ?it/s]         
 33%|███▎      | 1/3 [13:49<14:50, 445.45s/it][A

Loss:0.048819


                                     
  0%|          | 0/3 [14:42<?, ?it/s]         
 33%|███▎      | 1/3 [13:56<14:50, 445.45s/it][A

Loss:0.576708


                                     
  0%|          | 0/3 [14:50<?, ?it/s]         
 33%|███▎      | 1/3 [14:04<14:50, 445.45s/it][A

Loss:0.265014


                                     
  0%|          | 0/3 [14:57<?, ?it/s]         
 33%|███▎      | 1/3 [14:11<14:50, 445.45s/it][A

Loss:0.765183


                                     
  0%|          | 0/3 [15:04<?, ?it/s]         
 33%|███▎      | 1/3 [14:19<14:50, 445.45s/it][A

Loss:0.264336


                                     
  0%|          | 0/3 [15:12<?, ?it/s]         
 33%|███▎      | 1/3 [14:26<14:50, 445.45s/it][A

Loss:0.360902


                                     
  0%|          | 0/3 [15:19<?, ?it/s]         
 33%|███▎      | 1/3 [14:34<14:50, 445.45s/it][A

Loss:0.332452


                                     
  0%|          | 0/3 [15:27<?, ?it/s]         
 33%|███▎      | 1/3 [14:41<14:50, 445.45s/it][A

Loss:0.356505


                                     
  0%|          | 0/3 [15:34<?, ?it/s]         
 33%|███▎      | 1/3 [14:48<14:50, 445.45s/it][A

Loss:0.498224


                                     
  0%|          | 0/3 [15:42<?, ?it/s]         
 33%|███▎      | 1/3 [14:56<14:50, 445.45s/it][A

Loss:0.320431


                                     
  0%|          | 0/3 [15:50<?, ?it/s]         
 33%|███▎      | 1/3 [15:04<14:50, 445.45s/it][A

Loss:0.077777


                                     
  0%|          | 0/3 [15:57<?, ?it/s]         
 33%|███▎      | 1/3 [15:11<14:50, 445.45s/it][A

Loss:0.281813


                                     
  0%|          | 0/3 [16:04<?, ?it/s]         
 33%|███▎      | 1/3 [15:19<14:50, 445.45s/it][A

Loss:0.240123


                                     
  0%|          | 0/3 [16:12<?, ?it/s]         
 33%|███▎      | 1/3 [15:26<14:50, 445.45s/it][A

Loss:0.078818


                                     
  0%|          | 0/3 [16:19<?, ?it/s]         
 33%|███▎      | 1/3 [15:34<14:50, 445.45s/it][A

Loss:0.386153


                                     
  0%|          | 0/3 [16:27<?, ?it/s]         
 33%|███▎      | 1/3 [15:41<14:50, 445.45s/it][A

Loss:0.044126


                                     
  0%|          | 0/3 [16:35<?, ?it/s]         
 33%|███▎      | 1/3 [15:49<14:50, 445.45s/it][A

Loss:0.070761


                                     
  0%|          | 0/3 [16:42<?, ?it/s]         
 33%|███▎      | 1/3 [15:56<14:50, 445.45s/it][A

Loss:0.445788


                                     
  0%|          | 0/3 [16:50<?, ?it/s]         
 33%|███▎      | 1/3 [16:04<14:50, 445.45s/it][A

Loss:0.318622


                                     
  0%|          | 0/3 [16:57<?, ?it/s]         
 33%|███▎      | 1/3 [16:12<14:50, 445.45s/it][A

Loss:0.582086


                                     
  0%|          | 0/3 [17:05<?, ?it/s]         
 33%|███▎      | 1/3 [16:19<14:50, 445.45s/it][A

Loss:0.263850


                                     
  0%|          | 0/3 [17:12<?, ?it/s]         
 33%|███▎      | 1/3 [16:27<14:50, 445.45s/it][A

Loss:0.122566


                                     
  0%|          | 0/3 [17:20<?, ?it/s]         
 33%|███▎      | 1/3 [16:34<14:50, 445.45s/it][A

Loss:0.594169


                                     
  0%|          | 0/3 [17:27<?, ?it/s]         
 33%|███▎      | 1/3 [16:41<14:50, 445.45s/it][A

Loss:0.586406


                                     
  0%|          | 0/3 [17:35<?, ?it/s]         
 33%|███▎      | 1/3 [16:49<14:50, 445.45s/it][A

Loss:0.270444


                                     
  0%|          | 0/3 [17:42<?, ?it/s]         
 33%|███▎      | 1/3 [16:57<14:50, 445.45s/it][A

Loss:0.403090


                                     
  0%|          | 0/3 [17:50<?, ?it/s]         
 33%|███▎      | 1/3 [17:04<14:50, 445.45s/it][A

Loss:0.080049


                                     
  0%|          | 0/3 [17:57<?, ?it/s]         
 33%|███▎      | 1/3 [17:12<14:50, 445.45s/it][A

Loss:0.108087


                                     
  0%|          | 0/3 [18:05<?, ?it/s]         
 33%|███▎      | 1/3 [17:19<14:50, 445.45s/it][A

Loss:0.437748


                                     
  0%|          | 0/3 [18:12<?, ?it/s]         
 33%|███▎      | 1/3 [17:27<14:50, 445.45s/it][A

Loss:0.036126


                                     
  0%|          | 0/3 [18:20<?, ?it/s]         
 33%|███▎      | 1/3 [17:34<14:50, 445.45s/it][A

Loss:0.066607


                                     
  0%|          | 0/3 [18:27<?, ?it/s]         
 33%|███▎      | 1/3 [17:41<14:50, 445.45s/it][A

Loss:0.705766


                                     
  0%|          | 0/3 [18:35<?, ?it/s]         
 33%|███▎      | 1/3 [17:49<14:50, 445.45s/it][A

Loss:0.025862


                                     
  0%|          | 0/3 [18:42<?, ?it/s]         
 33%|███▎      | 1/3 [17:57<14:50, 445.45s/it][A

Loss:0.535521


                                     
  0%|          | 0/3 [18:50<?, ?it/s]         
 33%|███▎      | 1/3 [18:04<14:50, 445.45s/it][A

Loss:0.281819


                                     
  0%|          | 0/3 [18:57<?, ?it/s]         
 33%|███▎      | 1/3 [18:12<14:50, 445.45s/it][A

Loss:0.186776


                                     
  0%|          | 0/3 [19:05<?, ?it/s]         
 33%|███▎      | 1/3 [18:19<14:50, 445.45s/it][A

Loss:0.256166


                                     
  0%|          | 0/3 [19:13<?, ?it/s]         
 33%|███▎      | 1/3 [18:27<14:50, 445.45s/it][A

Loss:0.078913


                                     
  0%|          | 0/3 [19:20<?, ?it/s]         
 33%|███▎      | 1/3 [18:34<14:50, 445.45s/it][A

Loss:0.206851


                                     
  0%|          | 0/3 [19:27<?, ?it/s]         
 33%|███▎      | 1/3 [18:42<14:50, 445.45s/it][A

Loss:0.165107


                                     
  0%|          | 0/3 [19:35<?, ?it/s]         
 33%|███▎      | 1/3 [18:50<14:50, 445.45s/it][A

Loss:0.362343


                                     
  0%|          | 0/3 [19:43<?, ?it/s]         
 33%|███▎      | 1/3 [18:57<14:50, 445.45s/it][A

Loss:0.030030




Evaluating:   0%|          | 0/154 [00:00<?, ?it/s][A[A

Evaluating:   1%|          | 1/154 [00:00<01:00,  2.51it/s][A[A

Evaluating:   1%|▏         | 2/154 [00:00<01:00,  2.49it/s][A[A

Evaluating:   2%|▏         | 3/154 [00:01<01:00,  2.49it/s][A[A

Evaluating:   3%|▎         | 4/154 [00:01<01:00,  2.48it/s][A[A

Evaluating:   3%|▎         | 5/154 [00:02<01:00,  2.48it/s][A[A

Evaluating:   4%|▍         | 6/154 [00:02<00:59,  2.48it/s][A[A

Evaluating:   5%|▍         | 7/154 [00:02<00:59,  2.48it/s][A[A

Evaluating:   5%|▌         | 8/154 [00:03<00:58,  2.48it/s][A[A

Evaluating:   6%|▌         | 9/154 [00:03<00:58,  2.49it/s][A[A

Evaluating:   6%|▋         | 10/154 [00:04<00:58,  2.48it/s][A[A

Evaluating:   7%|▋         | 11/154 [00:04<00:57,  2.48it/s][A[A

Evaluating:   8%|▊         | 12/154 [00:04<00:57,  2.48it/s][A[A

Evaluating:   8%|▊         | 13/154 [00:05<00:56,  2.48it/s][A[A

Evaluating:   9%|▉         | 14/154 [00:05<00:56,  2.48it/s][A

Evaluating:  78%|███████▊  | 120/154 [00:48<00:13,  2.48it/s][A[A

Evaluating:  79%|███████▊  | 121/154 [00:48<00:13,  2.47it/s][A[A

Evaluating:  79%|███████▉  | 122/154 [00:49<00:12,  2.47it/s][A[A

Evaluating:  80%|███████▉  | 123/154 [00:49<00:12,  2.47it/s][A[A

Evaluating:  81%|████████  | 124/154 [00:50<00:12,  2.47it/s][A[A

Evaluating:  81%|████████  | 125/154 [00:50<00:11,  2.48it/s][A[A

Evaluating:  82%|████████▏ | 126/154 [00:50<00:11,  2.48it/s][A[A

Evaluating:  82%|████████▏ | 127/154 [00:51<00:10,  2.48it/s][A[A

Evaluating:  83%|████████▎ | 128/154 [00:51<00:10,  2.48it/s][A[A

Evaluating:  84%|████████▍ | 129/154 [00:52<00:10,  2.48it/s][A[A

Evaluating:  84%|████████▍ | 130/154 [00:52<00:09,  2.47it/s][A[A

Evaluating:  85%|████████▌ | 131/154 [00:52<00:09,  2.47it/s][A[A

Evaluating:  86%|████████▌ | 132/154 [00:53<00:08,  2.47it/s][A[A

Evaluating:  86%|████████▋ | 133/154 [00:53<00:08,  2.47it/s][A[A

Evaluating:  87%|████████▋ | 134/1

 47%|████▋     | 221253632/467042463 [00:03<00:03, 67609840.27B/s][A[A

 49%|████▉     | 228016128/467042463 [00:03<00:03, 67575830.31B/s][A[A

 50%|█████     | 234821632/467042463 [00:03<00:03, 67718613.20B/s][A[A

 52%|█████▏    | 241704960/467042463 [00:03<00:03, 68048309.12B/s][A[A

 53%|█████▎    | 248511488/467042463 [00:03<00:03, 67755089.10B/s][A[A

 55%|█████▍    | 255288320/467042463 [00:03<00:03, 67445026.53B/s][A[A

 56%|█████▌    | 262109184/467042463 [00:03<00:03, 67670119.51B/s][A[A

 58%|█████▊    | 268877824/467042463 [00:04<00:03, 65621534.12B/s][A[A

 59%|█████▉    | 275616768/467042463 [00:04<00:02, 66141225.28B/s][A[A

 60%|██████    | 282419200/467042463 [00:04<00:02, 66694338.51B/s][A[A

 62%|██████▏   | 289139712/467042463 [00:04<00:02, 66846537.09B/s][A[A

 63%|██████▎   | 295904256/467042463 [00:04<00:02, 67083857.89B/s][A[A

 65%|██████▍   | 302617600/467042463 [00:04<00:02, 66051839.59B/s][A[A

 66%|██████▌   | 309370880/467042463 [

Loss:1.819770


                                     
  0%|          | 0/3 [21:13<?, ?it/s]         
 67%|██████▋   | 2/3 [20:27<08:58, 538.28s/it][A

Loss:1.304597


                                     
  0%|          | 0/3 [21:23<?, ?it/s]         
 67%|██████▋   | 2/3 [20:37<08:58, 538.28s/it][A

Loss:0.942012


                                     
  0%|          | 0/3 [21:33<?, ?it/s]         
 67%|██████▋   | 2/3 [20:47<08:58, 538.28s/it][A

Loss:0.797735


                                     
  0%|          | 0/3 [21:43<?, ?it/s]         
 67%|██████▋   | 2/3 [20:57<08:58, 538.28s/it][A

Loss:0.515332


                                     
  0%|          | 0/3 [21:52<?, ?it/s]         
 67%|██████▋   | 2/3 [21:07<08:58, 538.28s/it][A

Loss:0.404242


                                     
  0%|          | 0/3 [22:02<?, ?it/s]         
 67%|██████▋   | 2/3 [21:16<08:58, 538.28s/it][A

Loss:0.763274


                                     
  0%|          | 0/3 [22:12<?, ?it/s]         
 67%|██████▋   | 2/3 [21:26<08:58, 538.28s/it][A

Loss:0.802602


                                     
  0%|          | 0/3 [22:22<?, ?it/s]         
 67%|██████▋   | 2/3 [21:36<08:58, 538.28s/it][A

Loss:0.425111


                                     
  0%|          | 0/3 [22:33<?, ?it/s]         
 67%|██████▋   | 2/3 [21:47<08:58, 538.28s/it][A

Loss:0.593007


                                     
  0%|          | 0/3 [22:43<?, ?it/s]         
 67%|██████▋   | 2/3 [21:57<08:58, 538.28s/it][A

Loss:0.205651


                                     
  0%|          | 0/3 [22:53<?, ?it/s]         
 67%|██████▋   | 2/3 [22:07<08:58, 538.28s/it][A

Loss:0.052174


                                     
  0%|          | 0/3 [23:03<?, ?it/s]         
 67%|██████▋   | 2/3 [22:17<08:58, 538.28s/it][A

Loss:0.191218


                                     
  0%|          | 0/3 [23:13<?, ?it/s]         
 67%|██████▋   | 2/3 [22:27<08:58, 538.28s/it][A

Loss:0.248251


                                     
  0%|          | 0/3 [23:23<?, ?it/s]         
 67%|██████▋   | 2/3 [22:37<08:58, 538.28s/it][A

Loss:0.664998


                                     
  0%|          | 0/3 [23:33<?, ?it/s]         
 67%|██████▋   | 2/3 [22:47<08:58, 538.28s/it][A

Loss:0.791746


                                     
  0%|          | 0/3 [23:43<?, ?it/s]         
 67%|██████▋   | 2/3 [22:57<08:58, 538.28s/it][A

Loss:0.187446


                                     
  0%|          | 0/3 [23:53<?, ?it/s]         
 67%|██████▋   | 2/3 [23:07<08:58, 538.28s/it][A

Loss:0.260010


                                     
  0%|          | 0/3 [24:03<?, ?it/s]         
 67%|██████▋   | 2/3 [23:17<08:58, 538.28s/it][A

Loss:0.162322


                                     
  0%|          | 0/3 [24:13<?, ?it/s]         
 67%|██████▋   | 2/3 [23:27<08:58, 538.28s/it][A

Loss:0.341495


                                     
  0%|          | 0/3 [24:23<?, ?it/s]         
 67%|██████▋   | 2/3 [23:37<08:58, 538.28s/it][A

Loss:0.298152


                                     
  0%|          | 0/3 [24:33<?, ?it/s]         
 67%|██████▋   | 2/3 [23:47<08:58, 538.28s/it][A

Loss:0.233707


                                     
  0%|          | 0/3 [24:43<?, ?it/s]         
 67%|██████▋   | 2/3 [23:58<08:58, 538.28s/it][A

Loss:0.475194


                                     
  0%|          | 0/3 [24:53<?, ?it/s]         
 67%|██████▋   | 2/3 [24:08<08:58, 538.28s/it][A

Loss:0.208173


                                     
  0%|          | 0/3 [25:03<?, ?it/s]         
 67%|██████▋   | 2/3 [24:18<08:58, 538.28s/it][A

Loss:0.039098


                                     
  0%|          | 0/3 [25:13<?, ?it/s]         
 67%|██████▋   | 2/3 [24:27<08:58, 538.28s/it][A

Loss:0.052284


                                     
  0%|          | 0/3 [25:23<?, ?it/s]         
 67%|██████▋   | 2/3 [24:37<08:58, 538.28s/it][A

Loss:0.290496


                                     
  0%|          | 0/3 [25:33<?, ?it/s]         
 67%|██████▋   | 2/3 [24:48<08:58, 538.28s/it][A

Loss:0.234923


                                     
  0%|          | 0/3 [25:43<?, ?it/s]         
 67%|██████▋   | 2/3 [24:58<08:58, 538.28s/it][A

Loss:0.346255


                                     
  0%|          | 0/3 [25:53<?, ?it/s]         
 67%|██████▋   | 2/3 [25:08<08:58, 538.28s/it][A

Loss:0.188442


                                     
  0%|          | 0/3 [26:03<?, ?it/s]         
 67%|██████▋   | 2/3 [25:17<08:58, 538.28s/it][A

Loss:0.455602


                                     
  0%|          | 0/3 [26:13<?, ?it/s]         
 67%|██████▋   | 2/3 [25:27<08:58, 538.28s/it][A

Loss:0.609619


                                     
  0%|          | 0/3 [26:23<?, ?it/s]         
 67%|██████▋   | 2/3 [25:37<08:58, 538.28s/it][A

Loss:0.024805


                                     
  0%|          | 0/3 [26:33<?, ?it/s]         
 67%|██████▋   | 2/3 [25:47<08:58, 538.28s/it][A

Loss:0.795632


                                     
  0%|          | 0/3 [26:43<?, ?it/s]         
 67%|██████▋   | 2/3 [25:57<08:58, 538.28s/it][A

Loss:0.525080


                                     
  0%|          | 0/3 [26:53<?, ?it/s]         
 67%|██████▋   | 2/3 [26:07<08:58, 538.28s/it][A

Loss:0.258776


                                     
  0%|          | 0/3 [27:04<?, ?it/s]         
 67%|██████▋   | 2/3 [26:18<08:58, 538.28s/it][A

Loss:0.321544


                                     
  0%|          | 0/3 [27:15<?, ?it/s]         
 67%|██████▋   | 2/3 [26:29<08:58, 538.28s/it][A

Loss:0.084718


                                     
  0%|          | 0/3 [27:26<?, ?it/s]         
 67%|██████▋   | 2/3 [26:40<08:58, 538.28s/it][A

Loss:0.383003


                                     
  0%|          | 0/3 [27:37<?, ?it/s]         
 67%|██████▋   | 2/3 [26:51<08:58, 538.28s/it][A

Loss:0.203176


                                     
  0%|          | 0/3 [27:49<?, ?it/s]         
 67%|██████▋   | 2/3 [27:03<08:58, 538.28s/it][A

Loss:0.356263


                                     
  0%|          | 0/3 [28:00<?, ?it/s]         
 67%|██████▋   | 2/3 [27:14<08:58, 538.28s/it][A

Loss:0.088029


                                     
  0%|          | 0/3 [28:11<?, ?it/s]         
 67%|██████▋   | 2/3 [27:25<08:58, 538.28s/it][A

Loss:0.609321


                                     
  0%|          | 0/3 [28:22<?, ?it/s]         
 67%|██████▋   | 2/3 [27:37<08:58, 538.28s/it][A

Loss:0.233717


                                     
  0%|          | 0/3 [28:34<?, ?it/s]         
 67%|██████▋   | 2/3 [27:48<08:58, 538.28s/it][A

Loss:0.392966


                                     
  0%|          | 0/3 [28:45<?, ?it/s]         
 67%|██████▋   | 2/3 [27:59<08:58, 538.28s/it][A

Loss:0.239043


                                     
  0%|          | 0/3 [28:56<?, ?it/s]         
 67%|██████▋   | 2/3 [28:10<08:58, 538.28s/it][A

Loss:0.309102


                                     
  0%|          | 0/3 [29:07<?, ?it/s]         
 67%|██████▋   | 2/3 [28:22<08:58, 538.28s/it][A

Loss:0.033599


                                     
  0%|          | 0/3 [29:19<?, ?it/s]         
 67%|██████▋   | 2/3 [28:33<08:58, 538.28s/it][A

Loss:0.207570


                                     
  0%|          | 0/3 [29:30<?, ?it/s]         
 67%|██████▋   | 2/3 [28:44<08:58, 538.28s/it][A

Loss:0.566612


                                     
  0%|          | 0/3 [29:41<?, ?it/s]         
 67%|██████▋   | 2/3 [28:55<08:58, 538.28s/it][A

Loss:0.155714


                                     
  0%|          | 0/3 [29:52<?, ?it/s]         
 67%|██████▋   | 2/3 [29:07<08:58, 538.28s/it][A

Loss:0.355020


                                     
  0%|          | 0/3 [30:03<?, ?it/s]         
 67%|██████▋   | 2/3 [29:18<08:58, 538.28s/it][A

Loss:0.153492


                                     
  0%|          | 0/3 [30:14<?, ?it/s]         
 67%|██████▋   | 2/3 [29:29<08:58, 538.28s/it][A

Loss:0.195237


                                     
  0%|          | 0/3 [30:25<?, ?it/s]         
 67%|██████▋   | 2/3 [29:40<08:58, 538.28s/it][A

Loss:0.147319


                                     
  0%|          | 0/3 [30:36<?, ?it/s]         
 67%|██████▋   | 2/3 [29:50<08:58, 538.28s/it][A

Loss:0.551497


                                     
  0%|          | 0/3 [30:47<?, ?it/s]         
 67%|██████▋   | 2/3 [30:01<08:58, 538.28s/it][A

Loss:0.446988


                                     
  0%|          | 0/3 [30:58<?, ?it/s]         
 67%|██████▋   | 2/3 [30:12<08:58, 538.28s/it][A

Loss:0.392400


                                     
  0%|          | 0/3 [31:08<?, ?it/s]         
 67%|██████▋   | 2/3 [30:22<08:58, 538.28s/it][A

Loss:0.676942


                                     
  0%|          | 0/3 [31:18<?, ?it/s]         
 67%|██████▋   | 2/3 [30:33<08:58, 538.28s/it][A

Loss:0.447995


                                     
  0%|          | 0/3 [31:29<?, ?it/s]         
 67%|██████▋   | 2/3 [30:43<08:58, 538.28s/it][A

Loss:0.124699


                                     
  0%|          | 0/3 [31:39<?, ?it/s]         
 67%|██████▋   | 2/3 [30:53<08:58, 538.28s/it][A

Loss:0.044253


                                     
  0%|          | 0/3 [31:49<?, ?it/s]         
 67%|██████▋   | 2/3 [31:03<08:58, 538.28s/it][A

Loss:0.137005


                                     
  0%|          | 0/3 [31:59<?, ?it/s]         
 67%|██████▋   | 2/3 [31:13<08:58, 538.28s/it][A

Loss:0.384042


                                     
  0%|          | 0/3 [32:09<?, ?it/s]         
 67%|██████▋   | 2/3 [31:24<08:58, 538.28s/it][A

Loss:0.318679


                                     
  0%|          | 0/3 [32:19<?, ?it/s]         
 67%|██████▋   | 2/3 [31:34<08:58, 538.28s/it][A

Loss:0.152525


                                     
  0%|          | 0/3 [32:30<?, ?it/s]         
 67%|██████▋   | 2/3 [31:44<08:58, 538.28s/it][A

Loss:0.244761


                                     
  0%|          | 0/3 [32:39<?, ?it/s]         
 67%|██████▋   | 2/3 [31:54<08:58, 538.28s/it][A

Loss:0.476746


                                     
  0%|          | 0/3 [32:49<?, ?it/s]         
 67%|██████▋   | 2/3 [32:04<08:58, 538.28s/it][A

Loss:0.074933


                                     
  0%|          | 0/3 [32:59<?, ?it/s]         
 67%|██████▋   | 2/3 [32:14<08:58, 538.28s/it][A

Loss:0.304292


                                     
  0%|          | 0/3 [33:09<?, ?it/s]         
 67%|██████▋   | 2/3 [32:24<08:58, 538.28s/it][A

Loss:0.016483


                                     
  0%|          | 0/3 [33:20<?, ?it/s]         
 67%|██████▋   | 2/3 [32:34<08:58, 538.28s/it][A

Loss:0.385908


                                     
  0%|          | 0/3 [33:31<?, ?it/s]         
 67%|██████▋   | 2/3 [32:45<08:58, 538.28s/it][A

Loss:0.385652


                                     
  0%|          | 0/3 [33:42<?, ?it/s]         
 67%|██████▋   | 2/3 [32:57<08:58, 538.28s/it][A

Loss:0.106855


                                     
  0%|          | 0/3 [33:53<?, ?it/s]         
 67%|██████▋   | 2/3 [33:08<08:58, 538.28s/it][A

Loss:0.165334


                                     
  0%|          | 0/3 [34:05<?, ?it/s]         
 67%|██████▋   | 2/3 [33:19<08:58, 538.28s/it][A

Loss:0.280620


                                     
  0%|          | 0/3 [34:16<?, ?it/s]         
 67%|██████▋   | 2/3 [33:30<08:58, 538.28s/it][A

Loss:0.546574


                                     
  0%|          | 0/3 [34:27<?, ?it/s]         
 67%|██████▋   | 2/3 [33:42<08:58, 538.28s/it][A

Loss:0.524420


                                     
  0%|          | 0/3 [34:39<?, ?it/s]         
 67%|██████▋   | 2/3 [33:53<08:58, 538.28s/it][A

Loss:0.130612


                                     
  0%|          | 0/3 [34:50<?, ?it/s]         
 67%|██████▋   | 2/3 [34:04<08:58, 538.28s/it][A

Loss:0.045096


                                     
  0%|          | 0/3 [35:01<?, ?it/s]         
 67%|██████▋   | 2/3 [34:15<08:58, 538.28s/it][A

Loss:0.080000


                                     
  0%|          | 0/3 [35:12<?, ?it/s]         
 67%|██████▋   | 2/3 [34:27<08:58, 538.28s/it][A

Loss:0.660747


                                     
  0%|          | 0/3 [35:24<?, ?it/s]         
 67%|██████▋   | 2/3 [34:38<08:58, 538.28s/it][A

Loss:0.104844


                                     
  0%|          | 0/3 [35:35<?, ?it/s]         
 67%|██████▋   | 2/3 [34:49<08:58, 538.28s/it][A

Loss:0.044222


                                     
  0%|          | 0/3 [35:46<?, ?it/s]         
 67%|██████▋   | 2/3 [35:01<08:58, 538.28s/it][A

Loss:0.029447


                                     
  0%|          | 0/3 [35:58<?, ?it/s]         
 67%|██████▋   | 2/3 [35:12<08:58, 538.28s/it][A

Loss:0.209993


                                     
  0%|          | 0/3 [36:09<?, ?it/s]         
 67%|██████▋   | 2/3 [35:23<08:58, 538.28s/it][A

Loss:0.317221


                                     
  0%|          | 0/3 [36:20<?, ?it/s]         
 67%|██████▋   | 2/3 [35:35<08:58, 538.28s/it][A

Loss:0.413533


                                     
  0%|          | 0/3 [36:32<?, ?it/s]         
 67%|██████▋   | 2/3 [35:46<08:58, 538.28s/it][A

Loss:0.068395


                                     
  0%|          | 0/3 [36:43<?, ?it/s]         
 67%|██████▋   | 2/3 [35:58<08:58, 538.28s/it][A

Loss:0.511159


                                     
  0%|          | 0/3 [36:55<?, ?it/s]         
 67%|██████▋   | 2/3 [36:09<08:58, 538.28s/it][A

Loss:0.197926


                                     
  0%|          | 0/3 [37:06<?, ?it/s]         
 67%|██████▋   | 2/3 [36:20<08:58, 538.28s/it][A

Loss:0.332851


                                     
  0%|          | 0/3 [37:17<?, ?it/s]         
 67%|██████▋   | 2/3 [36:31<08:58, 538.28s/it][A

Loss:0.042949




Evaluating:   0%|          | 0/154 [00:00<?, ?it/s][A[A

Evaluating:   1%|          | 1/154 [00:00<01:53,  1.35it/s][A[A

Evaluating:   1%|▏         | 2/154 [00:01<01:43,  1.47it/s][A[A

Evaluating:   2%|▏         | 3/154 [00:01<01:37,  1.56it/s][A[A

Evaluating:   3%|▎         | 4/154 [00:02<01:31,  1.64it/s][A[A

Evaluating:   3%|▎         | 5/154 [00:02<01:28,  1.69it/s][A[A

Evaluating:   4%|▍         | 6/154 [00:03<01:25,  1.74it/s][A[A

Evaluating:   5%|▍         | 7/154 [00:04<01:24,  1.74it/s][A[A

Evaluating:   5%|▌         | 8/154 [00:04<01:22,  1.76it/s][A[A

Evaluating:   6%|▌         | 9/154 [00:05<01:20,  1.79it/s][A[A

Evaluating:   6%|▋         | 10/154 [00:05<01:19,  1.81it/s][A[A

Evaluating:   7%|▋         | 11/154 [00:06<01:19,  1.79it/s][A[A

Evaluating:   8%|▊         | 12/154 [00:06<01:27,  1.62it/s][A[A

Evaluating:   8%|▊         | 13/154 [00:07<01:24,  1.67it/s][A[A

Evaluating:   9%|▉         | 14/154 [00:08<01:23,  1.68it/s][A

Evaluating:  78%|███████▊  | 120/154 [01:06<00:18,  1.84it/s][A[A

Evaluating:  79%|███████▊  | 121/154 [01:07<00:17,  1.84it/s][A[A

Evaluating:  79%|███████▉  | 122/154 [01:07<00:17,  1.83it/s][A[A

Evaluating:  80%|███████▉  | 123/154 [01:08<00:17,  1.80it/s][A[A

Evaluating:  81%|████████  | 124/154 [01:08<00:16,  1.82it/s][A[A

Evaluating:  81%|████████  | 125/154 [01:09<00:15,  1.83it/s][A[A

Evaluating:  82%|████████▏ | 126/154 [01:09<00:15,  1.79it/s][A[A

Evaluating:  82%|████████▏ | 127/154 [01:10<00:14,  1.81it/s][A[A

Evaluating:  83%|████████▎ | 128/154 [01:11<00:14,  1.81it/s][A[A

Evaluating:  84%|████████▍ | 129/154 [01:11<00:13,  1.81it/s][A[A

Evaluating:  84%|████████▍ | 130/154 [01:12<00:13,  1.82it/s][A[A

Evaluating:  85%|████████▌ | 131/154 [01:12<00:12,  1.79it/s][A[A

Evaluating:  86%|████████▌ | 132/154 [01:13<00:12,  1.79it/s][A[A

Evaluating:  86%|████████▋ | 133/154 [01:13<00:11,  1.81it/s][A[A

Evaluating:  87%|████████▋ | 134/1

## Evaluate

Finally, we report the accuracy and F1-score metrics for each model, as well as the fine-tuning time in hours.

In [35]:
pd.DataFrame(results)

Unnamed: 0,distilbert-base-uncased,roberta-base,xlnet-base-cased
accuracy,0.901406,0.919536,0.925647
f1-score,0.897829,0.916793,0.923171
time,0.111936,0.189581,0.270957
