# 头条新闻分类Bert With Head Baseline

## BERT要点
#### BERT why self-attention
+ 计算复杂度，self-attention每层的复杂度O(n^2*d) n是句子的长度，d是词向量维度
![](./table1.png)
+ 可以并行
+ 长程依赖，LSTM任意两点之间需要经过一定的距离，Attention任意两点之间可以直接进行计算。

#### 主要贡献
+ BERT使用掩码语言模型，可以使得预训练模型进行双向表示
+ BERT是第一个基于微调的模型

#### Task 1: MASKed LM(遮蔽语言模型)
为了训练双向深度表示，我们按照百分比（15%）随机遮盖一些token，然后仅预测这些别遮盖的词。
被掩盖的词中，
1. 80%的词 被替换成 [MASK]
2. 10%的词 被随机替换
3. 10%的词 不动

#### Task2：Next Sentence Prediction 
(A, B) 其中B有50%的概率是A的下一句，50%的概率是从数据集中随机选择的一句。
如果B是A的下一句标注成isNexT，不是则被标注成NotNext。

## 编写配置

In [1]:
import torch 
import torch.nn as nn

config = {
    'train_file_path': '../../../data/toutiao_news_cls/train.csv',
    'test_file_path': '../../../data/toutiao_news_cls/test.csv',
    'train_val_ratio': 0.1,  # 10%用作验证集
    # ------ 与TextCNN不同的配置 ------
    # 'vocab_size': 10000,   # 词典 3W
    'head': 'cnn',
    'model_path': '../../../pt/bert-base-chinese',
    # ------ 与TextCNN不同的配置 ------
    'batch_size': 16,      # batch 大小 16
    'num_epochs': 1,      # 10次迭代
    'learning_rate': 2e-5, # 学习率
    'logging_step': 300,   # 每跑300个batch记录一次
    'seed': 2022           # 随机种子
}

config['device'] = 'cuda' if torch.cuda.is_available() else 'cpu' # cpu&gpu

import random
import numpy as np

def seed_everything(seed):
    random.seed(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)
    return seed

seed_everything(config['seed'])

2022

## 数据预处理并编写DataLoader

In [2]:
import pandas as pd
from tqdm import tqdm
from collections import defaultdict
from transformers import BertTokenizer
from torch.utils.data import DataLoader

In [3]:
# bert分词器
bertTokenizer = BertTokenizer.from_pretrained(config['model_path'])
# 重写分词器
def tokenizer(sent):
    inputs = bertTokenizer.encode_plus(sent, add_special_tokens=True, return_token_type_ids=True, return_attention_mask=True)
    
    return inputs


In [4]:
def read_data(config, mode='train'):
    
    data_df = pd.read_csv(config[f'{mode}_file_path'], sep=',')
    LABEL, SENTENCE = 'label', 'sentence'
    data_df['bert_encode'] = data_df[SENTENCE].apply(tokenizer)
    data_df['input_ids'] = data_df['bert_encode'].apply(lambda s: s['input_ids'])
    input_ids = np.array([[int(id_) for id_ in v] for v in data_df['input_ids'].values])
    data_df['token_type_ids'] = data_df['bert_encode'].apply(lambda s: s['token_type_ids'])
    token_type_ids = np.array([[int(id_) for id_ in v] for v in data_df['token_type_ids'].values])
    data_df['attention_mask'] = data_df['bert_encode'].apply(lambda s: s['attention_mask'])
    attention_mask = np.array([[int(id_) for id_ in v] for v in data_df['attention_mask'].values])

    if mode == 'train':
        labels = data_df[LABEL].values
        
        X_train, y_train = defaultdict(list), []
        X_val, y_val = defaultdict(list), []
        num_val = int(config['train_val_ratio'] * len(data_df))
        
        # shuffle ids
        ids = np.random.choice(range(len(data_df)), size=len(data_df), replace=False)
        train_ids = ids[num_val:]
        val_ids = ids[:num_val]
        
        # get input_ids
        X_train['input_ids'], y_train = input_ids[train_ids], labels[train_ids]
        X_val['input_ids'], y_val = input_ids[val_ids], labels[val_ids]
         # get token_type_ids
        X_train['token_type_ids'] = token_type_ids[train_ids]
        X_val['token_type_ids'] = token_type_ids[val_ids]
        # get attention_mask
        X_train['attention_mask'] = attention_mask[train_ids]
        X_val['attention_mask'] = attention_mask[val_ids]
     
        # label 
        label2id = {label: i for i, label in enumerate(np.unique(y_train))}
        id2label = {i: label for label, i in label2id.items()}
        y_train = torch.tensor([label2id[y] for y in y_train], dtype=torch.long)
        y_val = torch.tensor([label2id[y] for y in y_val], dtype=torch.long)

        return X_train, y_train, X_val, y_val, label2id, id2label

    else:
        X_test = defaultdict(list)
        X_test['input_ids'] = input_ids
        X_test['token_type_ids'] = token_type_ids
        X_test['attention_mask'] = attention_mask
        y_test = torch.zeros(len(data_df), dtype=torch.long)
        
        return X_test, y_test

In [5]:
X_train, y_train, X_val, y_val, label2id, id2label = read_data(config, mode='train')

  import sys
  if __name__ == '__main__':
  # This is added back by InteractiveShellApp.init_path()


In [6]:
X_test, y_test = read_data(config, mode='test')

  import sys
  if __name__ == '__main__':
  # This is added back by InteractiveShellApp.init_path()


#### Dataset提供数据集的封装，创建/继承Dataset必须实现:
+ __len__: 整个数据集的长度
+ __getitem__: 支持数据集索引的函数

In [7]:
from torch.utils.data import Dataset
class TNEWSDataset(Dataset):
    def __init__(self, X, y):
        self.x = X
        self.y = y

    def __getitem__(self, idx):
        return {
            'input_ids' : self.x['input_ids'][idx],
            'label' : self.y[idx],
            'token_type_ids': self.x['token_type_ids'][idx],
            'attention_mask': self.x['attention_mask'][idx]
        }
    
    def __len__(self):
        return self.y.size(0)

#### 使用DataLoader实现数据集的并行加载
+ DataLoader提供一个可迭代对象，实现数据并行加载，从TNEWSDataset返回一个example，取多次，最后形成一个长度为batch_size的列表examples
+ examples的格式：[dict1, dict2, ...]
+ collate_fn()将examples中的数据合并为Tensor

In [8]:
def collate_fn(examples):
    input_ids_lst = []
    labels = []
    # ------ 与TextCNN不同的地方 ------
    token_type_ids_lst = []
    attention_mask_lst = []
    # ------ 与TextCNN不同的地方 ------

    for example in examples:
        input_ids_lst.append(example['input_ids'])
        labels.append(example['label'])
        # ------ 与TextCNN不同的地方 ------
        token_type_ids_lst.append(example['token_type_ids'])
        attention_mask_lst.append(example['attention_mask'])
        # ------ 与TextCNN不同的地方 ------
        
    # 计算input_ids_lst中最长的句子长度，对齐
    max_length = max(len(input_ids) for input_ids in input_ids_lst)
    # 定义一个Tensor
    input_ids_tensor = torch.zeros((len(labels), max_length), dtype=torch.long)
    # ------ 与TextCNN不同的地方 ------
    token_type_ids_tensor = torch.zeros_like(input_ids_tensor)
    attention_mask_tensor = torch.zeros_like(input_ids_tensor)
    # ------ 与TextCNN不同的地方 ------
    
    for i, input_ids in enumerate(input_ids_lst):
        seq_len = len(input_ids)
        input_ids_tensor[i, :seq_len] = torch.tensor(input_ids, dtype=torch.long)
        # ------ 与TextCNN不同的地方 ------
        token_type_ids_tensor[i, :seq_len] = torch.tensor(token_type_ids_lst[i], dtype=torch.long)
        attention_mask_tensor[i, :seq_len] = torch.tensor(attention_mask_lst[i], dtype=torch.long)
        # ------ 与TextCNN不同的地方 ------
        
    return {
        'input_ids': input_ids_tensor,
        'labels': torch.tensor(labels, dtype=torch.long),
        # ------ 与TextCNN不同的地方 ------
        'token_type_ids': token_type_ids_tensor,
        'attention_mask': attention_mask_tensor
        # ------ 与TextCNN不同的地方 ------
    }

In [9]:
from torch.utils.data import DataLoader

def build_dataloader(config):
    X_train, y_train, X_val, y_val, label2id, id2label = read_data(config, mode='train')
    X_test, y_test = read_data(config, mode='test')
    
    train_dataset = TNEWSDataset(X_train, y_train)
    val_dataset = TNEWSDataset(X_val, y_val)
    test_dataset = TNEWSDataset(X_test, y_test)
    
    train_dataloader = DataLoader(dataset=train_dataset, batch_size=config['batch_size'], num_workers=0, shuffle=True, collate_fn=collate_fn)
    val_dataloader = DataLoader(dataset=val_dataset, batch_size=config['batch_size'], num_workers=0, shuffle=False, collate_fn=collate_fn)
    test_dataloader = DataLoader(dataset=test_dataset, batch_size=config['batch_size'], num_workers=0, shuffle=False, collate_fn=collate_fn)

    return train_dataloader, val_dataloader, test_dataloader, id2label

In [10]:
train_dataloader, val_dataloader, test_dataloader, id2label = build_dataloader(config)

  import sys
  if __name__ == '__main__':
  # This is added back by InteractiveShellApp.init_path()


In [11]:
for batch in train_dataloader:
    print(len(batch['input_ids']), len(batch['labels']), len(batch['token_type_ids']), len(batch['attention_mask']))
    print(batch)
    break

16 16 16 16
{'input_ids': tensor([[ 101,  776,  691, 2356,  966, 5892, 1355, 1283,  783, 8024, 2832, 6598,
         5442,  812,  711,  862, 2898, 5330,  976, 4958,  776,  691, 8043,  102,
            0,    0,    0,    0,    0,    0,    0,    0,    0],
        [ 101,  677, 5468, 8038, 7599, 3235, 1915, 7455, 3449,  771, 7676, 8024,
          678, 5468, 2582,  720, 2190, 8043,  102,    0,    0,    0,    0,    0,
            0,    0,    0,    0,    0,    0,    0,    0,    0],
        [ 101, 1266,  776, 1957, 2094, 1745, 7063, 8038, 4385, 2141, 5445, 3655,
         6999, 8024, 6821, 2218, 3221, 4495, 3833, 8013,  102,    0,    0,    0,
            0,    0,    0,    0,    0,    0,    0,    0,    0],
        [ 101,  100, 2207, 1285, 1159,  100, 3173, 3124, 5862, 1765, 2157, 7270,
         4193, 5991, 6421,  679, 6421, 5314, 2015, 2845,  702, 4408,  102,    0,
            0,    0,    0,    0,    0,    0,    0,    0,    0],
        [ 101, 2809, 3124,  671, 1453, 2399, 8024, 7716, 1046, 7987, 6

## 训练验证

In [12]:
# BERT + head part2
from transformers import BertPreTrainedModel, BertModel

class BertForTNEWS(BertPreTrainedModel):
    # classifier -- head
    def __init__(self, config, model_path, classifier):
        super(BertForTNEWS, self).__init__(config)

        self.bert = BertModel.from_pretrained(model_path, config=config)
        self.classifier = classifier
    
    def forward(self, input_ids, token_type_ids,  attention_mask, labels):

        outputs = self.bert(input_ids=input_ids,
                            attention_mask=attention_mask, 
                            token_type_ids=token_type_ids, 
                            output_hidden_states=True)
        
        hidden_states = outputs.hidden_states

        logits = self.classifier(hidden_states, input_ids)
        
        outputs =(logits, )
        # 使用训练集、验证集
        if labels is not None:
            loss_fct = nn.CrossEntropyLoss()
            loss = loss_fct(logits, labels.view(-1))
            outputs =(loss, ) + outputs
        
        return outputs

In [13]:
import torch.nn.functional as F
import torch.nn as nn

class ConvClassifier(nn.Module):
    '''
    CNN + global max pool
    '''
    def __init__(self, config):
        super().__init__()
        self.conv = nn.Conv1d(in_channels=config.hidden_size, out_channels=config.hidden_size, kernel_size=3)
        self.global_max_pool = nn.AdaptiveMaxPool1d(1)
        self.dropout = nn.Dropout(config.hidden_dropout_prob)
        self.fc = nn.Linear(config.hidden_size, config.num_labels)
    
    def forward(self, hidden_states, input_ids):
        hidden_states = self.dropout(hidden_states[-1])#只取出最后一层
        # hidden_states shape (bs, seq_len, hidden_size) -> (bs, hidden_size, seq_len) 
        hidden_states = hidden_states.permute(0, 2, 1)
        out = F.relu(self.conv(hidden_states))
        
        # out (bs, hidden_size_out, seq_len_out)
        # out (bs, hidden_size, 1)
        # out (bs, hidden_size)
        out = self.global_max_pool(out).squeeze(dim=2)
        out = self.fc(out)
        return out

In [14]:
def build_model(model_path, config, head):
    heads = {
        'cnn':ConvClassifier
    }
    model = BertForTNEWS(config, model_path, heads[head](config))
    return model

In [15]:
from sklearn.metrics import f1_score, accuracy_score, precision_score, recall_score

def evaluation(config, model, val_dataloader):
    model.eval()
    preds = []
    labels = []
    val_loss = 0.
    val_iterator = tqdm(val_dataloader, desc='Evaluation...', total=len(val_dataloader))
    with torch.no_grad():
        for batch in val_iterator:
            labels.append(batch['labels'])
            batch = {item:value.to(config['device']) for item, value in batch.items()}
            
            # val output (loss, out)
            loss, logits = model(**batch)[:2]
            val_loss += loss.item()
            
            preds.append(logits.argmax(dim=-1).detach().cpu())
            
    avg_val_loss = val_loss/len(val_dataloader)
    labels = torch.cat(labels, dim=0).numpy()
    preds = torch.cat(preds, dim=0).numpy()
    
    precision = precision_score(labels, preds, average='macro')
    recall = recall_score(labels, preds, average='macro')
    f1 =f1_score(labels, preds, average='macro')
    accuracy = accuracy_score(labels, preds)
    
    return avg_val_loss, f1, precision, recall, accuracy

In [16]:
# Bert model + Head train
from transformers import BertConfig, BertForSequenceClassification
from transformers import AdamW
from tqdm import trange

def train(config, train_dataloader, val_dataloader, model):

    optimizer = AdamW(model.parameters(), lr=config['learning_rate'])
    
    model.to(config['device'])
    
    epoches_iterator = trange(config['num_epochs'])
    global_steps = 0
    train_loss = 0.
    logging_loss = 0.

    best_f1 = 0.
    best_precision = 0.
    best_recall = 0.
    best_accuracy = 0.
    
    for epoch in epoches_iterator:
        train_iterator = tqdm(train_dataloader, desc='Training', total=len(train_dataloader))
        model.train()
        for batch in train_iterator:
            batch = {item:value.to(config['device']) for item, value in batch.items()}
            
            # train output (loss, out)
            loss = model(**batch)[0]
            
            model.zero_grad()  # 模型参数梯度清零
            loss.backward()  # 反向传播
            optimizer.step()  # 更新参数
            train_loss += loss.item()  # 叠加loss
            global_steps += 1
            
            if global_steps % config['logging_step'] == 0:
                print_train_loss = (train_loss - logging_loss) / config['logging_step']
                logging_loss = train_loss
                avg_val_loss, f1, precision, recall, accuracy = evaluation(config, model, val_dataloader)
                
                if best_f1 < f1:
                    best_f1 = f1
                    best_precision = precision
                    best_recall = recall
                    best_accuracy = accuracy
                    print_log = f'''>>> training loss: {print_train_loss: .4f}, valid loss: {avg_val_loss: .4f}\n
                            valid f1 score: {f1: .4f}, valid precision score: {precision: .4f},
                            valid recall score: {recall: .4f}, valid accuracy score: {accuracy: .4f}'''
                    print(print_log)
                    
                model.train()
                
    return model, best_f1, best_precision, best_recall, best_accuracy

In [17]:
# 首次运行代码
# bert_config = BertConfig.from_pretrained(config['model_path'])
# bert_config.num_labels = len(id2label)
# model = build_model(config['model_path'], bert_config, config['head'])
# best_model, f1, precision, recall, accuracy = train(config, train_dataloader, val_dataloader, model)
# best_model.save_pretrained('../../../pt_tmp/bert_head_base_chinese')
# print_log = f'''valid f1 score: {f1: .4f}, valid precision score: {precision: .4f},
#                 valid recall score: {recall: .4f}, valid accuracy score: {accuracy: .4f}'''
# print(print_log)

# 迭代训练代码
bert_config = BertConfig.from_pretrained('../../../pt_tmp/bert_head_base_chinese')
bert_config.num_labels = len(id2label)
model = build_model('../../../pt_tmp/bert_head_base_chinese', bert_config, config['head'])
best_model, f1, precision, recall, accuracy = train(config, train_dataloader, val_dataloader, model)
best_model.save_pretrained('../../../pt_tmp/bert_head_base_chinese')
print_log = f'''valid f1 score: {f1: .4f}, valid precision score: {precision: .4f},
                valid recall score: {recall: .4f}, valid accuracy score: {accuracy: .4f}'''
print(print_log)

Some weights of the model checkpoint at ../../../pt_tmp/bert_head_base_chinese were not used when initializing BertModel: ['classifier.conv.weight', 'classifier.conv.bias', 'classifier.fc.weight', 'classifier.fc.bias']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
  0%|                                                                 | 0/1 [00:00<?, ?it/s]
Training:   0%|                                                    | 0/3002 [00:00<?, ?it/s][A
Training:   0%|                                          | 1/3002 [00:04<4:07:51,  4.96s/it][A
Training:   0%|                            

Training:   2%|▉                                        | 73/3002 [04:51<3:03:34,  3.76s/it][A
Training:   2%|█                                        | 74/3002 [04:55<3:00:29,  3.70s/it][A
Training:   2%|█                                        | 75/3002 [04:58<3:01:28,  3.72s/it][A
Training:   3%|█                                        | 76/3002 [05:02<3:00:42,  3.71s/it][A
Training:   3%|█                                        | 77/3002 [05:06<3:04:44,  3.79s/it][A
Training:   3%|█                                        | 78/3002 [05:10<3:03:24,  3.76s/it][A
Training:   3%|█                                        | 79/3002 [05:13<3:02:36,  3.75s/it][A
Training:   3%|█                                        | 80/3002 [05:17<3:07:37,  3.85s/it][A
Training:   3%|█                                        | 81/3002 [05:21<3:02:00,  3.74s/it][A
Training:   3%|█                                        | 82/3002 [05:25<3:08:39,  3.88s/it][A
Training:   3%|█▏                       

Training:   5%|██                                      | 158/3002 [10:21<2:58:27,  3.77s/it][A
Training:   5%|██                                      | 159/3002 [10:26<3:19:42,  4.21s/it][A
Training:   5%|██▏                                     | 160/3002 [10:31<3:29:47,  4.43s/it][A
Training:   5%|██▏                                     | 161/3002 [10:34<3:17:26,  4.17s/it][A
Training:   5%|██▏                                     | 162/3002 [10:38<3:09:55,  4.01s/it][A
Training:   5%|██▏                                     | 163/3002 [10:42<3:04:37,  3.90s/it][A
Training:   5%|██▏                                     | 164/3002 [10:46<3:11:11,  4.04s/it][A
Training:   5%|██▏                                     | 165/3002 [10:50<3:04:09,  3.89s/it][A
Training:   6%|██▏                                     | 166/3002 [10:53<3:00:41,  3.82s/it][A
Training:   6%|██▏                                     | 167/3002 [10:58<3:08:00,  3.98s/it][A
Training:   6%|██▏                      

Training:   8%|███▏                                    | 243/3002 [16:45<4:23:04,  5.72s/it][A
Training:   8%|███▎                                    | 244/3002 [16:50<4:09:39,  5.43s/it][A
Training:   8%|███▎                                    | 245/3002 [16:53<3:43:12,  4.86s/it][A
Training:   8%|███▎                                    | 246/3002 [16:57<3:27:10,  4.51s/it][A
Training:   8%|███▎                                    | 247/3002 [17:01<3:18:02,  4.31s/it][A
Training:   8%|███▎                                    | 248/3002 [17:05<3:19:04,  4.34s/it][A
Training:   8%|███▎                                    | 249/3002 [17:09<3:12:21,  4.19s/it][A
Training:   8%|███▎                                    | 250/3002 [17:14<3:16:35,  4.29s/it][A
Training:   8%|███▎                                    | 251/3002 [17:18<3:20:57,  4.38s/it][A
Training:   8%|███▎                                    | 252/3002 [17:23<3:23:54,  4.45s/it][A
Training:   8%|███▎                     

Evaluation...:   8%|███▏                                   | 27/334 [00:26<04:56,  1.04it/s][A[A

Evaluation...:   8%|███▎                                   | 28/334 [00:27<04:51,  1.05it/s][A[A

Evaluation...:   9%|███▍                                   | 29/334 [00:28<04:56,  1.03it/s][A[A

Evaluation...:   9%|███▌                                   | 30/334 [00:29<04:52,  1.04it/s][A[A

Evaluation...:   9%|███▌                                   | 31/334 [00:30<04:45,  1.06it/s][A[A

Evaluation...:  10%|███▋                                   | 32/334 [00:32<05:20,  1.06s/it][A[A

Evaluation...:  10%|███▊                                   | 33/334 [00:33<05:09,  1.03s/it][A[A

Evaluation...:  10%|███▉                                   | 34/334 [00:34<05:27,  1.09s/it][A[A

Evaluation...:  10%|████                                   | 35/334 [00:35<05:21,  1.08s/it][A[A

Evaluation...:  11%|████▏                                  | 36/334 [00:36<05:17,  1.07s/it][A[A



Evaluation...:  32%|████████████▎                         | 108/334 [01:48<03:52,  1.03s/it][A[A

Evaluation...:  33%|████████████▍                         | 109/334 [01:49<03:47,  1.01s/it][A[A

Evaluation...:  33%|████████████▌                         | 110/334 [01:50<03:36,  1.03it/s][A[A

Evaluation...:  33%|████████████▋                         | 111/334 [01:51<03:31,  1.05it/s][A[A

Evaluation...:  34%|████████████▋                         | 112/334 [01:52<03:27,  1.07it/s][A[A

Evaluation...:  34%|████████████▊                         | 113/334 [01:53<03:39,  1.01it/s][A[A

Evaluation...:  34%|████████████▉                         | 114/334 [01:55<04:03,  1.10s/it][A[A

Evaluation...:  34%|█████████████                         | 115/334 [01:56<03:50,  1.05s/it][A[A

Evaluation...:  35%|█████████████▏                        | 116/334 [01:56<03:41,  1.02s/it][A[A

Evaluation...:  35%|█████████████▎                        | 117/334 [01:58<03:47,  1.05s/it][A[A



Evaluation...:  57%|█████████████████████▌                | 189/334 [03:13<03:22,  1.39s/it][A[A

Evaluation...:  57%|█████████████████████▌                | 190/334 [03:14<03:09,  1.32s/it][A[A

Evaluation...:  57%|█████████████████████▋                | 191/334 [03:16<03:19,  1.39s/it][A[A

Evaluation...:  57%|█████████████████████▊                | 192/334 [03:17<03:07,  1.32s/it][A[A

Evaluation...:  58%|█████████████████████▉                | 193/334 [03:18<02:49,  1.20s/it][A[A

Evaluation...:  58%|██████████████████████                | 194/334 [03:19<02:42,  1.16s/it][A[A

Evaluation...:  58%|██████████████████████▏               | 195/334 [03:20<02:35,  1.12s/it][A[A

Evaluation...:  59%|██████████████████████▎               | 196/334 [03:21<02:35,  1.13s/it][A[A

Evaluation...:  59%|██████████████████████▍               | 197/334 [03:22<02:28,  1.09s/it][A[A

Evaluation...:  59%|██████████████████████▌               | 198/334 [03:23<02:23,  1.05s/it][A[A



Evaluation...:  81%|██████████████████████████████▋       | 270/334 [04:36<01:01,  1.03it/s][A[A

Evaluation...:  81%|██████████████████████████████▊       | 271/334 [04:38<01:11,  1.13s/it][A[A

Evaluation...:  81%|██████████████████████████████▉       | 272/334 [04:38<01:07,  1.08s/it][A[A

Evaluation...:  82%|███████████████████████████████       | 273/334 [04:39<01:02,  1.03s/it][A[A

Evaluation...:  82%|███████████████████████████████▏      | 274/334 [04:41<01:04,  1.07s/it][A[A

Evaluation...:  82%|███████████████████████████████▎      | 275/334 [04:41<00:59,  1.02s/it][A[A

Evaluation...:  83%|███████████████████████████████▍      | 276/334 [04:42<00:57,  1.01it/s][A[A

Evaluation...:  83%|███████████████████████████████▌      | 277/334 [04:44<01:00,  1.05s/it][A[A

Evaluation...:  83%|███████████████████████████████▋      | 278/334 [04:45<00:57,  1.02s/it][A[A

Evaluation...:  84%|███████████████████████████████▋      | 279/334 [04:46<01:03,  1.15s/it][A[A



>>> training loss:  1.0563, valid loss:  1.2830

                            valid f1 score:  0.5403, valid precision score:  0.5573,
                            valid recall score:  0.5413, valid accuracy score:  0.5568



Training:  10%|███▉                                   | 301/3002 [26:18<56:40:43, 75.54s/it][A
Training:  10%|███▉                                   | 302/3002 [26:21<40:27:47, 53.95s/it][A
Training:  10%|███▉                                   | 303/3002 [26:26<29:19:36, 39.12s/it][A
Training:  10%|███▉                                   | 304/3002 [26:30<21:24:55, 28.57s/it][A
Training:  10%|███▉                                   | 305/3002 [26:35<16:02:06, 21.40s/it][A
Training:  10%|███▉                                   | 306/3002 [26:38<12:04:56, 16.13s/it][A
Training:  10%|████                                    | 307/3002 [26:42<9:18:32, 12.44s/it][A
Training:  10%|████                                    | 308/3002 [26:46<7:20:17,  9.81s/it][A
Training:  10%|████                                    | 309/3002 [26:49<5:48:50,  7.77s/it][A
Training:  10%|████▏                                   | 310/3002 [26:53<4:52:05,  6.51s/it][A
Training:  10%|████▏                   

Training:  13%|█████▏                                  | 386/3002 [31:52<2:52:55,  3.97s/it][A
Training:  13%|█████▏                                  | 387/3002 [31:56<2:53:11,  3.97s/it][A
Training:  13%|█████▏                                  | 388/3002 [32:01<3:05:29,  4.26s/it][A
Training:  13%|█████▏                                  | 389/3002 [32:05<2:57:40,  4.08s/it][A
Training:  13%|█████▏                                  | 390/3002 [32:09<2:51:51,  3.95s/it][A
Training:  13%|█████▏                                  | 391/3002 [32:13<2:59:49,  4.13s/it][A
Training:  13%|█████▏                                  | 392/3002 [32:17<2:53:31,  3.99s/it][A
Training:  13%|█████▏                                  | 393/3002 [32:21<2:49:05,  3.89s/it][A
Training:  13%|█████▏                                  | 394/3002 [32:25<2:52:01,  3.96s/it][A
Training:  13%|█████▎                                  | 395/3002 [32:28<2:48:49,  3.89s/it][A
Training:  13%|█████▎                   

Training:  16%|██████▎                                 | 471/3002 [37:35<2:53:58,  4.12s/it][A
Training:  16%|██████▎                                 | 472/3002 [37:39<2:58:48,  4.24s/it][A
Training:  16%|██████▎                                 | 473/3002 [37:43<2:51:39,  4.07s/it][A
Training:  16%|██████▎                                 | 474/3002 [37:47<2:52:53,  4.10s/it][A
Training:  16%|██████▎                                 | 475/3002 [37:51<2:52:59,  4.11s/it][A
Training:  16%|██████▎                                 | 476/3002 [37:55<2:47:10,  3.97s/it][A
Training:  16%|██████▎                                 | 477/3002 [37:58<2:39:40,  3.79s/it][A
Training:  16%|██████▎                                 | 478/3002 [38:03<2:45:48,  3.94s/it][A
Training:  16%|██████▍                                 | 479/3002 [38:06<2:42:49,  3.87s/it][A
Training:  16%|██████▍                                 | 480/3002 [38:10<2:40:16,  3.81s/it][A
Training:  16%|██████▍                  

Training:  19%|███████▍                                | 556/3002 [43:06<2:44:11,  4.03s/it][A
Training:  19%|███████▍                                | 557/3002 [43:10<2:41:35,  3.97s/it][A
Training:  19%|███████▍                                | 558/3002 [43:14<2:37:46,  3.87s/it][A
Training:  19%|███████▍                                | 559/3002 [43:17<2:35:04,  3.81s/it][A
Training:  19%|███████▍                                | 560/3002 [43:21<2:30:00,  3.69s/it][A
Training:  19%|███████▍                                | 561/3002 [43:24<2:29:33,  3.68s/it][A
Training:  19%|███████▍                                | 562/3002 [43:28<2:33:26,  3.77s/it][A
Training:  19%|███████▌                                | 563/3002 [43:32<2:31:18,  3.72s/it][A
Training:  19%|███████▌                                | 564/3002 [43:36<2:30:30,  3.70s/it][A
Training:  19%|███████▌                                | 565/3002 [43:39<2:30:05,  3.70s/it][A
Training:  19%|███████▌                 

Evaluation...:  12%|████▌                                  | 39/334 [00:40<05:26,  1.11s/it][A[A

Evaluation...:  12%|████▋                                  | 40/334 [00:41<05:09,  1.05s/it][A[A

Evaluation...:  12%|████▊                                  | 41/334 [00:42<05:01,  1.03s/it][A[A

Evaluation...:  13%|████▉                                  | 42/334 [00:43<05:05,  1.05s/it][A[A

Evaluation...:  13%|█████                                  | 43/334 [00:44<04:54,  1.01s/it][A[A

Evaluation...:  13%|█████▏                                 | 44/334 [00:45<05:09,  1.07s/it][A[A

Evaluation...:  13%|█████▎                                 | 45/334 [00:46<05:00,  1.04s/it][A[A

Evaluation...:  14%|█████▎                                 | 46/334 [00:47<04:50,  1.01s/it][A[A

Evaluation...:  14%|█████▍                                 | 47/334 [00:48<04:49,  1.01s/it][A[A

Evaluation...:  14%|█████▌                                 | 48/334 [00:49<05:25,  1.14s/it][A[A



Evaluation...:  36%|█████████████▋                        | 120/334 [02:01<03:36,  1.01s/it][A[A

Evaluation...:  36%|█████████████▊                        | 121/334 [02:02<03:48,  1.07s/it][A[A

Evaluation...:  37%|█████████████▉                        | 122/334 [02:03<03:51,  1.09s/it][A[A

Evaluation...:  37%|█████████████▉                        | 123/334 [02:04<03:39,  1.04s/it][A[A

Evaluation...:  37%|██████████████                        | 124/334 [02:05<03:38,  1.04s/it][A[A

Evaluation...:  37%|██████████████▏                       | 125/334 [02:06<03:41,  1.06s/it][A[A

Evaluation...:  38%|██████████████▎                       | 126/334 [02:07<03:34,  1.03s/it][A[A

Evaluation...:  38%|██████████████▍                       | 127/334 [02:08<03:26,  1.00it/s][A[A

Evaluation...:  38%|██████████████▌                       | 128/334 [02:09<03:20,  1.03it/s][A[A

Evaluation...:  39%|██████████████▋                       | 129/334 [02:10<03:18,  1.03it/s][A[A



Evaluation...:  60%|██████████████████████▊               | 201/334 [03:26<02:05,  1.06it/s][A[A

Evaluation...:  60%|██████████████████████▉               | 202/334 [03:26<02:03,  1.07it/s][A[A

Evaluation...:  61%|███████████████████████               | 203/334 [03:28<02:21,  1.08s/it][A[A

Evaluation...:  61%|███████████████████████▏              | 204/334 [03:29<02:15,  1.05s/it][A[A

Evaluation...:  61%|███████████████████████▎              | 205/334 [03:30<02:17,  1.07s/it][A[A

Evaluation...:  62%|███████████████████████▍              | 206/334 [03:31<02:11,  1.03s/it][A[A

Evaluation...:  62%|███████████████████████▌              | 207/334 [03:32<02:03,  1.03it/s][A[A

Evaluation...:  62%|███████████████████████▋              | 208/334 [03:33<02:04,  1.01it/s][A[A

Evaluation...:  63%|███████████████████████▊              | 209/334 [03:34<02:01,  1.02it/s][A[A

Evaluation...:  63%|███████████████████████▉              | 210/334 [03:35<02:08,  1.04s/it][A[A



Evaluation...:  84%|████████████████████████████████      | 282/334 [04:49<00:55,  1.07s/it][A[A

Evaluation...:  85%|████████████████████████████████▏     | 283/334 [04:50<00:52,  1.03s/it][A[A

Evaluation...:  85%|████████████████████████████████▎     | 284/334 [04:51<00:49,  1.00it/s][A[A

Evaluation...:  85%|████████████████████████████████▍     | 285/334 [04:52<00:48,  1.02it/s][A[A

Evaluation...:  86%|████████████████████████████████▌     | 286/334 [04:53<00:46,  1.04it/s][A[A

Evaluation...:  86%|████████████████████████████████▋     | 287/334 [04:54<00:46,  1.02it/s][A[A

Evaluation...:  86%|████████████████████████████████▊     | 288/334 [04:55<00:49,  1.07s/it][A[A

Evaluation...:  87%|████████████████████████████████▉     | 289/334 [04:56<00:45,  1.01s/it][A[A

Evaluation...:  87%|████████████████████████████████▉     | 290/334 [04:57<00:44,  1.02s/it][A[A

Evaluation...:  87%|█████████████████████████████████     | 291/334 [04:58<00:44,  1.02s/it][A[A



>>> training loss:  0.9695, valid loss:  1.2600

                            valid f1 score:  0.5410, valid precision score:  0.5599,
                            valid recall score:  0.5380, valid accuracy score:  0.5534



Training:  20%|███████▊                               | 601/3002 [51:41<50:27:31, 75.66s/it][A
Training:  20%|███████▊                               | 602/3002 [51:45<36:09:45, 54.24s/it][A
Training:  20%|███████▊                               | 603/3002 [51:49<26:04:05, 39.12s/it][A
Training:  20%|███████▊                               | 604/3002 [51:53<19:04:07, 28.63s/it][A
Training:  20%|███████▊                               | 605/3002 [51:58<14:15:40, 21.42s/it][A
Training:  20%|███████▊                               | 606/3002 [52:02<10:43:51, 16.12s/it][A
Training:  20%|████████                                | 607/3002 [52:05<8:14:29, 12.39s/it][A
Training:  20%|████████                                | 608/3002 [52:10<6:38:19,  9.98s/it][A
Training:  20%|████████                                | 609/3002 [52:14<5:32:24,  8.33s/it][A
Training:  20%|████████▏                               | 610/3002 [52:18<4:36:13,  6.93s/it][A
Training:  20%|████████▏               

Training:  23%|█████████▏                              | 686/3002 [57:18<2:41:41,  4.19s/it][A
Training:  23%|█████████▏                              | 687/3002 [57:22<2:36:30,  4.06s/it][A
Training:  23%|█████████▏                              | 688/3002 [57:26<2:31:06,  3.92s/it][A
Training:  23%|█████████▏                              | 689/3002 [57:29<2:27:10,  3.82s/it][A
Training:  23%|█████████▏                              | 690/3002 [57:33<2:26:07,  3.79s/it][A
Training:  23%|█████████▏                              | 691/3002 [57:37<2:25:09,  3.77s/it][A
Training:  23%|█████████▏                              | 692/3002 [57:40<2:23:48,  3.74s/it][A
Training:  23%|█████████▏                              | 693/3002 [57:45<2:30:47,  3.92s/it][A
Training:  23%|█████████▏                              | 694/3002 [57:48<2:27:13,  3.83s/it][A
Training:  23%|█████████▎                              | 695/3002 [57:52<2:25:36,  3.79s/it][A
Training:  23%|█████████▎               

Training:  26%|█████████▊                            | 771/3002 [1:02:53<2:44:36,  4.43s/it][A
Training:  26%|█████████▊                            | 772/3002 [1:02:57<2:39:48,  4.30s/it][A
Training:  26%|█████████▊                            | 773/3002 [1:03:01<2:34:00,  4.15s/it][A
Training:  26%|█████████▊                            | 774/3002 [1:03:05<2:28:21,  4.00s/it][A
Training:  26%|█████████▊                            | 775/3002 [1:03:09<2:33:47,  4.14s/it][A
Training:  26%|█████████▊                            | 776/3002 [1:03:13<2:31:12,  4.08s/it][A
Training:  26%|█████████▊                            | 777/3002 [1:03:17<2:27:44,  3.98s/it][A
Training:  26%|█████████▊                            | 778/3002 [1:03:20<2:25:40,  3.93s/it][A
Training:  26%|█████████▊                            | 779/3002 [1:03:24<2:24:31,  3.90s/it][A
Training:  26%|█████████▊                            | 780/3002 [1:03:28<2:23:12,  3.87s/it][A
Training:  26%|█████████▉               

Training:  29%|██████████▊                           | 856/3002 [1:08:33<2:15:06,  3.78s/it][A
Training:  29%|██████████▊                           | 857/3002 [1:08:36<2:14:04,  3.75s/it][A
Training:  29%|██████████▊                           | 858/3002 [1:08:41<2:18:55,  3.89s/it][A
Training:  29%|██████████▊                           | 859/3002 [1:08:45<2:22:17,  3.98s/it][A
Training:  29%|██████████▉                           | 860/3002 [1:08:50<2:29:43,  4.19s/it][A
Training:  29%|██████████▉                           | 861/3002 [1:08:54<2:32:39,  4.28s/it][A
Training:  29%|██████████▉                           | 862/3002 [1:08:58<2:28:42,  4.17s/it][A
Training:  29%|██████████▉                           | 863/3002 [1:09:01<2:21:27,  3.97s/it][A
Training:  29%|██████████▉                           | 864/3002 [1:09:05<2:18:01,  3.87s/it][A
Training:  29%|██████████▉                           | 865/3002 [1:09:09<2:21:41,  3.98s/it][A
Training:  29%|██████████▉              

Evaluation...:  12%|████▌                                  | 39/334 [00:39<05:27,  1.11s/it][A[A

Evaluation...:  12%|████▋                                  | 40/334 [00:40<05:10,  1.06s/it][A[A

Evaluation...:  12%|████▊                                  | 41/334 [00:41<05:02,  1.03s/it][A[A

Evaluation...:  13%|████▉                                  | 42/334 [00:42<05:04,  1.04s/it][A[A

Evaluation...:  13%|█████                                  | 43/334 [00:43<04:54,  1.01s/it][A[A

Evaluation...:  13%|█████▏                                 | 44/334 [00:44<05:08,  1.06s/it][A[A

Evaluation...:  13%|█████▎                                 | 45/334 [00:45<05:00,  1.04s/it][A[A

Evaluation...:  14%|█████▎                                 | 46/334 [00:46<04:49,  1.00s/it][A[A

Evaluation...:  14%|█████▍                                 | 47/334 [00:47<04:48,  1.01s/it][A[A

Evaluation...:  14%|█████▌                                 | 48/334 [00:48<05:27,  1.15s/it][A[A



Evaluation...:  36%|█████████████▋                        | 120/334 [02:00<03:36,  1.01s/it][A[A

Evaluation...:  36%|█████████████▊                        | 121/334 [02:01<03:47,  1.07s/it][A[A

Evaluation...:  37%|█████████████▉                        | 122/334 [02:02<03:50,  1.09s/it][A[A

Evaluation...:  37%|█████████████▉                        | 123/334 [02:03<03:38,  1.04s/it][A[A

Evaluation...:  37%|██████████████                        | 124/334 [02:04<03:39,  1.04s/it][A[A

Evaluation...:  37%|██████████████▏                       | 125/334 [02:06<03:41,  1.06s/it][A[A

Evaluation...:  38%|██████████████▎                       | 126/334 [02:06<03:34,  1.03s/it][A[A

Evaluation...:  38%|██████████████▍                       | 127/334 [02:07<03:26,  1.00it/s][A[A

Evaluation...:  38%|██████████████▌                       | 128/334 [02:08<03:23,  1.01it/s][A[A

Evaluation...:  39%|██████████████▋                       | 129/334 [02:09<03:21,  1.02it/s][A[A



Evaluation...:  60%|██████████████████████▊               | 201/334 [03:24<02:16,  1.02s/it][A[A

Evaluation...:  60%|██████████████████████▉               | 202/334 [03:25<02:10,  1.01it/s][A[A

Evaluation...:  61%|███████████████████████               | 203/334 [03:26<02:26,  1.12s/it][A[A

Evaluation...:  61%|███████████████████████▏              | 204/334 [03:27<02:19,  1.07s/it][A[A

Evaluation...:  61%|███████████████████████▎              | 205/334 [03:28<02:20,  1.09s/it][A[A

Evaluation...:  62%|███████████████████████▍              | 206/334 [03:29<02:12,  1.03s/it][A[A

Evaluation...:  62%|███████████████████████▌              | 207/334 [03:30<02:04,  1.02it/s][A[A

Evaluation...:  62%|███████████████████████▋              | 208/334 [03:31<02:05,  1.01it/s][A[A

Evaluation...:  63%|███████████████████████▊              | 209/334 [03:32<02:01,  1.03it/s][A[A

Evaluation...:  63%|███████████████████████▉              | 210/334 [03:33<02:08,  1.04s/it][A[A



Evaluation...:  84%|████████████████████████████████      | 282/334 [25:34<04:19,  5.00s/it][A[A

Evaluation...:  85%|████████████████████████████████▏     | 283/334 [25:38<04:02,  4.75s/it][A[A

Evaluation...:  85%|████████████████████████████████▎     | 284/334 [25:42<03:49,  4.59s/it][A[A

Evaluation...:  85%|████████████████████████████████▍     | 285/334 [25:47<03:40,  4.51s/it][A[A

Evaluation...:  86%|██████████████████████████████▊     | 286/334 [30:33<1:11:19, 89.16s/it][A[A

Evaluation...:  86%|████████████████████████████████▋     | 287/334 [30:34<49:08, 62.73s/it][A[A

Evaluation...:  86%|████████████████████████████████▊     | 288/334 [30:36<33:57, 44.30s/it][A[A

Evaluation...:  87%|████████████████████████████████▉     | 289/334 [30:36<23:27, 31.27s/it][A[A

Evaluation...:  87%|████████████████████████████████▉     | 290/334 [30:38<16:16, 22.20s/it][A[A

Evaluation...:  87%|█████████████████████████████████     | 291/334 [30:39<11:21, 15.84s/it][A[A



>>> training loss:  0.9968, valid loss:  1.2538

                            valid f1 score:  0.5508, valid precision score:  0.5757,
                            valid recall score:  0.5483, valid accuracy score:  0.5562



Training:  30%|██████████▌                        | 901/3002 [1:53:38<313:12:13, 536.67s/it][A
Training:  30%|██████████▌                        | 902/3002 [1:58:35<271:05:21, 464.72s/it][A
Training:  30%|██████████▌                        | 903/3002 [1:58:39<190:20:13, 326.45s/it][A
Training:  30%|██████████▌                        | 904/3002 [1:58:42<133:48:30, 229.60s/it][A
Training:  30%|██████████▊                         | 905/3002 [1:58:55<95:49:03, 164.49s/it][A
Training:  30%|██████████▊                         | 906/3002 [1:59:13<70:07:10, 120.43s/it][A
Training:  30%|██████████▌                        | 907/3002 [2:04:08<100:34:24, 172.82s/it][A
Training:  30%|██████████▉                         | 908/3002 [2:04:11<71:00:18, 122.07s/it][A
Training:  30%|███████████▏                         | 909/3002 [2:04:15<50:18:51, 86.54s/it][A
Training:  30%|███████████▏                         | 910/3002 [2:04:26<37:06:17, 63.85s/it][A
Training:  30%|███████████▏            

Training:  33%|████████████▍                         | 986/3002 [2:38:38<2:20:39,  4.19s/it][A
Training:  33%|████████████▍                         | 987/3002 [2:38:43<2:24:36,  4.31s/it][A
Training:  33%|████████████▌                         | 988/3002 [2:38:47<2:18:00,  4.11s/it][A
Training:  33%|████████████▌                         | 989/3002 [2:38:51<2:17:55,  4.11s/it][A
Training:  33%|████████████▌                         | 990/3002 [2:38:55<2:14:55,  4.02s/it][A
Training:  33%|████████████▌                         | 991/3002 [2:39:00<2:24:10,  4.30s/it][A
Training:  33%|████████████▌                         | 992/3002 [2:39:03<2:17:43,  4.11s/it][A
Training:  33%|████████████▌                         | 993/3002 [2:39:07<2:18:01,  4.12s/it][A
Training:  33%|████████████▌                         | 994/3002 [2:39:11<2:13:22,  3.99s/it][A
Training:  33%|████████████▌                         | 995/3002 [2:39:15<2:10:11,  3.89s/it][A
Training:  33%|████████████▌            

Training:  36%|█████████████▏                       | 1071/3002 [2:44:25<2:05:29,  3.90s/it][A
Training:  36%|█████████████▏                       | 1072/3002 [2:44:29<2:03:23,  3.84s/it][A
Training:  36%|█████████████▏                       | 1073/3002 [2:44:33<2:02:20,  3.81s/it][A
Training:  36%|█████████████▏                       | 1074/3002 [2:44:36<2:00:56,  3.76s/it][A
Training:  36%|█████████████▏                       | 1075/3002 [2:44:40<2:00:30,  3.75s/it][A
Training:  36%|█████████████▎                       | 1076/3002 [2:44:44<2:02:51,  3.83s/it][A
Training:  36%|█████████████▎                       | 1077/3002 [2:44:48<2:03:18,  3.84s/it][A
Training:  36%|█████████████▎                       | 1078/3002 [2:44:52<2:10:49,  4.08s/it][A
Training:  36%|█████████████▎                       | 1079/3002 [2:44:56<2:07:11,  3.97s/it][A
Training:  36%|█████████████▎                       | 1080/3002 [2:45:00<2:07:48,  3.99s/it][A
Training:  36%|█████████████▎           

Training:  39%|██████████████▏                      | 1156/3002 [2:50:06<2:04:18,  4.04s/it][A
Training:  39%|██████████████▎                      | 1157/3002 [2:50:10<2:02:35,  3.99s/it][A
Training:  39%|██████████████▎                      | 1158/3002 [2:50:14<2:00:16,  3.91s/it][A
Training:  39%|██████████████▎                      | 1159/3002 [2:50:18<1:59:22,  3.89s/it][A
Training:  39%|██████████████▎                      | 1160/3002 [2:50:21<1:57:23,  3.82s/it][A
Training:  39%|██████████████▎                      | 1161/3002 [2:50:25<1:56:11,  3.79s/it][A
Training:  39%|██████████████▎                      | 1162/3002 [2:50:29<1:58:14,  3.86s/it][A
Training:  39%|██████████████▎                      | 1163/3002 [2:50:34<2:08:16,  4.19s/it][A
Training:  39%|██████████████▎                      | 1164/3002 [2:50:38<2:06:36,  4.13s/it][A
Training:  39%|██████████████▎                      | 1165/3002 [2:50:42<2:04:10,  4.06s/it][A
Training:  39%|██████████████▎          

Evaluation...:  12%|████▌                                  | 39/334 [00:38<05:25,  1.10s/it][A[A

Evaluation...:  12%|████▋                                  | 40/334 [00:39<05:08,  1.05s/it][A[A

Evaluation...:  12%|████▊                                  | 41/334 [00:40<05:00,  1.03s/it][A[A

Evaluation...:  13%|████▉                                  | 42/334 [00:41<05:01,  1.03s/it][A[A

Evaluation...:  13%|█████                                  | 43/334 [00:42<04:50,  1.00it/s][A[A

Evaluation...:  13%|█████▏                                 | 44/334 [00:44<05:04,  1.05s/it][A[A

Evaluation...:  13%|█████▎                                 | 45/334 [00:44<04:56,  1.03s/it][A[A

Evaluation...:  14%|█████▎                                 | 46/334 [00:45<04:45,  1.01it/s][A[A

Evaluation...:  14%|█████▍                                 | 47/334 [00:46<04:44,  1.01it/s][A[A

Evaluation...:  14%|█████▌                                 | 48/334 [00:48<05:24,  1.13s/it][A[A



Evaluation...:  36%|█████████████▋                        | 120/334 [01:59<03:33,  1.00it/s][A[A

Evaluation...:  36%|█████████████▊                        | 121/334 [02:00<03:44,  1.06s/it][A[A

Evaluation...:  37%|█████████████▉                        | 122/334 [02:01<03:48,  1.08s/it][A[A

Evaluation...:  37%|█████████████▉                        | 123/334 [02:02<03:37,  1.03s/it][A[A

Evaluation...:  37%|██████████████                        | 124/334 [02:03<03:37,  1.03s/it][A[A

Evaluation...:  37%|██████████████▏                       | 125/334 [02:04<03:38,  1.05s/it][A[A

Evaluation...:  38%|██████████████▎                       | 126/334 [02:05<03:31,  1.01s/it][A[A

Evaluation...:  38%|██████████████▍                       | 127/334 [02:06<03:23,  1.02it/s][A[A

Evaluation...:  38%|██████████████▌                       | 128/334 [02:07<03:18,  1.04it/s][A[A

Evaluation...:  39%|██████████████▋                       | 129/334 [02:08<03:16,  1.04it/s][A[A



Evaluation...:  60%|██████████████████████▊               | 201/334 [03:20<02:03,  1.08it/s][A[A

Evaluation...:  60%|██████████████████████▉               | 202/334 [03:21<02:02,  1.08it/s][A[A

Evaluation...:  61%|███████████████████████               | 203/334 [03:22<02:20,  1.07s/it][A[A

Evaluation...:  61%|███████████████████████▏              | 204/334 [03:23<02:14,  1.03s/it][A[A

Evaluation...:  61%|███████████████████████▎              | 205/334 [03:24<02:16,  1.06s/it][A[A

Evaluation...:  62%|███████████████████████▍              | 206/334 [03:25<02:09,  1.01s/it][A[A

Evaluation...:  62%|███████████████████████▌              | 207/334 [03:26<02:01,  1.04it/s][A[A

Evaluation...:  62%|███████████████████████▋              | 208/334 [03:27<02:02,  1.02it/s][A[A

Evaluation...:  63%|███████████████████████▊              | 209/334 [03:28<02:00,  1.04it/s][A[A

Evaluation...:  63%|███████████████████████▉              | 210/334 [03:29<02:07,  1.03s/it][A[A



Evaluation...:  84%|████████████████████████████████      | 282/334 [04:42<00:54,  1.05s/it][A[A

Evaluation...:  85%|████████████████████████████████▏     | 283/334 [04:43<00:51,  1.01s/it][A[A

Evaluation...:  85%|████████████████████████████████▎     | 284/334 [04:44<00:49,  1.02it/s][A[A

Evaluation...:  85%|████████████████████████████████▍     | 285/334 [04:45<00:47,  1.03it/s][A[A

Evaluation...:  86%|████████████████████████████████▌     | 286/334 [04:46<00:45,  1.05it/s][A[A

Evaluation...:  86%|████████████████████████████████▋     | 287/334 [04:47<00:45,  1.03it/s][A[A

Evaluation...:  86%|████████████████████████████████▊     | 288/334 [04:48<00:48,  1.06s/it][A[A

Evaluation...:  87%|████████████████████████████████▉     | 289/334 [04:49<00:45,  1.00s/it][A[A

Evaluation...:  87%|████████████████████████████████▉     | 290/334 [04:50<00:44,  1.01s/it][A[A

Evaluation...:  87%|█████████████████████████████████     | 291/334 [04:51<00:43,  1.01s/it][A[A



Training:  41%|███████████████▏                     | 1230/3002 [3:00:35<1:57:53,  3.99s/it][A
Training:  41%|███████████████▏                     | 1231/3002 [3:00:39<1:57:19,  3.97s/it][A
Training:  41%|███████████████▏                     | 1232/3002 [3:00:42<1:54:30,  3.88s/it][A
Training:  41%|███████████████▏                     | 1233/3002 [3:00:47<1:58:07,  4.01s/it][A
Training:  41%|███████████████▏                     | 1234/3002 [3:00:50<1:55:31,  3.92s/it][A
Training:  41%|███████████████▏                     | 1235/3002 [3:00:54<1:57:13,  3.98s/it][A
Training:  41%|███████████████▏                     | 1236/3002 [3:00:58<1:52:43,  3.83s/it][A
Training:  41%|███████████████▏                     | 1237/3002 [3:01:02<1:51:30,  3.79s/it][A
Training:  41%|███████████████▎                     | 1238/3002 [3:01:05<1:50:18,  3.75s/it][A
Training:  41%|███████████████▎                     | 1239/3002 [3:01:09<1:47:48,  3.67s/it][A
Training:  41%|███████████████▎         

Training:  44%|████████████████▏                    | 1315/3002 [3:06:21<1:55:45,  4.12s/it][A
Training:  44%|████████████████▏                    | 1316/3002 [3:06:25<1:56:52,  4.16s/it][A
Training:  44%|████████████████▏                    | 1317/3002 [3:06:31<2:06:10,  4.49s/it][A
Training:  44%|████████████████▏                    | 1318/3002 [3:06:35<2:01:27,  4.33s/it][A
Training:  44%|████████████████▎                    | 1319/3002 [3:06:39<2:01:15,  4.32s/it][A
Training:  44%|████████████████▎                    | 1320/3002 [3:06:43<1:58:44,  4.24s/it][A
Training:  44%|████████████████▎                    | 1321/3002 [3:06:47<1:56:13,  4.15s/it][A
Training:  44%|████████████████▎                    | 1322/3002 [3:06:51<1:58:46,  4.24s/it][A
Training:  44%|████████████████▎                    | 1323/3002 [3:06:56<1:59:40,  4.28s/it][A
Training:  44%|████████████████▎                    | 1324/3002 [3:07:00<1:58:58,  4.25s/it][A
Training:  44%|████████████████▎        

Training:  47%|█████████████████▎                   | 1400/3002 [3:11:57<1:50:06,  4.12s/it][A
Training:  47%|█████████████████▎                   | 1401/3002 [3:12:01<1:47:35,  4.03s/it][A
Training:  47%|█████████████████▎                   | 1402/3002 [3:12:04<1:43:04,  3.87s/it][A
Training:  47%|█████████████████▎                   | 1403/3002 [3:12:09<1:49:30,  4.11s/it][A
Training:  47%|█████████████████▎                   | 1404/3002 [3:12:13<1:52:29,  4.22s/it][A
Training:  47%|█████████████████▎                   | 1405/3002 [3:12:17<1:49:48,  4.13s/it][A
Training:  47%|█████████████████▎                   | 1406/3002 [3:12:21<1:45:55,  3.98s/it][A
Training:  47%|█████████████████▎                   | 1407/3002 [3:12:25<1:43:49,  3.91s/it][A
Training:  47%|█████████████████▎                   | 1408/3002 [3:12:29<1:45:23,  3.97s/it][A
Training:  47%|█████████████████▎                   | 1409/3002 [3:12:33<1:44:00,  3.92s/it][A
Training:  47%|█████████████████▍       

Training:  49%|██████████████████▎                  | 1485/3002 [3:17:25<1:33:25,  3.70s/it][A
Training:  50%|██████████████████▎                  | 1486/3002 [3:17:30<1:37:06,  3.84s/it][A
Training:  50%|██████████████████▎                  | 1487/3002 [3:17:33<1:33:29,  3.70s/it][A
Training:  50%|██████████████████▎                  | 1488/3002 [3:17:36<1:31:41,  3.63s/it][A
Training:  50%|██████████████████▎                  | 1489/3002 [3:17:40<1:31:03,  3.61s/it][A
Training:  50%|██████████████████▎                  | 1490/3002 [3:17:44<1:36:38,  3.83s/it][A
Training:  50%|██████████████████▍                  | 1491/3002 [3:17:48<1:38:33,  3.91s/it][A
Training:  50%|██████████████████▍                  | 1492/3002 [3:17:52<1:35:51,  3.81s/it][A
Training:  50%|██████████████████▍                  | 1493/3002 [3:17:56<1:34:40,  3.76s/it][A
Training:  50%|██████████████████▍                  | 1494/3002 [3:17:59<1:34:18,  3.75s/it][A
Training:  50%|██████████████████▍      

Evaluation...:  20%|███████▊                               | 67/334 [01:07<04:32,  1.02s/it][A[A

Evaluation...:  20%|███████▉                               | 68/334 [01:08<04:33,  1.03s/it][A[A

Evaluation...:  21%|████████                               | 69/334 [01:09<04:20,  1.02it/s][A[A

Evaluation...:  21%|████████▏                              | 70/334 [01:10<04:18,  1.02it/s][A[A

Evaluation...:  21%|████████▎                              | 71/334 [01:11<04:12,  1.04it/s][A[A

Evaluation...:  22%|████████▍                              | 72/334 [01:12<04:07,  1.06it/s][A[A

Evaluation...:  22%|████████▌                              | 73/334 [01:12<04:03,  1.07it/s][A[A

Evaluation...:  22%|████████▋                              | 74/334 [01:14<04:16,  1.01it/s][A[A

Evaluation...:  22%|████████▊                              | 75/334 [01:14<04:09,  1.04it/s][A[A

Evaluation...:  23%|████████▊                              | 76/334 [01:15<04:04,  1.06it/s][A[A



Evaluation...:  44%|████████████████▊                     | 148/334 [02:28<03:05,  1.00it/s][A[A

Evaluation...:  45%|████████████████▉                     | 149/334 [02:29<02:57,  1.04it/s][A[A

Evaluation...:  45%|█████████████████                     | 150/334 [02:30<02:55,  1.05it/s][A[A

Evaluation...:  45%|█████████████████▏                    | 151/334 [02:31<03:04,  1.01s/it][A[A

Evaluation...:  46%|█████████████████▎                    | 152/334 [02:32<02:57,  1.02it/s][A[A

Evaluation...:  46%|█████████████████▍                    | 153/334 [02:33<02:53,  1.05it/s][A[A

Evaluation...:  46%|█████████████████▌                    | 154/334 [02:34<02:50,  1.05it/s][A[A

Evaluation...:  46%|█████████████████▋                    | 155/334 [02:35<02:48,  1.07it/s][A[A

Evaluation...:  47%|█████████████████▋                    | 156/334 [02:36<02:54,  1.02it/s][A[A

Evaluation...:  47%|█████████████████▊                    | 157/334 [02:37<02:50,  1.04it/s][A[A



Evaluation...:  69%|██████████████████████████            | 229/334 [03:49<01:47,  1.02s/it][A[A

Evaluation...:  69%|██████████████████████████▏           | 230/334 [03:50<01:48,  1.04s/it][A[A

Evaluation...:  69%|██████████████████████████▎           | 231/334 [03:51<01:43,  1.00s/it][A[A

Evaluation...:  69%|██████████████████████████▍           | 232/334 [03:52<01:40,  1.02it/s][A[A

Evaluation...:  70%|██████████████████████████▌           | 233/334 [03:53<01:38,  1.02it/s][A[A

Evaluation...:  70%|██████████████████████████▌           | 234/334 [03:54<01:47,  1.07s/it][A[A

Evaluation...:  70%|██████████████████████████▋           | 235/334 [03:55<01:48,  1.09s/it][A[A

Evaluation...:  71%|██████████████████████████▊           | 236/334 [03:56<01:44,  1.07s/it][A[A

Evaluation...:  71%|██████████████████████████▉           | 237/334 [03:57<01:39,  1.03s/it][A[A

Evaluation...:  71%|███████████████████████████           | 238/334 [03:58<01:36,  1.01s/it][A[A



Evaluation...:  93%|███████████████████████████████████▎  | 310/334 [05:12<00:23,  1.02it/s][A[A

Evaluation...:  93%|███████████████████████████████████▍  | 311/334 [05:13<00:22,  1.04it/s][A[A

Evaluation...:  93%|███████████████████████████████████▍  | 312/334 [05:14<00:20,  1.07it/s][A[A

Evaluation...:  94%|███████████████████████████████████▌  | 313/334 [05:15<00:19,  1.07it/s][A[A

Evaluation...:  94%|███████████████████████████████████▋  | 314/334 [05:16<00:18,  1.06it/s][A[A

Evaluation...:  94%|███████████████████████████████████▊  | 315/334 [05:17<00:17,  1.08it/s][A[A

Evaluation...:  95%|███████████████████████████████████▉  | 316/334 [05:18<00:17,  1.05it/s][A[A

Evaluation...:  95%|████████████████████████████████████  | 317/334 [05:19<00:17,  1.01s/it][A[A

Evaluation...:  95%|████████████████████████████████████▏ | 318/334 [05:20<00:15,  1.02it/s][A[A

Evaluation...:  96%|████████████████████████████████████▎ | 319/334 [05:21<00:14,  1.01it/s][A[A



>>> training loss:  1.0072, valid loss:  1.2482

                            valid f1 score:  0.5536, valid precision score:  0.5815,
                            valid recall score:  0.5514, valid accuracy score:  0.5654



Training:  50%|██████████████████                  | 1501/3002 [3:24:00<30:53:52, 74.11s/it][A
Training:  50%|██████████████████                  | 1502/3002 [3:24:04<22:04:22, 52.98s/it][A
Training:  50%|██████████████████                  | 1503/3002 [3:24:08<15:53:57, 38.18s/it][A
Training:  50%|██████████████████                  | 1504/3002 [3:24:12<11:37:22, 27.93s/it][A
Training:  50%|██████████████████▌                  | 1505/3002 [3:24:15<8:35:24, 20.66s/it][A
Training:  50%|██████████████████▌                  | 1506/3002 [3:24:19<6:26:28, 15.50s/it][A
Training:  50%|██████████████████▌                  | 1507/3002 [3:24:24<5:07:21, 12.34s/it][A
Training:  50%|██████████████████▌                  | 1508/3002 [3:24:29<4:11:56, 10.12s/it][A
Training:  50%|██████████████████▌                  | 1509/3002 [3:24:32<3:23:34,  8.18s/it][A
Training:  50%|██████████████████▌                  | 1510/3002 [3:24:37<2:57:54,  7.15s/it][A
Training:  50%|██████████████████▌     

Training:  53%|███████████████████▌                 | 1586/3002 [3:29:35<1:28:10,  3.74s/it][A
Training:  53%|███████████████████▌                 | 1587/3002 [3:29:39<1:31:25,  3.88s/it][A
Training:  53%|███████████████████▌                 | 1588/3002 [3:29:44<1:35:50,  4.07s/it][A
Training:  53%|███████████████████▌                 | 1589/3002 [3:29:48<1:38:39,  4.19s/it][A
Training:  53%|███████████████████▌                 | 1590/3002 [3:29:53<1:40:03,  4.25s/it][A
Training:  53%|███████████████████▌                 | 1591/3002 [3:29:57<1:41:34,  4.32s/it][A
Training:  53%|███████████████████▌                 | 1592/3002 [3:30:01<1:36:51,  4.12s/it][A
Training:  53%|███████████████████▋                 | 1593/3002 [3:30:06<1:39:57,  4.26s/it][A
Training:  53%|███████████████████▋                 | 1594/3002 [3:30:09<1:34:56,  4.05s/it][A
Training:  53%|███████████████████▋                 | 1595/3002 [3:30:13<1:33:21,  3.98s/it][A
Training:  53%|███████████████████▋     

Training:  56%|████████████████████▌                | 1671/3002 [3:35:11<1:25:32,  3.86s/it][A
Training:  56%|████████████████████▌                | 1672/3002 [3:35:15<1:24:16,  3.80s/it][A
Training:  56%|████████████████████▌                | 1673/3002 [3:35:20<1:32:28,  4.17s/it][A
Training:  56%|████████████████████▋                | 1674/3002 [3:35:24<1:34:32,  4.27s/it][A
Training:  56%|████████████████████▋                | 1675/3002 [3:35:29<1:37:07,  4.39s/it][A
Training:  56%|████████████████████▋                | 1676/3002 [3:35:33<1:33:17,  4.22s/it][A
Training:  56%|████████████████████▋                | 1677/3002 [3:35:36<1:30:01,  4.08s/it][A
Training:  56%|████████████████████▋                | 1678/3002 [3:35:41<1:33:12,  4.22s/it][A
Training:  56%|████████████████████▋                | 1679/3002 [3:35:45<1:34:08,  4.27s/it][A
Training:  56%|████████████████████▋                | 1680/3002 [3:35:49<1:30:26,  4.11s/it][A
Training:  56%|████████████████████▋    

Training:  58%|█████████████████████▋               | 1756/3002 [3:40:44<1:20:19,  3.87s/it][A
Training:  59%|█████████████████████▋               | 1757/3002 [3:40:48<1:19:54,  3.85s/it][A
Training:  59%|█████████████████████▋               | 1758/3002 [3:40:52<1:18:05,  3.77s/it][A
Training:  59%|█████████████████████▋               | 1759/3002 [3:40:56<1:18:57,  3.81s/it][A
Training:  59%|█████████████████████▋               | 1760/3002 [3:40:59<1:16:22,  3.69s/it][A
Training:  59%|█████████████████████▋               | 1761/3002 [3:41:03<1:15:39,  3.66s/it][A
Training:  59%|█████████████████████▋               | 1762/3002 [3:41:07<1:19:03,  3.83s/it][A
Training:  59%|█████████████████████▋               | 1763/3002 [3:41:10<1:17:37,  3.76s/it][A
Training:  59%|█████████████████████▋               | 1764/3002 [3:41:14<1:17:00,  3.73s/it][A
Training:  59%|█████████████████████▊               | 1765/3002 [3:41:18<1:16:58,  3.73s/it][A
Training:  59%|█████████████████████▊   

Evaluation...:  12%|████▌                                  | 39/334 [00:39<05:26,  1.11s/it][A[A

Evaluation...:  12%|████▋                                  | 40/334 [00:40<05:09,  1.05s/it][A[A

Evaluation...:  12%|████▊                                  | 41/334 [00:41<05:01,  1.03s/it][A[A

Evaluation...:  13%|████▉                                  | 42/334 [00:42<05:03,  1.04s/it][A[A

Evaluation...:  13%|█████                                  | 43/334 [00:43<04:52,  1.01s/it][A[A

Evaluation...:  13%|█████▏                                 | 44/334 [00:44<05:07,  1.06s/it][A[A

Evaluation...:  13%|█████▎                                 | 45/334 [00:45<04:59,  1.04s/it][A[A

Evaluation...:  14%|█████▎                                 | 46/334 [00:46<04:48,  1.00s/it][A[A

Evaluation...:  14%|█████▍                                 | 47/334 [00:47<04:47,  1.00s/it][A[A

Evaluation...:  14%|█████▌                                 | 48/334 [00:48<05:25,  1.14s/it][A[A



Evaluation...:  36%|█████████████▋                        | 120/334 [02:00<03:35,  1.01s/it][A[A

Evaluation...:  36%|█████████████▊                        | 121/334 [02:01<03:47,  1.07s/it][A[A

Evaluation...:  37%|█████████████▉                        | 122/334 [02:02<03:50,  1.09s/it][A[A

Evaluation...:  37%|█████████████▉                        | 123/334 [02:03<03:39,  1.04s/it][A[A

Evaluation...:  37%|██████████████                        | 124/334 [02:04<03:39,  1.05s/it][A[A

Evaluation...:  37%|██████████████▏                       | 125/334 [02:05<03:41,  1.06s/it][A[A

Evaluation...:  38%|██████████████▎                       | 126/334 [02:06<03:33,  1.03s/it][A[A

Evaluation...:  38%|██████████████▍                       | 127/334 [02:07<03:26,  1.00it/s][A[A

Evaluation...:  38%|██████████████▌                       | 128/334 [02:08<03:21,  1.02it/s][A[A

Evaluation...:  39%|██████████████▋                       | 129/334 [02:09<03:19,  1.03it/s][A[A



Evaluation...:  60%|██████████████████████▊               | 201/334 [03:22<02:05,  1.06it/s][A[A

Evaluation...:  60%|██████████████████████▉               | 202/334 [03:23<02:04,  1.06it/s][A[A

Evaluation...:  61%|███████████████████████               | 203/334 [03:25<02:21,  1.08s/it][A[A

Evaluation...:  61%|███████████████████████▏              | 204/334 [03:26<02:16,  1.05s/it][A[A

Evaluation...:  61%|███████████████████████▎              | 205/334 [03:27<02:17,  1.07s/it][A[A

Evaluation...:  62%|███████████████████████▍              | 206/334 [03:28<02:11,  1.03s/it][A[A

Evaluation...:  62%|███████████████████████▌              | 207/334 [03:29<02:03,  1.03it/s][A[A

Evaluation...:  62%|███████████████████████▋              | 208/334 [03:30<02:05,  1.01it/s][A[A

Evaluation...:  63%|███████████████████████▊              | 209/334 [03:31<02:01,  1.03it/s][A[A

Evaluation...:  63%|███████████████████████▉              | 210/334 [03:32<02:09,  1.04s/it][A[A



Evaluation...:  84%|████████████████████████████████      | 282/334 [04:46<00:55,  1.07s/it][A[A

Evaluation...:  85%|████████████████████████████████▏     | 283/334 [04:47<00:52,  1.02s/it][A[A

Evaluation...:  85%|████████████████████████████████▎     | 284/334 [04:48<00:49,  1.00it/s][A[A

Evaluation...:  85%|████████████████████████████████▍     | 285/334 [04:49<00:48,  1.02it/s][A[A

Evaluation...:  86%|████████████████████████████████▌     | 286/334 [04:50<00:46,  1.03it/s][A[A

Evaluation...:  86%|████████████████████████████████▋     | 287/334 [04:51<00:46,  1.01it/s][A[A

Evaluation...:  86%|████████████████████████████████▊     | 288/334 [04:52<00:49,  1.08s/it][A[A

Evaluation...:  87%|████████████████████████████████▉     | 289/334 [04:53<00:45,  1.02s/it][A[A

Evaluation...:  87%|████████████████████████████████▉     | 290/334 [04:54<00:45,  1.03s/it][A[A

Evaluation...:  87%|█████████████████████████████████     | 291/334 [04:55<00:43,  1.02s/it][A[A



Training:  61%|██████████████████████▌              | 1830/3002 [3:51:09<1:19:25,  4.07s/it][A
Training:  61%|██████████████████████▌              | 1831/3002 [3:51:13<1:20:37,  4.13s/it][A
Training:  61%|██████████████████████▌              | 1832/3002 [3:51:17<1:19:00,  4.05s/it][A
Training:  61%|██████████████████████▌              | 1833/3002 [3:51:21<1:16:48,  3.94s/it][A
Training:  61%|██████████████████████▌              | 1834/3002 [3:51:25<1:15:11,  3.86s/it][A
Training:  61%|██████████████████████▌              | 1835/3002 [3:51:29<1:16:38,  3.94s/it][A
Training:  61%|██████████████████████▋              | 1836/3002 [3:51:32<1:14:55,  3.86s/it][A
Training:  61%|██████████████████████▋              | 1837/3002 [3:51:36<1:15:42,  3.90s/it][A
Training:  61%|██████████████████████▋              | 1838/3002 [3:51:48<1:59:19,  6.15s/it][A
Training:  61%|██████████████████████              | 1839/3002 [3:55:19<21:49:48, 67.57s/it][A
Training:  61%|██████████████████████   

Training:  64%|███████████████████████▌             | 1915/3002 [4:00:13<1:11:25,  3.94s/it][A
Training:  64%|███████████████████████▌             | 1916/3002 [4:00:17<1:10:41,  3.91s/it][A
Training:  64%|███████████████████████▋             | 1917/3002 [4:00:20<1:07:58,  3.76s/it][A
Training:  64%|███████████████████████▋             | 1918/3002 [4:00:24<1:07:24,  3.73s/it][A
Training:  64%|███████████████████████▋             | 1919/3002 [4:00:28<1:08:34,  3.80s/it][A
Training:  64%|███████████████████████▋             | 1920/3002 [4:00:32<1:11:06,  3.94s/it][A
Training:  64%|███████████████████████▋             | 1921/3002 [4:00:36<1:11:49,  3.99s/it][A
Training:  64%|███████████████████████▋             | 1922/3002 [4:00:40<1:11:10,  3.95s/it][A
Training:  64%|███████████████████████▋             | 1923/3002 [4:00:45<1:15:35,  4.20s/it][A
Training:  64%|███████████████████████▋             | 1924/3002 [4:00:49<1:15:58,  4.23s/it][A
Training:  64%|███████████████████████▋ 

Training:  67%|████████████████████████▋            | 2000/3002 [4:05:54<1:08:00,  4.07s/it][A
Training:  67%|████████████████████████▋            | 2001/3002 [4:05:58<1:06:01,  3.96s/it][A
Training:  67%|████████████████████████▋            | 2002/3002 [4:06:02<1:04:32,  3.87s/it][A
Training:  67%|████████████████████████▋            | 2003/3002 [4:06:06<1:06:14,  3.98s/it][A
Training:  67%|████████████████████████▋            | 2004/3002 [4:06:11<1:09:43,  4.19s/it][A
Training:  67%|████████████████████████▋            | 2005/3002 [4:06:14<1:07:03,  4.04s/it][A
Training:  67%|████████████████████████▋            | 2006/3002 [4:06:18<1:05:44,  3.96s/it][A
Training:  67%|████████████████████████▋            | 2007/3002 [4:06:23<1:08:04,  4.11s/it][A
Training:  67%|████████████████████████▋            | 2008/3002 [4:06:26<1:05:43,  3.97s/it][A
Training:  67%|████████████████████████▊            | 2009/3002 [4:06:30<1:04:37,  3.91s/it][A
Training:  67%|████████████████████████▊

Training:  69%|█████████████████████████▋           | 2085/3002 [4:11:37<1:02:52,  4.11s/it][A
Training:  69%|█████████████████████████▋           | 2086/3002 [4:11:43<1:08:24,  4.48s/it][A
Training:  70%|█████████████████████████▋           | 2087/3002 [4:11:48<1:10:46,  4.64s/it][A
Training:  70%|█████████████████████████▋           | 2088/3002 [4:11:52<1:08:13,  4.48s/it][A
Training:  70%|█████████████████████████▋           | 2089/3002 [4:11:56<1:05:40,  4.32s/it][A
Training:  70%|█████████████████████████▊           | 2090/3002 [4:12:00<1:04:36,  4.25s/it][A
Training:  70%|█████████████████████████▊           | 2091/3002 [4:12:04<1:02:47,  4.14s/it][A
Training:  70%|█████████████████████████▊           | 2092/3002 [4:12:08<1:04:04,  4.22s/it][A
Training:  70%|█████████████████████████▊           | 2093/3002 [4:12:12<1:02:42,  4.14s/it][A
Training:  70%|█████████████████████████▊           | 2094/3002 [4:12:16<1:01:42,  4.08s/it][A
Training:  70%|█████████████████████████

Evaluation...:  20%|███████▊                               | 67/334 [01:15<05:03,  1.14s/it][A[A

Evaluation...:  20%|███████▉                               | 68/334 [01:16<05:04,  1.15s/it][A[A

Evaluation...:  21%|████████                               | 69/334 [01:17<04:51,  1.10s/it][A[A

Evaluation...:  21%|████████▏                              | 70/334 [01:18<04:48,  1.09s/it][A[A

Evaluation...:  21%|████████▎                              | 71/334 [01:19<04:41,  1.07s/it][A[A

Evaluation...:  22%|████████▍                              | 72/334 [01:20<04:35,  1.05s/it][A[A

Evaluation...:  22%|████████▌                              | 73/334 [01:21<04:31,  1.04s/it][A[A

Evaluation...:  22%|████████▋                              | 74/334 [01:22<04:46,  1.10s/it][A[A

Evaluation...:  22%|████████▊                              | 75/334 [01:23<04:38,  1.08s/it][A[A

Evaluation...:  23%|████████▊                              | 76/334 [01:24<04:32,  1.06s/it][A[A



Evaluation...:  44%|████████████████▊                     | 148/334 [02:43<03:26,  1.11s/it][A[A

Evaluation...:  45%|████████████████▉                     | 149/334 [02:44<03:18,  1.07s/it][A[A

Evaluation...:  45%|█████████████████                     | 150/334 [02:45<03:15,  1.06s/it][A[A

Evaluation...:  45%|█████████████████▏                    | 151/334 [02:46<03:25,  1.12s/it][A[A

Evaluation...:  46%|█████████████████▎                    | 152/334 [02:47<03:18,  1.09s/it][A[A

Evaluation...:  46%|█████████████████▍                    | 153/334 [02:48<03:12,  1.07s/it][A[A

Evaluation...:  46%|█████████████████▌                    | 154/334 [02:49<03:11,  1.06s/it][A[A

Evaluation...:  46%|█████████████████▋                    | 155/334 [02:50<03:07,  1.05s/it][A[A

Evaluation...:  47%|█████████████████▋                    | 156/334 [02:51<03:14,  1.09s/it][A[A

Evaluation...:  47%|█████████████████▊                    | 157/334 [02:52<03:09,  1.07s/it][A[A



Evaluation...:  69%|██████████████████████████            | 229/334 [04:12<01:57,  1.12s/it][A[A

Evaluation...:  69%|██████████████████████████▏           | 230/334 [04:13<01:58,  1.14s/it][A[A

Evaluation...:  69%|██████████████████████████▎           | 231/334 [04:14<01:53,  1.10s/it][A[A

Evaluation...:  69%|██████████████████████████▍           | 232/334 [04:15<01:50,  1.08s/it][A[A

Evaluation...:  70%|██████████████████████████▌           | 233/334 [04:16<01:48,  1.08s/it][A[A

Evaluation...:  70%|██████████████████████████▌           | 234/334 [04:18<01:57,  1.18s/it][A[A

Evaluation...:  70%|██████████████████████████▋           | 235/334 [04:19<01:58,  1.20s/it][A[A

Evaluation...:  71%|██████████████████████████▊           | 236/334 [04:20<01:54,  1.17s/it][A[A

Evaluation...:  71%|██████████████████████████▉           | 237/334 [04:21<01:49,  1.13s/it][A[A

Evaluation...:  71%|███████████████████████████           | 238/334 [04:22<01:45,  1.10s/it][A[A



Evaluation...:  93%|███████████████████████████████████▎  | 310/334 [05:44<00:25,  1.08s/it][A[A

Evaluation...:  93%|███████████████████████████████████▍  | 311/334 [05:45<00:24,  1.06s/it][A[A

Evaluation...:  93%|███████████████████████████████████▍  | 312/334 [05:46<00:22,  1.04s/it][A[A

Evaluation...:  94%|███████████████████████████████████▌  | 313/334 [05:47<00:21,  1.03s/it][A[A

Evaluation...:  94%|███████████████████████████████████▋  | 314/334 [05:48<00:20,  1.04s/it][A[A

Evaluation...:  94%|███████████████████████████████████▊  | 315/334 [05:49<00:19,  1.03s/it][A[A

Evaluation...:  95%|███████████████████████████████████▉  | 316/334 [05:50<00:19,  1.06s/it][A[A

Evaluation...:  95%|████████████████████████████████████  | 317/334 [05:51<00:18,  1.12s/it][A[A

Evaluation...:  95%|████████████████████████████████████▏ | 318/334 [05:52<00:17,  1.08s/it][A[A

Evaluation...:  96%|████████████████████████████████████▎ | 319/334 [05:53<00:16,  1.10s/it][A[A



Training:  72%|████████████████████████████           | 2159/3002 [4:23:04<57:53,  4.12s/it][A
Training:  72%|████████████████████████████           | 2160/3002 [4:23:07<56:41,  4.04s/it][A
Training:  72%|████████████████████████████           | 2161/3002 [4:23:12<58:14,  4.16s/it][A
Training:  72%|██████████████████████████▋          | 2162/3002 [4:23:17<1:02:15,  4.45s/it][A
Training:  72%|██████████████████████████▋          | 2163/3002 [4:23:21<1:00:02,  4.29s/it][A
Training:  72%|████████████████████████████           | 2164/3002 [4:23:25<58:48,  4.21s/it][A
Training:  72%|██████████████████████████▋          | 2165/3002 [4:23:30<1:00:26,  4.33s/it][A
Training:  72%|████████████████████████████▏          | 2166/3002 [4:23:33<58:42,  4.21s/it][A
Training:  72%|████████████████████████████▏          | 2167/3002 [4:23:37<57:33,  4.14s/it][A
Training:  72%|████████████████████████████▏          | 2168/3002 [4:23:42<57:46,  4.16s/it][A
Training:  72%|█████████████████████████

Training:  75%|█████████████████████████████▏         | 2244/3002 [4:29:02<52:14,  4.14s/it][A
Training:  75%|█████████████████████████████▏         | 2245/3002 [4:29:06<51:16,  4.06s/it][A
Training:  75%|█████████████████████████████▏         | 2246/3002 [4:29:10<50:39,  4.02s/it][A
Training:  75%|█████████████████████████████▏         | 2247/3002 [4:29:14<52:22,  4.16s/it][A
Training:  75%|█████████████████████████████▏         | 2248/3002 [4:29:19<52:24,  4.17s/it][A
Training:  75%|█████████████████████████████▏         | 2249/3002 [4:29:23<51:19,  4.09s/it][A
Training:  75%|█████████████████████████████▏         | 2250/3002 [4:29:27<50:36,  4.04s/it][A
Training:  75%|█████████████████████████████▏         | 2251/3002 [4:29:30<50:01,  4.00s/it][A
Training:  75%|█████████████████████████████▎         | 2252/3002 [4:29:34<49:38,  3.97s/it][A
Training:  75%|█████████████████████████████▎         | 2253/3002 [4:29:38<47:51,  3.83s/it][A
Training:  75%|█████████████████████████

Training:  78%|██████████████████████████████▎        | 2329/3002 [4:34:56<51:34,  4.60s/it][A
Training:  78%|██████████████████████████████▎        | 2330/3002 [4:35:00<49:20,  4.41s/it][A
Training:  78%|██████████████████████████████▎        | 2331/3002 [4:35:05<48:32,  4.34s/it][A
Training:  78%|██████████████████████████████▎        | 2332/3002 [4:35:08<47:14,  4.23s/it][A
Training:  78%|██████████████████████████████▎        | 2333/3002 [4:35:12<45:50,  4.11s/it][A
Training:  78%|██████████████████████████████▎        | 2334/3002 [4:35:16<45:20,  4.07s/it][A
Training:  78%|██████████████████████████████▎        | 2335/3002 [4:35:20<44:10,  3.97s/it][A
Training:  78%|██████████████████████████████▎        | 2336/3002 [4:35:24<43:59,  3.96s/it][A
Training:  78%|██████████████████████████████▎        | 2337/3002 [4:35:28<44:42,  4.03s/it][A
Training:  78%|██████████████████████████████▎        | 2338/3002 [4:35:32<44:21,  4.01s/it][A
Training:  78%|█████████████████████████

Evaluation...:   4%|█▌                                     | 13/334 [00:14<05:50,  1.09s/it][A[A

Evaluation...:   4%|█▋                                     | 14/334 [00:15<05:39,  1.06s/it][A[A

Evaluation...:   4%|█▊                                     | 15/334 [00:16<06:04,  1.14s/it][A[A

Evaluation...:   5%|█▊                                     | 16/334 [00:17<05:51,  1.11s/it][A[A

Evaluation...:   5%|█▉                                     | 17/334 [00:18<06:00,  1.14s/it][A[A

Evaluation...:   5%|██                                     | 18/334 [00:19<05:54,  1.12s/it][A[A

Evaluation...:   6%|██▏                                    | 19/334 [00:21<06:09,  1.17s/it][A[A

Evaluation...:   6%|██▎                                    | 20/334 [00:22<05:53,  1.13s/it][A[A

Evaluation...:   6%|██▍                                    | 21/334 [00:23<05:41,  1.09s/it][A[A

Evaluation...:   7%|██▌                                    | 22/334 [00:24<05:33,  1.07s/it][A[A



Evaluation...:  28%|██████████▉                            | 94/334 [01:43<04:11,  1.05s/it][A[A

Evaluation...:  28%|███████████                            | 95/334 [01:44<04:07,  1.04s/it][A[A

Evaluation...:  29%|███████████▏                           | 96/334 [01:45<04:05,  1.03s/it][A[A

Evaluation...:  29%|███████████▎                           | 97/334 [01:46<04:01,  1.02s/it][A[A

Evaluation...:  29%|███████████▍                           | 98/334 [01:47<04:01,  1.02s/it][A[A

Evaluation...:  30%|███████████▌                           | 99/334 [01:48<04:18,  1.10s/it][A[A

Evaluation...:  30%|███████████▍                          | 100/334 [01:49<04:09,  1.06s/it][A[A

Evaluation...:  30%|███████████▍                          | 101/334 [01:50<04:04,  1.05s/it][A[A

Evaluation...:  31%|███████████▌                          | 102/334 [01:52<04:32,  1.18s/it][A[A

Evaluation...:  31%|███████████▋                          | 103/334 [01:53<04:25,  1.15s/it][A[A



Evaluation...:  52%|███████████████████▉                  | 175/334 [03:13<02:48,  1.06s/it][A[A

Evaluation...:  53%|████████████████████                  | 176/334 [03:14<02:58,  1.13s/it][A[A

Evaluation...:  53%|████████████████████▏                 | 177/334 [03:15<02:52,  1.10s/it][A[A

Evaluation...:  53%|████████████████████▎                 | 178/334 [03:16<02:46,  1.06s/it][A[A

Evaluation...:  54%|████████████████████▎                 | 179/334 [03:17<02:53,  1.12s/it][A[A

Evaluation...:  54%|████████████████████▍                 | 180/334 [03:19<03:02,  1.19s/it][A[A

Evaluation...:  54%|████████████████████▌                 | 181/334 [03:20<02:53,  1.14s/it][A[A

Evaluation...:  54%|████████████████████▋                 | 182/334 [03:21<02:52,  1.13s/it][A[A

Evaluation...:  55%|████████████████████▊                 | 183/334 [03:22<02:45,  1.10s/it][A[A

Evaluation...:  55%|████████████████████▉                 | 184/334 [03:23<02:44,  1.10s/it][A[A



Evaluation...:  77%|█████████████████████████████▏        | 256/334 [04:43<01:29,  1.15s/it][A[A

Evaluation...:  77%|█████████████████████████████▏        | 257/334 [04:44<01:28,  1.14s/it][A[A

Evaluation...:  77%|█████████████████████████████▎        | 258/334 [04:45<01:26,  1.13s/it][A[A

Evaluation...:  78%|█████████████████████████████▍        | 259/334 [04:47<01:32,  1.23s/it][A[A

Evaluation...:  78%|█████████████████████████████▌        | 260/334 [04:48<01:29,  1.21s/it][A[A

Evaluation...:  78%|█████████████████████████████▋        | 261/334 [04:49<01:23,  1.14s/it][A[A

Evaluation...:  78%|█████████████████████████████▊        | 262/334 [04:50<01:23,  1.16s/it][A[A

Evaluation...:  79%|█████████████████████████████▉        | 263/334 [04:51<01:27,  1.23s/it][A[A

Evaluation...:  79%|██████████████████████████████        | 264/334 [04:53<01:28,  1.26s/it][A[A

Evaluation...:  79%|██████████████████████████████▏       | 265/334 [04:54<01:22,  1.19s/it][A[A



Training:  80%|█████████████████████████████▌       | 2403/3002 [4:46:19<7:02:32, 42.32s/it][A
Training:  80%|█████████████████████████████▋       | 2404/3002 [4:46:23<5:09:08, 31.02s/it][A
Training:  80%|█████████████████████████████▋       | 2405/3002 [4:46:27<3:48:06, 22.93s/it][A
Training:  80%|█████████████████████████████▋       | 2406/3002 [4:46:32<2:53:12, 17.44s/it][A
Training:  80%|█████████████████████████████▋       | 2407/3002 [4:46:36<2:13:55, 13.50s/it][A
Training:  80%|█████████████████████████████▋       | 2408/3002 [4:46:40<1:45:24, 10.65s/it][A
Training:  80%|█████████████████████████████▋       | 2409/3002 [4:46:44<1:25:41,  8.67s/it][A
Training:  80%|█████████████████████████████▋       | 2410/3002 [4:46:50<1:15:46,  7.68s/it][A
Training:  80%|█████████████████████████████▋       | 2411/3002 [4:46:54<1:04:48,  6.58s/it][A
Training:  80%|███████████████████████████████▎       | 2412/3002 [4:46:58<57:12,  5.82s/it][A
Training:  80%|█████████████████████████

Training:  83%|████████████████████████████████▎      | 2488/3002 [4:52:15<34:58,  4.08s/it][A
Training:  83%|████████████████████████████████▎      | 2489/3002 [4:52:19<34:36,  4.05s/it][A
Training:  83%|████████████████████████████████▎      | 2490/3002 [4:52:23<34:23,  4.03s/it][A
Training:  83%|████████████████████████████████▎      | 2491/3002 [4:52:28<36:57,  4.34s/it][A
Training:  83%|████████████████████████████████▎      | 2492/3002 [4:52:32<36:01,  4.24s/it][A
Training:  83%|████████████████████████████████▍      | 2493/3002 [4:52:36<35:24,  4.17s/it][A
Training:  83%|████████████████████████████████▍      | 2494/3002 [4:52:41<35:21,  4.18s/it][A
Training:  83%|████████████████████████████████▍      | 2495/3002 [4:52:44<33:18,  3.94s/it][A
Training:  83%|████████████████████████████████▍      | 2496/3002 [4:52:48<34:02,  4.04s/it][A
Training:  83%|████████████████████████████████▍      | 2497/3002 [4:52:52<33:22,  3.97s/it][A
Training:  83%|█████████████████████████

Training:  86%|█████████████████████████████████▍     | 2573/3002 [4:58:18<30:19,  4.24s/it][A
Training:  86%|█████████████████████████████████▍     | 2574/3002 [4:58:22<29:43,  4.17s/it][A
Training:  86%|█████████████████████████████████▍     | 2575/3002 [4:58:26<29:18,  4.12s/it][A
Training:  86%|█████████████████████████████████▍     | 2576/3002 [4:58:30<28:56,  4.08s/it][A
Training:  86%|█████████████████████████████████▍     | 2577/3002 [4:58:34<29:46,  4.20s/it][A
Training:  86%|█████████████████████████████████▍     | 2578/3002 [4:58:39<29:59,  4.24s/it][A
Training:  86%|█████████████████████████████████▌     | 2579/3002 [4:58:43<29:21,  4.17s/it][A
Training:  86%|█████████████████████████████████▌     | 2580/3002 [4:58:47<28:59,  4.12s/it][A
Training:  86%|█████████████████████████████████▌     | 2581/3002 [4:58:51<29:52,  4.26s/it][A
Training:  86%|█████████████████████████████████▌     | 2582/3002 [4:58:55<29:01,  4.15s/it][A
Training:  86%|█████████████████████████

Training:  89%|██████████████████████████████████▌    | 2658/3002 [5:04:07<21:35,  3.77s/it][A
Training:  89%|██████████████████████████████████▌    | 2659/3002 [5:04:11<21:47,  3.81s/it][A
Training:  89%|██████████████████████████████████▌    | 2660/3002 [5:04:15<21:42,  3.81s/it][A
Training:  89%|██████████████████████████████████▌    | 2661/3002 [5:04:19<21:50,  3.84s/it][A
Training:  89%|██████████████████████████████████▌    | 2662/3002 [5:04:23<22:25,  3.96s/it][A
Training:  89%|██████████████████████████████████▌    | 2663/3002 [5:04:26<21:45,  3.85s/it][A
Training:  89%|██████████████████████████████████▌    | 2664/3002 [5:04:30<21:06,  3.75s/it][A
Training:  89%|██████████████████████████████████▌    | 2665/3002 [5:04:34<22:03,  3.93s/it][A
Training:  89%|██████████████████████████████████▋    | 2666/3002 [5:04:38<22:10,  3.96s/it][A
Training:  89%|██████████████████████████████████▋    | 2667/3002 [5:04:42<22:03,  3.95s/it][A
Training:  89%|█████████████████████████

Evaluation...:  12%|████▊                                  | 41/334 [00:43<05:15,  1.08s/it][A[A

Evaluation...:  13%|████▉                                  | 42/334 [00:44<05:17,  1.09s/it][A[A

Evaluation...:  13%|█████                                  | 43/334 [00:45<05:05,  1.05s/it][A[A

Evaluation...:  13%|█████▏                                 | 44/334 [00:46<05:20,  1.10s/it][A[A

Evaluation...:  13%|█████▎                                 | 45/334 [00:47<05:12,  1.08s/it][A[A

Evaluation...:  14%|█████▎                                 | 46/334 [00:48<05:01,  1.05s/it][A[A

Evaluation...:  14%|█████▍                                 | 47/334 [00:49<05:00,  1.05s/it][A[A

Evaluation...:  14%|█████▌                                 | 48/334 [00:50<05:38,  1.18s/it][A[A

Evaluation...:  15%|█████▋                                 | 49/334 [00:51<05:31,  1.16s/it][A[A

Evaluation...:  15%|█████▊                                 | 50/334 [00:52<05:08,  1.09s/it][A[A



Evaluation...:  37%|█████████████▉                        | 122/334 [02:07<03:59,  1.13s/it][A[A

Evaluation...:  37%|█████████████▉                        | 123/334 [02:08<03:47,  1.08s/it][A[A

Evaluation...:  37%|██████████████                        | 124/334 [02:09<03:48,  1.09s/it][A[A

Evaluation...:  37%|██████████████▏                       | 125/334 [02:10<03:50,  1.10s/it][A[A

Evaluation...:  38%|██████████████▎                       | 126/334 [02:11<03:42,  1.07s/it][A[A

Evaluation...:  38%|██████████████▍                       | 127/334 [02:12<03:34,  1.04s/it][A[A

Evaluation...:  38%|██████████████▌                       | 128/334 [02:13<03:31,  1.02s/it][A[A

Evaluation...:  39%|██████████████▋                       | 129/334 [02:14<03:28,  1.02s/it][A[A

Evaluation...:  39%|██████████████▊                       | 130/334 [02:15<03:23,  1.00it/s][A[A

Evaluation...:  39%|██████████████▉                       | 131/334 [02:16<03:30,  1.03s/it][A[A



Evaluation...:  61%|███████████████████████               | 203/334 [03:33<02:27,  1.13s/it][A[A

Evaluation...:  61%|███████████████████████▏              | 204/334 [03:34<02:22,  1.09s/it][A[A

Evaluation...:  61%|███████████████████████▎              | 205/334 [03:35<02:24,  1.12s/it][A[A

Evaluation...:  62%|███████████████████████▍              | 206/334 [03:36<02:16,  1.07s/it][A[A

Evaluation...:  62%|███████████████████████▌              | 207/334 [03:37<02:07,  1.01s/it][A[A

Evaluation...:  62%|███████████████████████▋              | 208/334 [03:38<02:09,  1.03s/it][A[A

Evaluation...:  63%|███████████████████████▊              | 209/334 [03:39<02:06,  1.01s/it][A[A

Evaluation...:  63%|███████████████████████▉              | 210/334 [03:40<02:14,  1.08s/it][A[A

Evaluation...:  63%|████████████████████████              | 211/334 [03:42<02:18,  1.13s/it][A[A

Evaluation...:  63%|████████████████████████              | 212/334 [03:43<02:10,  1.07s/it][A[A



Evaluation...:  85%|████████████████████████████████▎     | 284/334 [05:00<00:52,  1.05s/it][A[A

Evaluation...:  85%|████████████████████████████████▍     | 285/334 [05:01<00:50,  1.04s/it][A[A

Evaluation...:  86%|████████████████████████████████▌     | 286/334 [05:02<00:48,  1.02s/it][A[A

Evaluation...:  86%|████████████████████████████████▋     | 287/334 [05:04<00:48,  1.04s/it][A[A

Evaluation...:  86%|████████████████████████████████▊     | 288/334 [05:05<00:52,  1.13s/it][A[A

Evaluation...:  87%|████████████████████████████████▉     | 289/334 [05:06<00:48,  1.07s/it][A[A

Evaluation...:  87%|████████████████████████████████▉     | 290/334 [05:07<00:47,  1.08s/it][A[A

Evaluation...:  87%|█████████████████████████████████     | 291/334 [05:08<00:46,  1.07s/it][A[A

Evaluation...:  87%|█████████████████████████████████▏    | 292/334 [05:09<00:43,  1.04s/it][A[A

Evaluation...:  88%|█████████████████████████████████▎    | 293/334 [05:10<00:41,  1.02s/it][A[A



Training:  91%|███████████████████████████████████▍   | 2728/3002 [5:14:44<18:21,  4.02s/it][A
Training:  91%|███████████████████████████████████▍   | 2729/3002 [5:14:48<17:57,  3.95s/it][A
Training:  91%|███████████████████████████████████▍   | 2730/3002 [5:14:51<17:42,  3.90s/it][A
Training:  91%|███████████████████████████████████▍   | 2731/3002 [5:14:55<17:29,  3.87s/it][A
Training:  91%|███████████████████████████████████▍   | 2732/3002 [5:14:59<17:09,  3.81s/it][A
Training:  91%|███████████████████████████████████▌   | 2733/3002 [5:15:03<16:59,  3.79s/it][A
Training:  91%|███████████████████████████████████▌   | 2734/3002 [5:15:06<16:47,  3.76s/it][A
Training:  91%|███████████████████████████████████▌   | 2735/3002 [5:15:10<16:58,  3.82s/it][A
Training:  91%|███████████████████████████████████▌   | 2736/3002 [5:15:15<18:36,  4.20s/it][A
Training:  91%|███████████████████████████████████▌   | 2737/3002 [5:15:19<18:05,  4.10s/it][A
Training:  91%|█████████████████████████

Training:  94%|████████████████████████████████████▌  | 2813/3002 [5:20:24<12:36,  4.00s/it][A
Training:  94%|████████████████████████████████████▌  | 2814/3002 [5:20:28<12:31,  3.99s/it][A
Training:  94%|████████████████████████████████████▌  | 2815/3002 [5:20:32<12:41,  4.07s/it][A
Training:  94%|████████████████████████████████████▌  | 2816/3002 [5:20:36<12:29,  4.03s/it][A
Training:  94%|████████████████████████████████████▌  | 2817/3002 [5:20:40<12:07,  3.93s/it][A
Training:  94%|████████████████████████████████████▌  | 2818/3002 [5:20:43<11:55,  3.89s/it][A
Training:  94%|████████████████████████████████████▌  | 2819/3002 [5:20:47<11:46,  3.86s/it][A
Training:  94%|████████████████████████████████████▋  | 2820/3002 [5:20:51<11:50,  3.90s/it][A
Training:  94%|████████████████████████████████████▋  | 2821/3002 [5:20:55<11:40,  3.87s/it][A
Training:  94%|████████████████████████████████████▋  | 2822/3002 [5:20:59<11:29,  3.83s/it][A
Training:  94%|█████████████████████████

Training:  97%|█████████████████████████████████████▋ | 2898/3002 [5:26:07<07:03,  4.07s/it][A
Training:  97%|█████████████████████████████████████▋ | 2899/3002 [5:26:10<06:49,  3.97s/it][A
Training:  97%|█████████████████████████████████████▋ | 2900/3002 [5:26:14<06:37,  3.90s/it][A
Training:  97%|█████████████████████████████████████▋ | 2901/3002 [5:26:18<06:29,  3.85s/it][A
Training:  97%|█████████████████████████████████████▋ | 2902/3002 [5:26:21<06:20,  3.80s/it][A
Training:  97%|█████████████████████████████████████▋ | 2903/3002 [5:26:26<06:38,  4.02s/it][A
Training:  97%|█████████████████████████████████████▋ | 2904/3002 [5:26:30<06:35,  4.04s/it][A
Training:  97%|█████████████████████████████████████▋ | 2905/3002 [5:26:34<06:21,  3.93s/it][A
Training:  97%|█████████████████████████████████████▊ | 2906/3002 [5:26:38<06:11,  3.87s/it][A
Training:  97%|█████████████████████████████████████▊ | 2907/3002 [5:26:41<06:04,  3.84s/it][A
Training:  97%|█████████████████████████

Training:  99%|██████████████████████████████████████▊| 2983/3002 [5:31:36<01:15,  3.96s/it][A
Training:  99%|██████████████████████████████████████▊| 2984/3002 [5:31:40<01:12,  4.00s/it][A
Training:  99%|██████████████████████████████████████▊| 2985/3002 [5:31:44<01:06,  3.92s/it][A
Training:  99%|██████████████████████████████████████▊| 2986/3002 [5:31:48<01:01,  3.87s/it][A
Training: 100%|██████████████████████████████████████▊| 2987/3002 [5:31:52<00:58,  3.93s/it][A
Training: 100%|██████████████████████████████████████▊| 2988/3002 [5:31:56<00:55,  3.97s/it][A
Training: 100%|██████████████████████████████████████▊| 2989/3002 [5:32:01<00:54,  4.18s/it][A
Training: 100%|██████████████████████████████████████▊| 2990/3002 [5:32:04<00:47,  3.99s/it][A
Training: 100%|██████████████████████████████████████▊| 2991/3002 [5:32:08<00:43,  3.98s/it][A
Training: 100%|██████████████████████████████████████▊| 2992/3002 [5:32:12<00:40,  4.06s/it][A
Training: 100%|█████████████████████████

Evaluation...:  19%|███████▌                               | 65/334 [01:06<04:34,  1.02s/it][A[A

Evaluation...:  20%|███████▋                               | 66/334 [01:07<04:25,  1.01it/s][A[A

Evaluation...:  20%|███████▊                               | 67/334 [01:08<04:37,  1.04s/it][A[A

Evaluation...:  20%|███████▉                               | 68/334 [01:09<04:38,  1.05s/it][A[A

Evaluation...:  21%|████████                               | 69/334 [01:10<04:24,  1.00it/s][A[A

Evaluation...:  21%|████████▏                              | 70/334 [01:11<04:21,  1.01it/s][A[A

Evaluation...:  21%|████████▎                              | 71/334 [01:12<04:16,  1.03it/s][A[A

Evaluation...:  22%|████████▍                              | 72/334 [01:13<04:11,  1.04it/s][A[A

Evaluation...:  22%|████████▌                              | 73/334 [01:14<04:08,  1.05it/s][A[A

Evaluation...:  22%|████████▋                              | 74/334 [01:15<04:21,  1.01s/it][A[A



Evaluation...:  44%|████████████████▌                     | 146/334 [02:26<03:08,  1.00s/it][A[A

Evaluation...:  44%|████████████████▋                     | 147/334 [02:28<03:15,  1.04s/it][A[A

Evaluation...:  44%|████████████████▊                     | 148/334 [02:29<03:07,  1.01s/it][A[A

Evaluation...:  45%|████████████████▉                     | 149/334 [02:29<03:00,  1.03it/s][A[A

Evaluation...:  45%|█████████████████                     | 150/334 [02:30<02:57,  1.03it/s][A[A

Evaluation...:  45%|█████████████████▏                    | 151/334 [02:32<03:08,  1.03s/it][A[A

Evaluation...:  46%|█████████████████▎                    | 152/334 [02:33<03:01,  1.00it/s][A[A

Evaluation...:  46%|█████████████████▍                    | 153/334 [02:33<02:56,  1.03it/s][A[A

Evaluation...:  46%|█████████████████▌                    | 154/334 [02:34<02:54,  1.03it/s][A[A

Evaluation...:  46%|█████████████████▋                    | 155/334 [02:35<02:51,  1.05it/s][A[A



Evaluation...:  68%|█████████████████████████▊            | 227/334 [03:49<01:40,  1.06it/s][A[A

Evaluation...:  68%|█████████████████████████▉            | 228/334 [03:49<01:39,  1.07it/s][A[A

Evaluation...:  69%|██████████████████████████            | 229/334 [03:51<01:48,  1.03s/it][A[A

Evaluation...:  69%|██████████████████████████▏           | 230/334 [03:52<01:49,  1.05s/it][A[A

Evaluation...:  69%|██████████████████████████▎           | 231/334 [03:53<01:44,  1.02s/it][A[A

Evaluation...:  69%|██████████████████████████▍           | 232/334 [03:54<01:41,  1.00it/s][A[A

Evaluation...:  70%|██████████████████████████▌           | 233/334 [03:55<01:40,  1.01it/s][A[A

Evaluation...:  70%|██████████████████████████▌           | 234/334 [03:56<01:48,  1.09s/it][A[A

Evaluation...:  70%|██████████████████████████▋           | 235/334 [03:57<01:50,  1.11s/it][A[A

Evaluation...:  71%|██████████████████████████▊           | 236/334 [03:58<01:46,  1.09s/it][A[A



Evaluation...:  92%|███████████████████████████████████   | 308/334 [05:13<00:25,  1.01it/s][A[A

Evaluation...:  93%|███████████████████████████████████▏  | 309/334 [05:14<00:24,  1.03it/s][A[A

Evaluation...:  93%|███████████████████████████████████▎  | 310/334 [05:15<00:24,  1.00s/it][A[A

Evaluation...:  93%|███████████████████████████████████▍  | 311/334 [05:16<00:22,  1.02it/s][A[A

Evaluation...:  93%|███████████████████████████████████▍  | 312/334 [05:17<00:20,  1.05it/s][A[A

Evaluation...:  94%|███████████████████████████████████▌  | 313/334 [05:18<00:19,  1.06it/s][A[A

Evaluation...:  94%|███████████████████████████████████▋  | 314/334 [05:19<00:19,  1.05it/s][A[A

Evaluation...:  94%|███████████████████████████████████▊  | 315/334 [05:20<00:17,  1.06it/s][A[A

Evaluation...:  95%|███████████████████████████████████▉  | 316/334 [05:21<00:17,  1.03it/s][A[A

Evaluation...:  95%|████████████████████████████████████  | 317/334 [05:22<00:17,  1.03s/it][A[A



valid f1 score:  0.5536, valid precision score:  0.5815,
                valid recall score:  0.5514, valid accuracy score:  0.5654


## 预测并保存结果

In [21]:
def predict(config, id2label, model, test_dataloader):
    test_iterator = tqdm(test_dataloader, desc='Testing', total=len(test_dataloader))
    model.eval()
    test_preds = []
    
    with torch.no_grad():
        for batch in test_iterator:
            batch = {item: value.to(config['device']) for item, value in batch.items()}

            logits = model(**batch)[1]
            test_preds.append(logits.argmax(dim=-1).detach().cpu())
            
    test_preds = torch.cat(test_preds, dim=0).numpy()
    test_preds = [id2label[id_] for id_ in test_preds]
        
    test_df = pd.read_csv(config['test_file_path'], sep=',')
    # test_df.insert(1, column=['label_pred'], value=test_preds)
    test_df['label_pred'] = test_preds
    # test_df.drop(columns=['sentence'], inplace=True)
    test_df.to_csv('submission.csv', index=False, encoding='utf8')

In [22]:
predict(config, id2label, best_model, test_dataloader)

Testing: 100%|████████████████████████████████████████████| 625/625 [11:58<00:00,  1.15s/it]


In [23]:
test_df = pd.read_csv(config['test_file_path'], sep=',')

In [24]:
train_df = pd.read_csv(config['train_file_path'], sep=',')

In [25]:
train_df.head(10)

Unnamed: 0,id,label,label_desc,sentence
0,0,108,news_edu,上课时学生手机响个不停，老师一怒之下把手机摔了，家长拿发票让老师赔，大家怎么看待这种事？
1,1,104,news_finance,商赢环球股份有限公司关于延期回复上海证券交易所对公司2017年年度报告的事后审核问询函的公告
2,2,106,news_house,通过中介公司买了二手房，首付都付了，现在卖家不想卖了。怎么处理？
3,3,112,news_travel,2018年去俄罗斯看世界杯得花多少钱？
4,4,109,news_tech,剃须刀的个性革新，雷明登天猫定制版新品首发
5,5,103,news_sports,再次证明了“无敌是多么寂寞”——逆天的中国乒乓球队！
6,6,109,news_tech,三农盾SACC-全球首个推出：互联网+区块链+农产品的电商平台
7,7,116,news_game,重做or新英雄？其实重做对暴雪来说同样重要
8,8,103,news_sports,如何在商业活动中不受人欺骗？
9,9,101,news_culture,87版红楼梦最温柔的四个丫鬟，娶谁都是一生的福气


In [26]:
train_df['label'].unique()

array([108, 104, 106, 112, 109, 103, 116, 101, 107, 100, 102, 110, 115,
       113, 114])