# 头条新闻分类Bert Baseline

## BERT要点
#### BERT why self-attention
+ 计算复杂度，self-attention每层的复杂度O(n^2*d) n是句子的长度，d是词向量维度
![](./table1.png)
+ 可以并行
+ 长程依赖，LSTM任意两点之间需要经过一定的距离，Attention任意两点之间可以直接进行计算。

#### 主要贡献
+ BERT使用掩码语言模型，可以使得预训练模型进行双向表示
+ BERT是第一个基于微调的模型

#### Task 1: MASKed LM(遮蔽语言模型)
为了训练双向深度表示，我们按照百分比（15%）随机遮盖一些token，然后仅预测这些别遮盖的词。
被掩盖的词中，
1. 80%的词 被替换成 [MASK]
2. 10%的词 被随机替换
3. 10%的词 不动

#### Task2：Next Sentence Prediction 
(A, B) 其中B有50%的概率是A的下一句，50%的概率是从数据集中随机选择的一句。
如果B是A的下一句标注成isNexT，不是则被标注成NotNext。

## 编写配置

In [1]:
import torch 
import torch.nn as nn

config = {
    'train_file_path': '../../../data/toutiao_news_cls/train.csv',
    'test_file_path': '../../../data/toutiao_news_cls/test.csv',
    'train_val_ratio': 0.1,  # 10%用作验证集
    # ------ 与TextCNN不同的配置 ------
    # 'vocab_size': 10000,   # 词典 3W
    'head': 'cnn',
    'model_path': '../../../pt/bert-base-chinese',
    # ------ 与TextCNN不同的配置 ------
    'batch_size': 16,      # batch 大小 16
    'num_epochs': 1,      # 10次迭代
    'learning_rate': 2e-5, # 学习率
    'logging_step': 300,   # 每跑300个batch记录一次
    'seed': 2022           # 随机种子
}

config['device'] = 'cuda' if torch.cuda.is_available() else 'cpu' # cpu&gpu

import random
import numpy as np

def seed_everything(seed):
    random.seed(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)
    return seed

seed_everything(config['seed'])

2022

## 数据预处理并编写DataLoader

In [2]:
import pandas as pd
from tqdm import tqdm
from collections import defaultdict
from transformers import BertTokenizer
from torch.utils.data import DataLoader

In [3]:
# bert分词器
bertTokenizer = BertTokenizer.from_pretrained(config['model_path'])
# 重写分词器
def tokenizer(sent):
    inputs = bertTokenizer.encode_plus(sent, add_special_tokens=True, return_token_type_ids=True, return_attention_mask=True)
    
    return inputs


In [4]:
def read_data(config, mode='train'):
    
    data_df = pd.read_csv(config[f'{mode}_file_path'], sep=',')
    LABEL, SENTENCE = 'label', 'sentence'
    data_df['bert_encode'] = data_df[SENTENCE].apply(tokenizer)
    data_df['input_ids'] = data_df['bert_encode'].apply(lambda s: s['input_ids'])
    input_ids = np.array([[int(id_) for id_ in v] for v in data_df['input_ids'].values])
    data_df['token_type_ids'] = data_df['bert_encode'].apply(lambda s: s['token_type_ids'])
    token_type_ids = np.array([[int(id_) for id_ in v] for v in data_df['token_type_ids'].values])
    data_df['attention_mask'] = data_df['bert_encode'].apply(lambda s: s['attention_mask'])
    attention_mask = np.array([[int(id_) for id_ in v] for v in data_df['attention_mask'].values])

    if mode == 'train':
        labels = data_df[LABEL].values
        
        X_train, y_train = defaultdict(list), []
        X_val, y_val = defaultdict(list), []
        num_val = int(config['train_val_ratio'] * len(data_df))
        
        # shuffle ids
        ids = np.random.choice(range(len(data_df)), size=len(data_df), replace=False)
        train_ids = ids[num_val:]
        val_ids = ids[:num_val]
        
        # get input_ids
        X_train['input_ids'], y_train = input_ids[train_ids], labels[train_ids]
        X_val['input_ids'], y_val = input_ids[val_ids], labels[val_ids]
         # get token_type_ids
        X_train['token_type_ids'] = token_type_ids[train_ids]
        X_val['token_type_ids'] = token_type_ids[val_ids]
        # get attention_mask
        X_train['attention_mask'] = attention_mask[train_ids]
        X_val['attention_mask'] = attention_mask[val_ids]
     
        # label 
        label2id = {label: i for i, label in enumerate(np.unique(y_train))}
        id2label = {i: label for label, i in label2id.items()}
        y_train = torch.tensor([label2id[y] for y in y_train], dtype=torch.long)
        y_val = torch.tensor([label2id[y] for y in y_val], dtype=torch.long)

        return X_train, y_train, X_val, y_val, label2id, id2label

    else:
        X_test = defaultdict(list)
        X_test['input_ids'] = input_ids
        X_test['token_type_ids'] = token_type_ids
        X_test['attention_mask'] = attention_mask
        y_test = torch.zeros(len(data_df), dtype=torch.long)
        
        return X_test, y_test

In [5]:
X_train, y_train, X_val, y_val, label2id, id2label = read_data(config, mode='train')

  import sys
  if __name__ == '__main__':
  # This is added back by InteractiveShellApp.init_path()


In [6]:
X_test, y_test = read_data(config, mode='test')

  import sys
  if __name__ == '__main__':
  # This is added back by InteractiveShellApp.init_path()


#### Dataset提供数据集的封装，创建/继承Dataset必须实现:
+ __len__: 整个数据集的长度
+ __getitem__: 支持数据集索引的函数

In [7]:
from torch.utils.data import Dataset
class TNEWSDataset(Dataset):
    def __init__(self, X, y):
        self.x = X
        self.y = y

    def __getitem__(self, idx):
        return {
            'input_ids' : self.x['input_ids'][idx],
            'label' : self.y[idx],
            'token_type_ids': self.x['token_type_ids'][idx],
            'attention_mask': self.x['attention_mask'][idx]
        }
    
    def __len__(self):
        return self.y.size(0)

#### 使用DataLoader实现数据集的并行加载
+ DataLoader提供一个可迭代对象，实现数据并行加载，从TNEWSDataset返回一个example，取多次，最后形成一个长度为batch_size的列表examples
+ examples的格式：[dict1, dict2, ...]
+ collate_fn()将examples中的数据合并为Tensor

In [8]:
def collate_fn(examples):
    input_ids_lst = []
    labels = []
    # ------ 与TextCNN不同的地方 ------
    token_type_ids_lst = []
    attention_mask_lst = []
    # ------ 与TextCNN不同的地方 ------

    for example in examples:
        input_ids_lst.append(example['input_ids'])
        labels.append(example['label'])
        # ------ 与TextCNN不同的地方 ------
        token_type_ids_lst.append(example['token_type_ids'])
        attention_mask_lst.append(example['attention_mask'])
        # ------ 与TextCNN不同的地方 ------
        
    # 计算input_ids_lst中最长的句子长度，对齐
    max_length = max(len(input_ids) for input_ids in input_ids_lst)
    # 定义一个Tensor
    input_ids_tensor = torch.zeros((len(labels), max_length), dtype=torch.long)
    # ------ 与TextCNN不同的地方 ------
    token_type_ids_tensor = torch.zeros_like(input_ids_tensor)
    attention_mask_tensor = torch.zeros_like(input_ids_tensor)
    # ------ 与TextCNN不同的地方 ------
    
    for i, input_ids in enumerate(input_ids_lst):
        seq_len = len(input_ids)
        input_ids_tensor[i, :seq_len] = torch.tensor(input_ids, dtype=torch.long)
        # ------ 与TextCNN不同的地方 ------
        token_type_ids_tensor[i, :seq_len] = torch.tensor(token_type_ids_lst[i], dtype=torch.long)
        attention_mask_tensor[i, :seq_len] = torch.tensor(attention_mask_lst[i], dtype=torch.long)
        # ------ 与TextCNN不同的地方 ------
        
    return {
        'input_ids': input_ids_tensor,
        'labels': torch.tensor(labels, dtype=torch.long),
        # ------ 与TextCNN不同的地方 ------
        'token_type_ids': token_type_ids_tensor,
        'attention_mask': attention_mask_tensor
        # ------ 与TextCNN不同的地方 ------
    }

In [9]:
from torch.utils.data import DataLoader

def build_dataloader(config):
    X_train, y_train, X_val, y_val, label2id, id2label = read_data(config, mode='train')
    X_test, y_test = read_data(config, mode='test')
    
    train_dataset = TNEWSDataset(X_train, y_train)
    val_dataset = TNEWSDataset(X_val, y_val)
    test_dataset = TNEWSDataset(X_test, y_test)
    
    train_dataloader = DataLoader(dataset=train_dataset, batch_size=config['batch_size'], num_workers=0, shuffle=True, collate_fn=collate_fn)
    val_dataloader = DataLoader(dataset=val_dataset, batch_size=config['batch_size'], num_workers=0, shuffle=False, collate_fn=collate_fn)
    test_dataloader = DataLoader(dataset=test_dataset, batch_size=config['batch_size'], num_workers=0, shuffle=False, collate_fn=collate_fn)

    return train_dataloader, val_dataloader, test_dataloader, id2label

In [10]:
train_dataloader, val_dataloader, test_dataloader, id2label = build_dataloader(config)

  import sys
  if __name__ == '__main__':
  # This is added back by InteractiveShellApp.init_path()


In [11]:
for batch in train_dataloader:
    print(len(batch['input_ids']), len(batch['labels']), len(batch['token_type_ids']), len(batch['attention_mask']))
    print(batch)
    break

16 16 16 16
{'input_ids': tensor([[ 101,  776,  691, 2356,  966, 5892, 1355, 1283,  783, 8024, 2832, 6598,
         5442,  812,  711,  862, 2898, 5330,  976, 4958,  776,  691, 8043,  102,
            0,    0,    0,    0,    0,    0,    0,    0,    0],
        [ 101,  677, 5468, 8038, 7599, 3235, 1915, 7455, 3449,  771, 7676, 8024,
          678, 5468, 2582,  720, 2190, 8043,  102,    0,    0,    0,    0,    0,
            0,    0,    0,    0,    0,    0,    0,    0,    0],
        [ 101, 1266,  776, 1957, 2094, 1745, 7063, 8038, 4385, 2141, 5445, 3655,
         6999, 8024, 6821, 2218, 3221, 4495, 3833, 8013,  102,    0,    0,    0,
            0,    0,    0,    0,    0,    0,    0,    0,    0],
        [ 101,  100, 2207, 1285, 1159,  100, 3173, 3124, 5862, 1765, 2157, 7270,
         4193, 5991, 6421,  679, 6421, 5314, 2015, 2845,  702, 4408,  102,    0,
            0,    0,    0,    0,    0,    0,    0,    0,    0],
        [ 101, 2809, 3124,  671, 1453, 2399, 8024, 7716, 1046, 7987, 6

## 训练验证

In [12]:
from sklearn.metrics import f1_score, accuracy_score, precision_score, recall_score

def evaluation(config, model, val_dataloader):
    model.eval()
    preds = []
    labels = []
    val_loss = 0.
    val_iterator = tqdm(val_dataloader, desc='Evaluation...', total=len(val_dataloader))
    with torch.no_grad():
        for batch in val_iterator:
            labels.append(batch['labels'])
            batch = {item:value.to(config['device']) for item, value in batch.items()}
            
            # val output (loss, out)
            loss, logits = model(**batch)[:2]
            val_loss += loss.item()
            
            preds.append(logits.argmax(dim=-1).detach().cpu())
            
    avg_val_loss = val_loss/len(val_dataloader)
    labels = torch.cat(labels, dim=0).numpy()
    preds = torch.cat(preds, dim=0).numpy()
    
    precision = precision_score(labels, preds, average='macro')
    recall = recall_score(labels, preds, average='macro')
    f1 =f1_score(labels, preds, average='macro')
    accuracy = accuracy_score(labels, preds)
    
    return avg_val_loss, f1, precision, recall, accuracy

In [13]:
# Bert model train
from transformers import BertConfig, BertForSequenceClassification
from transformers import AdamW
from tqdm import trange

def train(config, train_dataloader, val_dataloader, model):

    optimizer = AdamW(model.parameters(), lr=config['learning_rate'])
    
    model.to(config['device'])
    
    epoches_iterator = trange(config['num_epochs'])
    global_steps = 0
    train_loss = 0.
    logging_loss = 0.
    
    best_f1 = 0.
    best_precision = 0.
    best_recall = 0.
    best_accuracy = 0.
    
    for epoch in epoches_iterator:
        train_iterator = tqdm(train_dataloader, desc='Training', total=len(train_dataloader))
        model.train()
        for batch in train_iterator:
            batch = {item:value.to(config['device']) for item, value in batch.items()}
            
            # train output (loss, out)
            loss = model(**batch)[0]
            
            model.zero_grad()  # 模型参数梯度清零
            loss.backward()  # 反向传播
            optimizer.step()  # 更新参数
            train_loss += loss.item()  # 叠加loss
            global_steps += 1
            
            if global_steps % config['logging_step'] == 0:
                print_train_loss = (train_loss - logging_loss) / config['logging_step']
                logging_loss = train_loss
                avg_val_loss, f1, precision, recall, accuracy = evaluation(config, model, val_dataloader)

                if best_f1 < f1:
                    best_f1 = f1
                    best_precision = precision
                    best_recall = recall
                    best_accuracy = accuracy
                    print_log = f'''>>> training loss: {print_train_loss: .4f}, valid loss: {avg_val_loss: .4f}\n
                            valid f1 score: {f1: .4f}, valid precision score: {precision: .4f},
                            valid recall score: {recall: .4f}, valid accuracy score: {accuracy: .4f}'''
                    print(print_log)
                    model.save_pretrained('../../../pt_tmp/bert_base_chinese')
                    
                model.train()
                
    return best_f1, best_precision, best_recall, best_accuracy

In [15]:
# 首次运行代码
# bert_config = BertConfig.from_pretrained(config['model_path'])
# bert_config.num_labels = len(id2label)
# model = BertForSequenceClassification.from_pretrained(config['model_path'], config=bert_config)
# best_model, f1, precision, recall, accuracy = train(config, train_dataloader, val_dataloader, model)
# best_model.save_pretrained('../../../pt_tmp/bert_base_chinese')
# print_log = f'''valid f1 score: {f1: .4f}, valid precision score: {precision: .4f},
#                 valid recall score: {recall: .4f}, valid accuracy score: {accuracy: .4f}'''
# print(print_log)

# 迭代训练代码
bert_config = BertConfig.from_pretrained('../../../pt_tmp/bert_base_chinese')
bert_config.num_labels = len(id2label)
model = BertForSequenceClassification.from_pretrained('../../../pt_tmp/bert_base_chinese')
f1, precision, recall, accuracy = train(config, train_dataloader, val_dataloader, model)
print_log = f'''valid f1 score: {f1: .4f}, valid precision score: {precision: .4f},
                valid recall score: {recall: .4f}, valid accuracy score: {accuracy: .4f}'''
print(print_log)

  0%|                                                                 | 0/1 [00:00<?, ?it/s]
Training:   0%|                                                    | 0/3002 [00:00<?, ?it/s][A
Training:   0%|                                          | 1/3002 [00:04<3:36:29,  4.33s/it][A
Training:   0%|                                          | 2/3002 [00:08<3:26:08,  4.12s/it][A
Training:   0%|                                          | 3/3002 [00:12<3:22:40,  4.05s/it][A
Training:   0%|                                          | 4/3002 [00:16<3:16:58,  3.94s/it][A
Training:   0%|                                          | 5/3002 [00:20<3:22:42,  4.06s/it][A
Training:   0%|                                          | 6/3002 [00:24<3:20:06,  4.01s/it][A
Training:   0%|                                          | 7/3002 [00:27<3:14:43,  3.90s/it][A
Training:   0%|                                          | 8/3002 [00:31<3:09:52,  3.81s/it][A
Training:   0%|▏                           

Training:   5%|██▏                                     | 165/3002 [11:00<3:07:19,  3.96s/it][A
Training:   6%|██▏                                     | 166/3002 [11:04<3:05:52,  3.93s/it][A
Training:   6%|██▏                                     | 167/3002 [11:08<3:12:43,  4.08s/it][A
Training:   6%|██▏                                     | 168/3002 [11:12<3:09:24,  4.01s/it][A
Training:   6%|██▎                                     | 169/3002 [11:16<3:05:45,  3.93s/it][A
Training:   6%|██▎                                     | 170/3002 [11:20<3:11:28,  4.06s/it][A
Training:   6%|██▎                                     | 171/3002 [11:24<3:08:32,  4.00s/it][A
Training:   6%|██▎                                     | 172/3002 [11:28<3:07:42,  3.98s/it][A
Training:   6%|██▎                                     | 173/3002 [11:32<3:06:08,  3.95s/it][A
Training:   6%|██▎                                     | 174/3002 [11:36<3:07:04,  3.97s/it][A
Training:   6%|██▎                      

Evaluation...:  10%|███▊                                   | 33/334 [00:31<04:57,  1.01it/s][A[A

Evaluation...:  10%|███▉                                   | 34/334 [00:32<05:15,  1.05s/it][A[A

Evaluation...:  10%|████                                   | 35/334 [00:33<05:10,  1.04s/it][A[A

Evaluation...:  11%|████▏                                  | 36/334 [00:34<05:05,  1.03s/it][A[A

Evaluation...:  11%|████▎                                  | 37/334 [00:35<05:14,  1.06s/it][A[A

Evaluation...:  11%|████▍                                  | 38/334 [00:36<04:50,  1.02it/s][A[A

Evaluation...:  12%|████▌                                  | 39/334 [00:37<05:12,  1.06s/it][A[A

Evaluation...:  12%|████▋                                  | 40/334 [00:38<04:56,  1.01s/it][A[A

Evaluation...:  12%|████▊                                  | 41/334 [00:39<04:50,  1.01it/s][A[A

Evaluation...:  13%|████▉                                  | 42/334 [00:40<04:52,  1.00s/it][A[A



Evaluation...:  58%|██████████████████████▏               | 195/334 [03:10<02:10,  1.06it/s][A[A

Evaluation...:  59%|██████████████████████▎               | 196/334 [03:11<02:16,  1.01it/s][A[A

Evaluation...:  59%|██████████████████████▍               | 197/334 [03:12<02:14,  1.02it/s][A[A

Evaluation...:  59%|██████████████████████▌               | 198/334 [03:13<02:12,  1.03it/s][A[A

Evaluation...:  60%|██████████████████████▋               | 199/334 [03:14<02:09,  1.04it/s][A[A

Evaluation...:  60%|██████████████████████▊               | 200/334 [03:15<02:06,  1.06it/s][A[A

Evaluation...:  60%|██████████████████████▊               | 201/334 [03:16<02:03,  1.08it/s][A[A

Evaluation...:  60%|██████████████████████▉               | 202/334 [03:17<02:01,  1.09it/s][A[A

Evaluation...:  61%|███████████████████████               | 203/334 [03:18<02:17,  1.05s/it][A[A

Evaluation...:  61%|███████████████████████▏              | 204/334 [03:19<02:11,  1.01s/it][A[A



>>> training loss:  1.0147, valid loss:  1.2348

                            valid f1 score:  0.5281, valid precision score:  0.5349,
                            valid recall score:  0.5284, valid accuracy score:  0.5598



Training:  10%|███▉                                   | 301/3002 [25:26<54:34:24, 72.74s/it][A
Training:  10%|███▉                                   | 302/3002 [25:30<39:00:35, 52.01s/it][A
Training:  10%|███▉                                   | 303/3002 [25:34<28:08:01, 37.53s/it][A
Training:  10%|███▉                                   | 304/3002 [25:38<20:37:50, 27.53s/it][A
Training:  10%|███▉                                   | 305/3002 [25:42<15:14:31, 20.35s/it][A
Training:  10%|███▉                                   | 306/3002 [25:46<11:36:29, 15.50s/it][A
Training:  10%|████                                    | 307/3002 [25:49<8:48:19, 11.76s/it][A
Training:  10%|████                                    | 308/3002 [25:52<6:57:56,  9.31s/it][A
Training:  10%|████                                    | 309/3002 [25:57<5:57:17,  7.96s/it][A
Training:  10%|████▏                                   | 310/3002 [26:01<5:00:15,  6.69s/it][A
Training:  10%|████▏                   

Training:  16%|██████▎                                 | 471/3002 [36:22<2:46:33,  3.95s/it][A
Training:  16%|██████▎                                 | 472/3002 [36:26<2:46:06,  3.94s/it][A
Training:  16%|██████▎                                 | 473/3002 [36:31<2:52:03,  4.08s/it][A
Training:  16%|██████▎                                 | 474/3002 [36:34<2:46:44,  3.96s/it][A
Training:  16%|██████▎                                 | 475/3002 [36:39<2:48:38,  4.00s/it][A
Training:  16%|██████▎                                 | 476/3002 [36:42<2:47:25,  3.98s/it][A
Training:  16%|██████▎                                 | 477/3002 [36:46<2:46:02,  3.95s/it][A
Training:  16%|██████▎                                 | 478/3002 [36:51<2:53:13,  4.12s/it][A
Training:  16%|██████▍                                 | 479/3002 [36:55<2:49:52,  4.04s/it][A
Training:  16%|██████▍                                 | 480/3002 [36:59<2:53:29,  4.13s/it][A
Training:  16%|██████▍                  

Evaluation...:  12%|████▌                                  | 39/334 [00:39<05:32,  1.13s/it][A[A

Evaluation...:  12%|████▋                                  | 40/334 [00:40<05:22,  1.10s/it][A[A

Evaluation...:  12%|████▊                                  | 41/334 [00:41<05:10,  1.06s/it][A[A

Evaluation...:  13%|████▉                                  | 42/334 [00:42<05:08,  1.06s/it][A[A

Evaluation...:  13%|█████                                  | 43/334 [00:42<04:55,  1.01s/it][A[A

Evaluation...:  13%|█████▏                                 | 44/334 [00:44<05:10,  1.07s/it][A[A

Evaluation...:  13%|█████▎                                 | 45/334 [00:45<05:03,  1.05s/it][A[A

Evaluation...:  14%|█████▎                                 | 46/334 [00:46<04:52,  1.01s/it][A[A

Evaluation...:  14%|█████▍                                 | 47/334 [00:47<04:52,  1.02s/it][A[A

Evaluation...:  14%|█████▌                                 | 48/334 [00:48<05:24,  1.14s/it][A[A



Evaluation...:  60%|██████████████████████▊               | 201/334 [03:19<02:05,  1.06it/s][A[A

Evaluation...:  60%|██████████████████████▉               | 202/334 [03:20<02:03,  1.07it/s][A[A

Evaluation...:  61%|███████████████████████               | 203/334 [03:21<02:20,  1.07s/it][A[A

Evaluation...:  61%|███████████████████████▏              | 204/334 [03:22<02:14,  1.04s/it][A[A

Evaluation...:  61%|███████████████████████▎              | 205/334 [03:23<02:16,  1.06s/it][A[A

Evaluation...:  62%|███████████████████████▍              | 206/334 [03:24<02:09,  1.02s/it][A[A

Evaluation...:  62%|███████████████████████▌              | 207/334 [03:25<02:01,  1.05it/s][A[A

Evaluation...:  62%|███████████████████████▋              | 208/334 [03:26<02:02,  1.03it/s][A[A

Evaluation...:  63%|███████████████████████▊              | 209/334 [03:27<01:58,  1.06it/s][A[A

Evaluation...:  63%|███████████████████████▉              | 210/334 [03:28<02:05,  1.01s/it][A[A



>>> training loss:  1.0450, valid loss:  1.2292

                            valid f1 score:  0.5638, valid precision score:  0.5696,
                            valid recall score:  0.5639, valid accuracy score:  0.5622



Training:  20%|███████▊                               | 601/3002 [50:32<49:13:49, 73.81s/it][A
Training:  20%|███████▊                               | 602/3002 [50:36<35:17:02, 52.93s/it][A
Training:  20%|███████▊                               | 603/3002 [50:40<25:26:51, 38.19s/it][A
Training:  20%|███████▊                               | 604/3002 [50:43<18:31:35, 27.81s/it][A
Training:  20%|███████▊                               | 605/3002 [50:47<13:42:58, 20.60s/it][A
Training:  20%|███████▊                               | 606/3002 [50:51<10:21:25, 15.56s/it][A
Training:  20%|████████                                | 607/3002 [50:55<7:58:58, 12.00s/it][A
Training:  20%|████████                                | 608/3002 [50:59<6:23:48,  9.62s/it][A
Training:  20%|████████                                | 609/3002 [51:02<5:11:45,  7.82s/it][A
Training:  20%|████████▏                               | 610/3002 [51:06<4:20:48,  6.54s/it][A
Training:  20%|████████▏               

Training:  26%|█████████▊                            | 771/3002 [1:01:32<2:31:43,  4.08s/it][A
Training:  26%|█████████▊                            | 772/3002 [1:01:36<2:26:04,  3.93s/it][A
Training:  26%|█████████▊                            | 773/3002 [1:01:40<2:32:59,  4.12s/it][A
Training:  26%|█████████▊                            | 774/3002 [1:01:44<2:25:46,  3.93s/it][A
Training:  26%|█████████▊                            | 775/3002 [1:01:48<2:25:00,  3.91s/it][A
Training:  26%|█████████▊                            | 776/3002 [1:01:52<2:26:18,  3.94s/it][A
Training:  26%|█████████▊                            | 777/3002 [1:01:55<2:22:42,  3.85s/it][A
Training:  26%|█████████▊                            | 778/3002 [1:01:59<2:19:36,  3.77s/it][A
Training:  26%|█████████▊                            | 779/3002 [1:02:03<2:18:11,  3.73s/it][A
Training:  26%|█████████▊                            | 780/3002 [1:02:07<2:22:14,  3.84s/it][A
Training:  26%|█████████▉               

Evaluation...:  12%|████▌                                  | 39/334 [00:41<05:43,  1.17s/it][A[A

Evaluation...:  12%|████▋                                  | 40/334 [00:42<05:26,  1.11s/it][A[A

Evaluation...:  12%|████▊                                  | 41/334 [00:43<05:19,  1.09s/it][A[A

Evaluation...:  13%|████▉                                  | 42/334 [00:44<05:22,  1.11s/it][A[A

Evaluation...:  13%|█████                                  | 43/334 [00:45<05:10,  1.07s/it][A[A

Evaluation...:  13%|█████▏                                 | 44/334 [00:47<05:25,  1.12s/it][A[A

Evaluation...:  13%|█████▎                                 | 45/334 [00:48<05:16,  1.10s/it][A[A

Evaluation...:  14%|█████▎                                 | 46/334 [00:49<05:05,  1.06s/it][A[A

Evaluation...:  14%|█████▍                                 | 47/334 [00:50<05:05,  1.06s/it][A[A

Evaluation...:  14%|█████▌                                 | 48/334 [00:51<05:43,  1.20s/it][A[A



Evaluation...:  60%|██████████████████████▊               | 201/334 [03:34<02:14,  1.01s/it][A[A

Evaluation...:  60%|██████████████████████▉               | 202/334 [03:35<02:12,  1.01s/it][A[A

Evaluation...:  61%|███████████████████████               | 203/334 [03:36<02:29,  1.14s/it][A[A

Evaluation...:  61%|███████████████████████▏              | 204/334 [03:37<02:23,  1.11s/it][A[A

Evaluation...:  61%|███████████████████████▎              | 205/334 [03:39<02:26,  1.13s/it][A[A

Evaluation...:  62%|███████████████████████▍              | 206/334 [03:40<02:19,  1.09s/it][A[A

Evaluation...:  62%|███████████████████████▌              | 207/334 [03:41<02:11,  1.03s/it][A[A

Evaluation...:  62%|███████████████████████▋              | 208/334 [03:42<02:12,  1.05s/it][A[A

Evaluation...:  63%|███████████████████████▊              | 209/334 [03:43<02:09,  1.03s/it][A[A

Evaluation...:  63%|███████████████████████▉              | 210/334 [03:44<02:16,  1.10s/it][A[A



Training:  31%|███████████▊                          | 930/3002 [1:18:11<2:17:52,  3.99s/it][A
Training:  31%|███████████▊                          | 931/3002 [1:18:14<2:17:03,  3.97s/it][A
Training:  31%|███████████▊                          | 932/3002 [1:18:18<2:11:01,  3.80s/it][A
Training:  31%|███████████▊                          | 933/3002 [1:18:21<2:08:47,  3.74s/it][A
Training:  31%|███████████▊                          | 934/3002 [1:18:25<2:07:31,  3.70s/it][A
Training:  31%|███████████▊                          | 935/3002 [1:18:29<2:06:10,  3.66s/it][A
Training:  31%|███████████▊                          | 936/3002 [1:18:33<2:18:11,  4.01s/it][A
Training:  31%|███████████▊                          | 937/3002 [1:18:37<2:14:06,  3.90s/it][A
Training:  31%|███████████▊                          | 938/3002 [1:18:41<2:10:38,  3.80s/it][A
Training:  31%|███████████▉                          | 939/3002 [1:18:44<2:08:19,  3.73s/it][A
Training:  31%|███████████▉             

Training:  37%|█████████████▌                       | 1100/3002 [1:29:05<2:02:19,  3.86s/it][A
Training:  37%|█████████████▌                       | 1101/3002 [1:29:08<1:58:37,  3.74s/it][A
Training:  37%|█████████████▌                       | 1102/3002 [1:29:12<1:57:30,  3.71s/it][A
Training:  37%|█████████████▌                       | 1103/3002 [1:29:16<2:02:41,  3.88s/it][A
Training:  37%|█████████████▌                       | 1104/3002 [1:29:20<2:03:00,  3.89s/it][A
Training:  37%|█████████████▌                       | 1105/3002 [1:29:24<2:01:23,  3.84s/it][A
Training:  37%|█████████████▋                       | 1106/3002 [1:29:27<2:00:36,  3.82s/it][A
Training:  37%|█████████████▋                       | 1107/3002 [1:29:31<1:56:57,  3.70s/it][A
Training:  37%|█████████████▋                       | 1108/3002 [1:29:34<1:56:57,  3.71s/it][A
Training:  37%|█████████████▋                       | 1109/3002 [1:29:38<1:57:41,  3.73s/it][A
Training:  37%|█████████████▋           

Evaluation...:  20%|███████▊                               | 67/334 [01:12<04:52,  1.09s/it][A[A

Evaluation...:  20%|███████▉                               | 68/334 [01:13<04:53,  1.10s/it][A[A

Evaluation...:  21%|████████                               | 69/334 [01:14<04:41,  1.06s/it][A[A

Evaluation...:  21%|████████▏                              | 70/334 [01:15<04:39,  1.06s/it][A[A

Evaluation...:  21%|████████▎                              | 71/334 [01:16<04:33,  1.04s/it][A[A

Evaluation...:  22%|████████▍                              | 72/334 [01:17<04:28,  1.02s/it][A[A

Evaluation...:  22%|████████▌                              | 73/334 [01:18<04:23,  1.01s/it][A[A

Evaluation...:  22%|████████▋                              | 74/334 [01:19<04:36,  1.06s/it][A[A

Evaluation...:  22%|████████▊                              | 75/334 [01:20<04:28,  1.04s/it][A[A

Evaluation...:  23%|████████▊                              | 76/334 [01:21<04:23,  1.02s/it][A[A



Evaluation...:  69%|██████████████████████████            | 229/334 [04:04<01:54,  1.09s/it][A[A

Evaluation...:  69%|██████████████████████████▏           | 230/334 [04:05<01:55,  1.11s/it][A[A

Evaluation...:  69%|██████████████████████████▎           | 231/334 [04:06<01:50,  1.07s/it][A[A

Evaluation...:  69%|██████████████████████████▍           | 232/334 [04:07<01:47,  1.06s/it][A[A

Evaluation...:  70%|██████████████████████████▌           | 233/334 [04:08<01:46,  1.05s/it][A[A

Evaluation...:  70%|██████████████████████████▌           | 234/334 [04:09<01:53,  1.14s/it][A[A

Evaluation...:  70%|██████████████████████████▋           | 235/334 [04:11<01:54,  1.16s/it][A[A

Evaluation...:  71%|██████████████████████████▊           | 236/334 [04:12<01:51,  1.14s/it][A[A

Evaluation...:  71%|██████████████████████████▉           | 237/334 [04:13<01:47,  1.10s/it][A[A

Evaluation...:  71%|███████████████████████████           | 238/334 [04:14<01:43,  1.08s/it][A[A



Training:  42%|███████████████▌                     | 1259/3002 [1:46:00<1:57:24,  4.04s/it][A
Training:  42%|███████████████▌                     | 1260/3002 [1:46:04<1:56:33,  4.01s/it][A
Training:  42%|███████████████▌                     | 1261/3002 [1:46:08<1:54:42,  3.95s/it][A
Training:  42%|███████████████▌                     | 1262/3002 [1:46:12<1:53:17,  3.91s/it][A
Training:  42%|███████████████▌                     | 1263/3002 [1:46:17<1:58:54,  4.10s/it][A
Training:  42%|███████████████▌                     | 1264/3002 [1:46:20<1:53:12,  3.91s/it][A
Training:  42%|███████████████▌                     | 1265/3002 [1:46:24<1:52:33,  3.89s/it][A
Training:  42%|███████████████▌                     | 1266/3002 [1:46:28<1:50:06,  3.81s/it][A
Training:  42%|███████████████▌                     | 1267/3002 [1:46:32<1:59:39,  4.14s/it][A
Training:  42%|███████████████▋                     | 1268/3002 [1:46:36<1:55:59,  4.01s/it][A
Training:  42%|███████████████▋         

Training:  48%|█████████████████▌                   | 1429/3002 [1:57:30<1:44:07,  3.97s/it][A
Training:  48%|█████████████████▌                   | 1430/3002 [1:57:34<1:42:12,  3.90s/it][A
Training:  48%|█████████████████▋                   | 1431/3002 [1:57:37<1:42:01,  3.90s/it][A
Training:  48%|█████████████████▋                   | 1432/3002 [1:57:42<1:43:32,  3.96s/it][A
Training:  48%|█████████████████▋                   | 1433/3002 [1:57:46<1:46:36,  4.08s/it][A
Training:  48%|█████████████████▋                   | 1434/3002 [1:57:51<1:52:41,  4.31s/it][A
Training:  48%|█████████████████▋                   | 1435/3002 [1:57:55<1:49:06,  4.18s/it][A
Training:  48%|█████████████████▋                   | 1436/3002 [1:58:00<1:56:33,  4.47s/it][A
Training:  48%|█████████████████▋                   | 1437/3002 [1:58:04<1:55:09,  4.41s/it][A
Training:  48%|█████████████████▋                   | 1438/3002 [1:58:08<1:50:25,  4.24s/it][A
Training:  48%|█████████████████▋       

Evaluation...:  28%|██████████▉                            | 94/334 [01:39<04:04,  1.02s/it][A[A

Evaluation...:  28%|███████████                            | 95/334 [01:40<04:00,  1.01s/it][A[A

Evaluation...:  29%|███████████▏                           | 96/334 [01:41<03:57,  1.00it/s][A[A

Evaluation...:  29%|███████████▎                           | 97/334 [01:42<03:52,  1.02it/s][A[A

Evaluation...:  29%|███████████▍                           | 98/334 [01:43<03:45,  1.05it/s][A[A

Evaluation...:  30%|███████████▌                           | 99/334 [01:44<04:02,  1.03s/it][A[A

Evaluation...:  30%|███████████▍                          | 100/334 [01:45<03:56,  1.01s/it][A[A

Evaluation...:  30%|███████████▍                          | 101/334 [01:46<03:52,  1.00it/s][A[A

Evaluation...:  31%|███████████▌                          | 102/334 [01:47<04:19,  1.12s/it][A[A

Evaluation...:  31%|███████████▋                          | 103/334 [01:48<04:14,  1.10s/it][A[A



Evaluation...:  77%|█████████████████████████████▏        | 256/334 [04:32<01:26,  1.11s/it][A[A

Evaluation...:  77%|█████████████████████████████▏        | 257/334 [04:33<01:23,  1.09s/it][A[A

Evaluation...:  77%|█████████████████████████████▎        | 258/334 [04:34<01:22,  1.08s/it][A[A

Evaluation...:  78%|█████████████████████████████▍        | 259/334 [04:36<01:28,  1.18s/it][A[A

Evaluation...:  78%|█████████████████████████████▌        | 260/334 [04:37<01:25,  1.16s/it][A[A

Evaluation...:  78%|█████████████████████████████▋        | 261/334 [04:38<01:20,  1.10s/it][A[A

Evaluation...:  78%|█████████████████████████████▊        | 262/334 [04:39<01:20,  1.12s/it][A[A

Evaluation...:  79%|█████████████████████████████▉        | 263/334 [04:40<01:23,  1.18s/it][A[A

Evaluation...:  79%|██████████████████████████████        | 264/334 [04:42<01:24,  1.21s/it][A[A

Evaluation...:  79%|██████████████████████████████▏       | 265/334 [04:43<01:18,  1.14s/it][A[A



Training:  53%|███████████████████▌                 | 1588/3002 [2:14:06<1:26:51,  3.69s/it][A
Training:  53%|███████████████████▌                 | 1589/3002 [2:14:10<1:28:08,  3.74s/it][A
Training:  53%|███████████████████▌                 | 1590/3002 [2:14:14<1:26:47,  3.69s/it][A
Training:  53%|███████████████████▌                 | 1591/3002 [2:14:17<1:27:24,  3.72s/it][A
Training:  53%|███████████████████▌                 | 1592/3002 [2:14:22<1:33:05,  3.96s/it][A
Training:  53%|███████████████████▋                 | 1593/3002 [2:14:26<1:35:02,  4.05s/it][A
Training:  53%|███████████████████▋                 | 1594/3002 [2:14:31<1:38:19,  4.19s/it][A
Training:  53%|███████████████████▋                 | 1595/3002 [2:14:34<1:33:01,  3.97s/it][A
Training:  53%|███████████████████▋                 | 1596/3002 [2:14:38<1:34:24,  4.03s/it][A
Training:  53%|███████████████████▋                 | 1597/3002 [2:14:42<1:30:16,  3.85s/it][A
Training:  53%|███████████████████▋     

Training:  59%|█████████████████████▋               | 1758/3002 [2:24:59<1:22:51,  4.00s/it][A
Training:  59%|█████████████████████▋               | 1759/3002 [2:25:03<1:22:05,  3.96s/it][A
Training:  59%|█████████████████████▋               | 1760/3002 [2:25:06<1:19:33,  3.84s/it][A
Training:  59%|█████████████████████▋               | 1761/3002 [2:25:10<1:20:01,  3.87s/it][A
Training:  59%|█████████████████████▋               | 1762/3002 [2:25:14<1:19:54,  3.87s/it][A
Training:  59%|█████████████████████▋               | 1763/3002 [2:25:18<1:19:41,  3.86s/it][A
Training:  59%|█████████████████████▋               | 1764/3002 [2:25:22<1:19:34,  3.86s/it][A
Training:  59%|█████████████████████▊               | 1765/3002 [2:25:25<1:18:48,  3.82s/it][A
Training:  59%|█████████████████████▊               | 1766/3002 [2:25:30<1:24:41,  4.11s/it][A
Training:  59%|█████████████████████▊               | 1767/3002 [2:25:34<1:22:57,  4.03s/it][A
Training:  59%|█████████████████████▊   

Evaluation...:  37%|█████████████▉                        | 122/334 [02:02<03:50,  1.09s/it][A[A

Evaluation...:  37%|█████████████▉                        | 123/334 [02:03<03:38,  1.04s/it][A[A

Evaluation...:  37%|██████████████                        | 124/334 [02:04<03:44,  1.07s/it][A[A

Evaluation...:  37%|██████████████▏                       | 125/334 [02:05<03:45,  1.08s/it][A[A

Evaluation...:  38%|██████████████▎                       | 126/334 [02:06<03:37,  1.05s/it][A[A

Evaluation...:  38%|██████████████▍                       | 127/334 [02:07<03:28,  1.01s/it][A[A

Evaluation...:  38%|██████████████▌                       | 128/334 [02:08<03:22,  1.02it/s][A[A

Evaluation...:  39%|██████████████▋                       | 129/334 [02:09<03:19,  1.03it/s][A[A

Evaluation...:  39%|██████████████▊                       | 130/334 [02:10<03:15,  1.05it/s][A[A

Evaluation...:  39%|██████████████▉                       | 131/334 [02:11<03:20,  1.01it/s][A[A



Evaluation...:  85%|████████████████████████████████▎     | 284/334 [04:51<00:52,  1.04s/it][A[A

Evaluation...:  85%|████████████████████████████████▍     | 285/334 [04:52<00:51,  1.05s/it][A[A

Evaluation...:  86%|████████████████████████████████▌     | 286/334 [04:53<00:49,  1.03s/it][A[A

Evaluation...:  86%|████████████████████████████████▋     | 287/334 [04:54<00:49,  1.04s/it][A[A

Evaluation...:  86%|████████████████████████████████▊     | 288/334 [04:55<00:51,  1.13s/it][A[A

Evaluation...:  87%|████████████████████████████████▉     | 289/334 [04:56<00:47,  1.06s/it][A[A

Evaluation...:  87%|████████████████████████████████▉     | 290/334 [04:57<00:47,  1.07s/it][A[A

Evaluation...:  87%|█████████████████████████████████     | 291/334 [04:58<00:45,  1.06s/it][A[A

Evaluation...:  87%|█████████████████████████████████▏    | 292/334 [04:59<00:42,  1.02s/it][A[A

Evaluation...:  88%|█████████████████████████████████▎    | 293/334 [05:00<00:40,  1.00it/s][A[A



>>> training loss:  1.0543, valid loss:  1.2162

                            valid f1 score:  0.5658, valid precision score:  0.5713,
                            valid recall score:  0.5628, valid accuracy score:  0.5575



Training:  60%|█████████████████████▌              | 1801/3002 [2:33:29<25:18:29, 75.86s/it][A
Training:  60%|█████████████████████▌              | 1802/3002 [2:33:34<18:06:44, 54.34s/it][A
Training:  60%|█████████████████████▌              | 1803/3002 [2:33:37<13:01:00, 39.08s/it][A
Training:  60%|██████████████████████▏              | 1804/3002 [2:33:41<9:28:50, 28.49s/it][A
Training:  60%|██████████████████████▏              | 1805/3002 [2:33:44<6:59:24, 21.02s/it][A
Training:  60%|██████████████████████▎              | 1806/3002 [2:33:48<5:14:09, 15.76s/it][A
Training:  60%|██████████████████████▎              | 1807/3002 [2:33:52<4:03:29, 12.23s/it][A
Training:  60%|██████████████████████▎              | 1808/3002 [2:33:56<3:12:10,  9.66s/it][A
Training:  60%|██████████████████████▎              | 1809/3002 [2:33:59<2:37:32,  7.92s/it][A
Training:  60%|██████████████████████▎              | 1810/3002 [2:34:04<2:16:09,  6.85s/it][A
Training:  60%|██████████████████████▎ 

Training:  66%|████████████████████████▎            | 1971/3002 [2:44:49<1:11:00,  4.13s/it][A
Training:  66%|████████████████████████▎            | 1972/3002 [2:44:53<1:08:16,  3.98s/it][A
Training:  66%|████████████████████████▎            | 1973/3002 [2:44:57<1:09:03,  4.03s/it][A
Training:  66%|████████████████████████▎            | 1974/3002 [2:45:01<1:07:03,  3.91s/it][A
Training:  66%|████████████████████████▎            | 1975/3002 [2:45:04<1:04:43,  3.78s/it][A
Training:  66%|████████████████████████▎            | 1976/3002 [2:45:09<1:07:52,  3.97s/it][A
Training:  66%|████████████████████████▎            | 1977/3002 [2:45:13<1:09:42,  4.08s/it][A
Training:  66%|████████████████████████▍            | 1978/3002 [2:45:17<1:07:26,  3.95s/it][A
Training:  66%|████████████████████████▍            | 1979/3002 [2:45:20<1:05:26,  3.84s/it][A
Training:  66%|████████████████████████▍            | 1980/3002 [2:45:25<1:12:30,  4.26s/it][A
Training:  66%|████████████████████████▍

Evaluation...:  12%|████▌                                  | 39/334 [00:38<05:18,  1.08s/it][A[A

Evaluation...:  12%|████▋                                  | 40/334 [00:39<05:03,  1.03s/it][A[A

Evaluation...:  12%|████▊                                  | 41/334 [00:40<04:57,  1.02s/it][A[A

Evaluation...:  13%|████▉                                  | 42/334 [00:41<04:59,  1.03s/it][A[A

Evaluation...:  13%|█████                                  | 43/334 [00:42<04:47,  1.01it/s][A[A

Evaluation...:  13%|█████▏                                 | 44/334 [00:43<05:00,  1.04s/it][A[A

Evaluation...:  13%|█████▎                                 | 45/334 [00:44<04:52,  1.01s/it][A[A

Evaluation...:  14%|█████▎                                 | 46/334 [00:45<04:42,  1.02it/s][A[A

Evaluation...:  14%|█████▍                                 | 47/334 [00:46<04:41,  1.02it/s][A[A

Evaluation...:  14%|█████▌                                 | 48/334 [00:47<05:16,  1.11s/it][A[A



Evaluation...:  60%|██████████████████████▊               | 201/334 [03:25<02:16,  1.03s/it][A[A

Evaluation...:  60%|██████████████████████▉               | 202/334 [03:26<02:12,  1.00s/it][A[A

Evaluation...:  61%|███████████████████████               | 203/334 [03:27<02:26,  1.12s/it][A[A

Evaluation...:  61%|███████████████████████▏              | 204/334 [03:28<02:19,  1.08s/it][A[A

Evaluation...:  61%|███████████████████████▎              | 205/334 [03:29<02:21,  1.10s/it][A[A

Evaluation...:  62%|███████████████████████▍              | 206/334 [03:30<02:19,  1.09s/it][A[A

Evaluation...:  62%|███████████████████████▌              | 207/334 [03:31<02:07,  1.00s/it][A[A

Evaluation...:  62%|███████████████████████▋              | 208/334 [03:32<02:06,  1.00s/it][A[A

Evaluation...:  63%|███████████████████████▊              | 209/334 [03:33<02:00,  1.04it/s][A[A

Evaluation...:  63%|███████████████████████▉              | 210/334 [03:34<02:06,  1.02s/it][A[A



Training:  71%|██████████████████████████▎          | 2130/3002 [3:00:44<1:00:10,  4.14s/it][A
Training:  71%|███████████████████████████▋           | 2131/3002 [3:00:47<57:03,  3.93s/it][A
Training:  71%|███████████████████████████▋           | 2132/3002 [3:00:51<55:50,  3.85s/it][A
Training:  71%|███████████████████████████▋           | 2133/3002 [3:00:54<55:20,  3.82s/it][A
Training:  71%|███████████████████████████▋           | 2134/3002 [3:00:58<54:23,  3.76s/it][A
Training:  71%|███████████████████████████▋           | 2135/3002 [3:01:02<53:44,  3.72s/it][A
Training:  71%|███████████████████████████▋           | 2136/3002 [3:01:06<55:22,  3.84s/it][A
Training:  71%|███████████████████████████▊           | 2137/3002 [3:01:09<54:44,  3.80s/it][A
Training:  71%|███████████████████████████▊           | 2138/3002 [3:01:13<54:04,  3.75s/it][A
Training:  71%|███████████████████████████▊           | 2139/3002 [3:01:17<53:28,  3.72s/it][A
Training:  71%|█████████████████████████

Training:  77%|█████████████████████████████▉         | 2300/3002 [3:11:52<47:34,  4.07s/it][A
Training:  77%|█████████████████████████████▉         | 2301/3002 [3:11:56<47:34,  4.07s/it][A
Training:  77%|█████████████████████████████▉         | 2302/3002 [3:12:00<46:39,  4.00s/it][A
Training:  77%|█████████████████████████████▉         | 2303/3002 [3:12:04<46:35,  4.00s/it][A
Training:  77%|█████████████████████████████▉         | 2304/3002 [3:12:08<49:01,  4.21s/it][A
Training:  77%|█████████████████████████████▉         | 2305/3002 [3:12:12<47:48,  4.12s/it][A
Training:  77%|█████████████████████████████▉         | 2306/3002 [3:12:16<46:49,  4.04s/it][A
Training:  77%|█████████████████████████████▉         | 2307/3002 [3:12:21<48:14,  4.16s/it][A
Training:  77%|█████████████████████████████▉         | 2308/3002 [3:12:25<49:34,  4.29s/it][A
Training:  77%|█████████████████████████████▉         | 2309/3002 [3:12:29<48:07,  4.17s/it][A
Training:  77%|█████████████████████████

Evaluation...:  20%|███████▊                               | 67/334 [01:13<04:55,  1.11s/it][A[A

Evaluation...:  20%|███████▉                               | 68/334 [01:14<04:57,  1.12s/it][A[A

Evaluation...:  21%|████████                               | 69/334 [01:15<04:44,  1.07s/it][A[A

Evaluation...:  21%|████████▏                              | 70/334 [01:16<04:41,  1.07s/it][A[A

Evaluation...:  21%|████████▎                              | 71/334 [01:17<04:35,  1.05s/it][A[A

Evaluation...:  22%|████████▍                              | 72/334 [01:18<04:31,  1.04s/it][A[A

Evaluation...:  22%|████████▌                              | 73/334 [01:19<04:27,  1.02s/it][A[A

Evaluation...:  22%|████████▋                              | 74/334 [01:20<04:40,  1.08s/it][A[A

Evaluation...:  22%|████████▊                              | 75/334 [01:21<04:32,  1.05s/it][A[A

Evaluation...:  23%|████████▊                              | 76/334 [01:22<04:27,  1.04s/it][A[A



Evaluation...:  69%|██████████████████████████            | 229/334 [04:07<01:55,  1.10s/it][A[A

Evaluation...:  69%|██████████████████████████▏           | 230/334 [04:08<01:56,  1.12s/it][A[A

Evaluation...:  69%|██████████████████████████▎           | 231/334 [04:09<01:51,  1.08s/it][A[A

Evaluation...:  69%|██████████████████████████▍           | 232/334 [04:10<01:48,  1.07s/it][A[A

Evaluation...:  70%|██████████████████████████▌           | 233/334 [04:11<01:47,  1.06s/it][A[A

Evaluation...:  70%|██████████████████████████▌           | 234/334 [04:12<01:55,  1.15s/it][A[A

Evaluation...:  70%|██████████████████████████▋           | 235/334 [04:13<01:56,  1.18s/it][A[A

Evaluation...:  71%|██████████████████████████▊           | 236/334 [04:14<01:52,  1.15s/it][A[A

Evaluation...:  71%|██████████████████████████▉           | 237/334 [04:16<01:48,  1.11s/it][A[A

Evaluation...:  71%|███████████████████████████           | 238/334 [04:17<01:44,  1.09s/it][A[A



Training:  82%|███████████████████████████████▉       | 2459/3002 [3:28:52<34:40,  3.83s/it][A
Training:  82%|███████████████████████████████▉       | 2460/3002 [3:28:55<33:56,  3.76s/it][A
Training:  82%|███████████████████████████████▉       | 2461/3002 [3:28:59<35:18,  3.92s/it][A
Training:  82%|███████████████████████████████▉       | 2462/3002 [3:29:03<34:18,  3.81s/it][A
Training:  82%|███████████████████████████████▉       | 2463/3002 [3:29:07<33:40,  3.75s/it][A
Training:  82%|████████████████████████████████       | 2464/3002 [3:29:13<40:06,  4.47s/it][A
Training:  82%|████████████████████████████████       | 2465/3002 [3:29:17<38:31,  4.30s/it][A
Training:  82%|████████████████████████████████       | 2466/3002 [3:29:20<36:59,  4.14s/it][A
Training:  82%|████████████████████████████████       | 2467/3002 [3:29:24<35:59,  4.04s/it][A
Training:  82%|████████████████████████████████       | 2468/3002 [3:29:29<36:50,  4.14s/it][A
Training:  82%|█████████████████████████

Training:  88%|██████████████████████████████████▏    | 2629/3002 [3:40:26<25:28,  4.10s/it][A
Training:  88%|██████████████████████████████████▏    | 2630/3002 [3:40:30<25:01,  4.04s/it][A
Training:  88%|██████████████████████████████████▏    | 2631/3002 [3:40:34<24:27,  3.95s/it][A
Training:  88%|██████████████████████████████████▏    | 2632/3002 [3:40:38<24:41,  4.00s/it][A
Training:  88%|██████████████████████████████████▏    | 2633/3002 [3:40:42<24:23,  3.97s/it][A
Training:  88%|██████████████████████████████████▏    | 2634/3002 [3:40:46<23:57,  3.91s/it][A
Training:  88%|██████████████████████████████████▏    | 2635/3002 [3:40:50<25:44,  4.21s/it][A
Training:  88%|██████████████████████████████████▏    | 2636/3002 [3:40:54<25:08,  4.12s/it][A
Training:  88%|██████████████████████████████████▎    | 2637/3002 [3:40:58<24:37,  4.05s/it][A
Training:  88%|██████████████████████████████████▎    | 2638/3002 [3:41:02<24:15,  4.00s/it][A
Training:  88%|█████████████████████████

Evaluation...:  28%|██████████▉                            | 94/334 [01:31<03:43,  1.07it/s][A[A

Evaluation...:  28%|███████████                            | 95/334 [01:32<03:41,  1.08it/s][A[A

Evaluation...:  29%|███████████▏                           | 96/334 [01:33<03:38,  1.09it/s][A[A

Evaluation...:  29%|███████████▎                           | 97/334 [01:34<03:35,  1.10it/s][A[A

Evaluation...:  29%|███████████▍                           | 98/334 [01:35<03:33,  1.10it/s][A[A

Evaluation...:  30%|███████████▌                           | 99/334 [01:36<03:48,  1.03it/s][A[A

Evaluation...:  30%|███████████▍                          | 100/334 [01:37<03:40,  1.06it/s][A[A

Evaluation...:  30%|███████████▍                          | 101/334 [01:38<03:36,  1.08it/s][A[A

Evaluation...:  31%|███████████▌                          | 102/334 [01:39<04:02,  1.05s/it][A[A

Evaluation...:  31%|███████████▋                          | 103/334 [01:40<03:56,  1.03s/it][A[A



Evaluation...:  77%|█████████████████████████████▏        | 256/334 [04:12<01:20,  1.03s/it][A[A

Evaluation...:  77%|█████████████████████████████▏        | 257/334 [04:13<01:17,  1.01s/it][A[A

Evaluation...:  77%|█████████████████████████████▎        | 258/334 [04:14<01:16,  1.01s/it][A[A

Evaluation...:  78%|█████████████████████████████▍        | 259/334 [04:15<01:21,  1.09s/it][A[A

Evaluation...:  78%|█████████████████████████████▌        | 260/334 [04:16<01:19,  1.08s/it][A[A

Evaluation...:  78%|█████████████████████████████▋        | 261/334 [04:17<01:14,  1.02s/it][A[A

Evaluation...:  78%|█████████████████████████████▊        | 262/334 [04:18<01:14,  1.04s/it][A[A

Evaluation...:  79%|█████████████████████████████▉        | 263/334 [04:20<01:17,  1.10s/it][A[A

Evaluation...:  79%|██████████████████████████████        | 264/334 [04:21<01:18,  1.12s/it][A[A

Evaluation...:  79%|██████████████████████████████▏       | 265/334 [04:22<01:13,  1.06s/it][A[A



Training:  93%|████████████████████████████████████▏  | 2788/3002 [3:56:13<13:30,  3.79s/it][A
Training:  93%|████████████████████████████████████▏  | 2789/3002 [3:56:16<13:23,  3.77s/it][A
Training:  93%|████████████████████████████████████▏  | 2790/3002 [3:56:20<13:18,  3.76s/it][A
Training:  93%|████████████████████████████████████▎  | 2791/3002 [3:56:24<13:24,  3.82s/it][A
Training:  93%|████████████████████████████████████▎  | 2792/3002 [3:56:28<13:44,  3.93s/it][A
Training:  93%|████████████████████████████████████▎  | 2793/3002 [3:56:32<13:22,  3.84s/it][A
Training:  93%|████████████████████████████████████▎  | 2794/3002 [3:56:36<13:47,  3.98s/it][A
Training:  93%|████████████████████████████████████▎  | 2795/3002 [3:56:40<13:53,  4.02s/it][A
Training:  93%|████████████████████████████████████▎  | 2796/3002 [3:56:45<14:39,  4.27s/it][A
Training:  93%|████████████████████████████████████▎  | 2797/3002 [3:56:49<14:20,  4.20s/it][A
Training:  93%|█████████████████████████

Training:  99%|██████████████████████████████████████▍| 2958/3002 [4:07:14<02:48,  3.84s/it][A
Training:  99%|██████████████████████████████████████▍| 2959/3002 [4:07:17<02:38,  3.69s/it][A
Training:  99%|██████████████████████████████████████▍| 2960/3002 [4:07:22<02:41,  3.85s/it][A
Training:  99%|██████████████████████████████████████▍| 2961/3002 [4:07:25<02:35,  3.78s/it][A
Training:  99%|██████████████████████████████████████▍| 2962/3002 [4:07:29<02:28,  3.71s/it][A
Training:  99%|██████████████████████████████████████▍| 2963/3002 [4:07:32<02:22,  3.66s/it][A
Training:  99%|██████████████████████████████████████▌| 2964/3002 [4:07:36<02:19,  3.67s/it][A
Training:  99%|██████████████████████████████████████▌| 2965/3002 [4:07:40<02:15,  3.67s/it][A
Training:  99%|██████████████████████████████████████▌| 2966/3002 [4:07:43<02:12,  3.68s/it][A
Training:  99%|██████████████████████████████████████▌| 2967/3002 [4:07:47<02:08,  3.68s/it][A
Training:  99%|█████████████████████████

Evaluation...:  37%|█████████████▉                        | 122/334 [02:00<03:45,  1.06s/it][A[A

Evaluation...:  37%|█████████████▉                        | 123/334 [02:01<03:34,  1.02s/it][A[A

Evaluation...:  37%|██████████████                        | 124/334 [02:02<03:35,  1.03s/it][A[A

Evaluation...:  37%|██████████████▏                       | 125/334 [02:03<03:37,  1.04s/it][A[A

Evaluation...:  38%|██████████████▎                       | 126/334 [02:04<03:30,  1.01s/it][A[A

Evaluation...:  38%|██████████████▍                       | 127/334 [02:05<03:23,  1.02it/s][A[A

Evaluation...:  38%|██████████████▌                       | 128/334 [02:06<03:17,  1.05it/s][A[A

Evaluation...:  39%|██████████████▋                       | 129/334 [02:07<03:14,  1.05it/s][A[A

Evaluation...:  39%|██████████████▊                       | 130/334 [02:07<03:11,  1.07it/s][A[A

Evaluation...:  39%|██████████████▉                       | 131/334 [02:09<03:16,  1.03it/s][A[A



Evaluation...:  85%|████████████████████████████████▎     | 284/334 [30:44<03:40,  4.41s/it][A[A

Evaluation...:  85%|████████████████████████████████▍     | 285/334 [30:48<03:33,  4.35s/it][A[A

Evaluation...:  86%|████████████████████████████████▌     | 286/334 [30:52<03:24,  4.27s/it][A[A

Evaluation...:  86%|████████████████████████████████▋     | 287/334 [30:57<03:24,  4.36s/it][A[A

Evaluation...:  86%|███████████████████████████████     | 288/334 [35:44<1:08:15, 89.03s/it][A[A

Evaluation...:  87%|████████████████████████████████▉     | 289/334 [35:44<46:55, 62.57s/it][A[A

Evaluation...:  87%|████████████████████████████████▉     | 290/334 [35:45<32:20, 44.11s/it][A[A

Evaluation...:  87%|█████████████████████████████████     | 291/334 [35:46<22:20, 31.17s/it][A[A

Evaluation...:  87%|█████████████████████████████████▏    | 292/334 [35:47<15:27, 22.09s/it][A[A

Evaluation...:  88%|█████████████████████████████████▎    | 293/334 [35:48<10:45, 15.73s/it][A[A



>>> training loss:  1.0156, valid loss:  1.2002

                            valid f1 score:  0.5709, valid precision score:  0.5698,
                            valid recall score:  0.5762, valid accuracy score:  0.5690



Training: 100%|█████████████████████████████████████▉| 3001/3002 [4:57:01<09:57, 597.05s/it][A
Training: 100%|███████████████████████████████████████| 3002/3002 [4:57:08<00:00,  5.94s/it][A
100%|████████████████████████████████████████████████████| 1/1 [4:57:08<00:00, 17828.56s/it]


## 预测并保存结果

In [16]:
def predict(config, id2label, model, test_dataloader):
    test_iterator = tqdm(test_dataloader, desc='Testing', total=len(test_dataloader))
    model.eval()
    test_preds = []
    
    with torch.no_grad():
        for batch in test_iterator:
            batch = {item: value.to(config['device']) for item, value in batch.items()}

            logits = model(**batch)[1]
            test_preds.append(logits.argmax(dim=-1).detach().cpu())
            
    test_preds = torch.cat(test_preds, dim=0).numpy()
    test_preds = [id2label[id_] for id_ in test_preds]
        
    test_df = pd.read_csv(config['test_file_path'], sep=',')
    # test_df.insert(1, column=['label_pred'], value=test_preds)
    test_df['label_pred'] = test_preds
    # test_df.drop(columns=['sentence'], inplace=True)
    test_df.to_csv('submission.csv', index=False, encoding='utf8')

In [17]:
predict(config, id2label, best_model, test_dataloader)

Testing: 100%|████████████████████████████████████████████| 625/625 [10:52<00:00,  1.04s/it]


In [18]:
test_df = pd.read_csv(config['test_file_path'], sep=',')

In [19]:
train_df = pd.read_csv(config['train_file_path'], sep=',')

In [20]:
train_df.head(10)

Unnamed: 0,id,label,label_desc,sentence
0,0,108,news_edu,上课时学生手机响个不停，老师一怒之下把手机摔了，家长拿发票让老师赔，大家怎么看待这种事？
1,1,104,news_finance,商赢环球股份有限公司关于延期回复上海证券交易所对公司2017年年度报告的事后审核问询函的公告
2,2,106,news_house,通过中介公司买了二手房，首付都付了，现在卖家不想卖了。怎么处理？
3,3,112,news_travel,2018年去俄罗斯看世界杯得花多少钱？
4,4,109,news_tech,剃须刀的个性革新，雷明登天猫定制版新品首发
5,5,103,news_sports,再次证明了“无敌是多么寂寞”——逆天的中国乒乓球队！
6,6,109,news_tech,三农盾SACC-全球首个推出：互联网+区块链+农产品的电商平台
7,7,116,news_game,重做or新英雄？其实重做对暴雪来说同样重要
8,8,103,news_sports,如何在商业活动中不受人欺骗？
9,9,101,news_culture,87版红楼梦最温柔的四个丫鬟，娶谁都是一生的福气


In [39]:
train_df['label'].unique()

array([108, 104, 106, 112, 109, 103, 116, 101, 107, 100, 102, 110, 115,
       113, 114])