# 头条新闻分类NeZha With Head And Focal Loss

## BERT要点
#### BERT why self-attention
+ 计算复杂度，self-attention每层的复杂度O(n^2*d) n是句子的长度，d是词向量维度
![](./table1.png)
+ 可以并行
+ 长程依赖，LSTM任意两点之间需要经过一定的距离，Attention任意两点之间可以直接进行计算。

#### 主要贡献
+ BERT使用掩码语言模型，可以使得预训练模型进行双向表示
+ BERT是第一个基于微调的模型

#### Task 1: MASKed LM(遮蔽语言模型)
为了训练双向深度表示，我们按照百分比（15%）随机遮盖一些token，然后仅预测这些别遮盖的词。
被掩盖的词中，
1. 80%的词 被替换成 [MASK]
2. 10%的词 被随机替换
3. 10%的词 不动

#### Task2：Next Sentence Prediction 
(A, B) 其中B有50%的概率是A的下一句，50%的概率是从数据集中随机选择的一句。
如果B是A的下一句标注成isNexT，不是则被标注成NotNext。

## 编写配置

In [1]:
import torch 
import torch.nn as nn

config = {
    'train_file_path': '../../../data/toutiao_news_cls/train.csv',
    'test_file_path': '../../../data/toutiao_news_cls/test.csv',
    'train_val_ratio': 0.1,  # 10%用作验证集
    # ------ 与TextCNN不同的配置 ------
    # 'vocab_size': 10000,   # 词典 3W
    'head': 'cnn',
    'model_path': '../../../pt/NeZha_model',
    # ------ 与TextCNN不同的配置 ------
    'batch_size': 16,      # batch 大小 16
    'num_epochs': 1,      # 10次迭代
    'learning_rate': 2e-5, # 学习率
    'logging_step': 500,   # 每跑300个batch记录一次
    'seed': 2022           # 随机种子
}

config['device'] = 'cuda' if torch.cuda.is_available() else 'cpu' # cpu&gpu

import random
import numpy as np

def seed_everything(seed):
    random.seed(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)
    return seed

seed_everything(config['seed'])

2022

## 数据预处理并编写DataLoader

In [2]:
import pandas as pd
from tqdm import tqdm
from collections import defaultdict
from transformers import BertTokenizer
from torch.utils.data import DataLoader

In [3]:
# bert分词器
bertTokenizer = BertTokenizer.from_pretrained(config['model_path'])
# 重写分词器
def tokenizer(sent):
    inputs = bertTokenizer.encode_plus(sent, add_special_tokens=True, return_token_type_ids=True, return_attention_mask=True)
    
    return inputs


In [4]:
def read_data(config, mode='train'):
    
    data_df = pd.read_csv(config[f'{mode}_file_path'], sep=',')
    LABEL, SENTENCE = 'label', 'sentence'
    data_df['bert_encode'] = data_df[SENTENCE].apply(tokenizer)
    data_df['input_ids'] = data_df['bert_encode'].apply(lambda s: s['input_ids'])
    input_ids = np.array([[int(id_) for id_ in v] for v in data_df['input_ids'].values])
    data_df['token_type_ids'] = data_df['bert_encode'].apply(lambda s: s['token_type_ids'])
    token_type_ids = np.array([[int(id_) for id_ in v] for v in data_df['token_type_ids'].values])
    data_df['attention_mask'] = data_df['bert_encode'].apply(lambda s: s['attention_mask'])
    attention_mask = np.array([[int(id_) for id_ in v] for v in data_df['attention_mask'].values])

    if mode == 'train':
        labels = data_df[LABEL].values
        
        X_train, y_train = defaultdict(list), []
        X_val, y_val = defaultdict(list), []
        num_val = int(config['train_val_ratio'] * len(data_df))
        
        # shuffle ids
        ids = np.random.choice(range(len(data_df)), size=len(data_df), replace=False)
        train_ids = ids[num_val:]
        val_ids = ids[:num_val]
        
        # get input_ids
        X_train['input_ids'], y_train = input_ids[train_ids], labels[train_ids]
        X_val['input_ids'], y_val = input_ids[val_ids], labels[val_ids]
         # get token_type_ids
        X_train['token_type_ids'] = token_type_ids[train_ids]
        X_val['token_type_ids'] = token_type_ids[val_ids]
        # get attention_mask
        X_train['attention_mask'] = attention_mask[train_ids]
        X_val['attention_mask'] = attention_mask[val_ids]
     
        # label 
        label2id = {label: i for i, label in enumerate(np.unique(y_train))}
        id2label = {i: label for label, i in label2id.items()}
        y_train = torch.tensor([label2id[y] for y in y_train], dtype=torch.long)
        y_val = torch.tensor([label2id[y] for y in y_val], dtype=torch.long)

        return X_train, y_train, X_val, y_val, label2id, id2label

    else:
        X_test = defaultdict(list)
        X_test['input_ids'] = input_ids
        X_test['token_type_ids'] = token_type_ids
        X_test['attention_mask'] = attention_mask
        y_test = torch.zeros(len(data_df), dtype=torch.long)
        
        return X_test, y_test

In [5]:
X_train, y_train, X_val, y_val, label2id, id2label = read_data(config, mode='train')

  import sys
  if __name__ == '__main__':
  # This is added back by InteractiveShellApp.init_path()


In [6]:
X_test, y_test = read_data(config, mode='test')

  import sys
  if __name__ == '__main__':
  # This is added back by InteractiveShellApp.init_path()


#### Dataset提供数据集的封装，创建/继承Dataset必须实现:
+ __len__: 整个数据集的长度
+ __getitem__: 支持数据集索引的函数

In [7]:
from torch.utils.data import Dataset
class TNEWSDataset(Dataset):
    def __init__(self, X, y):
        self.x = X
        self.y = y

    def __getitem__(self, idx):
        return {
            'input_ids' : self.x['input_ids'][idx],
            'label' : self.y[idx],
            'token_type_ids': self.x['token_type_ids'][idx],
            'attention_mask': self.x['attention_mask'][idx]
        }
    
    def __len__(self):
        return self.y.size(0)

#### 使用DataLoader实现数据集的并行加载
+ DataLoader提供一个可迭代对象，实现数据并行加载，从TNEWSDataset返回一个example，取多次，最后形成一个长度为batch_size的列表examples
+ examples的格式：[dict1, dict2, ...]
+ collate_fn()将examples中的数据合并为Tensor

In [8]:
def collate_fn(examples):
    input_ids_lst = []
    labels = []
    # ------ 与TextCNN不同的地方 ------
    token_type_ids_lst = []
    attention_mask_lst = []
    # ------ 与TextCNN不同的地方 ------

    for example in examples:
        input_ids_lst.append(example['input_ids'])
        labels.append(example['label'])
        # ------ 与TextCNN不同的地方 ------
        token_type_ids_lst.append(example['token_type_ids'])
        attention_mask_lst.append(example['attention_mask'])
        # ------ 与TextCNN不同的地方 ------
        
    # 计算input_ids_lst中最长的句子长度，对齐
    max_length = max(len(input_ids) for input_ids in input_ids_lst)
    # 定义一个Tensor
    input_ids_tensor = torch.zeros((len(labels), max_length), dtype=torch.long)
    # ------ 与TextCNN不同的地方 ------
    token_type_ids_tensor = torch.zeros_like(input_ids_tensor)
    attention_mask_tensor = torch.zeros_like(input_ids_tensor)
    # ------ 与TextCNN不同的地方 ------
    
    for i, input_ids in enumerate(input_ids_lst):
        seq_len = len(input_ids)
        input_ids_tensor[i, :seq_len] = torch.tensor(input_ids, dtype=torch.long)
        # ------ 与TextCNN不同的地方 ------
        token_type_ids_tensor[i, :seq_len] = torch.tensor(token_type_ids_lst[i], dtype=torch.long)
        attention_mask_tensor[i, :seq_len] = torch.tensor(attention_mask_lst[i], dtype=torch.long)
        # ------ 与TextCNN不同的地方 ------
        
    return {
        'input_ids': input_ids_tensor,
        'labels': torch.tensor(labels, dtype=torch.long),
        # ------ 与TextCNN不同的地方 ------
        'token_type_ids': token_type_ids_tensor,
        'attention_mask': attention_mask_tensor
        # ------ 与TextCNN不同的地方 ------
    }

In [9]:
from torch.utils.data import DataLoader

def build_dataloader(config):
    X_train, y_train, X_val, y_val, label2id, id2label = read_data(config, mode='train')
    X_test, y_test = read_data(config, mode='test')
    
    train_dataset = TNEWSDataset(X_train, y_train)
    val_dataset = TNEWSDataset(X_val, y_val)
    test_dataset = TNEWSDataset(X_test, y_test)
    
    train_dataloader = DataLoader(dataset=train_dataset, batch_size=config['batch_size'], num_workers=0, shuffle=True, collate_fn=collate_fn)
    val_dataloader = DataLoader(dataset=val_dataset, batch_size=config['batch_size'], num_workers=0, shuffle=False, collate_fn=collate_fn)
    test_dataloader = DataLoader(dataset=test_dataset, batch_size=config['batch_size'], num_workers=0, shuffle=False, collate_fn=collate_fn)

    return train_dataloader, val_dataloader, test_dataloader, id2label

In [10]:
train_dataloader, val_dataloader, test_dataloader, id2label = build_dataloader(config)

  import sys
  if __name__ == '__main__':
  # This is added back by InteractiveShellApp.init_path()


In [11]:
for batch in train_dataloader:
    print(len(batch['input_ids']), len(batch['labels']), len(batch['token_type_ids']), len(batch['attention_mask']))
    print(batch)
    break

16 16 16 16
{'input_ids': tensor([[ 101,  776,  691, 2356,  966, 5892, 1355, 1283,  783, 8024, 2832, 6598,
         5442,  812,  711,  862, 2898, 5330,  976, 4958,  776,  691, 8043,  102,
            0,    0,    0,    0,    0,    0,    0,    0,    0],
        [ 101,  677, 5468, 8038, 7599, 3235, 1915, 7455, 3449,  771, 7676, 8024,
          678, 5468, 2582,  720, 2190, 8043,  102,    0,    0,    0,    0,    0,
            0,    0,    0,    0,    0,    0,    0,    0,    0],
        [ 101, 1266,  776, 1957, 2094, 1745, 7063, 8038, 4385, 2141, 5445, 3655,
         6999, 8024, 6821, 2218, 3221, 4495, 3833, 8013,  102,    0,    0,    0,
            0,    0,    0,    0,    0,    0,    0,    0,    0],
        [ 101,  100, 2207, 1285, 1159,  100, 3173, 3124, 5862, 1765, 2157, 7270,
         4193, 5991, 6421,  679, 6421, 5314, 2015, 2845,  702, 4408,  102,    0,
            0,    0,    0,    0,    0,    0,    0,    0,    0],
        [ 101, 2809, 3124,  671, 1453, 2399, 8024, 7716, 1046, 7987, 6

## 训练验证

In [12]:
# NeZha + head part2
from NeZha import *

class NeZhaForTNEWS(NeZhaPreTrainedModel):
    # classifier -- head
    def __init__(self, config, model_path, classifier):
        super(NeZhaForTNEWS, self).__init__(config)

        self.bert = NeZhaModel.from_pretrained(model_path, config=config)
        self.classifier = classifier
    
    def forward(self, input_ids, token_type_ids,  attention_mask, labels):

        outputs = self.bert(input_ids=input_ids,
                            attention_mask=attention_mask, 
                            token_type_ids=token_type_ids)
        
        hidden_states = outputs[2]
        
        logits = self.classifier(hidden_states, input_ids)
        
        outputs =(logits, )
        # 使用训练集、验证集
        if labels is not None:
            loss_fct = nn.CrossEntropyLoss()
            loss = loss_fct(logits, labels.view(-1))
            outputs =(loss, ) + outputs
        
        return outputs

In [13]:
import torch.nn.functional as F
import torch.nn as nn
class ConvClassifier(nn.Module):
    '''
    CNN + global max pool
    '''
    def __init__(self, config):
        super().__init__()
        self.conv = nn.Conv1d(in_channels=config.hidden_size, out_channels=config.hidden_size, kernel_size=3)
        self.global_max_pool = nn.AdaptiveMaxPool1d(1)
        self.dropout = nn.Dropout(config.hidden_dropout_prob)
        self.fc = nn.Linear(config.hidden_size, config.num_labels)
    
    def forward(self, hidden_states, input_ids):
        hidden_states = self.dropout(hidden_states[-1])#只取出最后一层
        # hidden_states shape (bs, seq_len, hidden_size) -> (bs, hidden_size, seq_len) 
        hidden_states = hidden_states.permute(0, 2, 1)
        out = F.relu(self.conv(hidden_states))
        
        # out (bs, hidden_size_out, seq_len_out)
        # out (bs, hidden_size, 1)
        # out (bs, hidden_size)
        out = self.global_max_pool(out).squeeze(dim=2)
        out = self.fc(out)
        return out

In [14]:
def build_model(model_path, config, head):
    heads = {
        'cnn':ConvClassifier
    }
    model = NeZhaForTNEWS(config, model_path, heads[head](config))
    return model

In [15]:
from sklearn.metrics import f1_score, accuracy_score, precision_score, recall_score

def evaluation(config, model, val_dataloader):
    model.eval()
    preds = []
    labels = []
    val_loss = 0.
    val_iterator = tqdm(val_dataloader, desc='Evaluation...', total=len(val_dataloader))
    with torch.no_grad():
        for batch in val_iterator:
            labels.append(batch['labels'])
            batch = {item:value.to(config['device']) for item, value in batch.items()}
            
            # val output (loss, out)
            loss, logits = model(**batch)[:2]
            val_loss += loss.item()
            
            preds.append(logits.argmax(dim=-1).detach().cpu())
            
    avg_val_loss = val_loss/len(val_dataloader)
    labels = torch.cat(labels, dim=0).numpy()
    preds = torch.cat(preds, dim=0).numpy()
    
    precision = precision_score(labels, preds, average='macro')
    recall = recall_score(labels, preds, average='macro')
    f1 =f1_score(labels, preds, average='macro')
    accuracy = accuracy_score(labels, preds)
    
    return avg_val_loss, f1, precision, recall, accuracy

In [16]:
# NeZha model + Head train
from transformers import BertConfig, BertForSequenceClassification
from transformers import AdamW
from tqdm import trange

def train(config, train_dataloader, val_dataloader, model):

    optimizer = AdamW(model.parameters(), lr=config['learning_rate'])
    
    model.to(config['device'])
    
    epoches_iterator = trange(config['num_epochs'])
    global_steps = 0
    train_loss = 0.
    logging_loss = 0.
    
    best_f1 = 0.
    best_precision = 0.
    best_recall = 0.
    best_accuracy = 0.
    
    for epoch in epoches_iterator:
        train_iterator = tqdm(train_dataloader, desc='Training', total=len(train_dataloader))
        model.train()
        for batch in train_iterator:
            batch = {item:value.to(config['device']) for item, value in batch.items()}
            
            # train output (loss, out)
            loss = model(**batch)[0]
            
            model.zero_grad()  # 模型参数梯度清零
            loss.backward()  # 反向传播
            optimizer.step()  # 更新参数
            train_loss += loss.item()  # 叠加loss
            global_steps += 1
            
            if global_steps % config['logging_step'] == 0:
                print_train_loss = (train_loss - logging_loss) / config['logging_step']
                logging_loss = train_loss
                avg_val_loss, f1, precision, recall, accuracy = evaluation(config, model, val_dataloader)
                
                if best_f1 < f1:
                    best_f1 = f1
                    best_precision = precision
                    best_recall = recall
                    best_accuracy = accuracy
                    print_log = f'''>>> training loss: {print_train_loss: .4f}, valid loss: {avg_val_loss: .4f}\n
                            valid f1 score: {f1: .4f}, valid precision score: {precision: .4f},
                            valid recall score: {recall: .4f}, valid accuracy score: {accuracy: .4f}'''
                    print(print_log)
                    model.save_pretrained('../../../pt_tmp/nezha_head_base_chinese')
                    
                model.train()
                
    return best_f1, best_precision, best_recall, best_accuracy

In [18]:
# 首次运行代码
# bert_config = NeZhaConfig.from_pretrained(config['model_path'])
# bert_config.output_hidden_states = True
# bert_config.num_labels = len(id2label)
# model = build_model(config['model_path'], bert_config, config['head'])
# f1, precision, recall, accuracy = train(config, train_dataloader, val_dataloader, model)
# print_log = f'''valid f1 score: {f1: .4f}, valid precision score: {precision: .4f},
#                 valid recall score: {recall: .4f}, valid accuracy score: {accuracy: .4f}'''
# print(print_log)

# 迭代训练代码
bert_config = BertConfig.from_pretrained('../../../pt_tmp/nezha_head_base_chinese')
bert_config.output_hidden_states = True
bert_config.num_labels = len(id2label)
model = build_model('../../../pt_tmp/nezha_head_base_chinese', bert_config, config['head'])
f1, precision, recall, accuracy = train(config, train_dataloader, val_dataloader, model)
print_log = f'''valid f1 score: {f1: .4f}, valid precision score: {precision: .4f},
                valid recall score: {recall: .4f}, valid accuracy score: {accuracy: .4f}'''
print(print_log)

You are using a model of type nezha to instantiate a model of type bert. This is not supported for all configurations of models and can yield errors.
Some weights of the model checkpoint at ../../../pt_tmp/nezha_head_base_chinese were not used when initializing NeZhaModel: ['classifier.conv.weight', 'classifier.fc.bias', 'classifier.conv.bias', 'classifier.fc.weight']
- This IS expected if you are initializing NeZhaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing NeZhaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
  0%|                                                                 | 0/1 [00:00<?, ?it/s]
Training:   0%|                                                    | 0/3002 [00:0

Training:   5%|██▏                                     | 160/3002 [10:58<3:19:50,  4.22s/it][A
Training:   5%|██▏                                     | 161/3002 [11:02<3:24:56,  4.33s/it][A
Training:   5%|██▏                                     | 162/3002 [11:06<3:18:16,  4.19s/it][A
Training:   5%|██▏                                     | 163/3002 [11:10<3:13:54,  4.10s/it][A
Training:   5%|██▏                                     | 164/3002 [11:14<3:09:35,  4.01s/it][A
Training:   5%|██▏                                     | 165/3002 [11:18<3:14:48,  4.12s/it][A
Training:   6%|██▏                                     | 166/3002 [11:22<3:13:44,  4.10s/it][A
Training:   6%|██▏                                     | 167/3002 [11:27<3:23:35,  4.31s/it][A
Training:   6%|██▏                                     | 168/3002 [11:31<3:16:33,  4.16s/it][A
Training:   6%|██▎                                     | 169/3002 [11:35<3:11:55,  4.06s/it][A
Training:   6%|██▎                      

Training:  11%|████▍                                   | 330/3002 [36:29<3:14:14,  4.36s/it][A
Training:  11%|████▍                                   | 331/3002 [36:33<3:19:27,  4.48s/it][A
Training:  11%|████▍                                   | 332/3002 [36:37<3:10:58,  4.29s/it][A
Training:  11%|████▍                                   | 333/3002 [36:41<3:07:33,  4.22s/it][A
Training:  11%|████▍                                   | 334/3002 [36:45<3:01:54,  4.09s/it][A
Training:  11%|████▍                                   | 335/3002 [36:49<2:58:43,  4.02s/it][A
Training:  11%|████▍                                   | 336/3002 [36:53<2:55:32,  3.95s/it][A
Training:  11%|████▍                                   | 337/3002 [36:58<3:11:57,  4.32s/it][A
Training:  11%|████▌                                   | 338/3002 [37:03<3:20:44,  4.52s/it][A
Training:  11%|████▌                                   | 339/3002 [37:07<3:09:55,  4.28s/it][A
Training:  11%|████▌                    

Evaluation...:   0%|                                                | 0/334 [00:00<?, ?it/s][A[A

Evaluation...:   0%|                                        | 1/334 [00:01<06:56,  1.25s/it][A[A

Evaluation...:   1%|▏                                       | 2/334 [00:02<06:15,  1.13s/it][A[A

Evaluation...:   1%|▎                                       | 3/334 [00:03<05:49,  1.06s/it][A[A

Evaluation...:   1%|▍                                       | 4/334 [00:04<05:36,  1.02s/it][A[A

Evaluation...:   1%|▌                                       | 5/334 [00:05<05:40,  1.04s/it][A[A

Evaluation...:   2%|▋                                       | 6/334 [00:06<05:29,  1.00s/it][A[A

Evaluation...:   2%|▊                                       | 7/334 [00:07<05:23,  1.01it/s][A[A

Evaluation...:   2%|▉                                       | 8/334 [00:08<05:19,  1.02it/s][A[A

Evaluation...:   3%|█                                       | 9/334 [00:09<05:43,  1.06s/it][A[A



Evaluation...:  49%|██████████████████▍                   | 162/334 [02:50<03:04,  1.07s/it][A[A

Evaluation...:  49%|██████████████████▌                   | 163/334 [02:52<03:31,  1.24s/it][A[A

Evaluation...:  49%|██████████████████▋                   | 164/334 [02:53<03:30,  1.24s/it][A[A

Evaluation...:  49%|██████████████████▊                   | 165/334 [02:54<03:23,  1.21s/it][A[A

Evaluation...:  50%|██████████████████▉                   | 166/334 [02:55<03:10,  1.13s/it][A[A

Evaluation...:  50%|███████████████████                   | 167/334 [02:56<03:03,  1.10s/it][A[A

Evaluation...:  50%|███████████████████                   | 168/334 [02:57<03:06,  1.12s/it][A[A

Evaluation...:  51%|███████████████████▏                  | 169/334 [02:58<02:57,  1.08s/it][A[A

Evaluation...:  51%|███████████████████▎                  | 170/334 [03:00<03:01,  1.11s/it][A[A

Evaluation...:  51%|███████████████████▍                  | 171/334 [03:01<02:57,  1.09s/it][A[A



Evaluation...:  97%|████████████████████████████████████▊ | 324/334 [05:45<00:10,  1.06s/it][A[A

Evaluation...:  97%|████████████████████████████████████▉ | 325/334 [05:46<00:09,  1.04s/it][A[A

Evaluation...:  98%|█████████████████████████████████████ | 326/334 [05:47<00:08,  1.01s/it][A[A

Evaluation...:  98%|█████████████████████████████████████▏| 327/334 [05:47<00:06,  1.02it/s][A[A

Evaluation...:  98%|█████████████████████████████████████▎| 328/334 [05:48<00:05,  1.04it/s][A[A

Evaluation...:  99%|█████████████████████████████████████▍| 329/334 [05:49<00:04,  1.02it/s][A[A

Evaluation...:  99%|█████████████████████████████████████▌| 330/334 [05:51<00:04,  1.03s/it][A[A

Evaluation...:  99%|█████████████████████████████████████▋| 331/334 [05:52<00:03,  1.04s/it][A[A

Evaluation...:  99%|█████████████████████████████████████▊| 332/334 [05:53<00:02,  1.01s/it][A[A

Evaluation...: 100%|█████████████████████████████████████▉| 333/334 [05:54<00:01,  1.01s/it][A[A



>>> training loss:  1.3401, valid loss:  1.3036

                            valid f1 score:  0.5223, valid precision score:  0.5198,
                            valid recall score:  0.5371, valid accuracy score:  0.5566



Training:  17%|██████▌                                | 501/3002 [54:16<54:23:48, 78.30s/it][A
Training:  17%|██████▌                                | 502/3002 [54:19<38:49:41, 55.91s/it][A
Training:  17%|██████▌                                | 503/3002 [54:23<27:57:49, 40.28s/it][A
Training:  17%|██████▌                                | 504/3002 [54:27<20:19:39, 29.30s/it][A
Training:  17%|██████▌                                | 505/3002 [54:31<15:03:20, 21.71s/it][A
Training:  17%|██████▌                                | 506/3002 [54:34<11:19:23, 16.33s/it][A
Training:  17%|██████▊                                 | 507/3002 [54:38<8:41:18, 12.54s/it][A
Training:  17%|██████▊                                 | 508/3002 [54:43<6:59:47, 10.10s/it][A
Training:  17%|██████▊                                 | 509/3002 [54:47<5:44:55,  8.30s/it][A
Training:  17%|██████▊                                 | 510/3002 [54:51<4:51:08,  7.01s/it][A
Training:  17%|██████▊                 

Training:  22%|████████▍                             | 671/3002 [1:05:41<2:41:18,  4.15s/it][A
Training:  22%|████████▌                             | 672/3002 [1:05:44<2:37:31,  4.06s/it][A
Training:  22%|████████▌                             | 673/3002 [1:05:49<2:40:47,  4.14s/it][A
Training:  22%|████████▌                             | 674/3002 [1:05:53<2:37:46,  4.07s/it][A
Training:  22%|████████▌                             | 675/3002 [1:05:57<2:43:43,  4.22s/it][A
Training:  23%|████████▌                             | 676/3002 [1:06:01<2:37:55,  4.07s/it][A
Training:  23%|████████▌                             | 677/3002 [1:06:05<2:35:08,  4.00s/it][A
Training:  23%|████████▌                             | 678/3002 [1:06:09<2:32:47,  3.94s/it][A
Training:  23%|████████▌                             | 679/3002 [1:06:12<2:31:46,  3.92s/it][A
Training:  23%|████████▌                             | 680/3002 [1:06:16<2:32:39,  3.94s/it][A
Training:  23%|████████▌                

Training:  28%|██████████▋                           | 841/3002 [1:17:17<2:22:15,  3.95s/it][A
Training:  28%|██████████▋                           | 842/3002 [1:17:21<2:20:41,  3.91s/it][A
Training:  28%|██████████▋                           | 843/3002 [1:17:25<2:19:41,  3.88s/it][A
Training:  28%|██████████▋                           | 844/3002 [1:17:29<2:29:22,  4.15s/it][A
Training:  28%|██████████▋                           | 845/3002 [1:17:34<2:32:45,  4.25s/it][A
Training:  28%|██████████▋                           | 846/3002 [1:17:38<2:27:54,  4.12s/it][A
Training:  28%|██████████▋                           | 847/3002 [1:17:42<2:24:44,  4.03s/it][A
Training:  28%|██████████▋                           | 848/3002 [1:17:45<2:22:20,  3.96s/it][A
Training:  28%|██████████▋                           | 849/3002 [1:17:50<2:29:57,  4.18s/it][A
Training:  28%|██████████▊                           | 850/3002 [1:17:54<2:27:59,  4.13s/it][A
Training:  28%|██████████▊              

Evaluation...:   3%|█▏                                     | 10/334 [00:10<05:43,  1.06s/it][A[A

Evaluation...:   3%|█▎                                     | 11/334 [00:11<05:41,  1.06s/it][A[A

Evaluation...:   4%|█▍                                     | 12/334 [00:12<05:58,  1.11s/it][A[A

Evaluation...:   4%|█▌                                     | 13/334 [00:13<05:41,  1.06s/it][A[A

Evaluation...:   4%|█▋                                     | 14/334 [00:14<05:30,  1.03s/it][A[A

Evaluation...:   4%|█▊                                     | 15/334 [00:16<05:54,  1.11s/it][A[A

Evaluation...:   5%|█▊                                     | 16/334 [00:17<05:42,  1.08s/it][A[A

Evaluation...:   5%|█▉                                     | 17/334 [00:18<05:49,  1.10s/it][A[A

Evaluation...:   5%|██                                     | 18/334 [00:19<05:43,  1.09s/it][A[A

Evaluation...:   6%|██▏                                    | 19/334 [00:20<05:58,  1.14s/it][A[A



Evaluation...:  51%|███████████████████▌                  | 172/334 [03:06<02:56,  1.09s/it][A[A

Evaluation...:  52%|███████████████████▋                  | 173/334 [03:07<02:50,  1.06s/it][A[A

Evaluation...:  52%|███████████████████▊                  | 174/334 [03:08<02:49,  1.06s/it][A[A

Evaluation...:  52%|███████████████████▉                  | 175/334 [03:09<02:44,  1.04s/it][A[A

Evaluation...:  53%|████████████████████                  | 176/334 [03:10<02:53,  1.10s/it][A[A

Evaluation...:  53%|████████████████████▏                 | 177/334 [03:11<02:47,  1.07s/it][A[A

Evaluation...:  53%|████████████████████▎                 | 178/334 [03:12<02:41,  1.04s/it][A[A

Evaluation...:  54%|████████████████████▎                 | 179/334 [03:14<02:50,  1.10s/it][A[A

Evaluation...:  54%|████████████████████▍                 | 180/334 [03:15<02:59,  1.16s/it][A[A

Evaluation...:  54%|████████████████████▌                 | 181/334 [03:16<02:49,  1.11s/it][A[A



Evaluation...: 100%|██████████████████████████████████████| 334/334 [06:02<00:00,  1.09s/it][A[A
  _warn_prf(average, modifier, msg_start, len(result))


Training:  33%|████████████                        | 1001/3002 [1:34:04<44:29:47, 80.05s/it][A
Training:  33%|████████████                        | 1002/3002 [1:34:08<31:45:16, 57.16s/it][A
Training:  33%|████████████                        | 1003/3002 [1:34:12<22:49:22, 41.10s/it][A
Training:  33%|████████████                        | 1004/3002 [1:34:16<16:36:51, 29.94s/it][A
Training:  33%|████████████                        | 1005/3002 [1:34:20<12:17:17, 22.15s/it][A
Training:  34%|████████████▍                        | 1006/3002 [1:34:23<9:13:05, 16.63s/it][A
Training:  34%|████████████▍                        | 1007/3002 [1:34:27<7:07:33, 12.86s/it][A
Training:  34%|████████████▍                        | 1008/3002 [1:34:31<5:37:11, 10.15s/it][A
Training:  34%|████████████▍                        | 1009/3002 [1:34:35<4:3

Training:  39%|██████████████▎                      | 1162/3002 [1:45:00<2:10:51,  4.27s/it][A
Training:  39%|██████████████▎                      | 1163/3002 [1:45:04<2:06:22,  4.12s/it][A
Training:  39%|██████████████▎                      | 1164/3002 [1:45:09<2:11:00,  4.28s/it][A
Training:  39%|██████████████▎                      | 1165/3002 [1:45:12<2:07:01,  4.15s/it][A
Training:  39%|██████████████▎                      | 1166/3002 [1:45:17<2:09:37,  4.24s/it][A
Training:  39%|██████████████▍                      | 1167/3002 [1:45:21<2:08:24,  4.20s/it][A
Training:  39%|██████████████▍                      | 1168/3002 [1:45:25<2:11:23,  4.30s/it][A
Training:  39%|██████████████▍                      | 1169/3002 [1:45:31<2:18:01,  4.52s/it][A
Training:  39%|██████████████▍                      | 1170/3002 [1:45:34<2:12:00,  4.32s/it][A
Training:  39%|██████████████▍                      | 1171/3002 [1:45:38<2:07:31,  4.18s/it][A
Training:  39%|██████████████▍          

Training:  44%|████████████████▍                    | 1332/3002 [1:56:25<1:54:04,  4.10s/it][A
Training:  44%|████████████████▍                    | 1333/3002 [1:56:28<1:51:07,  4.00s/it][A
Training:  44%|████████████████▍                    | 1334/3002 [1:56:32<1:49:00,  3.92s/it][A
Training:  44%|████████████████▍                    | 1335/3002 [1:56:36<1:47:35,  3.87s/it][A
Training:  45%|████████████████▍                    | 1336/3002 [1:56:40<1:52:00,  4.03s/it][A
Training:  45%|████████████████▍                    | 1337/3002 [1:56:44<1:53:28,  4.09s/it][A
Training:  45%|████████████████▍                    | 1338/3002 [1:56:49<1:58:30,  4.27s/it][A
Training:  45%|████████████████▌                    | 1339/3002 [1:56:54<2:03:37,  4.46s/it][A
Training:  45%|████████████████▌                    | 1340/3002 [1:56:58<1:57:40,  4.25s/it][A
Training:  45%|████████████████▌                    | 1341/3002 [1:57:02<1:53:31,  4.10s/it][A
Training:  45%|████████████████▌        

Evaluation...:   1%|▏                                       | 2/334 [00:02<06:16,  1.13s/it][A[A

Evaluation...:   1%|▎                                       | 3/334 [00:03<05:51,  1.06s/it][A[A

Evaluation...:   1%|▍                                       | 4/334 [00:04<05:40,  1.03s/it][A[A

Evaluation...:   1%|▌                                       | 5/334 [00:05<05:45,  1.05s/it][A[A

Evaluation...:   2%|▋                                       | 6/334 [00:06<05:33,  1.02s/it][A[A

Evaluation...:   2%|▊                                       | 7/334 [00:07<05:28,  1.00s/it][A[A

Evaluation...:   2%|▉                                       | 8/334 [00:08<05:24,  1.00it/s][A[A

Evaluation...:   3%|█                                       | 9/334 [00:09<05:48,  1.07s/it][A[A

Evaluation...:   3%|█▏                                     | 10/334 [00:10<05:41,  1.05s/it][A[A

Evaluation...:   3%|█▎                                     | 11/334 [00:11<05:39,  1.05s/it][A[A



Evaluation...:  49%|██████████████████▋                   | 164/334 [02:56<03:32,  1.25s/it][A[A

Evaluation...:  49%|██████████████████▊                   | 165/334 [02:57<03:25,  1.21s/it][A[A

Evaluation...:  50%|██████████████████▉                   | 166/334 [02:58<03:12,  1.14s/it][A[A

Evaluation...:  50%|███████████████████                   | 167/334 [02:59<03:04,  1.11s/it][A[A

Evaluation...:  50%|███████████████████                   | 168/334 [03:00<03:08,  1.13s/it][A[A

Evaluation...:  51%|███████████████████▏                  | 169/334 [03:01<02:59,  1.09s/it][A[A

Evaluation...:  51%|███████████████████▎                  | 170/334 [03:02<03:04,  1.12s/it][A[A

Evaluation...:  51%|███████████████████▍                  | 171/334 [03:03<03:00,  1.11s/it][A[A

Evaluation...:  51%|███████████████████▌                  | 172/334 [03:04<02:55,  1.08s/it][A[A

Evaluation...:  52%|███████████████████▋                  | 173/334 [03:05<02:49,  1.05s/it][A[A



Evaluation...:  98%|█████████████████████████████████████ | 326/334 [05:52<00:08,  1.03s/it][A[A

Evaluation...:  98%|█████████████████████████████████████▏| 327/334 [05:53<00:06,  1.00it/s][A[A

Evaluation...:  98%|█████████████████████████████████████▎| 328/334 [05:54<00:05,  1.01it/s][A[A

Evaluation...:  99%|█████████████████████████████████████▍| 329/334 [05:55<00:05,  1.00s/it][A[A

Evaluation...:  99%|█████████████████████████████████████▌| 330/334 [05:56<00:04,  1.06s/it][A[A

Evaluation...:  99%|█████████████████████████████████████▋| 331/334 [05:57<00:03,  1.06s/it][A[A

Evaluation...:  99%|█████████████████████████████████████▊| 332/334 [05:58<00:02,  1.03s/it][A[A

Evaluation...: 100%|█████████████████████████████████████▉| 333/334 [05:59<00:01,  1.02s/it][A[A

Evaluation...: 100%|██████████████████████████████████████| 334/334 [06:00<00:00,  1.08s/it][A[A



>>> training loss:  1.0515, valid loss:  1.2484

                            valid f1 score:  0.5496, valid precision score:  0.5575,
                            valid recall score:  0.5497, valid accuracy score:  0.5560



Training:  50%|██████████████████                  | 1501/3002 [2:13:44<33:13:46, 79.70s/it][A
Training:  50%|██████████████████                  | 1502/3002 [2:13:48<23:43:07, 56.93s/it][A
Training:  50%|██████████████████                  | 1503/3002 [2:13:52<17:06:25, 41.08s/it][A
Training:  50%|██████████████████                  | 1504/3002 [2:13:56<12:26:45, 29.91s/it][A
Training:  50%|██████████████████▌                  | 1505/3002 [2:14:00<9:17:39, 22.35s/it][A
Training:  50%|██████████████████▌                  | 1506/3002 [2:14:05<7:00:49, 16.88s/it][A
Training:  50%|██████████████████▌                  | 1507/3002 [2:14:08<5:23:43, 12.99s/it][A
Training:  50%|██████████████████▌                  | 1508/3002 [2:14:12<4:13:54, 10.20s/it][A
Training:  50%|██████████████████▌                  | 1509/3002 [2:14:16<3:29:47,  8.43s/it][A
Training:  50%|██████████████████▌                  | 1510/3002 [2:14:20<2:55:56,  7.08s/it][A
Training:  50%|██████████████████▌     

Training:  56%|████████████████████▌                | 1671/3002 [2:25:13<1:26:24,  3.90s/it][A
Training:  56%|████████████████████▌                | 1672/3002 [2:25:18<1:31:06,  4.11s/it][A
Training:  56%|████████████████████▌                | 1673/3002 [2:25:23<1:40:28,  4.54s/it][A
Training:  56%|████████████████████▋                | 1674/3002 [2:25:27<1:36:20,  4.35s/it][A
Training:  56%|████████████████████▋                | 1675/3002 [2:25:31<1:32:12,  4.17s/it][A
Training:  56%|████████████████████▋                | 1676/3002 [2:25:35<1:29:19,  4.04s/it][A
Training:  56%|████████████████████▋                | 1677/3002 [2:25:38<1:27:15,  3.95s/it][A
Training:  56%|████████████████████▋                | 1678/3002 [2:25:42<1:25:49,  3.89s/it][A
Training:  56%|████████████████████▋                | 1679/3002 [2:25:46<1:25:01,  3.86s/it][A
Training:  56%|████████████████████▋                | 1680/3002 [2:25:50<1:24:22,  3.83s/it][A
Training:  56%|████████████████████▋    

Training:  61%|██████████████████████▋              | 1841/3002 [2:36:44<1:14:35,  3.85s/it][A
Training:  61%|██████████████████████▋              | 1842/3002 [2:36:49<1:18:43,  4.07s/it][A
Training:  61%|██████████████████████▋              | 1843/3002 [2:36:53<1:16:11,  3.94s/it][A
Training:  61%|██████████████████████▋              | 1844/3002 [2:36:57<1:18:09,  4.05s/it][A
Training:  61%|██████████████████████▋              | 1845/3002 [2:37:01<1:16:39,  3.98s/it][A
Training:  61%|██████████████████████▊              | 1846/3002 [2:37:05<1:15:35,  3.92s/it][A
Training:  62%|██████████████████████▊              | 1847/3002 [2:37:08<1:13:54,  3.84s/it][A
Training:  62%|██████████████████████▊              | 1848/3002 [2:37:12<1:16:02,  3.95s/it][A
Training:  62%|██████████████████████▊              | 1849/3002 [2:37:16<1:15:03,  3.91s/it][A
Training:  62%|██████████████████████▊              | 1850/3002 [2:37:20<1:14:00,  3.85s/it][A
Training:  62%|██████████████████████▊  

Evaluation...:   3%|█▏                                     | 10/334 [00:11<05:57,  1.10s/it][A[A

Evaluation...:   3%|█▎                                     | 11/334 [00:12<06:02,  1.12s/it][A[A

Evaluation...:   4%|█▍                                     | 12/334 [00:14<06:27,  1.20s/it][A[A

Evaluation...:   4%|█▌                                     | 13/334 [00:15<06:00,  1.12s/it][A[A

Evaluation...:   4%|█▋                                     | 14/334 [00:15<05:45,  1.08s/it][A[A

Evaluation...:   4%|█▊                                     | 15/334 [00:17<06:07,  1.15s/it][A[A

Evaluation...:   5%|█▊                                     | 16/334 [00:18<05:50,  1.10s/it][A[A

Evaluation...:   5%|█▉                                     | 17/334 [00:19<05:54,  1.12s/it][A[A

Evaluation...:   5%|██                                     | 18/334 [00:20<05:46,  1.10s/it][A[A

Evaluation...:   6%|██▏                                    | 19/334 [00:21<05:59,  1.14s/it][A[A



Evaluation...:  51%|███████████████████▌                  | 172/334 [03:04<02:52,  1.07s/it][A[A

Evaluation...:  52%|███████████████████▋                  | 173/334 [03:05<02:47,  1.04s/it][A[A

Evaluation...:  52%|███████████████████▊                  | 174/334 [03:06<02:46,  1.04s/it][A[A

Evaluation...:  52%|███████████████████▉                  | 175/334 [03:07<02:42,  1.02s/it][A[A

Evaluation...:  53%|████████████████████                  | 176/334 [03:09<02:51,  1.09s/it][A[A

Evaluation...:  53%|████████████████████▏                 | 177/334 [03:09<02:44,  1.05s/it][A[A

Evaluation...:  53%|████████████████████▎                 | 178/334 [03:10<02:38,  1.01s/it][A[A

Evaluation...:  54%|████████████████████▎                 | 179/334 [03:12<02:49,  1.09s/it][A[A

Evaluation...:  54%|████████████████████▍                 | 180/334 [03:13<02:58,  1.16s/it][A[A

Evaluation...:  54%|████████████████████▌                 | 181/334 [03:14<02:49,  1.11s/it][A[A



Evaluation...: 100%|██████████████████████████████████████| 334/334 [06:00<00:00,  1.08s/it][A[A


Training:  67%|███████████████████████▉            | 2001/3002 [2:53:31<22:11:06, 79.79s/it][A
Training:  67%|████████████████████████            | 2002/3002 [2:53:34<15:49:17, 56.96s/it][A
Training:  67%|████████████████████████            | 2003/3002 [2:53:38<11:22:56, 41.02s/it][A
Training:  67%|████████████████████████▋            | 2004/3002 [2:53:42<8:16:21, 29.84s/it][A
Training:  67%|████████████████████████▋            | 2005/3002 [2:53:46<6:06:13, 22.04s/it][A
Training:  67%|████████████████████████▋            | 2006/3002 [2:53:49<4:34:54, 16.56s/it][A
Training:  67%|████████████████████████▋            | 2007/3002 [2:53:53<3:31:08, 12.73s/it][A
Training:  67%|████████████████████████▋            | 2008/3002 [2:53:57<2:46:25, 10.05s/it][A
Training:  67%|████████████████████████▊            | 2009/3002 [2:54:01<2:14:39,  8.14s/it][A
Training:  67%|████████████████████

Training:  72%|████████████████████████████▏          | 2166/3002 [3:04:51<56:44,  4.07s/it][A
Training:  72%|████████████████████████████▏          | 2167/3002 [3:04:55<58:07,  4.18s/it][A
Training:  72%|████████████████████████████▏          | 2168/3002 [3:04:59<56:55,  4.10s/it][A
Training:  72%|████████████████████████████▏          | 2169/3002 [3:05:03<55:26,  3.99s/it][A
Training:  72%|████████████████████████████▏          | 2170/3002 [3:05:08<59:29,  4.29s/it][A
Training:  72%|██████████████████████████▊          | 2171/3002 [3:05:12<1:01:11,  4.42s/it][A
Training:  72%|██████████████████████████▊          | 2172/3002 [3:05:18<1:06:17,  4.79s/it][A
Training:  72%|██████████████████████████▊          | 2173/3002 [3:05:22<1:01:58,  4.49s/it][A
Training:  72%|████████████████████████████▏          | 2174/3002 [3:05:26<59:13,  4.29s/it][A
Training:  72%|████████████████████████████▎          | 2175/3002 [3:05:29<57:17,  4.16s/it][A
Training:  72%|█████████████████████████

Training:  78%|██████████████████████████████▎        | 2336/3002 [3:18:34<53:18,  4.80s/it][A
Training:  78%|████████████████████████████        | 2337/3002 [3:23:23<16:36:04, 89.87s/it][A
Training:  78%|████████████████████████████        | 2338/3002 [3:23:26<11:48:36, 64.03s/it][A
Training:  78%|████████████████████████████▊        | 2339/3002 [3:23:31<8:30:26, 46.19s/it][A
Training:  78%|████████████████████████████▊        | 2340/3002 [3:23:45<6:44:19, 36.65s/it][A
Training:  78%|█████████████████████████▋       | 2341/3002 [5:18:41<384:31:56, 2094.28s/it][A
Training:  78%|█████████████████████████▋       | 2342/3002 [7:18:53<665:26:45, 3629.71s/it][A
Training:  78%|█████████████████████████▊       | 2343/3002 [8:40:32<734:09:45, 4010.60s/it][A
Training:  78%|████████████████████████▉       | 2344/3002 [10:37:29<897:51:16, 4912.27s/it][A
Training:  78%|████████████████████████▉       | 2345/3002 [10:37:33<627:46:52, 3439.90s/it][A
Training:  78%|█████████████████████████

Evaluation...:   2%|▋                                       | 6/334 [00:06<05:35,  1.02s/it][A[A

Evaluation...:   2%|▊                                       | 7/334 [00:07<05:30,  1.01s/it][A[A

Evaluation...:   2%|▉                                       | 8/334 [00:08<05:26,  1.00s/it][A[A

Evaluation...:   3%|█                                       | 9/334 [00:09<05:50,  1.08s/it][A[A

Evaluation...:   3%|█▏                                     | 10/334 [00:10<05:43,  1.06s/it][A[A

Evaluation...:   3%|█▎                                     | 11/334 [00:11<05:41,  1.06s/it][A[A

Evaluation...:   4%|█▍                                     | 12/334 [00:12<05:59,  1.12s/it][A[A

Evaluation...:   4%|█▌                                     | 13/334 [00:13<05:43,  1.07s/it][A[A

Evaluation...:   4%|█▋                                     | 14/334 [00:14<05:30,  1.03s/it][A[A

Evaluation...:   4%|█▊                                     | 15/334 [00:16<05:56,  1.12s/it][A[A



Evaluation...:  50%|███████████████████                   | 168/334 [03:02<03:09,  1.14s/it][A[A

Evaluation...:  51%|███████████████████▏                  | 169/334 [03:02<03:00,  1.10s/it][A[A

Evaluation...:  51%|███████████████████▎                  | 170/334 [03:04<03:04,  1.13s/it][A[A

Evaluation...:  51%|███████████████████▍                  | 171/334 [03:05<03:01,  1.11s/it][A[A

Evaluation...:  51%|███████████████████▌                  | 172/334 [03:06<02:55,  1.08s/it][A[A

Evaluation...:  52%|███████████████████▋                  | 173/334 [03:07<02:50,  1.06s/it][A[A

Evaluation...:  52%|███████████████████▊                  | 174/334 [03:08<02:49,  1.06s/it][A[A

Evaluation...:  52%|███████████████████▉                  | 175/334 [03:09<02:44,  1.04s/it][A[A

Evaluation...:  53%|████████████████████                  | 176/334 [03:10<02:53,  1.10s/it][A[A

Evaluation...:  53%|████████████████████▏                 | 177/334 [03:11<02:46,  1.06s/it][A[A



Evaluation...:  99%|█████████████████████████████████████▌| 330/334 [05:58<00:04,  1.06s/it][A[A

Evaluation...:  99%|█████████████████████████████████████▋| 331/334 [05:59<00:03,  1.06s/it][A[A

Evaluation...:  99%|█████████████████████████████████████▊| 332/334 [06:00<00:02,  1.03s/it][A[A

Evaluation...: 100%|█████████████████████████████████████▉| 333/334 [06:01<00:01,  1.03s/it][A[A

Evaluation...: 100%|██████████████████████████████████████| 334/334 [06:01<00:00,  1.08s/it][A[A



>>> training loss:  1.0400, valid loss:  1.2364

                            valid f1 score:  0.5524, valid precision score:  0.5774,
                            valid recall score:  0.5439, valid accuracy score:  0.5622



Training:  83%|█████████████████████████████▏     | 2501/3002 [10:54:10<11:11:16, 80.39s/it][A
Training:  83%|██████████████████████████████      | 2502/3002 [10:54:13<7:58:09, 57.38s/it][A
Training:  83%|██████████████████████████████      | 2503/3002 [10:54:19<5:48:18, 41.88s/it][A
Training:  83%|██████████████████████████████      | 2504/3002 [10:54:24<4:16:38, 30.92s/it][A
Training:  83%|██████████████████████████████      | 2505/3002 [10:54:28<3:09:12, 22.84s/it][A
Training:  83%|██████████████████████████████      | 2506/3002 [10:54:33<2:23:09, 17.32s/it][A
Training:  84%|██████████████████████████████      | 2507/3002 [10:54:37<1:49:53, 13.32s/it][A
Training:  84%|██████████████████████████████      | 2508/3002 [10:54:41<1:27:12, 10.59s/it][A
Training:  84%|██████████████████████████████      | 2509/3002 [10:54:45<1:10:33,  8.59s/it][A
Training:  84%|███████████████████████████████▊      | 2510/3002 [10:54:49<59:10,  7.22s/it][A
Training:  84%|████████████████████████

Training:  89%|█████████████████████████████████▊    | 2671/3002 [11:05:59<22:41,  4.11s/it][A
Training:  89%|█████████████████████████████████▊    | 2672/3002 [11:06:02<22:15,  4.05s/it][A
Training:  89%|█████████████████████████████████▊    | 2673/3002 [11:06:06<22:04,  4.02s/it][A
Training:  89%|█████████████████████████████████▊    | 2674/3002 [11:06:10<21:41,  3.97s/it][A
Training:  89%|█████████████████████████████████▊    | 2675/3002 [11:06:14<21:39,  3.97s/it][A
Training:  89%|█████████████████████████████████▊    | 2676/3002 [11:06:18<22:02,  4.06s/it][A
Training:  89%|█████████████████████████████████▉    | 2677/3002 [11:06:22<21:38,  3.99s/it][A
Training:  89%|█████████████████████████████████▉    | 2678/3002 [11:06:27<22:04,  4.09s/it][A
Training:  89%|█████████████████████████████████▉    | 2679/3002 [11:06:30<21:39,  4.02s/it][A
Training:  89%|█████████████████████████████████▉    | 2680/3002 [11:06:35<22:25,  4.18s/it][A
Training:  89%|█████████████████████████

Training:  95%|███████████████████████████████████▉  | 2841/3002 [11:17:50<11:14,  4.19s/it][A
Training:  95%|███████████████████████████████████▉  | 2842/3002 [11:17:54<10:57,  4.11s/it][A
Training:  95%|███████████████████████████████████▉  | 2843/3002 [11:17:58<10:38,  4.02s/it][A
Training:  95%|████████████████████████████████████  | 2844/3002 [11:18:03<11:19,  4.30s/it][A
Training:  95%|████████████████████████████████████  | 2845/3002 [11:18:07<11:09,  4.26s/it][A
Training:  95%|████████████████████████████████████  | 2846/3002 [11:18:11<11:01,  4.24s/it][A
Training:  95%|████████████████████████████████████  | 2847/3002 [11:18:16<11:40,  4.52s/it][A
Training:  95%|████████████████████████████████████  | 2848/3002 [11:18:21<11:56,  4.65s/it][A
Training:  95%|████████████████████████████████████  | 2849/3002 [11:18:26<11:55,  4.68s/it][A
Training:  95%|████████████████████████████████████  | 2850/3002 [11:18:30<11:18,  4.47s/it][A
Training:  95%|█████████████████████████

Evaluation...:   3%|█▏                                     | 10/334 [00:10<05:43,  1.06s/it][A[A

Evaluation...:   3%|█▎                                     | 11/334 [00:11<05:41,  1.06s/it][A[A

Evaluation...:   4%|█▍                                     | 12/334 [00:12<05:58,  1.11s/it][A[A

Evaluation...:   4%|█▌                                     | 13/334 [00:13<05:43,  1.07s/it][A[A

Evaluation...:   4%|█▋                                     | 14/334 [00:14<05:31,  1.04s/it][A[A

Evaluation...:   4%|█▊                                     | 15/334 [00:16<05:56,  1.12s/it][A[A

Evaluation...:   5%|█▊                                     | 16/334 [00:17<05:43,  1.08s/it][A[A

Evaluation...:   5%|█▉                                     | 17/334 [00:18<05:50,  1.11s/it][A[A

Evaluation...:   5%|██                                     | 18/334 [00:19<05:43,  1.09s/it][A[A

Evaluation...:   6%|██▏                                    | 19/334 [00:20<05:59,  1.14s/it][A[A



Evaluation...:  51%|███████████████████▌                  | 172/334 [03:05<02:56,  1.09s/it][A[A

Evaluation...:  52%|███████████████████▋                  | 173/334 [03:06<02:50,  1.06s/it][A[A

Evaluation...:  52%|███████████████████▊                  | 174/334 [03:07<02:48,  1.05s/it][A[A

Evaluation...:  52%|███████████████████▉                  | 175/334 [03:08<02:44,  1.04s/it][A[A

Evaluation...:  53%|████████████████████                  | 176/334 [03:09<02:53,  1.10s/it][A[A

Evaluation...:  53%|████████████████████▏                 | 177/334 [03:10<02:47,  1.07s/it][A[A

Evaluation...:  53%|████████████████████▎                 | 178/334 [03:11<02:41,  1.03s/it][A[A

Evaluation...:  54%|████████████████████▎                 | 179/334 [03:12<02:49,  1.10s/it][A[A

Evaluation...:  54%|████████████████████▍                 | 180/334 [03:13<02:58,  1.16s/it][A[A

Evaluation...:  54%|████████████████████▌                 | 181/334 [03:14<02:49,  1.11s/it][A[A



Evaluation...: 100%|██████████████████████████████████████| 334/334 [06:00<00:00,  1.08s/it][A[A


Training: 100%|█████████████████████████████████████▉| 3001/3002 [11:35:04<01:19, 79.64s/it][A
Training: 100%|██████████████████████████████████████| 3002/3002 [11:35:06<00:00, 13.89s/it][A
100%|███████████████████████████████████████████████████| 1/1 [11:35:06<00:00, 41706.75s/it]


valid f1 score:  0.5524, valid precision score:  0.5774,
                valid recall score:  0.5439, valid accuracy score:  0.5622


## 预测并保存结果

In [18]:
def predict(config, id2label, model, test_dataloader):
    test_iterator = tqdm(test_dataloader, desc='Testing', total=len(test_dataloader))
    model.eval()
    test_preds = []
    
    with torch.no_grad():
        for batch in test_iterator:
            batch = {item: value.to(config['device']) for item, value in batch.items()}

            logits = model(**batch)[1]
            test_preds.append(logits.argmax(dim=-1).detach().cpu())
            
    test_preds = torch.cat(test_preds, dim=0).numpy()
    test_preds = [id2label[id_] for id_ in test_preds]
        
    test_df = pd.read_csv(config['test_file_path'], sep=',')
    # test_df.insert(1, column=['label_pred'], value=test_preds)
    test_df['label_pred'] = test_preds
    # test_df.drop(columns=['sentence'], inplace=True)
    test_df.to_csv('submission.csv', index=False, encoding='utf8')

In [19]:
predict(config, id2label, best_model, test_dataloader)

Testing: 100%|████████████████████████████████████████████| 625/625 [10:44<00:00,  1.03s/it]


In [20]:
test_df = pd.read_csv(config['test_file_path'], sep=',')

In [21]:
train_df = pd.read_csv(config['train_file_path'], sep=',')

In [22]:
train_df.head(10)

Unnamed: 0,id,label,label_desc,sentence
0,0,108,news_edu,上课时学生手机响个不停，老师一怒之下把手机摔了，家长拿发票让老师赔，大家怎么看待这种事？
1,1,104,news_finance,商赢环球股份有限公司关于延期回复上海证券交易所对公司2017年年度报告的事后审核问询函的公告
2,2,106,news_house,通过中介公司买了二手房，首付都付了，现在卖家不想卖了。怎么处理？
3,3,112,news_travel,2018年去俄罗斯看世界杯得花多少钱？
4,4,109,news_tech,剃须刀的个性革新，雷明登天猫定制版新品首发
5,5,103,news_sports,再次证明了“无敌是多么寂寞”——逆天的中国乒乓球队！
6,6,109,news_tech,三农盾SACC-全球首个推出：互联网+区块链+农产品的电商平台
7,7,116,news_game,重做or新英雄？其实重做对暴雪来说同样重要
8,8,103,news_sports,如何在商业活动中不受人欺骗？
9,9,101,news_culture,87版红楼梦最温柔的四个丫鬟，娶谁都是一生的福气


In [23]:
train_df['label'].unique()

array([108, 104, 106, 112, 109, 103, 116, 101, 107, 100, 102, 110, 115,
       113, 114])