## 对抗训练
![对抗样本](https://img-blog.csdnimg.cn/20210713144115288.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3dlaXhpbl80MTI4NzA2MA==,size_16,color_FFFFFF,t_70)



![对抗训练](https://img-blog.csdnimg.cn/20210716113626200.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3dlaXhpbl80MTI4NzA2MA==,size_16,color_FFFFFF,t_70)

### 对抗训练方法
**Fast Gradient Method(FGM)**\
**Projected Gradient Descent(PGD)** 

In [1]:
!nvidia-smi

Mon May 30 06:32:36 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03    Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   41C    P8    10W /  70W |      0MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

In [2]:
from google.colab import drive
import sys
drive.mount('/content/drive')
#设置路径
sys.path.append('/content/drive/MyDrive/Colab Notebooks')

Mounted at /content/drive


In [3]:
! pip install transformers==4.0.1

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting transformers==4.0.1
  Downloading transformers-4.0.1-py3-none-any.whl (1.4 MB)
[K     |████████████████████████████████| 1.4 MB 5.1 MB/s 
Collecting tokenizers==0.9.4
  Downloading tokenizers-0.9.4-cp37-cp37m-manylinux2010_x86_64.whl (2.9 MB)
[K     |████████████████████████████████| 2.9 MB 39.5 MB/s 
[?25hCollecting sacremoses
  Downloading sacremoses-0.0.53.tar.gz (880 kB)
[K     |████████████████████████████████| 880 kB 25.5 MB/s 
Building wheels for collected packages: sacremoses
  Building wheel for sacremoses (setup.py) ... [?25l[?25hdone
  Created wheel for sacremoses: filename=sacremoses-0.0.53-py3-none-any.whl size=895260 sha256=40a9d7bcf942cb2bb3b6105a011e86e2bc245818e74e7149e1b260fa19478e37
  Stored in directory: /root/.cache/pip/wheels/87/39/dd/a83eeef36d0bf98e7a4d1933a4ad2d660295a40613079bafc9
Successfully built sacremoses
Installing collected packages: token

In [4]:
! pip install torch==1.4.0

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting torch==1.4.0
  Downloading torch-1.4.0-cp37-cp37m-manylinux1_x86_64.whl (753.4 MB)
[K     |████████████████████████████████| 753.4 MB 6.6 kB/s 
[?25hInstalling collected packages: torch
  Attempting uninstall: torch
    Found existing installation: torch 1.11.0+cu113
    Uninstalling torch-1.11.0+cu113:
      Successfully uninstalled torch-1.11.0+cu113
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
torchvision 0.12.0+cu113 requires torch==1.11.0, but you have torch 1.4.0 which is incompatible.
torchtext 0.12.0 requires torch==1.11.0, but you have torch 1.4.0 which is incompatible.
torchaudio 0.11.0+cu113 requires torch==1.11.0, but you have torch 1.4.0 which is incompatible.[0m
Successfully installed torch-1.4.0


In [5]:
import torch
import random
import numpy as np
import pandas as pd
from tqdm import tqdm

config = {
    'train_file_path':'/content/drive/MyDrive/Colab Notebooks/dataset/train.csv',
    'test_file_path':'/content/drive/MyDrive/Colab Notebooks/dataset/test.csv',
    'train_val_ratio':0.1,
    'model_path':'/content/drive/MyDrive/Colab Notebooks/dataset/NeZha_model',
    'batch_size':16,
    'head': 'CNN',
    'num_epochs':1,
    'warmup_ratio':0.1, # warm up
    'eps':0.1,    #对抗模型需要的参数
    'alpha':0.3,   #pgd需要的参数
    'adv':'fgm',   #对抗训练的方法
    'learning_rate':2e-5,
    'logging_step':500,
    'seed':2022
}

config['device'] = 'cuda' if torch.cuda.is_available() else 'cpu'

def seed_everything(seed):
  random.seed(seed)
  np.random.seed(seed)
  torch.manual_seed(seed)
  torch.cuda.manual_seed_all(seed)
  return seed

seed_everything(config['seed'])

2022

In [6]:
from collections import defaultdict
def read_data(config, tokenizer, mode = 'train'):
  data_df = pd.read_csv(config[f'{mode}_file_path'], sep=',')
  if mode == 'train':
    X_train, y_train = defaultdict(list),[]
    X_val, y_val = defaultdict(list),[]
    num_val = int(len(data_df) * config['train_val_ratio'])
  else:
    X_test, y_test = defaultdict(list),[]

  for i, row in tqdm(data_df.iterrows(), desc=f'preprocess {mode} data', colour = 'blue', total = len(data_df)):
    label = row[1] if mode == 'train' else 0
    sentence = row[-1]

    inputs = tokenizer.encode_plus(sentence, add_special_tokens = True, return_token_type_ids = True, return_attention_mask = True)

    if mode == 'train':
      if i < num_val:
        X_val['inputs_ids'].append(inputs['input_ids'])
        y_val.append(label)
        X_val['token_type_ids'].append(inputs['token_type_ids'])
        X_val['attention_mask'].append(inputs['attention_mask'])
      else:
        X_train['inputs_ids'].append(inputs['input_ids'])
        y_train.append(label)
        X_train['token_type_ids'].append(inputs['token_type_ids'])
        X_train['attention_mask'].append(inputs['attention_mask'])

    else:
        X_test['inputs_ids'].append(inputs['input_ids'])
        y_test.append(label)
        X_test['token_type_ids'].append(inputs['token_type_ids'])
        X_test['attention_mask'].append(inputs['attention_mask'])

  if mode == 'train':
    label2id = {label: i for i, label in enumerate(np.unique(y_train))}
    id2label = {i: label for label, i in label2id.items()}

    y_train = torch.tensor([label2id[i] for i in y_train], dtype =torch.long)

    y_val = torch.tensor([label2id[i] for i in y_val], dtype =torch.long)
    return X_train, y_train, X_val, y_val, label2id, id2label

  else:
    y_test = torch.tensor(y_test, dtype = torch.long)
    return X_test, y_test


In [7]:
from torch.utils.data import Dataset
class TNEWSData(Dataset):
  def __init__(self, X, y):
    self.x = X
    self.y = y
  
 
  def __getitem__(self, idx):
    return{
        'inputs_ids': self.x['inputs_ids'][idx],
        'label':self.y[idx],
        'token_type_ids':self.x['token_type_ids'][idx],
        'attention_mask':self.x['attention_mask'][idx]

    }

 
  def __len__(self):
    return self.y.size(0)

In [8]:
def collate_fn(example):
  input_ids_list = []
  labels = []
  token_type_ids_list = []
  attention_mask_list = []

  for ex in example:
    input_ids_list.append(ex['inputs_ids'])
    labels.append(ex['label'])
    token_type_ids_list.append(ex['token_type_ids'])
    attention_mask_list.append(ex['attention_mask'])

  max_len = max(len(input_ids) for input_ids in input_ids_list)
  input_ids_tensor = torch.zeros((len(labels), max_len),dtype=torch.long)
  token_type_ids_tensor = torch.zeros_like(input_ids_tensor)
  attention_mask_tensor = torch.zeros_like(input_ids_tensor)

  for i, input_ids in enumerate(input_ids_list):
    input_ids_tensor[i, :len(input_ids)] = torch.tensor(input_ids, dtype = torch.long)
    token_type_ids_tensor[i, :len(input_ids)] = torch.tensor(token_type_ids_list[i], dtype = torch.long)
    attention_mask_tensor[i, :len(input_ids)] = torch.tensor(attention_mask_list[i], dtype = torch.long)

  return {
      'input_ids': input_ids_tensor,
      'labels': torch.tensor(labels ,dtype= torch.long),
      'token_type_ids':token_type_ids_tensor,
      'attention_mask':attention_mask_tensor
  }  

In [9]:
from transformers import BertTokenizer
from torch.utils.data import DataLoader
def build_dataloader(config):
  tokenizer = BertTokenizer.from_pretrained(config['model_path'])
  X_train, y_train, X_val, y_val, label2id, id2label = read_data(config, tokenizer, mode='train')
  X_test, y_test = read_data(config, tokenizer, mode='test')

  train_dataset = TNEWSData(X_train, y_train)
  val_dataset = TNEWSData(X_val, y_val)
  test_dataset = TNEWSData(X_test, y_test)

  train_dataloader = DataLoader(train_dataset, batch_size=config['batch_size'], num_workers = 4, shuffle = True, collate_fn=collate_fn)
  val_dataloader = DataLoader(val_dataset, batch_size=config['batch_size'], num_workers = 4, shuffle = False, collate_fn=collate_fn)
  test_dataloader = DataLoader(test_dataset, batch_size=config['batch_size'], num_workers = 4, shuffle = False, collate_fn=collate_fn)

  return train_dataloader, val_dataloader, test_dataloader, id2label


In [10]:
train_dataloader, val_dataloader, test_dataloader, id2label = build_dataloader(config)

preprocess train data: 100%|[34m██████████[0m| 53360/53360 [00:33<00:00, 1611.89it/s]
preprocess test data: 100%|[34m██████████[0m| 10000/10000 [00:04<00:00, 2475.15it/s]


In [11]:
import torch.nn.functional as F
import torch.nn as nn
from NeZha import *
class NeZhaForTNEWS(NeZhaPreTrainedModel):
  def __init__(self, config, model_path, classifier):
    super(NeZhaForTNEWS, self).__init__(config)

    self.nezha = NeZhaModel.from_pretrained(model_path, config=config)
    self.classifier = classifier  # head
    self.config = config

  def forward(self, input_ids, token_type_ids, attention_mask, labels):
    outputs = self.nezha(input_ids = input_ids,
                token_type_ids = token_type_ids,
                attention_mask = attention_mask)
    hidden_states = outputs[2]

    logits = self.classifier(hidden_states, input_ids)

    outputs = (logits, )

    if labels is not None:
      loss_fct = FocalLoss(num_classes=self.config.num_labels)
      loss = loss_fct(logits, labels.view(-1))
      outputs = (loss, )+ outputs

    return outputs

In [12]:
from typing import List
class ConvClassifier(nn.Module):
  def __init__(self, config):
    super().__init__()
    self.conv = nn.Conv1d(in_channels = config.hidden_size, out_channels = config.hidden_size, kernel_size = 3, padding=(3 - 1) // 2)
    self.global_max_pool = nn.AdaptiveMaxPool1d(1)
    self.dropout = nn.Dropout(config.hidden_dropout_prob)
    self.fc = nn.Linear(config.hidden_size, config.num_labels)

  def forward(self, hidden_states: List[torch.Tensor], input_ids: torch.Tensor):
    hidden_states = self.dropout(hidden_states[-1])
    hidden_states = hidden_states.permute(0 ,2, 1)

    out = F.relu(self.conv(hidden_states))
    out = self.global_max_pool(out).squeeze(dim=2)
    out = self.fc(out)
    
    return out

In [13]:
def build_model(model_path, config, head):
  heads = {
      'CNN': ConvClassifier
  }
  assert head in heads ,"head must have been implemented"
  print(f'>>> You are using {head} head, please wait...')
  model = NeZhaForTNEWS(config, model_path, heads[head](config))
  return model

In [14]:
from sklearn.metrics import f1_score
def evaluation(config, model, val_dataloader):
  model.eval()
  preds = []
  labels = []
  val_loss = 0.
  val_iterator = tqdm(val_dataloader, desc='Evaluation', total=len(val_dataloader))

  with torch.no_grad():
    for batch in val_iterator:
      labels.append(batch['labels'])
      batch = {item: value.to(config['device']) for item, value in batch.items()}
      loss, logits = model(**batch)[:2]
      val_loss += loss.item()
      preds.append(logits.argmax(dim = -1).detach().cpu())

  avg_val_loss = val_loss / len(val_dataloader)
  labels = torch.cat(labels, dim = 0).numpy()
  preds = torch.cat(preds, dim = 0).numpy()
  f1 = f1_score(labels, preds, average='macro')
  return avg_val_loss, f1

### 对抗样本：对人类看起来一样，对模型来说预测结果却完全不一样的样本。

## FGM-Fast Gradient Method
对于每个x:
1. 计算x的前向loss, 反向传播得到梯度；
2. 根据embeddign矩阵计算的梯度计算出r, 并加到当前embedding上，相当于x+r
3. 计算x+r的前向loss, 反向传播得到梯度，然后累加到(1)的梯度上；
4. 将embedding恢复为（1）时的embedding；
5. 根据（3）的梯度对参数进行更新。

## PGD-Projected Gradient Descent
FGM是一下子算出了对抗扰动，这样得到的扰动不一定是最优的。因此PGD进行了改进，多迭代了K(t)次，慢慢找到最优的扰动
对于每个x:

1.计算x的前向loss, 反向传播得到梯度；
  对于每步t：
  
  2. 根据embeddign矩阵计算的梯度计算出r, 并加到当前embedding上，相当于x+r；
  
  3. t如果不是最后一步，将梯度归0， 根据（2）的x+r计算前后向并得到梯度
  
  4. t是最后一步，恢复1的梯度，计算最后的x+r并将梯度累加到(1)上

5.将embedding恢复为（1）时的embedding；

6.根据（4）的梯度对参数进行更新。

In [15]:
from extra_loss import *
from extra_optim import *
from extra_fgm import *
from extra_pgd import *
from transformers import AdamW
from tqdm import trange


def train(config, id2label, train_dataloader, val_dataloader):
  nezha_config = NeZhaConfig.from_pretrained(config['model_path'])
  nezha_config.output_hidden_states = True
  nezha_config.num_labels = len(id2label)

  model = build_model(config['model_path'], nezha_config, config['head'])                                    

  

  # 得到模型的参数
  optimizer_grouped_parameters = model.parameters()
  # 定义一个基优化器
  optimizer = AdamW(model.parameters(), lr= config['learning_rate'])
  # Lookahead要有一个基优化器， k=5, alpha=1
  optimizer = Lookahead(optimizer, 5, 1)
  total_steps = config['num_epochs'] * len(train_dataloader)
  # 每调用warmup_steps次， 对应的学习率就会调整一次
  lr_scheduler = WarmupLinearSchedule(optimizer, warmup_steps = int(config['learning_rate'] * total_steps), t_total = total_steps)
  
  model.to(config['device'])                                                                 


### ----- adversarial -------#
  if config['adv'] == 'fgm':
    fgm = FGM(model)
  else:
    pgd = PGD(model)
    K = 3
### ----- adversarial -------#

  epoch_iterator = trange(config['num_epochs'])
  global_steps = 0
  train_loss = 0.
  logging_loss = 0.

  for epoch in epoch_iterator:
    train_iterator = tqdm(train_dataloader, desc='Training', total=len(train_dataloader))
    model.train()
    for batch in train_iterator:
      batch = {item: value.to(config['device']) for item, value in batch.items()}
      # 计算x的前项loss
      loss = model(**batch)[0]
      model.zero_grad()
      # 反向传播得到梯度
      loss.backward()

### ----- adversarial -------#
      if config['adv'] == 'fgm':
        # 计算x+r的前向loss, 反向传播得到梯度，然后累加到(1)的梯度上；
        fgm.attack(epsilon = config['eps'])
        # 计算x+r的前向loss
        loss_adv = model(**batch)[0]
        # 反向传播得到梯度，然后累加到(1)的梯度上；
        loss_adv.backward()
         #将embedding恢复为（1）时的embedding；
        fgm.restore()
      else:
        pgd.backup_grad()
        for t in range(K):
          fgm.attack(epsilon=config(['eps'], alpha=config['alpha'], is_first_attack=(t == 0)))
          if t != K - 1:
            model.zero_grad()
          else:
            pgd.restore_grad()
          loss_adv = model(**batch)[0]
          loss_adv.backward()
        pgd.restore()
### ----- adversarial -------#

      optimizer.step()
      lr_scheduler.step()

      train_loss += loss.item()
      global_steps += 1

      if global_steps % config['logging_step'] == 0:
        print_train_loss = (train_loss - logging_loss) / config['logging_step']
        logging_loss = train_loss
        avg_val_loss, f1 = evaluation(config, model, val_dataloader)

        print_log = f'>>>traing loss:{print_train_loss: .5f}, valid loss:{avg_val_loss: .5f}, valid f1 score:{f1: .5f}'
        print(print_log)
        model.train()

  return model    

In [16]:
model = train(config, id2label, train_dataloader, val_dataloader)

>>> You are using CNN head, please wait...


Some weights of NeZhaModel were not initialized from the model checkpoint at /content/drive/MyDrive/Colab Notebooks/dataset/NeZha_model and are newly initialized: ['bert.encoder.layer.0.attention.self.relative_positions_encoding.positions_encoding', 'bert.encoder.layer.1.attention.self.relative_positions_encoding.positions_encoding', 'bert.encoder.layer.2.attention.self.relative_positions_encoding.positions_encoding', 'bert.encoder.layer.3.attention.self.relative_positions_encoding.positions_encoding', 'bert.encoder.layer.4.attention.self.relative_positions_encoding.positions_encoding', 'bert.encoder.layer.5.attention.self.relative_positions_encoding.positions_encoding', 'bert.encoder.layer.6.attention.self.relative_positions_encoding.positions_encoding', 'bert.encoder.layer.7.attention.self.relative_positions_encoding.positions_encoding', 'bert.encoder.layer.8.attention.self.relative_positions_encoding.positions_encoding', 'bert.encoder.layer.9.attention.self.relative_positions_encodi

>>>traing loss: 0.92146, valid loss: 0.71168, valid f1 score: 0.47703



Training:  17%|█▋        | 501/3002 [02:17<2:18:10,  3.31s/it][A
Training:  17%|█▋        | 502/3002 [02:17<1:39:27,  2.39s/it][A
Training:  17%|█▋        | 503/3002 [02:18<1:12:46,  1.75s/it][A
Training:  17%|█▋        | 504/3002 [02:18<54:11,  1.30s/it]  [A
Training:  17%|█▋        | 505/3002 [02:18<41:14,  1.01it/s][A
Training:  17%|█▋        | 506/3002 [02:18<32:12,  1.29it/s][A
Training:  17%|█▋        | 507/3002 [02:19<26:09,  1.59it/s][A
Training:  17%|█▋        | 508/3002 [02:19<21:24,  1.94it/s][A
Training:  17%|█▋        | 509/3002 [02:19<18:10,  2.29it/s][A
Training:  17%|█▋        | 510/3002 [02:19<15:54,  2.61it/s][A
Training:  17%|█▋        | 511/3002 [02:20<14:27,  2.87it/s][A
Training:  17%|█▋        | 512/3002 [02:20<13:22,  3.10it/s][A
Training:  17%|█▋        | 513/3002 [02:20<12:21,  3.36it/s][A
Training:  17%|█▋        | 514/3002 [02:20<11:37,  3.56it/s][A
Training:  17%|█▋        | 515/3002 [02:21<11:09,  3.71it/s][A
Training:  17%|█▋        | 516/

>>>traing loss: 0.70314, valid loss: 0.65117, valid f1 score: 0.49247



Training:  33%|███▎      | 1001/3002 [04:41<1:48:19,  3.25s/it][A
Training:  33%|███▎      | 1002/3002 [04:41<1:18:06,  2.34s/it][A
Training:  33%|███▎      | 1003/3002 [04:41<56:59,  1.71s/it]  [A
Training:  33%|███▎      | 1004/3002 [04:41<42:44,  1.28s/it][A
Training:  33%|███▎      | 1005/3002 [04:42<33:03,  1.01it/s][A
Training:  34%|███▎      | 1006/3002 [04:42<25:37,  1.30it/s][A
Training:  34%|███▎      | 1007/3002 [04:42<20:24,  1.63it/s][A
Training:  34%|███▎      | 1008/3002 [04:42<16:50,  1.97it/s][A
Training:  34%|███▎      | 1009/3002 [04:43<14:18,  2.32it/s][A
Training:  34%|███▎      | 1010/3002 [04:43<12:29,  2.66it/s][A
Training:  34%|███▎      | 1011/3002 [04:43<11:25,  2.90it/s][A
Training:  34%|███▎      | 1012/3002 [04:43<10:24,  3.19it/s][A
Training:  34%|███▎      | 1013/3002 [04:44<10:26,  3.17it/s][A
Training:  34%|███▍      | 1014/3002 [04:44<10:10,  3.25it/s][A
Training:  34%|███▍      | 1015/3002 [04:44<09:39,  3.43it/s][A
Training:  34%|███

>>>traing loss: 0.63732, valid loss: 0.61072, valid f1 score: 0.54065



Training:  50%|█████     | 1501/3002 [07:05<1:22:07,  3.28s/it][A
Training:  50%|█████     | 1502/3002 [07:05<59:12,  2.37s/it]  [A
Training:  50%|█████     | 1503/3002 [07:05<43:18,  1.73s/it][A
Training:  50%|█████     | 1504/3002 [07:06<32:19,  1.29s/it][A
Training:  50%|█████     | 1505/3002 [07:06<24:29,  1.02it/s][A
Training:  50%|█████     | 1506/3002 [07:06<19:26,  1.28it/s][A
Training:  50%|█████     | 1507/3002 [07:06<15:23,  1.62it/s][A
Training:  50%|█████     | 1508/3002 [07:07<12:47,  1.95it/s][A
Training:  50%|█████     | 1509/3002 [07:07<10:57,  2.27it/s][A
Training:  50%|█████     | 1510/3002 [07:07<09:26,  2.63it/s][A
Training:  50%|█████     | 1511/3002 [07:08<08:37,  2.88it/s][A
Training:  50%|█████     | 1512/3002 [07:08<08:13,  3.02it/s][A
Training:  50%|█████     | 1513/3002 [07:08<07:44,  3.20it/s][A
Training:  50%|█████     | 1514/3002 [07:08<07:11,  3.45it/s][A
Training:  50%|█████     | 1515/3002 [07:09<06:56,  3.57it/s][A
Training:  50%|█████

>>>traing loss: 0.60327, valid loss: 0.59581, valid f1 score: 0.52737



Training:  67%|██████▋   | 2001/3002 [09:29<54:26,  3.26s/it]  [A
Training:  67%|██████▋   | 2002/3002 [09:30<39:12,  2.35s/it][A
Training:  67%|██████▋   | 2003/3002 [09:30<28:30,  1.71s/it][A
Training:  67%|██████▋   | 2004/3002 [09:30<21:22,  1.29s/it][A
Training:  67%|██████▋   | 2005/3002 [09:30<16:07,  1.03it/s][A
Training:  67%|██████▋   | 2006/3002 [09:31<12:36,  1.32it/s][A
Training:  67%|██████▋   | 2007/3002 [09:31<10:02,  1.65it/s][A
Training:  67%|██████▋   | 2008/3002 [09:31<08:13,  2.01it/s][A
Training:  67%|██████▋   | 2009/3002 [09:31<07:05,  2.33it/s][A
Training:  67%|██████▋   | 2010/3002 [09:32<06:15,  2.64it/s][A
Training:  67%|██████▋   | 2011/3002 [09:32<05:38,  2.93it/s][A
Training:  67%|██████▋   | 2012/3002 [09:32<05:14,  3.15it/s][A
Training:  67%|██████▋   | 2013/3002 [09:32<04:55,  3.35it/s][A
Training:  67%|██████▋   | 2014/3002 [09:33<04:41,  3.51it/s][A
Training:  67%|██████▋   | 2015/3002 [09:33<04:44,  3.47it/s][A
Training:  67%|██████▋

>>>traing loss: 0.59270, valid loss: 0.59022, valid f1 score: 0.53475



Training:  83%|████████▎ | 2501/3002 [11:54<27:18,  3.27s/it][A
Training:  83%|████████▎ | 2502/3002 [11:55<19:48,  2.38s/it][A
Training:  83%|████████▎ | 2503/3002 [11:55<14:25,  1.73s/it][A
Training:  83%|████████▎ | 2504/3002 [11:55<10:47,  1.30s/it][A
Training:  83%|████████▎ | 2505/3002 [11:55<08:08,  1.02it/s][A
Training:  83%|████████▎ | 2506/3002 [11:56<06:19,  1.31it/s][A
Training:  84%|████████▎ | 2507/3002 [11:56<04:59,  1.65it/s][A
Training:  84%|████████▎ | 2508/3002 [11:56<04:10,  1.97it/s][A
Training:  84%|████████▎ | 2509/3002 [11:57<03:30,  2.35it/s][A
Training:  84%|████████▎ | 2510/3002 [11:57<03:10,  2.59it/s][A
Training:  84%|████████▎ | 2511/3002 [11:57<02:50,  2.88it/s][A
Training:  84%|████████▎ | 2512/3002 [11:57<02:35,  3.14it/s][A
Training:  84%|████████▎ | 2513/3002 [11:58<02:23,  3.41it/s][A
Training:  84%|████████▎ | 2514/3002 [11:58<02:16,  3.57it/s][A
Training:  84%|████████▍ | 2515/3002 [11:58<02:14,  3.62it/s][A
Training:  84%|████████▍

>>>traing loss: 0.56590, valid loss: 0.57600, valid f1 score: 0.54563



Training: 100%|█████████▉| 3001/3002 [14:18<00:03,  3.25s/it][A
Training: 100%|██████████| 3002/3002 [14:18<00:00,  3.50it/s]
100%|██████████| 1/1 [14:18<00:00, 858.66s/it]


In [17]:
def prediction(config, id2label, model, test_dataloader):
  test_iterator = tqdm(test_dataloader, desc='Prediction', total = len(test_dataloader))
  model.eval()
  test_preds = []

  with torch.no_grad():
    for batch in test_iterator:
      batch = {item: value.to(config['device']) for item, value in batch.items()}
      logits = model(**batch)[1]
      test_preds.append(logits.argmax(dim=-1).detach().cpu())
  
  test_preds = torch.cat(test_preds, dim=0).numpy()
  test_preds = [id2label[id_] for id_ in test_preds]

  test_df = pd.read_csv(config['test_file_path'], sep=',')
  test_df.insert(1, column='label', value=test_preds)
  test_df.drop(['sentence'], 1,inplace=True)#1表示按列删除
  test_df.to_csv('submission_Nezha_adversarial_training.csv', index=False, encoding= 'utf8')

In [18]:
prediction(config, id2label, model, test_dataloader)

Prediction: 100%|██████████| 625/625 [00:25<00:00, 24.84it/s]
