# Document-level Relation Extraction Tutorial

> Tutorial author: 黎洲波（zhoubo.li@zju.edu.cn）

In this tutorial, we use [DocuNet](http://arxiv.org/abs/2106.03618) to extract relational triples in different sentences. We hope this tutorial can help you understand the process of document-level relation extraction.

This tutorial uses `Python3`.

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
!nvidia-smi

Sat Apr 22 08:44:50 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.85.12    Driver Version: 525.85.12    CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  NVIDIA A100-SXM...  Off  | 00000000:00:04.0 Off |                    0 |
| N/A   32C    P0    46W / 400W |      0MiB / 40960MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

## RE
**Relation extraction** (RE), a key task in information extraction, predicts semantic relations between pairs of entities from unstructured
texts.

## Document-level RE
Document-level RE extracts relations from multi-sentence in one document. An example is shown in the following picture, in which named entities are annotated with colors. Different from sentence-level RE, document-level RE can extract both intra-sentence and inter-sentence relational triples.
![文档级关系抽取](https://github.com/zjunlp/DeepKE/blob/main/tutorial-notebooks/re/document/img/img1.png?raw=true)

## Dataset

There are some document-level RE datasets including DocRED, CDR and GDA. The tutorial uses [DocRED](https://github.com/thunlp/DocRED/tree/master/). The structure of the dataset folder `./data/` is as follow:

```
.
├── dev.json                        # Validation Set
├── rel_info.json                   # Relation Label
├── rel2id.json                     # Relation Label - ID Map
├── test.json                       # Test Set
└── train_annotated.json            # Training Set
```

The data formats of DocRED are described as follow:

```
Data Format:
{
  'title',
  'sents':     [
                  [word in sent 0],
                  [word in sent 1]
               ]
  'vertexSet': [
                  [
                    { 'name': mention_name,
                      'sent_id': mention in which sentence,
                      'pos': postion of mention in a sentence,
                      'type': NER_type}
                    {anthor mention}
                  ],
                  [anthoer entity]
                ]
  'labels':   [
                {
                  'h': idx of head entity in vertexSet,
                  't': idx of tail entity in vertexSet,
                  'r': relation,
                  'evidence': evidence sentences' id
                }
              ]
}
```

## DocuNet
- [DocuNet](http://arxiv.org/abs/2106.03618) used in DeepKE is a semantic segmentation method using Document U-shaped Network based on computer vision (CV) and obtains excellent performance on DocRED dataset.
- The framework of DocuNet is as follow:

![文档级关系抽取架构图](https://github.com/zjunlp/DeepKE/blob/main/tutorial-notebooks/re/document/img/img2.png?raw=true)

## Prepare the runtime environment

In [None]:
# !pip install deepke
# !wget 120.27.214.45/Data/re/document/data.tar.gz
# !tar -xzvf data.tar.gz

In [None]:
# ! cd /content/drive/MyDrive/Models; git clone https://github.com/zjunlp/DeepKE

Cloning into 'DeepKE'...
remote: Enumerating objects: 8949, done.[K
remote: Counting objects: 100% (743/743), done.[K
remote: Compressing objects: 100% (434/434), done.[K
remote: Total 8949 (delta 396), reused 594 (delta 294), pack-reused 8206[K
Receiving objects: 100% (8949/8949), 107.52 MiB | 16.87 MiB/s, done.
Resolving deltas: 100% (4510/4510), done.
Updating files: 100% (570/570), done.


In [None]:
# !pip install -r /content/drive/MyDrive/Models/DeepKE/requirements.txt

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting torch<=1.11,>=1.5
  Downloading torch-1.11.0-cp39-cp39-manylinux1_x86_64.whl (750.6 MB)
[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m [32m750.6/750.6 MB[0m [31m200.0 MB/s[0m eta [36m0:00:01[0m
[?25h[31mERROR: Operation cancelled by user[0m[31m
[0m

In [None]:
!pip install -U torch ujson transformers

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting ujson
  Downloading ujson-5.7.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (52 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m52.8/52.8 kB[0m [31m4.0 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting transformers
  Downloading transformers-4.28.1-py3-none-any.whl (7.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.0/7.0 MB[0m [31m73.5 MB/s[0m eta [36m0:00:00[0m
Collecting huggingface-hub<1.0,>=0.11.0
  Downloading huggingface_hub-0.13.4-py3-none-any.whl (200 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m200.1/200.1 kB[0m [31m26.0 MB/s[0m eta [36m0:00:00[0m
Collecting tokenizers!=0.11.3,<0.14,>=0.11.1
  Downloading tokenizers-0.13.3-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.8/7.8 MB[0m [31m104.5 MB/s[0

## Import modules

In [None]:
import os
import time
import numpy as np
import torch
import random
import pickle
from tqdm import tqdm
import ujson as json
from opt_einsum import contract

import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import DataLoader
from transformers import AutoConfig, AutoModel, AutoTokenizer
from transformers.optimization import AdamW, get_linear_schedule_with_warmup

from tqdm import tqdm

In [None]:
# os.chdir('../')

In [None]:
!pwd

/content


## Preprocess the dataset

In [None]:
rel2id = json.load(open('/content/drive/MyDrive/data/nerel_docred_ver/rel2id.json', 'r'))
#rel2id = json.load(open('/Users/yandutov-alex/study/science_work/DocRED_original/code/prepro_data/rel2id.json'))

id2rel = {value: key for key, value in rel2id.items()}


def chunks(l, n):
    res = []
    for i in range(0, len(l), n):
        assert len(l[i:i + n]) == n
        res += [l[i:i + n]]
    return res

class ReadDataset:
    def __init__(self, dataset: str, tokenizer, max_seq_Length: int = 1024,
             transformers: str = 'bert') -> None:
        self.transformers = transformers
        self.dataset = dataset
        self.tokenizer = tokenizer
        self.max_seq_Length = max_seq_Length

    def read(self, file_in: str):
        save_file = file_in.split('.json')[0] + '_' + self.transformers + '_' \
                        + self.dataset + '.pkl'
        if self.dataset == 'docred':
            return read_docred(self.transformers, file_in, save_file, self.tokenizer, self.max_seq_Length, ents='nerel')
        # elif self.dataset == 'nerel':
        #     return read_docred(self.transformers, file_in, save_file, self.tokenizer, self.max_seq_Length, ents='nerel')
        else:
            raise RuntimeError("No read func for this dataset.")

def read_docred(transfermers, file_in, save_file, tokenizer, max_seq_length=1024, ents='docred'):
    if os.path.exists(save_file):
        with open(file=save_file, mode='rb') as fr:
            features = pickle.load(fr)
            fr.close()
        print('load preprocessed data from {}.'.format(save_file))
        return features
    else:
        max_len = 0
        up512_num = 0
        i_line = 0
        pos_samples = 0
        neg_samples = 0
        features = []
        if file_in == "":
            return None
        with open(file_in, "r") as fh:
            data = json.load(fh)

        if transfermers == 'bert':
            # entity_type = ["ORG", "-",  "LOC", "-",  "TIME", "-",  "PER", "-", "MISC", "-", "NUM"]
            if ents == 'nerel':
                entity_type = ['-', 'IDEOLOGY', '-', 'PENALTY', '-', 'CITY', '-', 'CRIME', '-', 'DISEASE', '-', 'TIME', '-', 'PERSON', '-',
                 'RELIGION', '-', 'EVENT', '-', 'AWARD', '-', 'LANGUAGE', '-', 'ORDINAL', '-', 'ORGANIZATION', '-', 'DATE', '-',
                 'NATIONALITY', '-', 'PROFESSION', '-', 'AGE', '-', 'MONEY', '-', 'COUNTRY', '-', 'LAW', '-', 'WORK_OF_ART', '-',
                 'PERCENT', '-', 'PRODUCT', '-', 'FACILITY', '-', 'FAMILY', '-', 'NUMBER', '-', 'LOCATION', '-', 'DISTRICT', '-',
                 'STATE_OR_PROVINCE']
            else:
                entity_type = ["-", "ORG", "-",  "LOC", "-",  "TIME", "-",  "PER", "-", "MISC", "-", "NUM"]

        for sample in tqdm(data, desc="Example"):
            sents = []
            sent_map = []

            entities = sample['vertexSet']
            entity_start, entity_end = [], []
            mention_types = []
            for entity in entities:
                for mention in entity:
                    sent_id = mention["sent_id"]
                    pos = mention["pos"]
                    entity_start.append((sent_id, pos[0]))
                    entity_end.append((sent_id, pos[1] - 1))
                    mention_types.append(mention['type'])

            for i_s, sent in enumerate(sample['sents']):
                new_map = {}
                for i_t, token in enumerate(sent):
                    tokens_wordpiece = tokenizer.tokenize(token)
                    if (i_s, i_t) in entity_start:
                        t = entity_start.index((i_s, i_t))
                        if transfermers == 'bert':
                            mention_type = mention_types[t]
                            special_token_i = entity_type.index(mention_type)
                            special_token = ['[unused' + str(special_token_i) + ']']
                        else:
                            special_token = ['*']
                        tokens_wordpiece = special_token + tokens_wordpiece
                        # tokens_wordpiece = ["[unused0]"]+ tokens_wordpiece

                    if (i_s, i_t) in entity_end:
                        t = entity_end.index((i_s, i_t))
                        if transfermers == 'bert':
                            mention_type = mention_types[t]
                            special_token_i = entity_type.index(mention_type) + 50
                            special_token = ['[unused' + str(special_token_i) + ']']
                        else:
                            special_token = ['*']
                        tokens_wordpiece = tokens_wordpiece + special_token

                        # tokens_wordpiece = tokens_wordpiece + ["[unused1]"]
                        # print(tokens_wordpiece,tokenizer.convert_tokens_to_ids(tokens_wordpiece))

                    new_map[i_t] = len(sents)
                    sents.extend(tokens_wordpiece)
                new_map[i_t + 1] = len(sents)
                sent_map.append(new_map)

            if len(sents)>max_len:
                max_len=len(sents)
            if len(sents)>512:
                up512_num += 1

            train_triple = {}
            if "labels" in sample:
                for label in sample['labels']:
                    evidence = label['evidence']
                    r = int(rel2id[label['r']])

                    # h, t = sorted([label['h'], label['t']])
                    h, t = [label['h'], label['t']]

                    if (h, t) not in train_triple:
                        train_triple[(h, t)] = [
                            {'relation': r, 'evidence': evidence}]
                    else:
                        train_triple[(h, t)].append(
                            {'relation': r, 'evidence': evidence})

            entity_pos = []
            for e in entities:
                entity_pos.append([])
                mention_num = len(e)
                for m in e:
                    start = sent_map[m["sent_id"]][m["pos"][0]]
                    end = sent_map[m["sent_id"]][m["pos"][1]]
                    entity_pos[-1].append((start, end,))


            relations, hts = [], []
            # Get positive samples from dataset
            for h, t in train_triple.keys():
                relation = [0] * len(rel2id)
                for mention in train_triple[h, t]:
                    relation[mention["relation"]] = 1
                    evidence = mention["evidence"]
                relations.append(relation)
                hts.append([h, t])
                pos_samples += 1
            # print(len(relations))
            # print(entities)
            # print(sorted(list(train_triple.keys())))

            # Get negative samples from dataset
            for h in range(len(entities)):
                for t in range(len(entities)):
                    if h != t and [h, t] not in hts:
                        relation = [1] + [0] * (len(rel2id) - 1)
                        relations.append(relation)
                        hts.append([h, t])
                        neg_samples += 1

            assert len(relations) <= len(entities) * (len(entities) - 1) + 1, f'relations:{len(relations)} entities:{len(entities)}'
            assert len(relations) >= len(entities) * (len(entities) - 1), f'relations:{len(relations)} entities:{len(entities)}'

            if len(hts)==0:
                print(len(sent))
            sents = sents[:max_seq_length - 2]
            input_ids = tokenizer.convert_tokens_to_ids(sents)
            input_ids = tokenizer.build_inputs_with_special_tokens(input_ids)

            i_line += 1
            feature = {'input_ids': input_ids,
                       'entity_pos': entity_pos,
                       'labels': relations,
                       'hts': hts,
                       'title': sample['title'],
                       }
            features.append(feature)



        print("# of documents {}.".format(i_line))
        print("# of positive examples {}.".format(pos_samples))
        print("# of negative examples {}.".format(neg_samples))
        print("# {} examples len>512 and max len is {}.".format(up512_num, max_len))


        with open(file=save_file, mode='wb') as fw:
            pickle.dump(features, fw)
        print('finish reading {} and save preprocessed data to {}.'.format(file_in, save_file))


In [None]:
# ner2id_path = './data/for_docred/ner2id.json'
# with open(ner2id_path, 'r') as f:
#     ner2id = json.load(f)

# print([
#     x
#     for ent in ner2id.keys()
#     for x in ['-', ent]
# ])

In [None]:
# train_features = Dataset.read(train_file)

## Prepare the Model

In [None]:
class AttentionUNet(torch.nn.Module):
    """
    UNet, down sampling & up sampling for global reasoning
    """

    def __init__(self, input_channels, class_number, **kwargs):
        super(AttentionUNet, self).__init__()

        down_channel = kwargs['down_channel'] # default = 256

        down_channel_2 = down_channel * 2
        up_channel_1 = down_channel_2 * 2
        up_channel_2 = down_channel * 2

        self.inc = InConv(input_channels, down_channel)
        self.down1 = DownLayer(down_channel, down_channel_2)
        self.down2 = DownLayer(down_channel_2, down_channel_2)

        self.up1 = UpLayer(up_channel_1, up_channel_1 // 4)
        self.up2 = UpLayer(up_channel_2, up_channel_2 // 4)
        self.outc = OutConv(up_channel_2 // 4, class_number)

    def forward(self, attention_channels):
        """
        Given multi-channel attention map, return the logits of every one mapping into 3-class
        :param attention_channels:
        :return:
        """
        # attention_channels as the shape of: batch_size x channel x width x height
        x = attention_channels
        x1 = self.inc(x)
        x2 = self.down1(x1)
        x3 = self.down2(x2)
        x = self.up1(x3, x2)
        x = self.up2(x, x1)
        output = self.outc(x)
        # attn_map as the shape of: batch_size x width x height x class
        output = output.permute(0, 2, 3, 1).contiguous()
        return output


class DoubleConv(nn.Module):
    """(conv => [BN] => ReLU) * 2"""

    def __init__(self, in_ch, out_ch):
        super(DoubleConv, self).__init__()
        self.double_conv = nn.Sequential(nn.Conv2d(in_ch, out_ch, kernel_size=3, padding=1),
                                         nn.BatchNorm2d(out_ch),
                                         nn.ReLU(inplace=True),
                                         nn.Conv2d(out_ch, out_ch, kernel_size=3, padding=1),
                                         nn.BatchNorm2d(out_ch),
                                         nn.ReLU(inplace=True))

    def forward(self, x):
        x = self.double_conv(x)
        return x


class InConv(nn.Module):

    def __init__(self, in_ch, out_ch):
        super(InConv, self).__init__()
        self.conv = DoubleConv(in_ch, out_ch)

    def forward(self, x):
        x = self.conv(x)
        return x


class DownLayer(nn.Module):

    def __init__(self, in_ch, out_ch):
        super(DownLayer, self).__init__()
        self.maxpool_conv = nn.Sequential(
            nn.MaxPool2d(kernel_size=2),
            DoubleConv(in_ch, out_ch)
        )

    def forward(self, x):
        x = self.maxpool_conv(x)
        return x


class UpLayer(nn.Module):

    def __init__(self, in_ch, out_ch, bilinear=True):
        super(UpLayer, self).__init__()
        if bilinear:
            self.up = nn.Upsample(scale_factor=2, mode='bilinear',
                                  align_corners=True)
        else:
            self.up = nn.ConvTranspose2d(in_ch // 2, in_ch // 2, 2, stride=2)
        self.conv = DoubleConv(in_ch, out_ch)

    def forward(self, x1, x2):
        x1 = self.up(x1)
        diffY = x2.size()[2] - x1.size()[2]
        diffX = x2.size()[3] - x1.size()[3]
        x1 = F.pad(x1, (diffX // 2, diffX - diffX // 2, diffY // 2, diffY -
                        diffY // 2))
        x = torch.cat([x2, x1], dim=1)
        x = self.conv(x)
        return x


class OutConv(nn.Module):

    def __init__(self, in_ch, out_ch):
        super(OutConv, self).__init__()
        self.conv = nn.Conv2d(in_ch, out_ch, 1)

    def forward(self, x):
        x = self.conv(x)
        return x

class DocREModel(nn.Module):
    def __init__(self, config, args, model, emb_size=768, block_size=64, num_labels=-1):
        super().__init__()
        self.config = config
        self.bert_model = model
        self.hidden_size = config.hidden_size
        self.loss_fnt = ATLoss()

        self.head_extractor = nn.Linear(1 * config.hidden_size + args.unet_out_dim, emb_size)
        self.tail_extractor = nn.Linear(1 * config.hidden_size + args.unet_out_dim, emb_size)
        # self.head_extractor = nn.Linear(1 * config.hidden_size , emb_size)
        # self.tail_extractor = nn.Linear(1 * config.hidden_size , emb_size)
        self.bilinear = nn.Linear(emb_size * block_size, config.num_labels)

        self.emb_size = emb_size
        self.block_size = block_size
        self.num_labels = num_labels

        self.bertdrop = nn.Dropout(0.6)
        self.unet_in_dim = args.unet_in_dim
        self.unet_out_dim = args.unet_in_dim
        self.liner = nn.Linear(config.hidden_size, args.unet_in_dim)
        self.min_height = args.max_height
        self.channel_type = args.channel_type
        self.segmentation_net = AttentionUNet(input_channels=args.unet_in_dim,
                                              class_number=args.unet_out_dim,
                                              down_channel=args.down_dim)


    def encode(self, input_ids, attention_mask,entity_pos):
        config = self.config
        if config.transformer_type == "bert":
            start_tokens = [config.cls_token_id]
            end_tokens = [config.sep_token_id]
        elif config.transformer_type == "roberta":
            start_tokens = [config.cls_token_id]
            end_tokens = [config.sep_token_id, config.sep_token_id]
        sequence_output, attention = process_long_input(self.bert_model, input_ids, attention_mask, start_tokens, end_tokens)
        return sequence_output, attention

    def get_hrt(self, sequence_output, attention, entity_pos, hts):
        offset = 1 if self.config.transformer_type in ["bert", "roberta"] else 0
        bs, h, _, c = attention.size()
        # ne = max([len(x) for x in entity_pos])  # 本次bs中的最大实体数

        hss, tss, rss = [], [], []
        entity_es = []
        entity_as = []
        for i in range(len(entity_pos)):
            entity_embs, entity_atts = [], []
            for entity_num, e in enumerate(entity_pos[i]):
                if len(e) > 1:
                    e_emb, e_att = [], []
                    for start, end in e:
                        if start + offset < c:
                            # In case the entity mention is truncated due to limited max seq length.
                            e_emb.append(sequence_output[i, start + offset])
                            e_att.append(attention[i, :, start + offset])
                    if len(e_emb) > 0:
                        e_emb = torch.logsumexp(torch.stack(e_emb, dim=0), dim=0)
                        e_att = torch.stack(e_att, dim=0).mean(0)
                    else:
                        e_emb = torch.zeros(self.config.hidden_size).to(sequence_output)
                        e_att = torch.zeros(h, c).to(attention)
                else:
                    start, end = e[0]
                    if start + offset < c:
                        e_emb = sequence_output[i, start + offset]
                        e_att = attention[i, :, start + offset]
                    else:
                        e_emb = torch.zeros(self.config.hidden_size).to(sequence_output)
                        e_att = torch.zeros(h, c).to(attention)
                entity_embs.append(e_emb)
                entity_atts.append(e_att)
            for _ in range(self.min_height-entity_num-1):
                entity_atts.append(e_att)

            entity_embs = torch.stack(entity_embs, dim=0)  # [n_e, d]
            entity_atts = torch.stack(entity_atts, dim=0)  # [n_e, h, seq_len]


            entity_es.append(entity_embs)
            entity_as.append(entity_atts)
            ht_i = torch.LongTensor(hts[i]).to(sequence_output.device)
            hs = torch.index_select(entity_embs, 0, ht_i[:, 0])
            ts = torch.index_select(entity_embs, 0, ht_i[:, 1])

            hss.append(hs)
            tss.append(ts)
        hss = torch.cat(hss, dim=0)
        tss = torch.cat(tss, dim=0)
        return hss, tss, entity_es, entity_as

    def get_mask(self, ents, bs, ne, run_device):
        ent_mask = torch.zeros(bs, ne, device=run_device)
        rel_mask = torch.zeros(bs, ne, ne, device=run_device)
        for _b in range(bs):
            ent_mask[_b, :len(ents[_b])] = 1
            rel_mask[_b, :len(ents[_b]), :len(ents[_b])] = 1
        return ent_mask, rel_mask


    def get_ht(self, rel_enco, hts):
        htss = []
        for i in range(len(hts)):
            ht_index = hts[i]
            for (h_index, t_index) in ht_index:
                htss.append(rel_enco[i,h_index,t_index])
        htss = torch.stack(htss,dim=0)
        return htss

    def get_channel_map(self, sequence_output, entity_as):
        # sequence_output = sequence_output.to('cpu')
        # attention = attention.to('cpu')
        bs,_,d = sequence_output.size()
        # ne = max([len(x) for x in entity_as])  # 本次bs中的最大实体数
        ne = self.min_height

        index_pair = []
        for i in range(ne):
            tmp = torch.cat((torch.ones((ne, 1), dtype=int) * i, torch.arange(0, ne).unsqueeze(1)), dim=-1)
            index_pair.append(tmp)
        index_pair = torch.stack(index_pair, dim=0).reshape(-1, 2).to(sequence_output.device)
        map_rss = []
        for b in range(bs):
            entity_atts = entity_as[b]
            h_att = torch.index_select(entity_atts, 0, index_pair[:, 0])
            t_att = torch.index_select(entity_atts, 0, index_pair[:, 1])
            ht_att = (h_att * t_att).mean(1)
            ht_att = ht_att / (ht_att.sum(1, keepdim=True) + 1e-5)
            rs = contract("ld,rl->rd", sequence_output[b], ht_att)
            map_rss.append(rs)
        map_rss = torch.cat(map_rss, dim=0).reshape(bs, ne, ne, d)
        return map_rss

    def forward(self,
                input_ids=None,
                attention_mask=None,
                labels=None,
                entity_pos=None,
                hts=None,
                instance_mask=None,
                ):

        sequence_output, attention = self.encode(input_ids, attention_mask,entity_pos)

        bs, sequen_len, d = sequence_output.shape
        # run_device = sequence_output.device.index
        run_device = sequence_output.device
        ne = max([len(x) for x in entity_pos])  # 本次bs中的最大实体数
        ent_mask, rel_mask = self.get_mask(entity_pos, bs, ne, run_device)

        # get hs, ts and entity_embs >> entity_rs
        hs, ts, entity_embs, entity_as = self.get_hrt(sequence_output, attention, entity_pos, hts)


        if self.channel_type == 'context-based':
            feature_map = self.get_channel_map(sequence_output, entity_as)
            ##print('feature_map:', feature_map.shape)
            attn_input = self.liner(feature_map).permute(0, 3, 1, 2).contiguous()

        else:
            raise Exception("channel_type must be specify correctly")


        attn_map = self.segmentation_net(attn_input)
        h_t = self.get_ht (attn_map, hts)

        hs = torch.tanh(self.head_extractor(torch.cat([hs, h_t], dim=1)))
        ts = torch.tanh(self.tail_extractor(torch.cat([ts, h_t], dim=1)))


        b1 = hs.view(-1, self.emb_size // self.block_size, self.block_size)
        b2 = ts.view(-1, self.emb_size // self.block_size, self.block_size)
        bl = (b1.unsqueeze(3) * b2.unsqueeze(2)).view(-1, self.emb_size * self.block_size)
        logits = self.bilinear(bl)


        output = (self.loss_fnt.get_label(logits, num_labels=self.num_labels))
        if labels is not None:
            labels = [torch.tensor(label) for label in labels]
            labels = torch.cat(labels, dim=0).to(logits)
            loss = self.loss_fnt(logits.float(), labels.float())
            output = (loss.to(sequence_output), output)
        return output

## Loss function

In [None]:
def multilabel_categorical_crossentropy(y_true, y_pred):
    y_pred = (1 - 2 * y_true) * y_pred
    y_pred_neg = y_pred - y_true * 1e30
    y_pred_pos = y_pred - (1 - y_true) * 1e30
    zeros = torch.zeros_like(y_pred[..., :1])
    y_pred_neg = torch.cat([y_pred_neg, zeros],dim=-1)
    y_pred_pos = torch.cat((y_pred_pos, zeros),dim=-1)
    neg_loss = torch.logsumexp(y_pred_neg, axis=-1)
    pos_loss = torch.logsumexp(y_pred_pos, axis=-1)
    return neg_loss + pos_loss


class ATLoss(nn.Module):
    def __init__(self):
        super().__init__()

    def forward(self, logits, labels):

        loss = multilabel_categorical_crossentropy(labels,logits)
        loss = loss.mean()
        return loss

    def get_label(self, logits, num_labels=-1):
        th_logit = torch.zeros_like(logits[..., :1])
        output = torch.zeros_like(logits).to(logits)
        mask = (logits > th_logit)
        if num_labels > 0:
            top_v, _ = torch.topk(logits, num_labels, dim=1)
            top_v = top_v[:, -1]
            mask = (logits >= top_v.unsqueeze(1)) & mask
        output[mask] = 1.0
        output[:, 0] = (output[:,1:].sum(1) == 0.).to(logits)

        return output

## Preprocess the inputs

In [None]:
def process_long_input(model, input_ids, attention_mask, start_tokens, end_tokens):
    # Split the input to 2 overlapping chunks. Now BERT can encode inputs of which the length are up to 1024.
    n, c = input_ids.size()
    start_tokens = torch.tensor(start_tokens).to(input_ids)
    end_tokens = torch.tensor(end_tokens).to(input_ids)
    len_start = start_tokens.size(0)
    len_end = end_tokens.size(0)
    if c <= 512:
        output = model(
            input_ids=input_ids,
            attention_mask=attention_mask,
            output_attentions=True,
        )
        sequence_output = output[0]
        attention = output[-1][-1]
    else:
        new_input_ids, new_attention_mask, num_seg = [], [], []
        seq_len = attention_mask.sum(1).cpu().numpy().astype(np.int32).tolist()
        for i, l_i in enumerate(seq_len):
            if l_i <= 512:
                new_input_ids.append(input_ids[i, :512])
                new_attention_mask.append(attention_mask[i, :512])
                num_seg.append(1)
            else:
                input_ids1 = torch.cat([input_ids[i, :512 - len_end], end_tokens], dim=-1)
                input_ids2 = torch.cat([start_tokens, input_ids[i, (l_i - 512 + len_start): l_i]], dim=-1)
                attention_mask1 = attention_mask[i, :512]
                attention_mask2 = attention_mask[i, (l_i - 512): l_i]
                new_input_ids.extend([input_ids1, input_ids2])
                new_attention_mask.extend([attention_mask1, attention_mask2])
                num_seg.append(2)
        input_ids = torch.stack(new_input_ids, dim=0)
        attention_mask = torch.stack(new_attention_mask, dim=0)
        output = model(
            input_ids=input_ids,
            attention_mask=attention_mask,
            output_attentions=True,
        )
        sequence_output = output[0]
        attention = output[-1][-1]
        i = 0
        new_output, new_attention = [], []
        for (n_s, l_i) in zip(num_seg, seq_len):
            if n_s == 1:
                output = F.pad(sequence_output[i], (0, 0, 0, c - 512))
                att = F.pad(attention[i], (0, c - 512, 0, c - 512))
                new_output.append(output)
                new_attention.append(att)
            elif n_s == 2:
                output1 = sequence_output[i][:512 - len_end]
                mask1 = attention_mask[i][:512 - len_end]
                att1 = attention[i][:, :512 - len_end, :512 - len_end]
                output1 = F.pad(output1, (0, 0, 0, c - 512 + len_end))
                mask1 = F.pad(mask1, (0, c - 512 + len_end))
                att1 = F.pad(att1, (0, c - 512 + len_end, 0, c - 512 + len_end))

                output2 = sequence_output[i + 1][len_start:]
                mask2 = attention_mask[i + 1][len_start:]
                att2 = attention[i + 1][:, len_start:, len_start:]
                output2 = F.pad(output2, (0, 0, l_i - 512 + len_start, c - l_i))
                mask2 = F.pad(mask2, (l_i - 512 + len_start, c - l_i))
                att2 = F.pad(att2, [l_i - 512 + len_start, c - l_i, l_i - 512 + len_start, c - l_i])
                mask = mask1 + mask2 + 1e-10
                output = (output1 + output2) / mask.unsqueeze(-1)
                att = (att1 + att2)
                att = att / (att.sum(-1, keepdim=True) + 1e-10)
                new_output.append(output)
                new_attention.append(att)
            i += n_s
        sequence_output = torch.stack(new_output, dim=0)
        attention = torch.stack(new_attention, dim=0)
    return sequence_output, attention

## Auxiliary functions

In [None]:
def set_seed(cfg):
    random.seed(cfg.seed)
    np.random.seed(cfg.seed)
    torch.manual_seed(cfg.seed)

def collate_fn(batch):
    max_len = max([len(f["input_ids"]) for f in batch])
    input_ids = [f["input_ids"] + [0] * (max_len - len(f["input_ids"])) for f in batch]
    input_mask = [[1.0] * len(f["input_ids"]) + [0.0] * (max_len - len(f["input_ids"])) for f in batch]
    input_ids = torch.tensor(input_ids, dtype=torch.long)
    input_mask = torch.tensor(input_mask, dtype=torch.float)
    entity_pos = [f["entity_pos"] for f in batch]

    labels = [f["labels"] for f in batch]
    hts = [f["hts"] for f in batch]
    output = (input_ids, input_mask, labels, entity_pos, hts )
    return output

def to_official(args, preds, features):
    rel2id = json.load(open(f'{args.data_dir}/rel2id.json', 'r'))
    id2rel = {value: key for key, value in rel2id.items()}

    h_idx, t_idx, title = [], [], []

    for f in features:
        hts = f["hts"]
        h_idx += [ht[0] for ht in hts]
        t_idx += [ht[1] for ht in hts]
        title += [f["title"] for ht in hts]

    res = []
    # print('h_idx, preds', len(h_idx), len(preds))
    # assert len(h_idx) == len(preds)


    for i in range(preds.shape[0]):
        pred = preds[i]
        pred = np.nonzero(pred)[0].tolist()
        for p in pred:
            if p != 0:
                res.append(
                    {
                        'title': title[i],
                        'h_idx': h_idx[i],
                        't_idx': t_idx[i],
                        'r': id2rel[p],
                    }
                )
    return res

def gen_train_facts(data_file_name, truth_dir):
    fact_file_name = data_file_name[data_file_name.find("train_"):]
    fact_file_name = os.path.join(truth_dir, fact_file_name.replace(".json", ".fact"))

    if os.path.exists(fact_file_name):
        fact_in_train = set([])
        triples = json.load(open(fact_file_name))
        for x in triples:
            fact_in_train.add(tuple(x))
        return fact_in_train

    fact_in_train = set([])
    ori_data = json.load(open(data_file_name))
    for data in ori_data:
        vertexSet = data['vertexSet']
        for label in data['labels']:
            rel = label['r']
            for n1 in vertexSet[label['h']]:
                for n2 in vertexSet[label['t']]:
                    fact_in_train.add((n1['name'], n2['name'], rel))

    json.dump(list(fact_in_train), open(fact_file_name, "w"))

    return fact_in_train

def official_evaluate(tmp, path):
    '''
        Adapted from the official evaluation code
    '''
    truth_dir = os.path.join(path, 'ref')

    if not os.path.exists(truth_dir):
        os.makedirs(truth_dir)

    fact_in_train_annotated = gen_train_facts(os.path.join(path, "train_annotated.json"), truth_dir)

    # if not os.path.exists(os.path.join(path, "train_distant.json")):
    #     raise FileNotFoundError("Sorry, the file: 'train_annotated.json' is too big to upload to github, \
    #         please manually download to 'data/' from DocRED GoogleDrive https://drive.google.com/drive/folders/1c5-0YwnoJx8NS6CV2f-NoTHR__BdkNqw")
    # fact_in_train_distant = gen_train_facts(os.path.join(path, "train_distant.json"), truth_dir)

    truth = json.load(open(os.path.join(path, "dev.json")))

    std = {}
    tot_evidences = 0
    titleset = set([])

    title2vectexSet = {}

    for x in truth:
        title = x['title']
        titleset.add(title)

        vertexSet = x['vertexSet']
        title2vectexSet[title] = vertexSet

        for label in x['labels']:
            r = label['r']
            h_idx = label['h']
            t_idx = label['t']
            std[(title, r, h_idx, t_idx)] = set(label['evidence'])
            tot_evidences += len(label['evidence'])

    tot_relations = len(std)
    tmp.sort(key=lambda x: (x['title'], x['h_idx'], x['t_idx'], x['r']))
    submission_answer = [tmp[0]]
    for i in range(1, len(tmp)):
        x = tmp[i]
        y = tmp[i - 1]
        if (x['title'], x['h_idx'], x['t_idx'], x['r']) != (y['title'], y['h_idx'], y['t_idx'], y['r']):
            submission_answer.append(tmp[i])

    correct_re = 0
    correct_evidence = 0
    pred_evi = 0

    correct_in_train_annotated = 0
    correct_in_train_distant = 0
    titleset2 = set([])
    for x in submission_answer:
        title = x['title']
        h_idx = x['h_idx']
        t_idx = x['t_idx']
        r = x['r']
        titleset2.add(title)
        if title not in title2vectexSet:
            continue
        vertexSet = title2vectexSet[title]

        if 'evidence' in x:
            evi = set(x['evidence'])
        else:
            evi = set([])
        pred_evi += len(evi)

        if (title, r, h_idx, t_idx) in std:
            correct_re += 1
            stdevi = std[(title, r, h_idx, t_idx)]
            correct_evidence += len(stdevi & evi)
            in_train_annotated = in_train_distant = False
            for n1 in vertexSet[h_idx]:
                for n2 in vertexSet[t_idx]:
                    if (n1['name'], n2['name'], r) in fact_in_train_annotated:
                        in_train_annotated = True
                    # if (n1['name'], n2['name'], r) in fact_in_train_distant:
                    #     in_train_distant = True

            if in_train_annotated:
                correct_in_train_annotated += 1
            # if in_train_distant:
            #     correct_in_train_distant += 1

    re_p = 1.0 * correct_re / len(submission_answer)
    re_r = 1.0 * correct_re / tot_relations
    if re_p + re_r == 0:
        re_f1 = 0
    else:
        re_f1 = 2.0 * re_p * re_r / (re_p + re_r)

    evi_p = 1.0 * correct_evidence / pred_evi if pred_evi > 0 else 0
    evi_r = 1.0 * correct_evidence / tot_evidences
    if evi_p + evi_r == 0:
        evi_f1 = 0
    else:
        evi_f1 = 2.0 * evi_p * evi_r / (evi_p + evi_r)

    re_p_ignore_train_annotated = 1.0 * (correct_re - correct_in_train_annotated) / (len(submission_answer) - correct_in_train_annotated + 1e-5)
    re_p_ignore_train = 1.0 * (correct_re - correct_in_train_distant) / (len(submission_answer) - correct_in_train_distant + 1e-5)

    if re_p_ignore_train_annotated + re_r == 0:
        re_f1_ignore_train_annotated = 0
    else:
        re_f1_ignore_train_annotated = 2.0 * re_p_ignore_train_annotated * re_r / (re_p_ignore_train_annotated + re_r)

    if re_p_ignore_train + re_r == 0:
        re_f1_ignore_train = 0
    else:
        re_f1_ignore_train = 2.0 * re_p_ignore_train * re_r / (re_p_ignore_train + re_r)

    return re_f1, evi_f1, re_f1_ignore_train_annotated, re_f1_ignore_train, re_p, re_r

## Train the model
### Config parameters

### DeepPavlov ruBert

In [None]:
class Config(object):
    adam_epsilon=1e-06
    bert_lr=3e-05
    channel_type='context-based'
    config_name=''
    data_dir='/content/drive/MyDrive/data/nerel_docred_ver'
    dataset='docred'
    dev_file='dev.json'
    down_dim=256
    evaluation_steps=-1
    gradient_accumulation_steps=2
    learning_rate=0.0003
    log_dir='/content/drive/MyDrive/Models/DeepKE_train/logs/train_roberta.log'
    max_grad_norm=1.0
    max_height=103
    max_seq_length=512
    # model_name_or_path='xlm-roberta-base'
    model_name_or_path='DeepPavlov/rubert-base-cased'
    num_class=50
    num_labels=29
    num_train_epochs=10
    save_path='/content/drive/MyDrive/Models/DeepKE_train/checkpoints/model_roberta.pt'
    seed=111
    test_batch_size=2
    test_file='test.json'
    tokenizer_name=''
    train_batch_size=2
    train_file='train_annotated_filter.json'
    train_from_saved_model=''
    transformer_type='roberta'
    unet_in_dim=3
    unet_out_dim=256
    warmup_ratio=0.06
    load_path='/content/drive/MyDrive/Models/DeepKE_train/checkpoints/model_roberta.pt'

cfg = Config()

In [None]:
# class Config(object):
#     adam_epsilon=1e-06
#     bert_lr=3e-05
#     channel_type='context-based'
#     config_name=''
#     data_dir='/Users/yandutov-alex/study/science_work/DocRED_original/data/'
#     dataset='docred'
#     dev_file='dev.json'
#     down_dim=256
#     evaluation_steps=-1
#     gradient_accumulation_steps=2
#     learning_rate=0.0004
#     log_dir='./logs/deepke/train_roberta_docred.log'
#     max_grad_norm=1.0
#     max_height=42
#     max_seq_length=1024
#     model_name_or_path='roberta-base'
#     num_class=97
#     num_labels=4
#     num_train_epochs=30
#     save_path='./model_roberta.pt'
#     seed=111
#     test_batch_size=2
#     test_file='test.json'
#     tokenizer_name=''
#     train_batch_size=2
#     train_file='train_annotated.json'
#     train_from_saved_model=''
#     transformer_type='roberta'
#     unet_in_dim=3
#     unet_out_dim=256
#     warmup_ratio=0.06
#     load_path='./model_roberta.pt'

# cfg = Config()

### Model Training

In [None]:
def train(args, model, train_features, dev_features, test_features):
    def logging(s, print_=True, log_=True):
        if print_:
            print(s)
        if log_ and args.log_dir != '':
            with open(args.log_dir, 'a+') as f_log:
                f_log.write(s + '\n')
    def finetune(features, optimizer, num_epoch, num_steps, model):
        cur_model = model.module if hasattr(model, 'module') else model
        # device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

        if torch.cuda.is_available():
            device = "cuda:0"
        # elif torch.backends.mps.is_available():
        #     device = "mps"
        else:
            device = "cpu"

        if args.train_from_saved_model != '':
            best_score = torch.load(args.train_from_saved_model)["best_f1"]
            epoch_delta = torch.load(args.train_from_saved_model)["epoch"] + 1
        else:
            epoch_delta = 0
            best_score = -1
        train_dataloader = DataLoader(features, batch_size=args.train_batch_size, shuffle=True, collate_fn=collate_fn, drop_last=True)
        train_iterator = [epoch + epoch_delta for epoch in range(num_epoch)]
        total_steps = int(len(train_dataloader) * num_epoch // args.gradient_accumulation_steps)
        warmup_steps = int(total_steps * args.warmup_ratio)
        scheduler = get_linear_schedule_with_warmup(optimizer, num_warmup_steps=warmup_steps, num_training_steps=total_steps)
        print("Total steps: {}".format(total_steps))
        print("Warmup steps: {}".format(warmup_steps))
        global_step = 0
        log_step = 100
        total_loss = 0



        #scaler = GradScaler()
        for epoch in train_iterator:
            start_time = time.time()
            optimizer.zero_grad()

            for step, batch in tqdm(enumerate(train_dataloader)):
                model.train()

                inputs = {'input_ids': batch[0].to(device),
                          'attention_mask': batch[1].to(device),
                          'labels': batch[2],
                          'entity_pos': batch[3],
                          'hts': batch[4],
                          }
                #with autocast():
                outputs = model(**inputs)
                loss = outputs[0] / args.gradient_accumulation_steps
                total_loss += loss.item()
                #    scaler.scale(loss).backward()


                loss.backward()

                if step % args.gradient_accumulation_steps == 0:
                    #scaler.unscale_(optimizer)
                    if args.max_grad_norm > 0:
                        # torch.nn.utils.clip_grad_norm_(amp.master_params(optimizer), args.max_grad_norm)
                        torch.nn.utils.clip_grad_norm_(cur_model.parameters(), args.max_grad_norm)
                    #scaler.step(optimizer)
                    #scaler.update()
                    #scheduler.step()
                    optimizer.step()
                    scheduler.step()
                    optimizer.zero_grad()
                    global_step += 1
                    num_steps += 1
                    if global_step % log_step == 0:
                        cur_loss = total_loss / log_step
                        elapsed = time.time() - start_time
                        logging(
                            '| epoch {:2d} | step {:4d} | min/b {:5.2f} | lr {} | train loss {:5.3f}'.format(
                                epoch, global_step, elapsed / 60, scheduler.get_last_lr(), cur_loss))
                        total_loss = 0
                        start_time = time.time()

                if (step + 1) == len(train_dataloader) - 1 or (args.evaluation_steps > 0 and num_steps % args.evaluation_steps == 0 and step % args.gradient_accumulation_steps == 0):
                # if step ==0:
                    logging('-' * 89)
                    eval_start_time = time.time()
                    dev_score, dev_output = evaluate(args, model, dev_features, tag="dev")

                    logging(
                        '| epoch {:3d} | time: {:5.2f}s | dev_result:{}'.format(epoch, time.time() - eval_start_time,
                                                                                dev_output))
                    logging('-' * 89)
                    if dev_score > best_score:
                        best_score = dev_score
                        logging(
                            '| epoch {:3d} | best_f1:{}'.format(epoch, best_score))
                        if args.save_path != "":
                            torch.save({
                                'epoch': epoch,
                                'checkpoint': cur_model.state_dict(),
                                'best_f1': best_score,
                                'optimizer': optimizer.state_dict()
                            }, args.save_path
                            , _use_new_zipfile_serialization=False)
                            logging(
                                '| successfully save model at: {}'.format(args.save_path))
                            logging('-' * 89)
        return num_steps

    cur_model = model.module if hasattr(model, 'module') else model
    extract_layer = ["extractor", "bilinear"]
    bert_layer = ['bert_model']
    optimizer_grouped_parameters = [
        {"params": [p for n, p in cur_model.named_parameters() if any(nd in n for nd in bert_layer)], "lr": args.bert_lr},
        {"params": [p for n, p in cur_model.named_parameters() if any(nd in n for nd in extract_layer)], "lr": 1e-4},
        {"params": [p for n, p in cur_model.named_parameters() if not any(nd in n for nd in extract_layer + bert_layer)]},
    ]

    optimizer = AdamW(optimizer_grouped_parameters, lr=args.learning_rate, eps=args.adam_epsilon)
    if args.train_from_saved_model != '':
        optimizer.load_state_dict(torch.load(args.train_from_saved_model)["optimizer"])
        print("load saved optimizer from {}.".format(args.train_from_saved_model))


    num_steps = 0
    set_seed(args)
    model.zero_grad()
    finetune(train_features, optimizer, args.num_train_epochs, num_steps, model)

def evaluate(args, model, features, tag="dev"):
    # device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
    if torch.cuda.is_available():
        device = "cuda:0"
    # elif torch.backends.mps.is_available():
    #     device = "mps"
    else:
        device = "cpu"

    dataloader = DataLoader(features, batch_size=args.test_batch_size, shuffle=False, collate_fn=collate_fn, drop_last=False)
    preds = []
    total_loss = 0
    for i, batch in enumerate(dataloader):
        model.eval()

        inputs = {'input_ids': batch[0].to(device),
                  'attention_mask': batch[1].to(device),
                  'labels': batch[2],
                  'entity_pos': batch[3],
                  'hts': batch[4],
                  }

        with torch.no_grad():
            output = model(**inputs)
            loss = output[0]
            pred = output[1].cpu().numpy()
            pred[np.isnan(pred)] = 0
            preds.append(pred)
            total_loss += loss.item()

    average_loss = total_loss / (i + 1)
    preds = np.concatenate(preds, axis=0).astype(np.float32)
    ans = to_official(args, preds, features)
    if len(ans) > 0:
        best_f1, _, best_f1_ign, _, re_p, re_r = official_evaluate(ans, args.data_dir)
    output = {
        tag + "_F1": best_f1 * 100,
        tag + "_F1_ign": best_f1_ign * 100,
        tag + "_re_p": re_p * 100,
        tag + "_re_r": re_r * 100,
        tag + "_average_loss": average_loss
    }
    return best_f1, output

In [None]:
# if not os.path.exists(os.path.join(cfg.data_dir, "train_distant.json")):
#     raise FileNotFoundError("Sorry, the file: 'train_annotated.json' is too big to upload to github, \
#         please manually download to 'data/' from DocRED GoogleDrive https://drive.google.com/drive/folders/1c5-0YwnoJx8NS6CV2f-NoTHR__BdkNqw")


if torch.cuda.is_available():
    device = "cuda:0"
# elif torch.backends.mps.is_available():
#     device = "mps"
else:
    device = "cpu"


config = AutoConfig.from_pretrained(
    cfg.config_name if cfg.config_name else cfg.model_name_or_path,
    num_labels=cfg.num_class,
)
tokenizer = AutoTokenizer.from_pretrained(
    cfg.tokenizer_name if cfg.tokenizer_name else cfg.model_name_or_path,
)

Dataset = ReadDataset(cfg.dataset, tokenizer, cfg.max_seq_length, cfg.transformer_type)

train_file = os.path.join(cfg.data_dir, cfg.train_file)
dev_file = os.path.join(cfg.data_dir, cfg.dev_file)
test_file = os.path.join(cfg.data_dir, cfg.test_file)
train_features = Dataset.read(train_file)
dev_features = Dataset.read(dev_file)
test_features = Dataset.read(test_file)

model = AutoModel.from_pretrained(
    cfg.model_name_or_path,
    from_tf=bool(".ckpt" in cfg.model_name_or_path),
    config=config,
)


config.cls_token_id = tokenizer.cls_token_id
config.sep_token_id = tokenizer.sep_token_id
config.transformer_type = cfg.transformer_type

set_seed(cfg)
model = DocREModel(config, cfg,  model, num_labels=cfg.num_labels)
if cfg.train_from_saved_model != '':
    model.load_state_dict(torch.load(cfg.train_from_saved_model)["checkpoint"])
    print("load saved model from {}.".format(cfg.train_from_saved_model))

#if torch.cuda.device_count() > 1:
#    print("Let's use", torch.cuda.device_count(), "GPUs!")
#    model = torch.nn.DataParallel(model, device_ids = list(range(torch.cuda.device_count())))
model.to(device)

train(cfg, model, train_features, dev_features, test_features)

NameError: ignored

## Ru Roberta large

In [None]:
class Config(object):
    adam_epsilon=1e-06
    bert_lr=3e-05
    channel_type='context-based'
    config_name=''
    data_dir='/content/drive/MyDrive/data/nerel_docred_ver'
    dataset='docred'
    dev_file='dev.json'
    down_dim=256
    evaluation_steps=-1
    gradient_accumulation_steps=2
    learning_rate=0.0003
    log_dir='/content/drive/MyDrive/Models/DeepKE_train/logs/train_ru_roberta.log'
    max_grad_norm=1.0
    max_height=103
    max_seq_length=512
    # model_name_or_path='xlm-roberta-base'
    model_name_or_path='ai-forever/ruRoberta-large'
    num_class=50
    num_labels=29
    num_train_epochs=20
    save_path='/content/drive/MyDrive/Models/DeepKE_train/checkpoints/model_ru_roberta.pt'
    seed=111
    test_batch_size=2
    test_file='test.json'
    tokenizer_name=''
    train_batch_size=2
    train_file='train_annotated_filter.json'
    train_from_saved_model=''
    transformer_type='roberta'
    unet_in_dim=3
    unet_out_dim=256
    warmup_ratio=0.06
    load_path='/content/drive/MyDrive/Models/DeepKE_train/checkpoints/model_ru_roberta.pt'

cfg = Config()

In [None]:
# if not os.path.exists(os.path.join(cfg.data_dir, "train_distant.json")):
#     raise FileNotFoundError("Sorry, the file: 'train_annotated.json' is too big to upload to github, \
#         please manually download to 'data/' from DocRED GoogleDrive https://drive.google.com/drive/folders/1c5-0YwnoJx8NS6CV2f-NoTHR__BdkNqw")


if torch.cuda.is_available():
    device = "cuda:0"
# elif torch.backends.mps.is_available():
#     device = "mps"
else:
    device = "cpu"


config = AutoConfig.from_pretrained(
    cfg.config_name if cfg.config_name else cfg.model_name_or_path,
    num_labels=cfg.num_class,
)
tokenizer = AutoTokenizer.from_pretrained(
    cfg.tokenizer_name if cfg.tokenizer_name else cfg.model_name_or_path,
)

Dataset = ReadDataset(cfg.dataset, tokenizer, cfg.max_seq_length, cfg.transformer_type)

train_file = os.path.join(cfg.data_dir, cfg.train_file)
dev_file = os.path.join(cfg.data_dir, cfg.dev_file)
test_file = os.path.join(cfg.data_dir, cfg.test_file)
train_features = Dataset.read(train_file)
dev_features = Dataset.read(dev_file)
test_features = Dataset.read(test_file)

model = AutoModel.from_pretrained(
    cfg.model_name_or_path,
    from_tf=bool(".ckpt" in cfg.model_name_or_path),
    config=config,
)


config.cls_token_id = tokenizer.cls_token_id
config.sep_token_id = tokenizer.sep_token_id
config.transformer_type = cfg.transformer_type

set_seed(cfg)
model = DocREModel(config, cfg,  model, num_labels=cfg.num_labels)
if cfg.train_from_saved_model != '':
    model.load_state_dict(torch.load(cfg.train_from_saved_model)["checkpoint"])
    print("load saved model from {}.".format(cfg.train_from_saved_model))

#if torch.cuda.device_count() > 1:
#    print("Let's use", torch.cuda.device_count(), "GPUs!")
#    model = torch.nn.DataParallel(model, device_ids = list(range(torch.cuda.device_count())))
model.to(device)

train(cfg, model, train_features, dev_features, test_features)

load preprocessed data from /content/drive/MyDrive/data/nerel_docred_ver/train_annotated_filter_roberta_docred.pkl.
load preprocessed data from /content/drive/MyDrive/data/nerel_docred_ver/dev_roberta_docred.pkl.
load preprocessed data from /content/drive/MyDrive/data/nerel_docred_ver/test_roberta_docred.pkl.


Some weights of the model checkpoint at ai-forever/ruRoberta-large were not used when initializing RobertaModel: ['lm_head.dense.weight', 'lm_head.decoder.bias', 'lm_head.bias', 'lm_head.dense.bias', 'lm_head.layer_norm.bias', 'lm_head.layer_norm.weight', 'lm_head.decoder.weight']
- This IS expected if you are initializing RobertaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Total steps: 3650
Warmup steps: 219


199it [02:04,  1.88it/s]

| epoch  0 | step  100 | min/b  2.07 | lr [1.3698630136986302e-05, 4.5662100456621006e-05, 0.000136986301369863] | train loss 1.701


363it [03:48,  2.05it/s]

-----------------------------------------------------------------------------------------
| epoch   0 | time:  6.34s | dev_result:{'dev_F1': 6.653283624892457, 'dev_F1_ign': 6.62401520730752, 'dev_re_p': 35.80246913580247, 'dev_re_r': 3.66740436294657, 'dev_average_loss': 0.18175558286144378}
-----------------------------------------------------------------------------------------
| epoch   0 | best_f1:0.06653283624892457
| successfully save model at: /content/drive/MyDrive/Models/DeepKE_train/checkpoints/model_ru_roberta.pt


364it [04:07,  6.03s/it]

-----------------------------------------------------------------------------------------


365it [04:07,  1.47it/s]
33it [00:21,  1.25it/s]

| epoch  1 | step  200 | min/b  0.36 | lr [2.7397260273972603e-05, 9.132420091324201e-05, 0.000273972602739726] | train loss 0.215


233it [02:27,  2.08it/s]

| epoch  1 | step  300 | min/b  2.10 | lr [2.929175167589624e-05, 9.763917225298747e-05, 0.0002929175167589624] | train loss 0.166


363it [03:49,  1.61it/s]

-----------------------------------------------------------------------------------------
| epoch   1 | time:  6.33s | dev_result:{'dev_F1': 23.289207987435493, 'dev_F1_ign': 23.077578661367205, 'dev_re_p': 40.108191653786704, 'dev_re_r': 16.4084729687006, 'dev_average_loss': 0.1367814280885331}
-----------------------------------------------------------------------------------------
| epoch   1 | best_f1:0.23289207987435492
| successfully save model at: /content/drive/MyDrive/Models/DeepKE_train/checkpoints/model_ru_roberta.pt


364it [04:08,  6.09s/it]

-----------------------------------------------------------------------------------------


365it [04:08,  1.47it/s]
67it [00:43,  1.52it/s]

| epoch  2 | step  400 | min/b  0.72 | lr [2.841737102885456e-05, 9.472457009618187e-05, 0.00028417371028854557] | train loss 0.136


267it [02:49,  1.52it/s]

| epoch  2 | step  500 | min/b  2.09 | lr [2.7542990381812884e-05, 9.180996793937629e-05, 0.00027542990381812884] | train loss 0.132


363it [03:48,  2.09it/s]

-----------------------------------------------------------------------------------------
| epoch   2 | time:  6.34s | dev_result:{'dev_F1': 25.836005497022445, 'dev_F1_ign': 25.66180247634058, 'dev_re_p': 46.88279301745636, 'dev_re_r': 17.831172937085046, 'dev_average_loss': 0.1328606033261786}
-----------------------------------------------------------------------------------------
| epoch   2 | best_f1:0.25836005497022446
| successfully save model at: /content/drive/MyDrive/Models/DeepKE_train/checkpoints/model_ru_roberta.pt


364it [04:07,  6.13s/it]

-----------------------------------------------------------------------------------------


365it [04:08,  1.47it/s]
101it [01:05,  1.63it/s]

| epoch  3 | step  600 | min/b  1.10 | lr [2.6668609734771204e-05, 8.889536578257068e-05, 0.000266686097347712] | train loss 0.117


301it [03:11,  1.82it/s]

| epoch  3 | step  700 | min/b  2.10 | lr [2.5794229087729524e-05, 8.598076362576508e-05, 0.0002579422908772952] | train loss 0.115


363it [03:51,  1.41it/s]

-----------------------------------------------------------------------------------------
| epoch   3 | time:  6.31s | dev_result:{'dev_F1': 34.836427939876216, 'dev_F1_ign': 34.48295104130536, 'dev_re_p': 57.89860396767082, 'dev_re_r': 24.913057224154283, 'dev_average_loss': 0.11305942584542518}
-----------------------------------------------------------------------------------------
| epoch   3 | best_f1:0.34836427939876213
| successfully save model at: /content/drive/MyDrive/Models/DeepKE_train/checkpoints/model_ru_roberta.pt


364it [04:11,  6.73s/it]

-----------------------------------------------------------------------------------------


365it [04:12,  1.45it/s]
135it [01:30,  1.43it/s]

| epoch  4 | step  800 | min/b  1.51 | lr [2.4919848440687847e-05, 8.30661614689595e-05, 0.00024919848440687844] | train loss 0.100


335it [03:31,  1.71it/s]

| epoch  4 | step  900 | min/b  2.01 | lr [2.4045467793646167e-05, 8.01515593121539e-05, 0.00024045467793646163] | train loss 0.099


363it [03:49,  1.50it/s]

-----------------------------------------------------------------------------------------


364it [03:55,  2.48s/it]

| epoch   4 | time:  6.37s | dev_result:{'dev_F1': 34.68261269549218, 'dev_F1_ign': 34.460672368937715, 'dev_re_p': 63.628691983122366, 'dev_re_r': 23.8381283591527, 'dev_average_loss': 0.10693065417890853}
-----------------------------------------------------------------------------------------


365it [03:56,  1.54it/s]
169it [01:46,  1.42it/s]

| epoch  5 | step 1000 | min/b  1.78 | lr [2.317108714660449e-05, 7.72369571553483e-05, 0.00023171087146604488] | train loss 0.089


363it [03:48,  1.53it/s]

-----------------------------------------------------------------------------------------
| epoch   5 | time:  6.32s | dev_result:{'dev_F1': 39.28881179531656, 'dev_F1_ign': 39.020463194033795, 'dev_re_p': 62.52587991718427, 'dev_re_r': 28.643692696806827, 'dev_average_loss': 0.10317925736308098}
-----------------------------------------------------------------------------------------
| epoch   5 | best_f1:0.3928881179531656
| successfully save model at: /content/drive/MyDrive/Models/DeepKE_train/checkpoints/model_ru_roberta.pt


364it [04:07,  6.09s/it]

-----------------------------------------------------------------------------------------


365it [04:07,  1.47it/s]
3it [00:01,  2.35it/s]

| epoch  6 | step 1100 | min/b  0.02 | lr [2.229670649956281e-05, 7.43223549985427e-05, 0.00022296706499562807] | train loss 0.086


203it [02:05,  1.52it/s]

| epoch  6 | step 1200 | min/b  2.06 | lr [2.1422325852521133e-05, 7.140775284173711e-05, 0.0002142232585252113] | train loss 0.081


363it [03:50,  1.44it/s]

-----------------------------------------------------------------------------------------
| epoch   6 | time:  6.44s | dev_result:{'dev_F1': 40.184757505773675, 'dev_F1_ign': 39.81879179082311, 'dev_re_p': 59.8125, 'dev_re_r': 30.256085994309203, 'dev_average_loss': 0.10445394747434779}
-----------------------------------------------------------------------------------------
| epoch   6 | best_f1:0.4018475750577367
| successfully save model at: /content/drive/MyDrive/Models/DeepKE_train/checkpoints/model_ru_roberta.pt


364it [04:09,  6.15s/it]

-----------------------------------------------------------------------------------------


365it [04:09,  1.46it/s]
37it [00:25,  1.32it/s]

| epoch  7 | step 1300 | min/b  0.43 | lr [2.0547945205479453e-05, 6.84931506849315e-05, 0.00020547945205479448] | train loss 0.075


237it [02:29,  1.45it/s]

| epoch  7 | step 1400 | min/b  2.06 | lr [1.9673564558437776e-05, 6.557854852812592e-05, 0.00019673564558437772] | train loss 0.073


363it [03:48,  1.58it/s]

-----------------------------------------------------------------------------------------
| epoch   7 | time:  6.28s | dev_result:{'dev_F1': 42.682412484183885, 'dev_F1_ign': 42.41327742032348, 'dev_re_p': 64.09119696010133, 'dev_re_r': 31.994941511223523, 'dev_average_loss': 0.10216029804754764}
-----------------------------------------------------------------------------------------
| epoch   7 | best_f1:0.42682412484183885
| successfully save model at: /content/drive/MyDrive/Models/DeepKE_train/checkpoints/model_ru_roberta.pt


364it [04:07,  6.09s/it]

-----------------------------------------------------------------------------------------


365it [04:08,  1.47it/s]
71it [00:45,  1.14it/s]

| epoch  8 | step 1500 | min/b  0.76 | lr [1.8799183911396096e-05, 6.266394637132032e-05, 0.00018799183911396092] | train loss 0.070


271it [02:50,  1.62it/s]

| epoch  8 | step 1600 | min/b  2.08 | lr [1.7924803264354415e-05, 5.974934421451473e-05, 0.00017924803264354416] | train loss 0.062


363it [03:48,  1.30it/s]

-----------------------------------------------------------------------------------------
| epoch   8 | time:  6.31s | dev_result:{'dev_F1': 44.19688069677942, 'dev_F1_ign': 43.81288572470303, 'dev_re_p': 61.49943630214205, 'dev_re_r': 34.49257034460955, 'dev_average_loss': 0.10364125280621204}
-----------------------------------------------------------------------------------------
| epoch   8 | best_f1:0.44196880696779417
| successfully save model at: /content/drive/MyDrive/Models/DeepKE_train/checkpoints/model_ru_roberta.pt


364it [04:07,  6.18s/it]

-----------------------------------------------------------------------------------------


365it [04:08,  1.47it/s]
105it [01:06,  1.47it/s]

| epoch  9 | step 1700 | min/b  1.12 | lr [1.7050422617312735e-05, 5.683474205770912e-05, 0.00017050422617312735] | train loss 0.061


305it [03:10,  1.71it/s]

| epoch  9 | step 1800 | min/b  2.06 | lr [1.617604197027106e-05, 5.392013990090353e-05, 0.00016176041970271057] | train loss 0.060


363it [03:48,  1.57it/s]

-----------------------------------------------------------------------------------------


364it [03:56,  3.05s/it]

| epoch   9 | time:  8.05s | dev_result:{'dev_F1': 44.161707367124976, 'dev_F1_ign': 43.80724510054457, 'dev_re_p': 62.92397660818714, 'dev_re_r': 34.01833702181473, 'dev_average_loss': 0.10452688556719333}
-----------------------------------------------------------------------------------------


365it [03:57,  1.54it/s]
139it [01:27,  1.58it/s]

| epoch 10 | step 1900 | min/b  1.46 | lr [1.5301661323229378e-05, 5.100553774409793e-05, 0.00015301661323229376] | train loss 0.056


339it [03:32,  1.44it/s]

| epoch 10 | step 2000 | min/b  2.09 | lr [1.44272806761877e-05, 4.809093558729233e-05, 0.00014427280676187698] | train loss 0.056


363it [03:47,  2.01it/s]

-----------------------------------------------------------------------------------------
| epoch  10 | time:  6.34s | dev_result:{'dev_F1': 46.664167916041976, 'dev_F1_ign': 46.1784746606631, 'dev_re_p': 57.2940635066728, 'dev_re_r': 39.36136579196965, 'dev_average_loss': 0.10657669278852483}
-----------------------------------------------------------------------------------------
| epoch  10 | best_f1:0.46664167916041976


364it [04:06,  6.22s/it]

| successfully save model at: /content/drive/MyDrive/Models/DeepKE_train/checkpoints/model_ru_roberta.pt
-----------------------------------------------------------------------------------------


365it [04:07,  1.47it/s]
173it [01:51,  1.44it/s]

| epoch 11 | step 2100 | min/b  1.86 | lr [1.3552900029146021e-05, 4.517633343048674e-05, 0.0001355290002914602] | train loss 0.052


363it [03:49,  1.40it/s]

-----------------------------------------------------------------------------------------


364it [03:55,  2.46s/it]

| epoch  11 | time:  6.32s | dev_result:{'dev_F1': 45.42496949979666, 'dev_F1_ign': 45.098710312769136, 'dev_re_p': 63.64672364672364, 'dev_re_r': 35.314574770787225, 'dev_average_loss': 0.10674690827727318}
-----------------------------------------------------------------------------------------


365it [03:56,  1.54it/s]
7it [00:03,  1.88it/s]

| epoch 12 | step 2200 | min/b  0.07 | lr [1.2678519382104343e-05, 4.226173127368114e-05, 0.00012678519382104342] | train loss 0.048


207it [02:10,  1.51it/s]

| epoch 12 | step 2300 | min/b  2.11 | lr [1.1804138735062664e-05, 3.9347129116875545e-05, 0.00011804138735062662] | train loss 0.050


363it [03:48,  1.70it/s]

-----------------------------------------------------------------------------------------
| epoch  12 | time:  6.32s | dev_result:{'dev_F1': 46.94944799535154, 'dev_F1_ign': 46.527347970570844, 'dev_re_p': 60.6, 'dev_re_r': 38.318052481821056, 'dev_average_loss': 0.11013508841712424}
-----------------------------------------------------------------------------------------
| epoch  12 | best_f1:0.4694944799535154
| successfully save model at: /content/drive/MyDrive/Models/DeepKE_train/checkpoints/model_ru_roberta.pt


364it [04:10,  7.09s/it]

-----------------------------------------------------------------------------------------


365it [04:11,  1.45it/s]
41it [00:25,  1.55it/s]

| epoch 13 | step 2400 | min/b  0.43 | lr [1.0929758088020986e-05, 3.643252696006995e-05, 0.00010929758088020984] | train loss 0.045


241it [02:31,  1.67it/s]

| epoch 13 | step 2500 | min/b  2.10 | lr [1.0055377440979307e-05, 3.3517924803264355e-05, 0.00010055377440979306] | train loss 0.044


363it [03:49,  1.73it/s]

-----------------------------------------------------------------------------------------


364it [03:56,  2.46s/it]

| epoch  13 | time:  6.30s | dev_result:{'dev_F1': 46.25796178343949, 'dev_F1_ign': 45.85520283593599, 'dev_re_p': 62.439548629768936, 'dev_re_r': 36.737274739171674, 'dev_average_loss': 0.11061595301044748}
-----------------------------------------------------------------------------------------


365it [03:57,  1.54it/s]
75it [00:46,  1.41it/s]

| epoch 14 | step 2600 | min/b  0.77 | lr [9.180996793937627e-06, 3.0603322646458756e-05, 9.180996793937626e-05] | train loss 0.042


275it [02:52,  1.40it/s]

| epoch 14 | step 2700 | min/b  2.10 | lr [8.306616146895948e-06, 2.7688720489653165e-05, 8.306616146895948e-05] | train loss 0.041


363it [03:47,  1.61it/s]

-----------------------------------------------------------------------------------------


364it [03:54,  2.52s/it]

| epoch  14 | time:  6.40s | dev_result:{'dev_F1': 46.35787806809184, 'dev_F1_ign': 45.91136710040775, 'dev_re_p': 61.99047114875596, 'dev_re_r': 37.02181473284856, 'dev_average_loss': 0.11203355865275606}
-----------------------------------------------------------------------------------------


365it [03:55,  1.55it/s]
109it [01:09,  1.78it/s]

| epoch 15 | step 2800 | min/b  1.16 | lr [7.43223549985427e-06, 2.4774118332847566e-05, 7.43223549985427e-05] | train loss 0.039


309it [03:17,  1.46it/s]

| epoch 15 | step 2900 | min/b  2.13 | lr [6.557854852812591e-06, 2.185951617604197e-05, 6.55785485281259e-05] | train loss 0.038


363it [03:48,  1.54it/s]

-----------------------------------------------------------------------------------------


364it [03:55,  2.53s/it]

| epoch  15 | time:  6.36s | dev_result:{'dev_F1': 46.48053480141565, 'dev_F1_ign': 46.04538620606252, 'dev_re_p': 61.46645865834633, 'dev_re_r': 37.36958583623143, 'dev_average_loss': 0.11466999787916528}
-----------------------------------------------------------------------------------------


365it [03:56,  1.55it/s]
143it [01:31,  1.44it/s]

| epoch 16 | step 3000 | min/b  1.53 | lr [5.683474205770913e-06, 1.8944914019236376e-05, 5.683474205770912e-05] | train loss 0.038


343it [03:38,  1.01s/it]

| epoch 16 | step 3100 | min/b  2.11 | lr [4.809093558729234e-06, 1.603031186243078e-05, 4.809093558729233e-05] | train loss 0.035


363it [03:50,  1.74it/s]

-----------------------------------------------------------------------------------------


364it [03:56,  2.45s/it]

| epoch  16 | time:  6.33s | dev_result:{'dev_F1': 46.63765822784809, 'dev_F1_ign': 46.26880305374207, 'dev_re_p': 62.28209191759112, 'dev_re_r': 37.27473917167246, 'dev_average_loss': 0.1152498592880178}
-----------------------------------------------------------------------------------------


365it [03:57,  1.54it/s]
177it [01:52,  1.63it/s]

| epoch 17 | step 3200 | min/b  1.87 | lr [3.934712911687554e-06, 1.3115709705625181e-05, 3.934712911687554e-05] | train loss 0.033


363it [03:49,  1.98it/s]

-----------------------------------------------------------------------------------------


364it [03:55,  2.32s/it]

| epoch  17 | time:  6.35s | dev_result:{'dev_F1': 46.90972899200624, 'dev_F1_ign': 46.46964940797855, 'dev_re_p': 61.190233977619535, 'dev_re_r': 38.03351248814417, 'dev_average_loss': 0.11621093298209474}
-----------------------------------------------------------------------------------------


365it [03:56,  1.54it/s]
11it [00:06,  1.56it/s]

| epoch 18 | step 3300 | min/b  0.12 | lr [3.060332264645876e-06, 1.0201107548819588e-05, 3.0603322646458756e-05] | train loss 0.034


211it [02:16,  1.54it/s]

| epoch 18 | step 3400 | min/b  2.16 | lr [2.185951617604197e-06, 7.286505392013991e-06, 2.185951617604197e-05] | train loss 0.032


363it [03:48,  1.69it/s]

-----------------------------------------------------------------------------------------


364it [03:55,  2.51s/it]

| epoch  18 | time:  6.38s | dev_result:{'dev_F1': 46.52544050683033, 'dev_F1_ign': 46.11928829232619, 'dev_re_p': 62.23516949152542, 'dev_re_r': 37.14827695226051, 'dev_average_loss': 0.11702844968184511}
-----------------------------------------------------------------------------------------


365it [03:56,  1.55it/s]
45it [00:27,  1.35it/s]

| epoch 19 | step 3500 | min/b  0.45 | lr [1.3115709705625184e-06, 4.371903235208394e-06, 1.3115709705625181e-05] | train loss 0.032


245it [02:31,  1.76it/s]

| epoch 19 | step 3600 | min/b  2.07 | lr [4.371903235208394e-07, 1.4573010784027982e-06, 4.371903235208393e-06] | train loss 0.030


363it [03:48,  1.35it/s]

-----------------------------------------------------------------------------------------
| epoch  19 | time:  8.07s | dev_result:{'dev_F1': 47.204724409448815, 'dev_F1_ign': 46.76990738756132, 'dev_re_p': 62.54564423578508, 'dev_re_r': 37.90705026873222, 'dev_average_loss': 0.1168784349364169}
-----------------------------------------------------------------------------------------
| epoch  19 | best_f1:0.47204724409448817
| successfully save model at: /content/drive/MyDrive/Models/DeepKE_train/checkpoints/model_ru_roberta.pt


364it [04:09,  6.88s/it]

-----------------------------------------------------------------------------------------


365it [04:10,  1.46it/s]


correct parameters

In [None]:
class Config(object):
    adam_epsilon=1e-06
    bert_lr=3e-05
    channel_type='context-based'
    config_name=''
    data_dir='/content/drive/MyDrive/data/nerel_docred_ver'
    dataset='docred'
    dev_file='dev.json'
    down_dim=256
    evaluation_steps=-1
    gradient_accumulation_steps=2
    learning_rate=0.0003
    log_dir='/content/drive/MyDrive/Models/DeepKE_train/logs/train_ru_roberta2.log'
    max_grad_norm=1.0
    max_height=103
    max_seq_length=1024
    # model_name_or_path='xlm-roberta-base'
    model_name_or_path='ai-forever/ruRoberta-large'
    num_class=50
    num_labels=2
    num_train_epochs=20
    save_path='/content/drive/MyDrive/Models/DeepKE_train/checkpoints/model_ru_roberta2.pt'
    seed=111
    test_batch_size=2
    test_file='test.json'
    tokenizer_name=''
    train_batch_size=2
    train_file='train_annotated_filter.json'
    train_from_saved_model=''
    transformer_type='roberta'
    unet_in_dim=3
    unet_out_dim=256
    warmup_ratio=0.06
    load_path='/content/drive/MyDrive/Models/DeepKE_train/checkpoints/model_ru_roberta2.pt'

cfg = Config()

In [None]:
# if not os.path.exists(os.path.join(cfg.data_dir, "train_distant.json")):
#     raise FileNotFoundError("Sorry, the file: 'train_annotated.json' is too big to upload to github, \
#         please manually download to 'data/' from DocRED GoogleDrive https://drive.google.com/drive/folders/1c5-0YwnoJx8NS6CV2f-NoTHR__BdkNqw")


if torch.cuda.is_available():
    device = "cuda:0"
# elif torch.backends.mps.is_available():
#     device = "mps"
else:
    device = "cpu"


config = AutoConfig.from_pretrained(
    cfg.config_name if cfg.config_name else cfg.model_name_or_path,
    num_labels=cfg.num_class,
)
tokenizer = AutoTokenizer.from_pretrained(
    cfg.tokenizer_name if cfg.tokenizer_name else cfg.model_name_or_path,
)

Dataset = ReadDataset(cfg.dataset, tokenizer, cfg.max_seq_length, cfg.transformer_type)

train_file = os.path.join(cfg.data_dir, cfg.train_file)
dev_file = os.path.join(cfg.data_dir, cfg.dev_file)
test_file = os.path.join(cfg.data_dir, cfg.test_file)
train_features = Dataset.read(train_file)
dev_features = Dataset.read(dev_file)
test_features = Dataset.read(test_file)

model = AutoModel.from_pretrained(
    cfg.model_name_or_path,
    from_tf=bool(".ckpt" in cfg.model_name_or_path),
    config=config,
)


config.cls_token_id = tokenizer.cls_token_id
config.sep_token_id = tokenizer.sep_token_id
config.transformer_type = cfg.transformer_type

set_seed(cfg)
model = DocREModel(config, cfg,  model, num_labels=cfg.num_labels)
if cfg.train_from_saved_model != '':
    model.load_state_dict(torch.load(cfg.train_from_saved_model)["checkpoint"])
    print("load saved model from {}.".format(cfg.train_from_saved_model))

#if torch.cuda.device_count() > 1:
#    print("Let's use", torch.cuda.device_count(), "GPUs!")
#    model = torch.nn.DataParallel(model, device_ids = list(range(torch.cuda.device_count())))
model.to(device)

train(cfg, model, train_features, dev_features, test_features)

load preprocessed data from /content/drive/MyDrive/data/nerel_docred_ver/train_annotated_filter_roberta_docred.pkl.
load preprocessed data from /content/drive/MyDrive/data/nerel_docred_ver/dev_roberta_docred.pkl.
load preprocessed data from /content/drive/MyDrive/data/nerel_docred_ver/test_roberta_docred.pkl.


Some weights of the model checkpoint at ai-forever/ruRoberta-large were not used when initializing RobertaModel: ['lm_head.bias', 'lm_head.dense.weight', 'lm_head.decoder.weight', 'lm_head.layer_norm.bias', 'lm_head.dense.bias', 'lm_head.layer_norm.weight', 'lm_head.decoder.bias']
- This IS expected if you are initializing RobertaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Total steps: 3650
Warmup steps: 219


199it [02:26,  1.54it/s]

| epoch  0 | step  100 | min/b  2.44 | lr [1.3698630136986302e-05, 4.5662100456621006e-05, 0.000136986301369863] | train loss 1.544


363it [04:28,  1.76it/s]

-----------------------------------------------------------------------------------------
| epoch   0 | time:  8.34s | dev_result:{'dev_F1': 10.040485829959515, 'dev_F1_ign': 9.956565603864153, 'dev_re_p': 34.31734317343174, 'dev_re_r': 5.880493202655707, 'dev_average_loss': 0.16144240869486587}
-----------------------------------------------------------------------------------------
| epoch   0 | best_f1:0.10040485829959514


364it [04:47,  6.37s/it]

| successfully save model at: /content/drive/MyDrive/Models/DeepKE_train/checkpoints/model_ru_roberta2.pt
-----------------------------------------------------------------------------------------


365it [04:48,  1.26it/s]
33it [00:25,  1.06it/s]

| epoch  1 | step  200 | min/b  0.43 | lr [2.7397260273972603e-05, 9.132420091324201e-05, 0.000273972602739726] | train loss 0.184


233it [02:53,  1.79it/s]

| epoch  1 | step  300 | min/b  2.46 | lr [2.929175167589624e-05, 9.763917225298747e-05, 0.0002929175167589624] | train loss 0.151


363it [04:29,  1.36it/s]

-----------------------------------------------------------------------------------------
| epoch   1 | time:  7.56s | dev_result:{'dev_F1': 20.631182289213378, 'dev_F1_ign': 20.435281317198623, 'dev_re_p': 40.443213296398895, 'dev_re_r': 13.847613025608599, 'dev_average_loss': 0.1325813894100646}
-----------------------------------------------------------------------------------------
| epoch   1 | best_f1:0.2063118228921338
| successfully save model at: /content/drive/MyDrive/Models/DeepKE_train/checkpoints/model_ru_roberta2.pt


364it [04:50,  6.70s/it]

-----------------------------------------------------------------------------------------


365it [04:51,  1.25it/s]
67it [00:50,  1.34it/s]

| epoch  2 | step  400 | min/b  0.85 | lr [2.841737102885456e-05, 9.472457009618187e-05, 0.00028417371028854557] | train loss 0.122


267it [03:18,  1.30it/s]

| epoch  2 | step  500 | min/b  2.46 | lr [2.7542990381812884e-05, 9.180996793937629e-05, 0.00027542990381812884] | train loss 0.119


363it [04:28,  1.81it/s]

-----------------------------------------------------------------------------------------
| epoch   2 | time:  7.54s | dev_result:{'dev_F1': 24.233716475095786, 'dev_F1_ign': 24.094418497094523, 'dev_re_p': 49.950641658440276, 'dev_re_r': 15.997470755611761, 'dev_average_loss': 0.11342079826491944}
-----------------------------------------------------------------------------------------
| epoch   2 | best_f1:0.24233716475095785
| successfully save model at: /content/drive/MyDrive/Models/DeepKE_train/checkpoints/model_ru_roberta2.pt


364it [04:48,  6.57s/it]

-----------------------------------------------------------------------------------------


365it [04:49,  1.26it/s]
101it [01:14,  1.35it/s]

| epoch  3 | step  600 | min/b  1.25 | lr [2.6668609734771204e-05, 8.889536578257068e-05, 0.000266686097347712] | train loss 0.100


301it [03:44,  1.56it/s]

| epoch  3 | step  700 | min/b  2.50 | lr [2.5794229087729524e-05, 8.598076362576508e-05, 0.0002579422908772952] | train loss 0.095


363it [04:30,  1.27it/s]

-----------------------------------------------------------------------------------------
| epoch   3 | time:  7.58s | dev_result:{'dev_F1': 36.59095644926026, 'dev_F1_ign': 36.21608138396909, 'dev_re_p': 53.66748166259169, 'dev_re_r': 27.75845716092317, 'dev_average_loss': 0.09945883578125467}
-----------------------------------------------------------------------------------------
| epoch   3 | best_f1:0.36590956449260265
| successfully save model at: /content/drive/MyDrive/Models/DeepKE_train/checkpoints/model_ru_roberta2.pt


364it [04:50,  6.65s/it]

-----------------------------------------------------------------------------------------


365it [04:51,  1.25it/s]
135it [01:46,  1.20it/s]

| epoch  4 | step  800 | min/b  1.77 | lr [2.4919848440687847e-05, 8.30661614689595e-05, 0.00024919848440687844] | train loss 0.080


335it [04:08,  1.41it/s]

| epoch  4 | step  900 | min/b  2.37 | lr [2.4045467793646167e-05, 8.01515593121539e-05, 0.00024045467793646163] | train loss 0.080


363it [04:29,  1.26it/s]

-----------------------------------------------------------------------------------------
| epoch   4 | time:  7.51s | dev_result:{'dev_F1': 42.027863777089784, 'dev_F1_ign': 41.534504554684396, 'dev_re_p': 54.1645885286783, 'dev_re_r': 34.33449257034461, 'dev_average_loss': 0.09532955415705417}
-----------------------------------------------------------------------------------------
| epoch   4 | best_f1:0.42027863777089786
| successfully save model at: /content/drive/MyDrive/Models/DeepKE_train/checkpoints/model_ru_roberta2.pt


364it [04:49,  6.59s/it]

-----------------------------------------------------------------------------------------


365it [04:50,  1.26it/s]
169it [02:06,  1.16it/s]

| epoch  5 | step 1000 | min/b  2.11 | lr [2.317108714660449e-05, 7.72369571553483e-05, 0.00023171087146604488] | train loss 0.069


363it [04:30,  1.28it/s]

-----------------------------------------------------------------------------------------


364it [04:38,  2.90s/it]

| epoch   5 | time:  7.53s | dev_result:{'dev_F1': 42.00836820083682, 'dev_F1_ign': 41.71615262737326, 'dev_re_p': 62.09029066171924, 'dev_re_r': 31.74201707239962, 'dev_average_loss': 0.09165363925251555}
-----------------------------------------------------------------------------------------


365it [04:39,  1.31it/s]
3it [00:01,  1.96it/s]

| epoch  6 | step 1100 | min/b  0.02 | lr [2.229670649956281e-05, 7.43223549985427e-05, 0.00022296706499562807] | train loss 0.066


203it [02:26,  1.22it/s]

| epoch  6 | step 1200 | min/b  2.41 | lr [2.1422325852521133e-05, 7.140775284173711e-05, 0.0002142232585252113] | train loss 0.061


363it [04:29,  1.26it/s]

-----------------------------------------------------------------------------------------
| epoch   6 | time:  7.50s | dev_result:{'dev_F1': 46.74624566807855, 'dev_F1_ign': 46.28049295700294, 'dev_re_p': 59.77351058591827, 'dev_re_r': 38.38128359152703, 'dev_average_loss': 0.08940465621491696}
-----------------------------------------------------------------------------------------
| epoch   6 | best_f1:0.4674624566807855
| successfully save model at: /content/drive/MyDrive/Models/DeepKE_train/checkpoints/model_ru_roberta2.pt


364it [04:49,  6.57s/it]

-----------------------------------------------------------------------------------------


365it [04:49,  1.26it/s]
37it [00:30,  1.10it/s]

| epoch  7 | step 1300 | min/b  0.50 | lr [2.0547945205479453e-05, 6.84931506849315e-05, 0.00020547945205479448] | train loss 0.054


237it [02:55,  1.25it/s]

| epoch  7 | step 1400 | min/b  2.42 | lr [1.9673564558437776e-05, 6.557854852812592e-05, 0.00019673564558437772] | train loss 0.051


363it [04:28,  1.29it/s]

-----------------------------------------------------------------------------------------
| epoch   7 | time:  8.70s | dev_result:{'dev_F1': 47.22168063950088, 'dev_F1_ign': 46.85582483571963, 'dev_re_p': 61.5971515768057, 'dev_re_r': 38.28643692696807, 'dev_average_loss': 0.09021336418834139}
-----------------------------------------------------------------------------------------
| epoch   7 | best_f1:0.4722168063950088
| successfully save model at: /content/drive/MyDrive/Models/DeepKE_train/checkpoints/model_ru_roberta2.pt


364it [04:50,  7.01s/it]

-----------------------------------------------------------------------------------------


365it [04:50,  1.25it/s]
71it [00:53,  1.06s/it]

| epoch  8 | step 1500 | min/b  0.90 | lr [1.8799183911396096e-05, 6.266394637132032e-05, 0.00018799183911396092] | train loss 0.049


271it [03:20,  1.35it/s]

| epoch  8 | step 1600 | min/b  2.44 | lr [1.7924803264354415e-05, 5.974934421451473e-05, 0.00017924803264354416] | train loss 0.044


363it [04:28,  1.09it/s]

-----------------------------------------------------------------------------------------
| epoch   8 | time:  7.56s | dev_result:{'dev_F1': 50.479117700235044, 'dev_F1_ign': 50.02202724446194, 'dev_re_p': 58.952702702702695, 'dev_re_r': 44.13531457477079, 'dev_average_loss': 0.09158211395620032}
-----------------------------------------------------------------------------------------
| epoch   8 | best_f1:0.5047911770023504
| successfully save model at: /content/drive/MyDrive/Models/DeepKE_train/checkpoints/model_ru_roberta2.pt
-----------------------------------------------------------------------------------------


365it [04:52,  1.25it/s]
105it [01:19,  1.23it/s]

| epoch  9 | step 1700 | min/b  1.32 | lr [1.7050422617312735e-05, 5.683474205770912e-05, 0.00017050422617312735] | train loss 0.042


305it [03:43,  1.46it/s]

| epoch  9 | step 1800 | min/b  2.41 | lr [1.617604197027106e-05, 5.392013990090353e-05, 0.00016176041970271057] | train loss 0.041


363it [04:27,  1.31it/s]

-----------------------------------------------------------------------------------------


364it [04:37,  3.36s/it]

| epoch   9 | time:  8.69s | dev_result:{'dev_F1': 49.73850315599639, 'dev_F1_ign': 49.176126034076354, 'dev_re_p': 57.892527287993275, 'dev_re_r': 43.59785014227, 'dev_average_loss': 0.09339467483632108}
-----------------------------------------------------------------------------------------


365it [04:38,  1.31it/s]
139it [01:43,  1.29it/s]

| epoch 10 | step 1900 | min/b  1.72 | lr [1.5301661323229378e-05, 5.100553774409793e-05, 0.00015301661323229376] | train loss 0.037


339it [04:09,  1.25it/s]

| epoch 10 | step 2000 | min/b  2.45 | lr [1.44272806761877e-05, 4.809093558729233e-05, 0.00014427280676187698] | train loss 0.036


363it [04:27,  1.70it/s]

-----------------------------------------------------------------------------------------
| epoch  10 | time:  7.60s | dev_result:{'dev_F1': 52.3306627822287, 'dev_F1_ign': 51.82131154123617, 'dev_re_p': 61.70030055817948, 'dev_re_r': 45.43155232374328, 'dev_average_loss': 0.09249015596318752}
-----------------------------------------------------------------------------------------
| epoch  10 | best_f1:0.523306627822287
| successfully save model at: /content/drive/MyDrive/Models/DeepKE_train/checkpoints/model_ru_roberta2.pt


364it [04:50,  7.53s/it]

-----------------------------------------------------------------------------------------


365it [04:51,  1.25it/s]
173it [02:11,  1.23it/s]

| epoch 11 | step 2100 | min/b  2.20 | lr [1.3552900029146021e-05, 4.517633343048674e-05, 0.0001355290002914602] | train loss 0.030


363it [04:29,  1.15it/s]

-----------------------------------------------------------------------------------------


364it [04:37,  2.91s/it]

| epoch  11 | time:  7.46s | dev_result:{'dev_F1': 49.47079576636612, 'dev_F1_ign': 49.0536874168269, 'dev_re_p': 65.08509541000515, 'dev_re_r': 39.89883022447044, 'dev_average_loss': 0.0967245489517425}
-----------------------------------------------------------------------------------------


365it [04:38,  1.31it/s]
7it [00:04,  1.65it/s]

| epoch 12 | step 2200 | min/b  0.08 | lr [1.2678519382104343e-05, 4.226173127368114e-05, 0.00012678519382104342] | train loss 0.030


207it [02:34,  1.29it/s]

| epoch 12 | step 2300 | min/b  2.51 | lr [1.1804138735062664e-05, 3.9347129116875545e-05, 0.00011804138735062662] | train loss 0.028


363it [04:29,  1.45it/s]

-----------------------------------------------------------------------------------------


364it [04:37,  2.93s/it]

| epoch  12 | time:  7.51s | dev_result:{'dev_F1': 51.75533840028954, 'dev_F1_ign': 51.2493610863367, 'dev_re_p': 60.516292848074485, 'dev_re_r': 45.21024343977237, 'dev_average_loss': 0.09697179385322205}
-----------------------------------------------------------------------------------------


365it [04:38,  1.31it/s]
41it [00:30,  1.35it/s]

| epoch 13 | step 2400 | min/b  0.50 | lr [1.0929758088020986e-05, 3.643252696006995e-05, 0.00010929758088020984] | train loss 0.027


241it [02:55,  1.47it/s]

| epoch 13 | step 2500 | min/b  2.43 | lr [1.0055377440979307e-05, 3.3517924803264355e-05, 0.00010055377440979306] | train loss 0.025


363it [04:28,  1.43it/s]

-----------------------------------------------------------------------------------------
| epoch  13 | time:  7.59s | dev_result:{'dev_F1': 52.48995282194653, 'dev_F1_ign': 51.92134738959847, 'dev_re_p': 58.67187499999999, 'dev_re_r': 47.48656338918748, 'dev_average_loss': 0.09791959909365532}
-----------------------------------------------------------------------------------------
| epoch  13 | best_f1:0.5248995282194653
| successfully save model at: /content/drive/MyDrive/Models/DeepKE_train/checkpoints/model_ru_roberta2.pt


364it [04:48,  6.59s/it]

-----------------------------------------------------------------------------------------


365it [04:49,  1.26it/s]
75it [00:54,  1.17it/s]

| epoch 14 | step 2600 | min/b  0.91 | lr [9.180996793937627e-06, 3.0603322646458756e-05, 9.180996793937626e-05] | train loss 0.023


275it [03:23,  1.19it/s]

| epoch 14 | step 2700 | min/b  2.48 | lr [8.306616146895948e-06, 2.7688720489653165e-05, 8.306616146895948e-05] | train loss 0.022


363it [04:28,  1.35it/s]

-----------------------------------------------------------------------------------------


364it [04:37,  2.97s/it]

| epoch  14 | time:  7.53s | dev_result:{'dev_F1': 51.60567587752054, 'dev_F1_ign': 51.148697997487105, 'dev_re_p': 63.01869585043319, 'dev_re_r': 43.69269680682896, 'dev_average_loss': 0.10028378047207569}
-----------------------------------------------------------------------------------------


365it [04:37,  1.31it/s]
109it [01:21,  1.57it/s]

| epoch 15 | step 2800 | min/b  1.35 | lr [7.43223549985427e-06, 2.4774118332847566e-05, 7.43223549985427e-05] | train loss 0.020


309it [03:51,  1.21it/s]

| epoch 15 | step 2900 | min/b  2.50 | lr [6.557854852812591e-06, 2.185951617604197e-05, 6.55785485281259e-05] | train loss 0.019


363it [04:27,  1.24it/s]

-----------------------------------------------------------------------------------------


364it [04:35,  3.03s/it]

| epoch  15 | time:  7.56s | dev_result:{'dev_F1': 51.545253863134654, 'dev_F1_ign': 51.08178460714902, 'dev_re_p': 61.63660360756709, 'dev_re_r': 44.29339234903573, 'dev_average_loss': 0.10383129405214432}
-----------------------------------------------------------------------------------------


365it [04:36,  1.32it/s]
143it [01:48,  1.24it/s]

| epoch 16 | step 3000 | min/b  1.80 | lr [5.683474205770913e-06, 1.8944914019236376e-05, 5.683474205770912e-05] | train loss 0.018


343it [04:15,  1.29it/s]

| epoch 16 | step 3100 | min/b  2.45 | lr [4.809093558729234e-06, 1.603031186243078e-05, 4.809093558729233e-05] | train loss 0.019


363it [04:29,  1.50it/s]

-----------------------------------------------------------------------------------------
| epoch  16 | time:  7.64s | dev_result:{'dev_F1': 52.72561531449408, 'dev_F1_ign': 52.204900858754364, 'dev_re_p': 62.273901808785524, 'dev_re_r': 45.71609231742017, 'dev_average_loss': 0.10287534017512139}
-----------------------------------------------------------------------------------------
| epoch  16 | best_f1:0.5272561531449408
| successfully save model at: /content/drive/MyDrive/Models/DeepKE_train/checkpoints/model_ru_roberta2.pt


364it [04:50,  6.68s/it]

-----------------------------------------------------------------------------------------


365it [04:50,  1.26it/s]
177it [02:12,  1.48it/s]

| epoch 17 | step 3200 | min/b  2.21 | lr [3.934712911687554e-06, 1.3115709705625181e-05, 3.934712911687554e-05] | train loss 0.017


363it [04:29,  1.65it/s]

-----------------------------------------------------------------------------------------
| epoch  17 | time:  7.48s | dev_result:{'dev_F1': 53.030852994555346, 'dev_F1_ign': 52.53501816445373, 'dev_re_p': 62.24968044311887, 'dev_re_r': 46.190325640214986, 'dev_average_loss': 0.10495920289070049}
-----------------------------------------------------------------------------------------
| epoch  17 | best_f1:0.5303085299455534
| successfully save model at: /content/drive/MyDrive/Models/DeepKE_train/checkpoints/model_ru_roberta2.pt


364it [04:48,  6.34s/it]

-----------------------------------------------------------------------------------------


365it [04:49,  1.26it/s]
11it [00:08,  1.30it/s]

| epoch 18 | step 3300 | min/b  0.14 | lr [3.060332264645876e-06, 1.0201107548819588e-05, 3.0603322646458756e-05] | train loss 0.016


211it [02:41,  1.22it/s]

| epoch 18 | step 3400 | min/b  2.55 | lr [2.185951617604197e-06, 7.286505392013991e-06, 2.185951617604197e-05] | train loss 0.015


363it [04:30,  1.42it/s]

-----------------------------------------------------------------------------------------
| epoch  18 | time:  7.53s | dev_result:{'dev_F1': 53.56887937187723, 'dev_F1_ign': 53.034872300048576, 'dev_re_p': 61.49119213437116, 'dev_re_r': 47.45494783433449, 'dev_average_loss': 0.1049495409143732}
-----------------------------------------------------------------------------------------
| epoch  18 | best_f1:0.5356887937187723
| successfully save model at: /content/drive/MyDrive/Models/DeepKE_train/checkpoints/model_ru_roberta2.pt


364it [04:50,  6.56s/it]

-----------------------------------------------------------------------------------------


365it [04:51,  1.25it/s]
45it [00:32,  1.10it/s]

| epoch 19 | step 3500 | min/b  0.54 | lr [1.3115709705625184e-06, 4.371903235208394e-06, 1.3115709705625181e-05] | train loss 0.016


245it [02:58,  1.52it/s]

| epoch 19 | step 3600 | min/b  2.44 | lr [4.371903235208394e-07, 1.4573010784027982e-06, 4.371903235208393e-06] | train loss 0.014


363it [04:29,  1.17it/s]

-----------------------------------------------------------------------------------------
| epoch  19 | time:  7.80s | dev_result:{'dev_F1': 53.72755305641211, 'dev_F1_ign': 53.21047837030704, 'dev_re_p': 63.02127659574468, 'dev_re_r': 46.822636737274735, 'dev_average_loss': 0.10549144724265058}
-----------------------------------------------------------------------------------------
| epoch  19 | best_f1:0.5372755305641211
| successfully save model at: /content/drive/MyDrive/Models/DeepKE_train/checkpoints/model_ru_roberta2.pt


364it [04:49,  6.80s/it]

-----------------------------------------------------------------------------------------


365it [04:50,  1.26it/s]


## ru bert large

In [None]:
class Config(object):
    adam_epsilon=1e-06
    bert_lr=3e-05
    channel_type='context-based'
    config_name=''
    data_dir='/content/drive/MyDrive/data/nerel_docred_ver'
    dataset='docred'
    dev_file='dev.json'
    down_dim=256
    evaluation_steps=-1
    gradient_accumulation_steps=2
    learning_rate=0.0003
    log_dir='/content/drive/MyDrive/Models/DeepKE_train/logs/train_ru_bertL.log'
    max_grad_norm=1.0
    max_height=103
    max_seq_length=512
    # model_name_or_path='xlm-roberta-base'
    model_name_or_path='ai-forever/ruBert-large'
    num_class=50
    num_labels=29
    num_train_epochs=15
    save_path='/content/drive/MyDrive/Models/DeepKE_train/checkpoints/model_ru_bertL.pt'
    seed=111
    test_batch_size=2
    test_file='test.json'
    tokenizer_name=''
    train_batch_size=2
    train_file='train_annotated_filter.json'
    train_from_saved_model=''
    transformer_type='bert'
    unet_in_dim=3
    unet_out_dim=256
    warmup_ratio=0.06
    load_path='/content/drive/MyDrive/Models/DeepKE_train/checkpoints/model_ru_bertL.pt'

cfg = Config()

In [None]:
# if not os.path.exists(os.path.join(cfg.data_dir, "train_distant.json")):
#     raise FileNotFoundError("Sorry, the file: 'train_annotated.json' is too big to upload to github, \
#         please manually download to 'data/' from DocRED GoogleDrive https://drive.google.com/drive/folders/1c5-0YwnoJx8NS6CV2f-NoTHR__BdkNqw")


if torch.cuda.is_available():
    device = "cuda:0"
# elif torch.backends.mps.is_available():
#     device = "mps"
else:
    device = "cpu"


config = AutoConfig.from_pretrained(
    cfg.config_name if cfg.config_name else cfg.model_name_or_path,
    num_labels=cfg.num_class,
)
tokenizer = AutoTokenizer.from_pretrained(
    cfg.tokenizer_name if cfg.tokenizer_name else cfg.model_name_or_path,
)

Dataset = ReadDataset(cfg.dataset, tokenizer, cfg.max_seq_length, cfg.transformer_type)

train_file = os.path.join(cfg.data_dir, cfg.train_file)
dev_file = os.path.join(cfg.data_dir, cfg.dev_file)
test_file = os.path.join(cfg.data_dir, cfg.test_file)
train_features = Dataset.read(train_file)
dev_features = Dataset.read(dev_file)
test_features = Dataset.read(test_file)

model = AutoModel.from_pretrained(
    cfg.model_name_or_path,
    from_tf=bool(".ckpt" in cfg.model_name_or_path),
    config=config,
)


config.cls_token_id = tokenizer.cls_token_id
config.sep_token_id = tokenizer.sep_token_id
config.transformer_type = cfg.transformer_type

set_seed(cfg)
model = DocREModel(config, cfg,  model, num_labels=cfg.num_labels)
if cfg.train_from_saved_model != '':
    model.load_state_dict(torch.load(cfg.train_from_saved_model)["checkpoint"])
    print("load saved model from {}.".format(cfg.train_from_saved_model))

#if torch.cuda.device_count() > 1:
#    print("Let's use", torch.cuda.device_count(), "GPUs!")
#    model = torch.nn.DataParallel(model, device_ids = list(range(torch.cuda.device_count())))
model.to(device)

train(cfg, model, train_features, dev_features, test_features)

load preprocessed data from /content/drive/MyDrive/data/nerel_docred_ver/train_annotated_filter_bert_docred.pkl.
load preprocessed data from /content/drive/MyDrive/data/nerel_docred_ver/dev_bert_docred.pkl.
load preprocessed data from /content/drive/MyDrive/data/nerel_docred_ver/test_bert_docred.pkl.


Some weights of the model checkpoint at ai-forever/ruBert-large were not used when initializing BertModel: ['cls.predictions.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.decoder.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Total steps: 2737
Warmup steps: 164


199it [02:03,  2.00it/s]

| epoch  0 | step  100 | min/b  2.06 | lr [1.8292682926829268e-05, 6.097560975609756e-05, 0.00018292682926829266] | train loss 1.736


363it [03:44,  2.24it/s]

-----------------------------------------------------------------------------------------
| epoch   0 | time:  5.99s | dev_result:{'dev_F1': 14.226754066521002, 'dev_F1_ign': 14.078428561475691, 'dev_re_p': 30.648535564853557, 'dev_re_r': 9.263357571925388, 'dev_average_loss': 0.17467387527861494}
-----------------------------------------------------------------------------------------
| epoch   0 | best_f1:0.14226754066521002
| successfully save model at: /content/drive/MyDrive/Models/DeepKE_train/checkpoints/model_ru_bertL.pt
-----------------------------------------------------------------------------------------


365it [04:48,  1.26it/s]
33it [00:21,  1.28it/s]

| epoch  1 | step  200 | min/b  0.35 | lr [2.9580256509910614e-05, 9.860085503303538e-05, 0.0002958025650991061] | train loss 0.197


233it [02:23,  2.18it/s]

| epoch  1 | step  300 | min/b  2.04 | lr [2.8414302370773418e-05, 9.47143412359114e-05, 0.0002841430237077342] | train loss 0.162


363it [03:43,  1.67it/s]

-----------------------------------------------------------------------------------------
| epoch   1 | time:  6.96s | dev_result:{'dev_F1': 16.334864726901476, 'dev_F1_ign': 16.26450946393917, 'dev_re_p': 42.384105960264904, 'dev_re_r': 10.116977552956055, 'dev_average_loss': 0.13162397149395436}
-----------------------------------------------------------------------------------------
| epoch   1 | best_f1:0.16334864726901477
| successfully save model at: /content/drive/MyDrive/Models/DeepKE_train/checkpoints/model_ru_bertL.pt


364it [04:06,  7.36s/it]

-----------------------------------------------------------------------------------------


365it [04:07,  1.48it/s]
67it [00:42,  1.56it/s]

| epoch  2 | step  400 | min/b  0.71 | lr [2.7248348231636226e-05, 9.082782743878741e-05, 0.0002724834823163622] | train loss 0.135


267it [02:44,  1.57it/s]

| epoch  2 | step  500 | min/b  2.04 | lr [2.608239409249903e-05, 8.694131364166343e-05, 0.00026082394092499025] | train loss 0.134


363it [03:42,  2.20it/s]

-----------------------------------------------------------------------------------------
| epoch   2 | time:  6.01s | dev_result:{'dev_F1': 24.494485294117645, 'dev_F1_ign': 24.39529209319269, 'dev_re_p': 44.827586206896555, 'dev_re_r': 16.851090736642426, 'dev_average_loss': 0.122829499555395}
-----------------------------------------------------------------------------------------
| epoch   2 | best_f1:0.24494485294117646
| successfully save model at: /content/drive/MyDrive/Models/DeepKE_train/checkpoints/model_ru_bertL.pt


364it [04:05,  7.27s/it]

-----------------------------------------------------------------------------------------


365it [04:05,  1.48it/s]
101it [01:02,  1.68it/s]

| epoch  3 | step  600 | min/b  1.04 | lr [2.4916439953361834e-05, 8.305479984453946e-05, 0.00024916439953361834] | train loss 0.116


301it [03:04,  1.90it/s]

| epoch  3 | step  700 | min/b  2.04 | lr [2.3750485814224642e-05, 7.916828604741548e-05, 0.0002375048581422464] | train loss 0.115


363it [03:44,  1.46it/s]

-----------------------------------------------------------------------------------------
| epoch   3 | time:  6.05s | dev_result:{'dev_F1': 30.321864594894564, 'dev_F1_ign': 30.09833562353737, 'dev_re_p': 50.894187779433686, 'dev_re_r': 21.593423964590578, 'dev_average_loss': 0.10980742147311251}
-----------------------------------------------------------------------------------------
| epoch   3 | best_f1:0.30321864594894565
| successfully save model at: /content/drive/MyDrive/Models/DeepKE_train/checkpoints/model_ru_bertL.pt


364it [04:07,  7.34s/it]

-----------------------------------------------------------------------------------------


365it [04:07,  1.47it/s]
135it [01:28,  1.45it/s]

| epoch  4 | step  800 | min/b  1.47 | lr [2.2584531675087446e-05, 7.528177225029149e-05, 0.00022584531675087446] | train loss 0.099


335it [03:25,  1.77it/s]

| epoch  4 | step  900 | min/b  1.96 | lr [2.1418577535950254e-05, 7.139525845316751e-05, 0.0002141857753595025] | train loss 0.101


363it [03:43,  1.50it/s]

-----------------------------------------------------------------------------------------
| epoch   4 | time:  5.92s | dev_result:{'dev_F1': 35.459587955626, 'dev_F1_ign': 35.00141694122964, 'dev_re_p': 47.48010610079575, 'dev_re_r': 28.295921593423966, 'dev_average_loss': 0.10734710628364949}
-----------------------------------------------------------------------------------------
| epoch   4 | best_f1:0.35459587955626
| successfully save model at: /content/drive/MyDrive/Models/DeepKE_train/checkpoints/model_ru_bertL.pt


364it [04:04,  6.72s/it]

-----------------------------------------------------------------------------------------


365it [04:04,  1.49it/s]
169it [01:44,  1.43it/s]

| epoch  5 | step 1000 | min/b  1.74 | lr [2.0252623396813058e-05, 6.750874465604354e-05, 0.00020252623396813056] | train loss 0.089


363it [03:44,  1.55it/s]

-----------------------------------------------------------------------------------------


364it [03:50,  2.32s/it]

| epoch   5 | time:  5.94s | dev_result:{'dev_F1': 33.79325944356481, 'dev_F1_ign': 33.562153809504004, 'dev_re_p': 59.37996820349761, 'dev_re_r': 23.61681947518179, 'dev_average_loss': 0.09776581546410601}
-----------------------------------------------------------------------------------------


365it [03:51,  1.58it/s]
3it [00:01,  2.49it/s]

| epoch  6 | step 1100 | min/b  0.02 | lr [1.9086669257675866e-05, 6.362223085891955e-05, 0.00019086669257675862] | train loss 0.084


203it [02:00,  1.55it/s]

| epoch  6 | step 1200 | min/b  1.99 | lr [1.792071511853867e-05, 5.973571706179557e-05, 0.00017920715118538669] | train loss 0.079


363it [03:42,  1.49it/s]

-----------------------------------------------------------------------------------------
| epoch   6 | time:  6.00s | dev_result:{'dev_F1': 39.30242450021268, 'dev_F1_ign': 39.02884935274212, 'dev_re_p': 60.03898635477582, 'dev_re_r': 29.21277268416061, 'dev_average_loss': 0.0973612239703219}
-----------------------------------------------------------------------------------------
| epoch   6 | best_f1:0.3930242450021268
| successfully save model at: /content/drive/MyDrive/Models/DeepKE_train/checkpoints/model_ru_bertL.pt


364it [04:06,  7.72s/it]

-----------------------------------------------------------------------------------------


365it [04:06,  1.48it/s]
38it [00:25,  1.73it/s]

| epoch  7 | step 1300 | min/b  0.42 | lr [1.6754760979401478e-05, 5.584920326467159e-05, 0.00016754760979401475] | train loss 0.072


237it [02:27,  1.49it/s]

| epoch  7 | step 1400 | min/b  2.04 | lr [1.5588806840264282e-05, 5.196268946754761e-05, 0.0001558880684026428] | train loss 0.067


363it [03:44,  1.59it/s]

-----------------------------------------------------------------------------------------
| epoch   7 | time:  5.99s | dev_result:{'dev_F1': 44.76971116315379, 'dev_F1_ign': 44.40760491385618, 'dev_re_p': 58.490566037735846, 'dev_re_r': 36.26304141637686, 'dev_average_loss': 0.09407690674700636}
-----------------------------------------------------------------------------------------
| epoch   7 | best_f1:0.44769711163153786
| successfully save model at: /content/drive/MyDrive/Models/DeepKE_train/checkpoints/model_ru_bertL.pt


364it [04:06,  6.98s/it]

-----------------------------------------------------------------------------------------


365it [04:07,  1.48it/s]
71it [00:44,  1.15it/s]

| epoch  8 | step 1500 | min/b  0.75 | lr [1.442285270112709e-05, 4.8076175670423634e-05, 0.00014422852701127087] | train loss 0.064


271it [02:46,  1.67it/s]

| epoch  8 | step 1600 | min/b  2.03 | lr [1.3256898561989895e-05, 4.418966187329965e-05, 0.00013256898561989894] | train loss 0.058


363it [03:43,  1.31it/s]

-----------------------------------------------------------------------------------------
| epoch   8 | time:  5.96s | dev_result:{'dev_F1': 45.33081285444234, 'dev_F1_ign': 44.886800398481164, 'dev_re_p': 56.37047484720264, 'dev_re_r': 37.90705026873222, 'dev_average_loss': 0.0958379843926176}
-----------------------------------------------------------------------------------------
| epoch   8 | best_f1:0.4533081285444234
| successfully save model at: /content/drive/MyDrive/Models/DeepKE_train/checkpoints/model_ru_bertL.pt


364it [04:05,  7.34s/it]

-----------------------------------------------------------------------------------------


365it [04:06,  1.48it/s]
105it [01:06,  1.49it/s]

| epoch  9 | step 1700 | min/b  1.11 | lr [1.2090944422852701e-05, 4.0303148076175674e-05, 0.000120909444228527] | train loss 0.054


305it [03:06,  1.76it/s]

| epoch  9 | step 1800 | min/b  2.01 | lr [1.0924990283715507e-05, 3.641663427905169e-05, 0.00010924990283715506] | train loss 0.053


363it [03:43,  1.59it/s]

-----------------------------------------------------------------------------------------
| epoch   9 | time:  5.98s | dev_result:{'dev_F1': 47.38955823293173, 'dev_F1_ign': 46.93025217009616, 'dev_re_p': 56.06911447084233, 'dev_re_r': 41.036990199178, 'dev_average_loss': 0.09506675648562452}
-----------------------------------------------------------------------------------------
| epoch   9 | best_f1:0.47389558232931733
| successfully save model at: /content/drive/MyDrive/Models/DeepKE_train/checkpoints/model_ru_bertL.pt


364it [04:06,  7.30s/it]

-----------------------------------------------------------------------------------------


365it [04:07,  1.48it/s]
139it [01:26,  1.59it/s]

| epoch 10 | step 1900 | min/b  1.43 | lr [9.759036144578313e-06, 3.253012048192771e-05, 9.759036144578313e-05] | train loss 0.047


339it [03:28,  1.49it/s]

| epoch 10 | step 2000 | min/b  2.04 | lr [8.59308200544112e-06, 2.864360668480373e-05, 8.593082005441118e-05] | train loss 0.045


363it [03:42,  2.16it/s]

-----------------------------------------------------------------------------------------
| epoch  10 | time:  6.95s | dev_result:{'dev_F1': 49.899026987332476, 'dev_F1_ign': 49.38146127544446, 'dev_re_p': 59.50087565674256, 'dev_re_r': 42.96553904521024, 'dev_average_loss': 0.09270291648646618}
-----------------------------------------------------------------------------------------
| epoch  10 | best_f1:0.49899026987332473
| successfully save model at: /content/drive/MyDrive/Models/DeepKE_train/checkpoints/model_ru_bertL.pt


364it [04:05,  7.35s/it]

-----------------------------------------------------------------------------------------


365it [04:06,  1.48it/s]
173it [01:49,  1.49it/s]

| epoch 11 | step 2100 | min/b  1.82 | lr [7.427127866303926e-06, 2.4757092887679753e-05, 7.427127866303925e-05] | train loss 0.039


363it [03:43,  1.41it/s]

-----------------------------------------------------------------------------------------
| epoch  11 | time:  5.88s | dev_result:{'dev_F1': 51.25815470643057, 'dev_F1_ign': 50.80000848999008, 'dev_re_p': 62.44323342415985, 'dev_re_r': 43.47138792285804, 'dev_average_loss': 0.09176979158469971}
-----------------------------------------------------------------------------------------
| epoch  11 | best_f1:0.5125815470643057
| successfully save model at: /content/drive/MyDrive/Models/DeepKE_train/checkpoints/model_ru_bertL.pt


364it [04:05,  7.16s/it]

-----------------------------------------------------------------------------------------


365it [04:06,  1.48it/s]
7it [00:03,  1.97it/s]

| epoch 12 | step 2200 | min/b  0.06 | lr [6.261173727166731e-06, 2.0870579090555773e-05, 6.261173727166732e-05] | train loss 0.037


207it [02:08,  1.57it/s]

| epoch 12 | step 2300 | min/b  2.07 | lr [5.095219588029537e-06, 1.6984065293431793e-05, 5.095219588029537e-05] | train loss 0.035


363it [03:43,  1.78it/s]

-----------------------------------------------------------------------------------------
| epoch  12 | time:  6.00s | dev_result:{'dev_F1': 52.63534036090762, 'dev_F1_ign': 52.08765338802692, 'dev_re_p': 60.51766639276911, 'dev_re_r': 46.56971229845084, 'dev_average_loss': 0.0924683133972452}
-----------------------------------------------------------------------------------------
| epoch  12 | best_f1:0.5263534036090762
| successfully save model at: /content/drive/MyDrive/Models/DeepKE_train/checkpoints/model_ru_bertL.pt


364it [04:06,  7.30s/it]

-----------------------------------------------------------------------------------------


365it [04:07,  1.47it/s]
41it [00:25,  1.61it/s]

| epoch 13 | step 2400 | min/b  0.42 | lr [3.929265448892344e-06, 1.3097551496307814e-05, 3.9292654488923434e-05] | train loss 0.034


241it [02:26,  1.74it/s]

| epoch 13 | step 2500 | min/b  2.02 | lr [2.76331130975515e-06, 9.211037699183832e-06, 2.7633113097551497e-05] | train loss 0.032


363it [03:43,  1.76it/s]

-----------------------------------------------------------------------------------------


364it [03:49,  2.36s/it]

| epoch  13 | time:  6.00s | dev_result:{'dev_F1': 52.46800731261426, 'dev_F1_ign': 51.98519721252901, 'dev_re_p': 62.20199393151279, 'dev_re_r': 45.3683212140373, 'dev_average_loss': 0.0931826355926534}
-----------------------------------------------------------------------------------------


365it [03:50,  1.59it/s]
75it [00:44,  1.42it/s]

| epoch 14 | step 2600 | min/b  0.75 | lr [1.5973571706179556e-06, 5.324523902059852e-06, 1.5973571706179557e-05] | train loss 0.030


275it [02:48,  1.40it/s]

| epoch 14 | step 2700 | min/b  2.06 | lr [4.3140303148076176e-07, 1.4380101049358726e-06, 4.314030314807617e-06] | train loss 0.029


363it [03:42,  1.64it/s]

-----------------------------------------------------------------------------------------
| epoch  14 | time:  6.04s | dev_result:{'dev_F1': 52.647058823529406, 'dev_F1_ign': 52.15498102099616, 'dev_re_p': 62.88976723759332, 'dev_re_r': 45.273474549478344, 'dev_average_loss': 0.09339573162984341}
-----------------------------------------------------------------------------------------
| epoch  14 | best_f1:0.526470588235294
| successfully save model at: /content/drive/MyDrive/Models/DeepKE_train/checkpoints/model_ru_bertL.pt


364it [04:05,  7.32s/it]

-----------------------------------------------------------------------------------------


365it [04:06,  1.48it/s]


In [None]:
from google.colab import runtime
runtime.unassign()

## ru bert base

In [None]:
class Config(object):
    adam_epsilon=1e-06
    bert_lr=3e-05
    channel_type='context-based'
    config_name=''
    data_dir='/content/drive/MyDrive/data/nerel_docred_ver'
    dataset='docred'
    dev_file='dev.json'
    down_dim=256
    evaluation_steps=-1
    gradient_accumulation_steps=2
    learning_rate=0.0003
    log_dir='/content/drive/MyDrive/Models/DeepKE_train/logs/train_ru_bertB.log'
    max_grad_norm=1.0
    max_height=103
    max_seq_length=512
    # model_name_or_path='xlm-roberta-base'
    model_name_or_path='ai-forever/ruBert-base'
    num_class=50
    num_labels=29
    num_train_epochs=15
    save_path='/content/drive/MyDrive/Models/DeepKE_train/checkpoints/model_ru_bertB.pt'
    seed=111
    test_batch_size=2
    test_file='test.json'
    tokenizer_name=''
    train_batch_size=2
    train_file='train_annotated_filter.json'
    train_from_saved_model=''
    transformer_type='bert'
    unet_in_dim=3
    unet_out_dim=256
    warmup_ratio=0.06
    load_path='/content/drive/MyDrive/Models/DeepKE_train/checkpoints/model_ru_bertB.pt'

cfg = Config()

In [None]:
# if not os.path.exists(os.path.join(cfg.data_dir, "train_distant.json")):
#     raise FileNotFoundError("Sorry, the file: 'train_annotated.json' is too big to upload to github, \
#         please manually download to 'data/' from DocRED GoogleDrive https://drive.google.com/drive/folders/1c5-0YwnoJx8NS6CV2f-NoTHR__BdkNqw")


if torch.cuda.is_available():
    device = "cuda:0"
# elif torch.backends.mps.is_available():
#     device = "mps"
else:
    device = "cpu"


config = AutoConfig.from_pretrained(
    cfg.config_name if cfg.config_name else cfg.model_name_or_path,
    num_labels=cfg.num_class,
)
tokenizer = AutoTokenizer.from_pretrained(
    cfg.tokenizer_name if cfg.tokenizer_name else cfg.model_name_or_path,
)

Dataset = ReadDataset(cfg.dataset, tokenizer, cfg.max_seq_length, cfg.transformer_type)

train_file = os.path.join(cfg.data_dir, cfg.train_file)
dev_file = os.path.join(cfg.data_dir, cfg.dev_file)
test_file = os.path.join(cfg.data_dir, cfg.test_file)
train_features = Dataset.read(train_file)
dev_features = Dataset.read(dev_file)
test_features = Dataset.read(test_file)

model = AutoModel.from_pretrained(
    cfg.model_name_or_path,
    from_tf=bool(".ckpt" in cfg.model_name_or_path),
    config=config,
)


config.cls_token_id = tokenizer.cls_token_id
config.sep_token_id = tokenizer.sep_token_id
config.transformer_type = cfg.transformer_type

set_seed(cfg)
model = DocREModel(config, cfg,  model, num_labels=cfg.num_labels)
if cfg.train_from_saved_model != '':
    model.load_state_dict(torch.load(cfg.train_from_saved_model)["checkpoint"])
    print("load saved model from {}.".format(cfg.train_from_saved_model))

#if torch.cuda.device_count() > 1:
#    print("Let's use", torch.cuda.device_count(), "GPUs!")
#    model = torch.nn.DataParallel(model, device_ids = list(range(torch.cuda.device_count())))
model.to(device)

train(cfg, model, train_features, dev_features, test_features)

load preprocessed data from /content/drive/MyDrive/data/nerel_docred_ver/train_annotated_filter_bert_docred.pkl.
load preprocessed data from /content/drive/MyDrive/data/nerel_docred_ver/dev_bert_docred.pkl.
load preprocessed data from /content/drive/MyDrive/data/nerel_docred_ver/test_bert_docred.pkl.


Some weights of the model checkpoint at ai-forever/ruBert-base were not used when initializing BertModel: ['cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.bias', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.decoder.weight', 'cls.predictions.decoder.bias', 'cls.seq_relationship.weight']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Total steps: 2737
Warmup steps: 164


200it [01:41,  3.03it/s]

| epoch  0 | step  100 | min/b  1.70 | lr [1.8292682926829268e-05, 6.097560975609756e-05, 0.00018292682926829266] | train loss 1.729


363it [03:05,  2.82it/s]

-----------------------------------------------------------------------------------------
| epoch   0 | time:  4.74s | dev_result:{'dev_F1': 16.273972602739725, 'dev_F1_ign': 16.199838115389937, 'dev_re_p': 60.98562628336756, 'dev_re_r': 9.389819791337338, 'dev_average_loss': 0.14960071238431524}
-----------------------------------------------------------------------------------------
| epoch   0 | best_f1:0.16273972602739725


364it [03:18,  4.19s/it]

| successfully save model at: /content/drive/MyDrive/Models/DeepKE_train/checkpoints/model_ru_bertB.pt
-----------------------------------------------------------------------------------------


365it [03:18,  1.84it/s]
33it [00:17,  1.53it/s]

| epoch  1 | step  200 | min/b  0.29 | lr [2.9580256509910614e-05, 9.860085503303538e-05, 0.0002958025650991061] | train loss 0.204


233it [01:57,  2.91it/s]

| epoch  1 | step  300 | min/b  1.68 | lr [2.8414302370773418e-05, 9.47143412359114e-05, 0.0002841430237077342] | train loss 0.126


363it [03:03,  2.07it/s]

-----------------------------------------------------------------------------------------
| epoch   1 | time:  4.51s | dev_result:{'dev_F1': 32.41106719367588, 'dev_F1_ign': 32.15081230884837, 'dev_re_p': 61.24780316344464, 'dev_re_r': 22.036041732532404, 'dev_average_loss': 0.10003781271107653}
-----------------------------------------------------------------------------------------
| epoch   1 | best_f1:0.32411067193675885
| successfully save model at: /content/drive/MyDrive/Models/DeepKE_train/checkpoints/model_ru_bertB.pt
-----------------------------------------------------------------------------------------


365it [03:14,  1.88it/s]
67it [00:36,  1.89it/s]

| epoch  2 | step  400 | min/b  0.60 | lr [2.7248348231636226e-05, 9.082782743878741e-05, 0.0002724834823163622] | train loss 0.100


267it [02:16,  1.94it/s]

| epoch  2 | step  500 | min/b  1.68 | lr [2.608239409249903e-05, 8.694131364166343e-05, 0.00026082394092499025] | train loss 0.095


363it [03:04,  2.95it/s]

-----------------------------------------------------------------------------------------
| epoch   2 | time:  4.47s | dev_result:{'dev_F1': 43.57523302263648, 'dev_F1_ign': 42.880594779576526, 'dev_re_p': 46.01054481546573, 'dev_re_r': 41.38476130256086, 'dev_average_loss': 0.09643397924113781}
-----------------------------------------------------------------------------------------
| epoch   2 | best_f1:0.43575233022636484
| successfully save model at: /content/drive/MyDrive/Models/DeepKE_train/checkpoints/model_ru_bertB.pt


364it [03:16,  3.27s/it]

-----------------------------------------------------------------------------------------


365it [03:16,  1.86it/s]
101it [00:51,  2.08it/s]

| epoch  3 | step  600 | min/b  0.86 | lr [2.4916439953361834e-05, 8.305479984453946e-05, 0.00024916439953361834] | train loss 0.082


301it [02:32,  2.43it/s]

| epoch  3 | step  700 | min/b  1.67 | lr [2.3750485814224642e-05, 7.916828604741548e-05, 0.0002375048581422464] | train loss 0.077


363it [03:03,  1.74it/s]

-----------------------------------------------------------------------------------------
| epoch   3 | time:  5.45s | dev_result:{'dev_F1': 44.417377182298004, 'dev_F1_ign': 44.02729316117645, 'dev_re_p': 62.05331820760068, 'dev_re_r': 34.58741700916851, 'dev_average_loss': 0.08371392906980311}
-----------------------------------------------------------------------------------------
| epoch   3 | best_f1:0.44417377182298007
| successfully save model at: /content/drive/MyDrive/Models/DeepKE_train/checkpoints/model_ru_bertB.pt


364it [03:16,  4.23s/it]

-----------------------------------------------------------------------------------------


365it [03:16,  1.86it/s]
135it [01:13,  1.79it/s]

| epoch  4 | step  800 | min/b  1.22 | lr [2.2584531675087446e-05, 7.528177225029149e-05, 0.00022584531675087446] | train loss 0.068


335it [02:48,  2.23it/s]

| epoch  4 | step  900 | min/b  1.60 | lr [2.1418577535950254e-05, 7.139525845316751e-05, 0.0002141857753595025] | train loss 0.070


363it [03:03,  1.87it/s]

-----------------------------------------------------------------------------------------
| epoch   4 | time:  4.49s | dev_result:{'dev_F1': 48.704268292682926, 'dev_F1_ign': 48.22227516061032, 'dev_re_p': 61.294964028776974, 'dev_re_r': 40.40467910211824, 'dev_average_loss': 0.08316165042367388}
-----------------------------------------------------------------------------------------
| epoch   4 | best_f1:0.4870426829268293
| successfully save model at: /content/drive/MyDrive/Models/DeepKE_train/checkpoints/model_ru_bertB.pt


364it [03:13,  3.48s/it]

-----------------------------------------------------------------------------------------


365it [03:14,  1.88it/s]
169it [01:26,  1.75it/s]

| epoch  5 | step 1000 | min/b  1.44 | lr [2.0252623396813058e-05, 6.750874465604354e-05, 0.00020252623396813056] | train loss 0.060


363it [03:03,  1.92it/s]

-----------------------------------------------------------------------------------------
| epoch   5 | time:  5.38s | dev_result:{'dev_F1': 49.3728847302409, 'dev_F1_ign': 49.000051322370645, 'dev_re_p': 66.66666666666666, 'dev_re_r': 39.203288017704715, 'dev_average_loss': 0.08083177381690512}
-----------------------------------------------------------------------------------------
| epoch   5 | best_f1:0.49372884730240896
| successfully save model at: /content/drive/MyDrive/Models/DeepKE_train/checkpoints/model_ru_bertB.pt


364it [03:17,  4.44s/it]

-----------------------------------------------------------------------------------------


365it [03:17,  1.85it/s]
3it [00:00,  3.51it/s]

| epoch  6 | step 1100 | min/b  0.01 | lr [1.9086669257675866e-05, 6.362223085891955e-05, 0.00019086669257675862] | train loss 0.058


203it [01:39,  1.91it/s]

| epoch  6 | step 1200 | min/b  1.65 | lr [1.792071511853867e-05, 5.973571706179557e-05, 0.00017920715118538669] | train loss 0.056


363it [03:03,  1.82it/s]

-----------------------------------------------------------------------------------------
| epoch   6 | time:  4.46s | dev_result:{'dev_F1': 52.83932228635263, 'dev_F1_ign': 52.40368866436841, 'dev_re_p': 64.2663043478261, 'dev_re_r': 44.8624723363895, 'dev_average_loss': 0.08222898278147617}
-----------------------------------------------------------------------------------------
| epoch   6 | best_f1:0.5283932228635263
| successfully save model at: /content/drive/MyDrive/Models/DeepKE_train/checkpoints/model_ru_bertB.pt


364it [03:16,  4.27s/it]

-----------------------------------------------------------------------------------------


365it [03:17,  1.85it/s]
38it [00:21,  2.08it/s]

| epoch  7 | step 1300 | min/b  0.35 | lr [1.6754760979401478e-05, 5.584920326467159e-05, 0.00016754760979401475] | train loss 0.051


237it [01:59,  1.83it/s]

| epoch  7 | step 1400 | min/b  1.65 | lr [1.5588806840264282e-05, 5.196268946754761e-05, 0.0001558880684026428] | train loss 0.049


363it [03:03,  2.02it/s]

-----------------------------------------------------------------------------------------
| epoch   7 | time:  4.51s | dev_result:{'dev_F1': 53.28241498454265, 'dev_F1_ign': 52.80567781855391, 'dev_re_p': 62.714041095890416, 'dev_re_r': 46.316787859626935, 'dev_average_loss': 0.08239634057625811}
-----------------------------------------------------------------------------------------
| epoch   7 | best_f1:0.5328241498454265
| successfully save model at: /content/drive/MyDrive/Models/DeepKE_train/checkpoints/model_ru_bertB.pt


364it [03:16,  4.22s/it]

-----------------------------------------------------------------------------------------


365it [03:17,  1.85it/s]
71it [00:37,  1.33it/s]

| epoch  8 | step 1500 | min/b  0.62 | lr [1.442285270112709e-05, 4.8076175670423634e-05, 0.00014422852701127087] | train loss 0.048


271it [02:17,  2.09it/s]

| epoch  8 | step 1600 | min/b  1.67 | lr [1.3256898561989895e-05, 4.418966187329965e-05, 0.00013256898561989894] | train loss 0.043


363it [03:03,  1.57it/s]

-----------------------------------------------------------------------------------------


364it [03:08,  1.87s/it]

| epoch   8 | time:  4.44s | dev_result:{'dev_F1': 53.05519660328595, 'dev_F1_ign': 52.56917685324384, 'dev_re_p': 63.753327417923686, 'dev_re_r': 45.43155232374328, 'dev_average_loss': 0.08734311702403616}
-----------------------------------------------------------------------------------------


365it [03:08,  1.93it/s]
105it [00:53,  1.85it/s]

| epoch  9 | step 1700 | min/b  0.89 | lr [1.2090944422852701e-05, 4.0303148076175674e-05, 0.000120909444228527] | train loss 0.041


305it [02:33,  2.09it/s]

| epoch  9 | step 1800 | min/b  1.66 | lr [1.0924990283715507e-05, 3.641663427905169e-05, 0.00010924990283715506] | train loss 0.041


363it [03:03,  1.92it/s]

-----------------------------------------------------------------------------------------
| epoch   9 | time:  4.43s | dev_result:{'dev_F1': 53.34346504559271, 'dev_F1_ign': 52.908926455398074, 'dev_re_p': 66.82532127558306, 'dev_re_r': 44.388239013594685, 'dev_average_loss': 0.09006904905780833}
-----------------------------------------------------------------------------------------
| epoch   9 | best_f1:0.5334346504559271
| successfully save model at: /content/drive/MyDrive/Models/DeepKE_train/checkpoints/model_ru_bertB.pt


364it [03:16,  4.21s/it]

-----------------------------------------------------------------------------------------


365it [03:17,  1.85it/s]
139it [01:10,  1.98it/s]

| epoch 10 | step 1900 | min/b  1.18 | lr [9.759036144578313e-06, 3.253012048192771e-05, 9.759036144578313e-05] | train loss 0.037


339it [02:51,  1.80it/s]

| epoch 10 | step 2000 | min/b  1.68 | lr [8.59308200544112e-06, 2.864360668480373e-05, 8.593082005441118e-05] | train loss 0.037


363it [03:03,  2.78it/s]

-----------------------------------------------------------------------------------------
| epoch  10 | time:  4.51s | dev_result:{'dev_F1': 54.26874536005939, 'dev_F1_ign': 53.78680934367207, 'dev_re_p': 65.70786516853933, 'dev_re_r': 46.22194119506797, 'dev_average_loss': 0.0902667614690801}
-----------------------------------------------------------------------------------------
| epoch  10 | best_f1:0.5426874536005939
| successfully save model at: /content/drive/MyDrive/Models/DeepKE_train/checkpoints/model_ru_bertB.pt


364it [03:16,  4.31s/it]

-----------------------------------------------------------------------------------------


365it [03:17,  1.85it/s]
173it [01:30,  1.79it/s]

| epoch 11 | step 2100 | min/b  1.50 | lr [7.427127866303926e-06, 2.4757092887679753e-05, 7.427127866303925e-05] | train loss 0.034


363it [03:04,  1.73it/s]

-----------------------------------------------------------------------------------------


364it [03:09,  1.77s/it]

| epoch  11 | time:  4.38s | dev_result:{'dev_F1': 54.0946656649136, 'dev_F1_ign': 53.633335436891784, 'dev_re_p': 66.63581675150392, 'dev_re_r': 45.52639898830225, 'dev_average_loss': 0.09356865278900937}
-----------------------------------------------------------------------------------------


365it [03:09,  1.92it/s]
7it [00:03,  2.56it/s]

| epoch 12 | step 2200 | min/b  0.05 | lr [6.261173727166731e-06, 2.0870579090555773e-05, 6.261173727166732e-05] | train loss 0.033


207it [01:44,  1.90it/s]

| epoch 12 | step 2300 | min/b  1.70 | lr [5.095219588029537e-06, 1.6984065293431793e-05, 5.095219588029537e-05] | train loss 0.033


363it [03:03,  2.24it/s]

-----------------------------------------------------------------------------------------


364it [03:08,  1.82s/it]

| epoch  12 | time:  4.52s | dev_result:{'dev_F1': 54.15569763068822, 'dev_F1_ign': 53.685885839581424, 'dev_re_p': 66.8213457076566, 'dev_re_r': 45.52639898830225, 'dev_average_loss': 0.09157118026880508}
-----------------------------------------------------------------------------------------


365it [03:08,  1.93it/s]
41it [00:20,  1.93it/s]

| epoch 13 | step 2400 | min/b  0.35 | lr [3.929265448892344e-06, 1.3097551496307814e-05, 3.9292654488923434e-05] | train loss 0.032


241it [02:01,  2.14it/s]

| epoch 13 | step 2500 | min/b  1.67 | lr [2.76331130975515e-06, 9.211037699183832e-06, 2.7633113097551497e-05] | train loss 0.030


363it [03:04,  2.21it/s]

-----------------------------------------------------------------------------------------


364it [03:09,  1.81s/it]

| epoch  13 | time:  4.50s | dev_result:{'dev_F1': 54.20183486238531, 'dev_F1_ign': 53.6901719471389, 'dev_re_p': 64.58242238740708, 'dev_re_r': 46.69617451786279, 'dev_average_loss': 0.09494627314679166}
-----------------------------------------------------------------------------------------


365it [03:10,  1.92it/s]
75it [00:36,  1.74it/s]

| epoch 14 | step 2600 | min/b  0.62 | lr [1.5973571706179556e-06, 5.324523902059852e-06, 1.5973571706179557e-05] | train loss 0.030


275it [02:18,  1.72it/s]

| epoch 14 | step 2700 | min/b  1.68 | lr [4.3140303148076176e-07, 1.4380101049358726e-06, 4.314030314807617e-06] | train loss 0.028


363it [03:02,  2.02it/s]

-----------------------------------------------------------------------------------------
| epoch  14 | time:  4.56s | dev_result:{'dev_F1': 54.39970171513796, 'dev_F1_ign': 53.926676781644325, 'dev_re_p': 66.28805088596093, 'dev_re_r': 46.12709453050901, 'dev_average_loss': 0.09713644264860356}
-----------------------------------------------------------------------------------------
| epoch  14 | best_f1:0.5439970171513796
| successfully save model at: /content/drive/MyDrive/Models/DeepKE_train/checkpoints/model_ru_bertB.pt


364it [03:13,  3.49s/it]

-----------------------------------------------------------------------------------------


365it [03:14,  1.88it/s]


### xlm roberta

In [None]:
class Config(object):
    adam_epsilon=1e-06
    bert_lr=3e-05
    channel_type='context-based'
    config_name=''
    data_dir='/content/drive/MyDrive/data/nerel_docred_ver'
    dataset='docred'
    dev_file='dev.json'
    down_dim=256
    evaluation_steps=-1
    gradient_accumulation_steps=2
    learning_rate=0.0004
    log_dir='/content/drive/MyDrive/Models/DeepKE_train/logs/model_xlm_roberta.log'
    max_grad_norm=1.0
    max_height=103
    max_seq_length=1024
    # model_name_or_path='xlm-roberta-base'
    model_name_or_path='xlm-roberta-large'
    num_class=50
    num_labels=29
    num_train_epochs=20
    save_path='/content/drive/MyDrive/Models/DeepKE_train/checkpoints/model_xlm_roberta.pt'
    seed=111
    test_batch_size=2
    test_file='test.json'
    tokenizer_name=''
    train_batch_size=2
    train_file='train_annotated_filter.json'
    train_from_saved_model=''
    transformer_type='roberta'
    unet_in_dim=3
    unet_out_dim=256
    warmup_ratio=0.06
    load_path='/content/drive/MyDrive/Models/DeepKE_train/checkpoints/model_xlm_roberta.pt'

cfg = Config()

In [None]:
# if not os.path.exists(os.path.join(cfg.data_dir, "train_distant.json")):
#     raise FileNotFoundError("Sorry, the file: 'train_annotated.json' is too big to upload to github, \
#         please manually download to 'data/' from DocRED GoogleDrive https://drive.google.com/drive/folders/1c5-0YwnoJx8NS6CV2f-NoTHR__BdkNqw")


if torch.cuda.is_available():
    device = "cuda:0"
# elif torch.backends.mps.is_available():
#     device = "mps"
else:
    device = "cpu"


config = AutoConfig.from_pretrained(
    cfg.config_name if cfg.config_name else cfg.model_name_or_path,
    num_labels=cfg.num_class,
)
tokenizer = AutoTokenizer.from_pretrained(
    cfg.tokenizer_name if cfg.tokenizer_name else cfg.model_name_or_path,
)

Dataset = ReadDataset(cfg.dataset, tokenizer, cfg.max_seq_length, cfg.transformer_type)

train_file = os.path.join(cfg.data_dir, cfg.train_file)
dev_file = os.path.join(cfg.data_dir, cfg.dev_file)
test_file = os.path.join(cfg.data_dir, cfg.test_file)
train_features = Dataset.read(train_file)
dev_features = Dataset.read(dev_file)
test_features = Dataset.read(test_file)

model = AutoModel.from_pretrained(
    cfg.model_name_or_path,
    from_tf=bool(".ckpt" in cfg.model_name_or_path),
    config=config,
)


config.cls_token_id = tokenizer.cls_token_id
config.sep_token_id = tokenizer.sep_token_id
config.transformer_type = cfg.transformer_type

set_seed(cfg)
model = DocREModel(config, cfg,  model, num_labels=cfg.num_labels)
if cfg.train_from_saved_model != '':
    model.load_state_dict(torch.load(cfg.train_from_saved_model)["checkpoint"])
    print("load saved model from {}.".format(cfg.train_from_saved_model))

#if torch.cuda.device_count() > 1:
#    print("Let's use", torch.cuda.device_count(), "GPUs!")
#    model = torch.nn.DataParallel(model, device_ids = list(range(torch.cuda.device_count())))
model.to(device)

train(cfg, model, train_features, dev_features, test_features)

 of documents 730.
 of positive examples 23353.
 of negative examples 1169361.
 352 examples len>512 and max len is 1060.
finish reading /content/drive/MyDrive/data/nerel_docred_ver/train_annotated_filter.json and save preprocessed data to /content/drive/MyDrive/data/nerel_docred_ver/train_annotated_filter_roberta_docred.pkl.
Example: 100%|██████████| 94/94 [00:06<00:00, 13.79it/s]
 of documents 94.
 of positive examples 3143.
 of negative examples 170057.
 44 examples len>512 and max len is 1295.
finish reading /content/drive/MyDrive/data/nerel_docred_ver/dev.json and save preprocessed data to /content/drive/MyDrive/data/nerel_docred_ver/dev_roberta_docred.pkl.
Example: 100%|██████████| 93/93 [00:08<00:00, 11.22it/s]
 of documents 93.
 of positive examples 3259.
 of negative examples 186208.
 52 examples len>512 and max len is 3101.

In [None]:
from google.colab import runtime
runtime.unassign()

### Model Prediction

In [None]:
def report(args, model, features):

    if torch.cuda.is_available():
        device = "cuda:0"
    # elif torch.backends.mps.is_available():
    #     device = "mps"
    else:
        device = "cpu"

    dataloader = DataLoader(features, batch_size=args.test_batch_size, shuffle=False, collate_fn=collate_fn, drop_last=False)
    preds = []
    for batch in dataloader:
        model.eval()

        inputs = {'input_ids': batch[0].to(device),
                  'attention_mask': batch[1].to(device),
                  'entity_pos': batch[3],
                  'hts': batch[4],
                  }

        with torch.no_grad():
            pred = model(**inputs)
            pred = pred.cpu().numpy()
            pred[np.isnan(pred)] = 0
            preds.append(pred)

    preds = np.concatenate(preds, axis=0).astype(np.float32)
    preds = to_official(args, preds, features)
    return preds

model.load_state_dict(torch.load(cfg.load_path)['checkpoint'])
T_features = test_features  # Testing on the test set
#T_score, T_output = evaluate(cfg, model, T_features, tag="test")
pred = report(cfg, model, T_features)
with open("./result.json", "w") as fh:
    json.dump(pred, fh)