## Distant Supervision for Relation Extraction via Piecewise Convolutional Neural Networks EMNLP 2015

# 1前言

### 1.1课程回顾

<img src='imgs/overall_pcnn.png' width="800" height="800" align="bottom">

### 1.2 模型结构

<img src="./imgs/model.png"  width="600" height="600" align="bottom" />

### 1.3 代码结构

<img src="./imgs/dir.png"  width="300" height="300" align="bottom" />

# 2 准备工作
### 2.1项目环境配置

* Python3.8
* jupyter notebook
* torch            1.6.0+cu10.2
* numpy            1.18.5

代码运行环境建议使用Visual Studio Code(VScode)

### 2.2 数据集下载

数据集下载地址链接：https://pan.baidu.com/s/1BaBYvvxWO8IwTSi-GEqUaA <br>
提取码：0d23

# 3 项目代码结构（VScode中演示）

>1）是什么？

　　我们首先会在VScode环境中让代码跑一下，直观感受到项目的训练，并展示前向推断的输出，让大家看到模型的效果。
>2）怎么构成的？

　　然后介绍项目代码的构成，介绍项目有哪些文件夹，包含哪些文件，这些文件构成了什么功能模块如：数据预处理模块，模型设计模块，损失函数模块，推断与评估模块。
>3）小结

　　在主文件中在过一下启动训练的流程。

# 4 算法模块及细节（jupyter和VScode中演示）

　　在jupyter notebook中细致地讲解每一个模块。
  
　　以实现模块功能为目的，来讲解每个函数的执行流程，呈现中间数据，方便同学们理解学习。
  
　　内容分为以下几个模块：**超参数设置，数据读取与处理，模型定义，模型训练，模型评价**。

In [1]:
import os
import models
import dataset
import torch
import numpy as np
import torch.nn as nn
from torch.utils.data import DataLoader
import torch.optim as optim
import torch.nn.functional as F
from utils import save_pr, now, eval_metric

### 4.1 超参数设置

In [2]:
data_dic ={
    'NYT': {
        'data_root': './dataset/NYT/',
        'w2v_path': './dataset/NYT/w2v.npy',
        'p1_2v_path': './dataset/NYT/p1_2v.npy',
        'p2_2v_path': './dataset/NYT/p2_2v.npy',
        'vocab_size': 114043,
        'rel_num': 53
    },
    'FilterNYT': {
        'data_root': './dataset/FilterNYT/',
        'w2v_path': './dataset/FilterNYT/w2v.npy',
        'p1_2v_path': './dataset/FilterNYT/p1_2v.npy',
        'p2_2v_path': './dataset/FilterNYT/p2_2v.npy',
        'vocab_size': 160695 + 2,
        'rel_num': 27
    }
}

In [3]:
class DefaultConfig(object):

    model = 'PCNN_ONE'  # the name of used model, in  <models/__init__.py>
    data = 'FilterNYT'  # SEM NYT FilterNYT

    result_dir = './out'
    data_root = data_dic[data]['data_root']  # the data dir
    w2v_path = data_dic[data]['w2v_path']
    p1_2v_path = data_dic[data]['p1_2v_path']
    p2_2v_path = data_dic[data]['p2_2v_path']
    load_model_path = 'checkpoints/model.pth'  # the trained model

    seed = 3435
    batch_size = 128  # batch size
    use_gpu = True  # user GPU or not
    gpu_id = 1
    num_workers = 0  # how many workers for loading data

    max_len = 80 + 2  # max_len for each sentence + two padding
    limit = 50  # the position range <-limit, limit>

    vocab_size = data_dic[data]['vocab_size']  # vocab + UNK + BLANK
    rel_num = data_dic[data]['rel_num']
    word_dim = 50
    pos_dim = 5
    pos_size = limit * 2 + 2

    norm_emb=True

    num_epochs = 16  # the number of epochs for training
    drop_out = 0.5
    lr = 0.0003  # initial learning rate
    lr_decay = 0.95  # when val_loss increase, lr = lr*lr_decay
    weight_decay = 0.0001  # optimizer parameter

    # Conv
    filters = [3]
    filters_num = 230
    sen_feature_dim = filters_num

    rel_dim = filters_num * len(filters)
    rel_filters_num = 100

    print_opt = 'DEF'
    use_pcnn=True


In [4]:
def parse(self, kwargs):
    '''
    user can update the default hyperparamter
    '''
    for k, v in kwargs.items():
        if not hasattr(self, k):
            raise Exception('opt has No key: {}'.format(k))
        setattr(self, k, v)
    data_list = ['data_root', 'w2v_path', 'rel_num', 'vocab_size', 'p1_2v_path', 'p2_2v_path']
    for r in data_list:
        setattr(self, r, data_dic[self.data][r])

    print('*************************************************')
    print('user config:')
    for k, v in self.__class__.__dict__.items():
        if not k.startswith('__'):
            print("{} => {}".format(k, getattr(self, k)))

    print('*************************************************')

In [8]:
DefaultConfig.parse = parse

In [10]:
opt = DefaultConfig()

In [16]:
k = {}

In [17]:
opt.parse(k)

*************************************************
user config:
model => PCNN_ONE
data => FilterNYT
result_dir => ./out
data_root => ./dataset/FilterNYT/
w2v_path => ./dataset/FilterNYT/w2v.npy
p1_2v_path => ./dataset/FilterNYT/p1_2v.npy
p2_2v_path => ./dataset/FilterNYT/p2_2v.npy
load_model_path => checkpoints/model.pth
seed => 3435
batch_size => 128
use_gpu => True
gpu_id => 1
num_workers => 0
max_len => 82
limit => 50
vocab_size => 160697
rel_num => 27
word_dim => 50
pos_dim => 5
pos_size => 102
norm_emb => True
num_epochs => 16
drop_out => 0.5
lr => 0.0003
lr_decay => 0.95
weight_decay => 0.0001
filters => [3]
filters_num => 230
sen_feature_dim => 230
rel_dim => 230
rel_filters_num => 100
print_opt => DEF
use_pcnn => True
parse => <bound method parse of <__main__.DefaultConfig object at 0x7f55ccf07340>>
*************************************************


### 4.2 数据读取与处理
* 数据处理细节
* 构建dataset类

#### 4.2.1数据处理细节

##### 载入原始数据集并且预处理

In [56]:
w2v_path = os.path.join(opt.data_root, 'vector.txt')
word_path = os.path.join(opt.data_root, 'dict.txt')
train_path = os.path.join(opt.data_root, 'train', 'train.txt')
test_path = os.path.join(opt.data_root, 'test', 'test.txt')

载入word2vector

In [57]:
wordlist = []
vecs = []

In [58]:
wordlist.append('BLANK')

In [59]:
wlist = [word.strip('\n') for word in open(word_path)]

In [60]:
wlist

[',',
 'the',
 '.',
 'of',
 'to',
 'a',
 'and',
 "''",
 'in',
 'that',
 "'s",
 'for',
 'is',
 'The',
 'said',
 'on',
 'was',
 'with',
 'at',
 'he',
 'Mr.',
 'it',
 'as',
 'by',
 'his',
 'from',
 'be',
 'are',
 'have',
 'not',
 'I',
 'an',
 'has',
 'who',
 '$',
 ':',
 'had',
 'they',
 '``',
 'or',
 'their',
 'would',
 '-RRB-',
 '-LRB-',
 'were',
 'will',
 'but',
 'this',
 '--',
 'about',
 'more',
 'which',
 'one',
 'been',
 'its',
 'But',
 ';',
 'In',
 'It',
 "n't",
 'He',
 'her',
 'than',
 'you',
 'when',
 'up',
 'out',
 'all',
 'she',
 'do',
 'two',
 'we',
 'like',
 'can',
 'years',
 'other',
 'last',
 'A',
 'also',
 'there',
 'year',
 'into',
 'people',
 "'",
 'new',
 'some',
 'first',
 'them',
 'after',
 'what',
 'time',
 'could',
 'no',
 'so',
 'over',
 'only',
 'if',
 'most',
 '?',
 'him',
 'percent',
 'did',
 'because',
 'million',
 'We',
 'many',
 'now',
 'And',
 'New_York',
 'just',
 'Ms.',
 'American',
 'company',
 'where',
 'made',
 'through',
 'They',
 'three',
 'before',
 '

In [61]:
wordlist.extend(wlist)

In [63]:
fw2v = open(w2v_path)

In [64]:
fw2v = [line for line in fw2v]

In [67]:
line1 = fw2v[0].strip('\n').split()
line1

['-0.221001',
 '1.358701',
 '-0.747914',
 '0.870552',
 '1.417721',
 '0.009870',
 '-0.614740',
 '1.618770',
 '0.648751',
 '0.390447',
 '0.478752',
 '-0.383416',
 '1.196540',
 '-0.644879',
 '-0.421608',
 '-0.923521',
 '0.361792',
 '-0.859179',
 '0.224276',
 '1.084490',
 '0.834912',
 '-0.257614',
 '-0.248996',
 '-1.236610',
 '1.638060',
 '0.720808',
 '1.066176',
 '-0.369189',
 '1.228255',
 '-0.155706',
 '0.748299',
 '-0.069667',
 '1.141663',
 '-0.488700',
 '2.208251',
 '-0.090331',
 '1.176398',
 '-0.632925',
 '0.334784',
 '-1.695715',
 '0.873845',
 '-0.801254',
 '-0.000435',
 '-0.600301',
 '2.363796',
 '-0.249681',
 '0.473764',
 '0.503697',
 '0.690691',
 '-0.513487']

In [68]:
list(map(float, line1))

[-0.221001,
 1.358701,
 -0.747914,
 0.870552,
 1.417721,
 0.00987,
 -0.61474,
 1.61877,
 0.648751,
 0.390447,
 0.478752,
 -0.383416,
 1.19654,
 -0.644879,
 -0.421608,
 -0.923521,
 0.361792,
 -0.859179,
 0.224276,
 1.08449,
 0.834912,
 -0.257614,
 -0.248996,
 -1.23661,
 1.63806,
 0.720808,
 1.066176,
 -0.369189,
 1.228255,
 -0.155706,
 0.748299,
 -0.069667,
 1.141663,
 -0.4887,
 2.208251,
 -0.090331,
 1.176398,
 -0.632925,
 0.334784,
 -1.695715,
 0.873845,
 -0.801254,
 -0.000435,
 -0.600301,
 2.363796,
 -0.249681,
 0.473764,
 0.503697,
 0.690691,
 -0.513487]

In [69]:
for line in open(w2v_path):
    line = line.strip('\n').split()
    vec = list(map(float, line))
    vecs.append(vec)

In [75]:
len(vecs)

160695

In [76]:
dim = len(vecs[0])
dim

50

In [77]:
vecs.insert(0, np.zeros(dim))

In [78]:
wordlist.append('UNK')

In [83]:
wordlist[-1]

'UNK'

In [85]:
vecs.append(np.random.uniform(low=-1.0, high=1.0, size=dim))

In [86]:
word2id = {j: i for i, j in enumerate(wordlist)}
id2word = {i: j for i, j in enumerate(wordlist)}

In [87]:
vecs = np.array(vecs, dtype=np.float32)

初始化position2vector

In [89]:
limit = 50
pos_dim = 5

In [90]:
pos1_vec = np.asarray(np.random.uniform(low=-1.0, high=1.0, size=(limit * 2 + 1, pos_dim)), dtype=np.float32)
pos1_vec = np.vstack((np.zeros((1, pos_dim)), pos1_vec))

In [91]:
pos2_vec = np.asarray(np.random.uniform(low=-1.0, high=1.0, size=(limit * 2 + 1, pos_dim)), dtype=np.float32)
pos2_vec = np.vstack((np.zeros((1, pos_dim)), pos2_vec))

In [93]:
pos1_vec.shape

(102, 5)

保存

In [94]:
np.save(os.path.join(opt.data_root, 'w2v.npy'), vecs)
np.save(os.path.join(opt.data_root, 'p1_2v.npy'), pos1_vec)
np.save(os.path.join(opt.data_root, 'p2_2v.npy'), pos2_vec)

预处理训练集

In [95]:
all_sens =[]
all_labels =[]

In [96]:
f = open(train_path)

In [97]:
line = f.readline()

In [98]:
line

'1122 53041\n'

In [100]:
entities = list(map(int, line.split(' ')))
entities

[1122, 53041]

In [101]:
line = f.readline()
line

'1 -1 -1 -1 10\n'

In [102]:
bagLabel = line.split(' ')
bagLabel

['1', '-1', '-1', '-1', '10\n']

In [103]:
rel = list(map(int, bagLabel[0:-1]))
rel

[1, -1, -1, -1]

In [105]:
num = int(bagLabel[-1])   #句子个数
num

10

In [125]:
positions = []
sentences = []
entitiesPos = []
masks = []

In [111]:
sent = f.readline().strip().split(' ')
sent

['15',
 '13',
 '16842',
 '125741',
 '1',
 '2',
 '619',
 '4',
 '16297',
 '1005',
 '125741',
 '7',
 '3320',
 '125741',
 '4',
 '53041',
 '1',
 '1122',
 '1',
 '17',
 '1067',
 '173',
 '1144',
 '5',
 '10199',
 '968',
 '1421',
 '114169',
 '1',
 '6',
 '440',
 '4',
 '12045',
 '1354',
 '114169',
 '7',
 '2554',
 '15237',
 '4130',
 '737',
 '114169',
 '4',
 '17304',
 '1',
 '1278',
 '3']

In [121]:
sentence = list(map(lambda x: id2word[int(x)], sent[2:]))
sentence

['Rosemary',
 'Antonelle',
 ',',
 'the',
 'daughter',
 'of',
 'Teresa',
 'L.',
 'Antonelle',
 'and',
 'Patrick',
 'Antonelle',
 'of',
 'Belle_Harbor',
 ',',
 'Queens',
 ',',
 'was',
 'married',
 'yesterday',
 'afternoon',
 'to',
 'Lt.',
 'Thomas',
 'Joseph',
 'Quast',
 ',',
 'a',
 'son',
 'of',
 'Peggy',
 'B.',
 'Quast',
 'and',
 'Vice',
 'Adm.',
 'Philip',
 'M.',
 'Quast',
 'of',
 'Carmel',
 ',',
 'Calif.',
 '.']

In [109]:
positions.append(list(map(int, sent[0:2])))
positions

[[16, 14]]

In [115]:
epos = list(map(lambda x: int(x) + 1, sent[0:2]))
epos

[16, 14]

In [117]:
epos.sort()
epos

[14, 16]

In [118]:
mask = [1] * (epos[0] + 1)
mask

[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]

In [119]:
mask += [2] * (epos[1] - epos[0])
mask

[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2]

In [120]:
mask += [3] * (len(sent[2:-1]) - epos[1])
mask

[1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 2,
 2,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3]

In [126]:
entitiesPos.append(epos)
entitiesPos

[[14, 16]]

In [127]:
sentences.append(list(map(int, sent[2:-1])))
sentences

[[16842,
  125741,
  1,
  2,
  619,
  4,
  16297,
  1005,
  125741,
  7,
  3320,
  125741,
  4,
  53041,
  1,
  1122,
  1,
  17,
  1067,
  173,
  1144,
  5,
  10199,
  968,
  1421,
  114169,
  1,
  6,
  440,
  4,
  12045,
  1354,
  114169,
  7,
  2554,
  15237,
  4130,
  737,
  114169,
  4,
  17304,
  1,
  1278]]

In [128]:
masks.append(mask)
masks

[[1,
  1,
  1,
  1,
  1,
  1,
  1,
  1,
  1,
  1,
  1,
  1,
  1,
  1,
  1,
  2,
  2,
  3,
  3,
  3,
  3,
  3,
  3,
  3,
  3,
  3,
  3,
  3,
  3,
  3,
  3,
  3,
  3,
  3,
  3,
  3,
  3,
  3,
  3,
  3,
  3,
  3,
  3,
  3]]

##### 载入预处理后的数据集

In [48]:
path = os.path.join(opt.data_root, 'train/')

In [None]:
data = np.load(path + 'bags_feature.npy', allow_pickle=True)

In [53]:
data[0][0]

[1122, 53041]

In [54]:
labels = np.load(path + 'labels.npy')

In [55]:
labels

array([[ 1, -1, -1, -1],
       [ 3, -1, -1, -1],
       [ 5,  2, -1, -1],
       ...,
       [ 0, -1, -1, -1],
       [ 0, -1, -1, -1],
       [ 0, -1, -1, -1]])

#### 4.2.2构建dataset类

In [29]:
DataModel = getattr(dataset, opt.data + 'Data')

In [31]:
DataModel

dataset.filternyt.FilterNYTData

In [34]:
from torch.utils.data import Dataset

In [35]:
class FilterNYTData(Dataset):

    def __init__(self, root_path, train=True):
        if train:
            path = os.path.join(root_path, 'train/')
            print('loading train data')
        else:
            path = os.path.join(root_path, 'test/')
            print('loading test data')

        self.labels = np.load(path + 'labels.npy')
        self.data = np.load(path + 'bags_feature.npy', allow_pickle=True)

        print('loading finish')

    def __getitem__(self, idx):
        assert idx < len(self.data)
        return self.data[idx], self.labels[idx]

    def __len__(self):
        return len(self.data)

In [32]:
train_data = DataModel(opt.data_root, train=True)

loading train data
loading finish


In [38]:
def collate_fn(batch):
    data, label = zip(*batch)
    return data, label

In [39]:
train_data_loader = DataLoader(train_data, opt.batch_size, shuffle=True, num_workers=opt.num_workers, collate_fn=collate_fn)

In [41]:
test_data = DataModel(opt.data_root, train=False)
test_data_loader = DataLoader(test_data, batch_size=opt.batch_size, shuffle=False, num_workers=opt.num_workers, collate_fn=collate_fn)
print('train data: {}; test data: {}'.format(len(train_data), len(test_data)))

loading test data
loading finish
train data: 65726; test data: 93574


### 4.3 模型定义

In [25]:
import time

In [21]:
if opt.use_gpu:
    torch.cuda.set_device(opt.gpu_id)

In [22]:
model = getattr(models, 'PCNN_ONE')(opt)

In [24]:
class BasicModule(torch.nn.Module):
    '''
    封装了nn.Module,主要是提供了save和load两个方法
    '''

    def __init__(self):
        super(BasicModule, self).__init__()
        self.model_name=str(type(self))  # model name

    def load(self, path):
        '''
        可加载指定路径的模型
        '''
        self.load_state_dict(torch.load(path))

    def save(self, name=None):
        '''
        保存模型，默认使用“模型名字+时间”作为文件名
        '''
        prefix = 'checkpoints/'
        if name is None:
            name = prefix + self.model_name + '_'
            name = time.strftime(name + '%m%d_%H:%M:%S.pth')
        else:
            name = prefix + self.model_name + '_' + str(name)+ '.pth'
        torch.save(self.state_dict(), name)
        return name

In [26]:
class PCNN_ONE(BasicModule):
    '''
    Zeng 2015 DS PCNN
    '''
    def __init__(self, opt):
        super(PCNN_ONE, self).__init__()

        self.opt = opt

        self.model_name = 'PCNN_ONE'

        self.word_embs = nn.Embedding(self.opt.vocab_size, self.opt.word_dim)
        self.pos1_embs = nn.Embedding(self.opt.pos_size, self.opt.pos_dim)
        self.pos2_embs = nn.Embedding(self.opt.pos_size, self.opt.pos_dim)

        feature_dim = self.opt.word_dim + self.opt.pos_dim * 2

        # for more filter size
        self.convs = nn.ModuleList([nn.Conv2d(1, self.opt.filters_num, (k, feature_dim), padding=(int(k / 2), 0)) for k in self.opt.filters])

        all_filter_num = self.opt.filters_num * len(self.opt.filters)

        if self.opt.use_pcnn:
            all_filter_num = all_filter_num * 3
            masks = torch.FloatTensor(([[0, 0, 0], [1, 0, 0], [0, 1, 0], [0, 0, 1]]))
            if self.opt.use_gpu:
                masks = masks.cuda()
            self.mask_embedding = nn.Embedding(4, 3)
            self.mask_embedding.weight.data.copy_(masks)
            self.mask_embedding.weight.requires_grad = False

        self.linear = nn.Linear(all_filter_num, self.opt.rel_num)
        self.dropout = nn.Dropout(self.opt.drop_out)

        self.init_model_weight()
        self.init_word_emb()

    def init_model_weight(self):
        '''
        use xavier to init
        '''
        for conv in self.convs:
            nn.init.xavier_uniform_(conv.weight)
            nn.init.constant_(conv.bias, 0.0)

        nn.init.xavier_uniform_(self.linear.weight)
        nn.init.constant_(self.linear.bias, 0.0)

    def init_word_emb(self):

        def p_2norm(path):
            v = torch.from_numpy(np.load(path))
            if self.opt.norm_emb:
                v = torch.div(v, v.norm(2, 1).unsqueeze(1))
                v[v != v] = 0.0
            return v

        w2v = p_2norm(self.opt.w2v_path)
        p1_2v = p_2norm(self.opt.p1_2v_path)
        p2_2v = p_2norm(self.opt.p2_2v_path)

        if self.opt.use_gpu:
            self.word_embs.weight.data.copy_(w2v.cuda())
            self.pos1_embs.weight.data.copy_(p1_2v.cuda())
            self.pos2_embs.weight.data.copy_(p2_2v.cuda())
        else:
            self.pos1_embs.weight.data.copy_(p1_2v)
            self.pos2_embs.weight.data.copy_(p2_2v)
            self.word_embs.weight.data.copy_(w2v)

    def mask_piece_pooling(self, x, mask):
        '''
        refer: https://github.com/thunlp/OpenNRE
        A fast piecewise pooling using mask
        '''
        x = x.unsqueeze(-1).permute(0, 2, 1, -1)
        masks = self.mask_embedding(mask).unsqueeze(-2) * 100
        x = masks.float() + x
        x = torch.max(x, 1)[0] - torch.FloatTensor([100]).cuda()
        x = x.view(-1, x.size(1) * x.size(2))
        return x

    def piece_max_pooling(self, x, insPool):
        '''
        old version piecewise
        '''
        split_batch_x = torch.split(x, 1, 0)
        split_pool = torch.split(insPool, 1, 0)
        batch_res = []
        for i in range(len(split_pool)):
            ins = split_batch_x[i].squeeze()  # all_filter_num * max_len
            pool = split_pool[i].squeeze().data    # 2
            seg_1 = ins[:, :pool[0]].max(1)[0].unsqueeze(1)          # all_filter_num * 1
            seg_2 = ins[:, pool[0]: pool[1]].max(1)[0].unsqueeze(1)  # all_filter_num * 1
            seg_3 = ins[:, pool[1]:].max(1)[0].unsqueeze(1)
            piece_max_pool = torch.cat([seg_1, seg_2, seg_3], 1).view(1, -1)    # 1 * 3all_filter_num
            batch_res.append(piece_max_pool)

        out = torch.cat(batch_res, 0)
        assert out.size(1) == 3 * self.opt.filters_num
        return out

    def forward(self, x, train=False):

        insEnt, _, insX, insPFs, insPool, insMasks = x
        insPF1, insPF2 = [i.squeeze(1) for i in torch.split(insPFs, 1, 1)]

        word_emb = self.word_embs(insX)
        pf1_emb = self.pos1_embs(insPF1)
        pf2_emb = self.pos2_embs(insPF2)

        x = torch.cat([word_emb, pf1_emb, pf2_emb], 2)
        x = x.unsqueeze(1)
        x = self.dropout(x)

        x = [conv(x).squeeze(3) for conv in self.convs]
        if self.opt.use_pcnn:
            x = [self.mask_piece_pooling(i, insMasks) for i in x]
            # x = [self.piece_max_pooling(i, insPool) for i in x]
        else:
            x = [F.max_pool1d(i, i.size(2)).squeeze(2) for i in x]
        x = torch.cat(x, 1).tanh()
        x = self.dropout(x)
        x = self.linear(x)

        return x

In [28]:
if opt.use_gpu:
    model.cuda()

In [142]:
model

PCNN_ONE(
  (word_embs): Embedding(160697, 50)
  (pos1_embs): Embedding(102, 5)
  (pos2_embs): Embedding(102, 5)
  (convs): ModuleList(
    (0): Conv2d(1, 230, kernel_size=(3, 60), stride=(1, 1), padding=(1, 0))
  )
  (mask_embedding): Embedding(4, 3)
  (linear): Linear(in_features=690, out_features=27, bias=True)
  (dropout): Dropout(p=0.5, inplace=False)
)

### 4.4 模型训练

In [130]:
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adadelta(filter(lambda p: p.requires_grad, model.parameters()), rho=1.0, eps=1e-6, weight_decay=opt.weight_decay)

In [131]:
print("start training...")
max_pre = -1.0
max_rec = -1.0

start training...


In [279]:
total_loss = 0

In [280]:
data, label_set = iter(train_data_loader).next()

In [281]:
len(data)

128

In [282]:
data[0]

array([list([135834, 26554]), 3,
       list([[26554, 11, 326, 303, 1, 135834, 1, 15, 9, 32, 683, 10, 176, 4, 2, 1865, 196, 46, 187, 5, 26554, 906, 34, 37, 1815, 65, 41, 810, 13723, 89, 2758, 17, 1987, 9, 1807, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [135, 987, 1865, 2129, 6, 308, 15011, 9, 26554, 11, 3478, 7, 6, 1424, 4219, 4, 2, 2018, 72, 29, 115, 23, 6, 113, 1, 8, 26554, 11, 326, 303, 1, 135834, 1, 15, 9, 6, 844, 535, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [4849, 15, 10, 85, 522, 4, 26554, 1, 18, 135834, 1, 6, 215, 521, 303, 19, 2, 51104, 1101, 2471, 1, 23, 326, 303, 1, 37, 115, 2018, 9, 9188, 5033, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]),
       list([[[47, 48, 49, 50, 51, 52,

In [283]:
len(label_set)

128

In [284]:
label_set[0]

array([ 4, -1, -1, -1])

In [285]:
label = [l[0] for l in label_set]

In [286]:
if opt.use_gpu:
    label = torch.LongTensor(label).cuda()
else:
    label = torch.LongTensor(label)

In [240]:
def select_instance(model, batch_data, labels):

    model.eval()
    select_ent = []
    select_num = []
    select_sen = []
    select_pf = []
    select_pool = []
    select_mask = []
    for idx, bag in enumerate(batch_data):
        insNum = bag[1]
        label = labels[idx]
        max_ins_id = 0
        if insNum > 1:
            model.batch_size = insNum
            if opt.use_gpu:
                data = map(lambda x: torch.LongTensor(x).cuda(), bag)
            else:
                data = map(lambda x: torch.LongTensor(x), bag)

            out = model(data)

            #  max_ins_id = torch.max(torch.max(out, 1)[0], 0)[1]
            max_ins_id = torch.max(out[:, label], 0)[1]

            if opt.use_gpu:
                #  max_ins_id = max_ins_id.data.cpu().numpy()[0]
                max_ins_id = max_ins_id.item()
            else:
                max_ins_id = max_ins_id.data.numpy()[0]

        max_sen = bag[2][max_ins_id]
        max_pf = bag[3][max_ins_id]
        max_pool = bag[4][max_ins_id]
        max_mask = bag[5][max_ins_id]

        select_ent.append(bag[0])
        select_num.append(bag[1])
        select_sen.append(max_sen)
        select_pf.append(max_pf)
        select_pool.append(max_pool)
        select_mask.append(max_mask)

    if opt.use_gpu:
        data = map(lambda x: torch.LongTensor(x).cuda(), [select_ent, select_num, select_sen, select_pf, select_pool, select_mask])
    else:
        data = map(lambda x: torch.LongTensor(x), [select_ent, select_num, select_sen, select_pf, select_pool, select_mask])

    model.train()
    return data


In [287]:
data_ = select_instance(model, data, label)

select_instance

In [242]:
model.eval()

PCNN_ONE(
  (word_embs): Embedding(160697, 50)
  (pos1_embs): Embedding(102, 5)
  (pos2_embs): Embedding(102, 5)
  (convs): ModuleList(
    (0): Conv2d(1, 230, kernel_size=(3, 60), stride=(1, 1), padding=(1, 0))
  )
  (mask_embedding): Embedding(4, 3)
  (linear): Linear(in_features=690, out_features=27, bias=True)
  (dropout): Dropout(p=0.5, inplace=False)
)

In [243]:
select_ent = []
select_num = []
select_sen = []
select_pf = []
select_pool = []
select_mask = []

In [244]:
bag  = data[1]
idx = 0

In [245]:
bag

array([list([1856, 4689]), 1,
       list([[1856, 17, 32, 18674, 396, 1090, 9, 2, 27330, 75, 4, 90, 72, 107, 197, 23, 181, 7163, 1, 8, 15, 24321, 40559, 1, 2, 180, 4, 4689, 694, 26, 4252, 5, 1362, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]),
       list([[[52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]]),
       list([[1, 28]]),
       list([[1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2

In [246]:
insNum = bag[1]
insNum     # instacne 的数量

1

In [247]:
labels

array([[ 1, -1, -1, -1],
       [ 3, -1, -1, -1],
       [ 5,  2, -1, -1],
       ...,
       [ 0, -1, -1, -1],
       [ 0, -1, -1, -1],
       [ 0, -1, -1, -1]])

In [248]:
label = labels[idx]
label

array([ 1, -1, -1, -1])

In [249]:
max_ins_id = 0

In [250]:
model.batch_size

2

In [251]:
# if insNum > 1:
model.batch_size = insNum

In [252]:
if opt.use_gpu:
    data = map(lambda x: torch.LongTensor(x).cuda(), bag)
else:
    data = map(lambda x: torch.LongTensor(x), bag)


In [253]:
out = model(data)
out

tensor([[ 0.0147, -0.0539,  0.0280,  0.0557, -0.0702, -0.0361, -0.1082,  0.1344,
         -0.0356,  0.0322,  0.0777, -0.0261,  0.0630,  0.0317,  0.0254,  0.0164,
          0.0426, -0.0248,  0.0536, -0.0229, -0.0469, -0.0333,  0.0104, -0.0498,
          0.0359, -0.0025, -0.0397]], device='cuda:1', grad_fn=<AddmmBackward>)

In [254]:
out.shape    # 3个instance , 27种关系

torch.Size([1, 27])

In [255]:
label

array([ 1, -1, -1, -1])

In [256]:
out[:, label]

tensor([[-0.0539, -0.0397, -0.0397, -0.0397]], device='cuda:1',
       grad_fn=<IndexBackward>)

In [257]:
max_ins_id = torch.max(out[:, label], 0)[1]   
max_ins_id    # 选出选择一个最后评分最大的，作为使用的instance

tensor([0, 0, 0, 0], device='cuda:1')

In [258]:
max_ins_id = max_ins_id.data.cpu().numpy()[0]
max_ins_id

0

In [259]:
max_sen = bag[2][max_ins_id]
max_pf = bag[3][max_ins_id]
max_pool = bag[4][max_ins_id]
max_mask = bag[5][max_ins_id]

In [260]:
select_ent.append(bag[0])
select_num.append(bag[1])
select_sen.append(max_sen)
select_pf.append(max_pf)
select_pool.append(max_pool)
select_mask.append(max_mask)

In [261]:
if opt.use_gpu:
    data = map(lambda x: torch.LongTensor(x).cuda(), [select_ent, select_num, select_sen, select_pf, select_pool, select_mask])
else:
    data = map(lambda x: torch.LongTensor(x), [select_ent, select_num, select_sen, select_pf, select_pool, select_mask])

model.train()

PCNN_ONE(
  (word_embs): Embedding(160697, 50)
  (pos1_embs): Embedding(102, 5)
  (pos2_embs): Embedding(102, 5)
  (convs): ModuleList(
    (0): Conv2d(1, 230, kernel_size=(3, 60), stride=(1, 1), padding=(1, 0))
  )
  (mask_embedding): Embedding(4, 3)
  (linear): Linear(in_features=690, out_features=27, bias=True)
  (dropout): Dropout(p=0.5, inplace=False)
)

select_instance结束

In [262]:
model.batch_size = opt.batch_size

In [263]:
optimizer.zero_grad()

In [288]:
out = model(data_, train=True)
out

tensor([[ 0.1526,  0.0134, -0.0232,  ...,  0.0203,  0.0284,  0.0130],
        [ 0.0147, -0.0521,  0.0131,  ...,  0.1512,  0.0764,  0.0140],
        [-0.7913,  0.2031,  0.7932,  ...,  0.0539,  1.5673, -1.9126],
        ...,
        [-0.0359, -0.1471, -0.1216,  ...,  0.1467,  0.0576, -0.0958],
        [ 0.0642, -0.1000, -0.0797,  ...,  0.0435, -0.0413, -0.0175],
        [ 0.1598, -0.0536,  0.0660,  ...,  0.1856,  0.0513, -0.1404]],
       device='cuda:1', grad_fn=<AddmmBackward>)

### 4.5 模型评价

#### 4.5.1 预测

In [271]:
model.eval()

PCNN_ONE(
  (word_embs): Embedding(160697, 50)
  (pos1_embs): Embedding(102, 5)
  (pos2_embs): Embedding(102, 5)
  (convs): ModuleList(
    (0): Conv2d(1, 230, kernel_size=(3, 60), stride=(1, 1), padding=(1, 0))
  )
  (mask_embedding): Embedding(4, 3)
  (linear): Linear(in_features=690, out_features=27, bias=True)
  (dropout): Dropout(p=0.5, inplace=False)
)

In [272]:
pred_y = []
true_y = []
pred_p = []

In [359]:
data, labels = iter(test_data_loader).next()

In [360]:
true_y.extend(labels)

In [321]:
len(data)

128

In [322]:
bag = data[0]

In [323]:
insNum = bag[1]
model.batch_size = insNum
if opt.use_gpu:
    data = map(lambda x: torch.LongTensor(x).cuda(), bag)
else:
    data = map(lambda x: torch.LongTensor(x), bag)

out = model(data)

In [324]:
out.shape

torch.Size([2, 27])

In [325]:
out = F.softmax(out, 1)

In [326]:
max_ins_prob, max_ins_label = map(lambda x: x.data.cpu().numpy(), torch.max(out, 1))

In [327]:
max_ins_prob

array([0.04105159, 0.04166167], dtype=float32)

In [328]:
max_ins_label

array([12,  7])

In [329]:
tmp_prob = -1.0
tmp_NA_prob = -1.0
pred_label = 0
pos_flag = False

In [330]:
max_ins_label[0]

12

In [331]:
for i in range(insNum):
    if pos_flag and max_ins_label[i] < 1:
        continue
    else:
        if max_ins_label[i] > 0:
            pos_flag = True
            if max_ins_prob[i] > tmp_prob:
                pred_label = max_ins_label[i]
                tmp_prob = max_ins_prob[i]
        else:
            if max_ins_prob[i] > tmp_NA_prob:
                tmp_NA_prob = max_ins_prob[i]

In [332]:
if pos_flag:
    pred_p.append(tmp_prob)
else:
    pred_p.append(tmp_NA_prob)

In [333]:
pred_y.append(pred_label)

#### 4.5.2 指标计算

In [363]:
for idx, (data, labels) in enumerate(test_data_loader):
    true_y.extend(labels)

In [364]:
positive_num = len([i for i in true_y if i[0] > 0])

In [365]:
positive_num

3464

In [346]:
pred_p

[0.04581885, 0.04166167]

In [347]:
np.argsort(pred_p)

array([1, 0])

In [351]:
index = np.argsort(pred_p)[::-1]
index

array([0, 1])

In [349]:
tp = 0
fp = 0
fn = 0
all_pre = [0]
all_rec = [0]
fp_res = []

In [350]:
idx = 0

In [352]:
i = true_y[index[idx]]
j = pred_y[index[idx]]

In [353]:
i

array([ 0, -1, -1, -1])

In [354]:
if i[0] == 0:  # NA relation
    if j > 0:
        fp_res.append((index[idx], j, pred_p[index[idx]]))
        fp += 1
else:
    if j == 0:
        fn += 1
    else:
        for k in i:
            if k == -1:
                break
            if k == j:
                tp += 1
                break

In [355]:
 if fp + tp == 0:
    precision = 1.0
else:
    precision = tp * 1.0 / (tp + fp)

In [366]:
recall = tp * 1.0 / positive_num

In [367]:
if precision != all_pre[-1] or recall != all_rec[-1]:
    all_pre.append(precision)
    all_rec.append(recall)

In [370]:
precision

0.0

In [371]:
recall

0.0

# 5 代码梳理及细节回顾(在VScode中演示)

　　在VScode环境中的训练文件里再回顾训练流程。

# 6 作业
  
`【思考题】`思考这篇文章的模型在卷积神经网络方面以及多实例学习方面（损失函数），还有什么可以改进的地方吗，

`【代码实践】`复现该文章代码的模型的PCNNs部分, 以及损失函数的部分（select_instance）。

`【画图】`不看文章原图，按照自己的理解画出模型整体的结构图。

`【总结】`对这篇文章进行回顾总结，思考并学习文章写作总体结构，学习实验设计等内容。

---