<h2 id="80-id番号への変換">80. ID番号への変換</h2>
<p>問題51で構築した学習データ中の単語にユニークなID番号を付与したい．学習データ中で最も頻出する単語に<code class="language-plaintext highlighter-rouge">1</code>，2番目に頻出する単語に<code class="language-plaintext highlighter-rouge">2</code>，……といった方法で，学習データ中で2回以上出現する単語にID番号を付与せよ．そして，与えられた単語列に対して，ID番号の列を返す関数を実装せよ．ただし，出現頻度が2回未満の単語のID番号はすべて<code class="language-plaintext highlighter-rouge">0</code>とせよ．</p>


In [31]:
# データ分割
import pandas as pd
from sklearn.model_selection import train_test_split

# csvファイルを読み込む
path = "/Users/nyuton/Documents/100knock-2023/trainee_nyutonn/chapter08/data/newsCorpora.csv"
df = pd.read_table(path, header=None, sep='\\t', engine='python')
df.columns = ['ID', 'TITLE', 'URL', 'PUBLISHER', 'CATEGORY', 'STORY', 'HOSTNAME', 'TIMESTAMP']

# PUBLISHERが特定の行のみを取り出す
publishers = ['Reuters', 'Huffington Post', 'Businessweek', 'Contactmusic.com', 'Daily Mail']
daily_mails = df[df['PUBLISHER'].isin(publishers)]

# 訓練データ、検証データ、テストデータに分ける
train_data, non_train, train_target, non_train_target = train_test_split(daily_mails[['TITLE', 'CATEGORY']], daily_mails['CATEGORY'], train_size=0.8, random_state=10, stratify=daily_mails['CATEGORY'])
valid_data, test_data, valid_target, test_target = train_test_split(non_train, non_train_target, train_size=0.5, random_state=10,  stratify=non_train_target)
print(len(train_data), len(valid_data), len(test_data))

# テキストファイルに書き込む
train_data.to_csv('work/train.txt', header=None, index=None, sep='\t')
valid_data.to_csv('work/valid.txt', header=None, index=None, sep='\t')
test_data.to_csv('work/test.txt', header=None, index=None, sep='\t')
print(len(train_data))
train_data.head()

10684 1336 1336
10684


Unnamed: 0,TITLE,CATEGORY
409106,Selena Gomez exposes her derriere in VERY shor...,e
290867,Hillshire Says Tyson Foods Bid Superior to Pin...,b
36532,'Friends saw him hit me': Johnny Weir opens up...,e
358830,'As funny as a liver transplant!' Melissa McCa...,e
67622,Piers Morgan Delivers One Final Blow To Gun Vi...,e


In [32]:
# 単語の回数を数え上げ
from collections import Counter, defaultdict
from nltk.tokenize import word_tokenize

vocab = defaultdict(int)
for id, (title, category) in train_data.iterrows():
    # words = title.split()
    words = word_tokenize(title)
    for word in words:
        vocab[word] += 1
vocab = Counter(vocab)
# vocab.most_common()

In [33]:
from nltk.tokenize import word_tokenize

# 単語列から出現頻度インデックスを返す関数
def sentence2index(sentence):
    # 文を単語列に分割
    words = word_tokenize(sentence)
    # 単語のみのリストに分割する
    vocab_order, cnt_list = zip(*vocab.most_common())
    index_output = []
    for word in words:
        # 語彙にないときは 0
        if word not in vocab:
            index = 0
        # 回数が1のときも0
        elif cnt_list[vocab_order.index(word)] == 1:
            index = 0
        # 語彙にあるとき，0 インデックスなので +1 する
        else:  
            index = vocab_order.index(word) + 1

        index_output.append(index)
    return index_output


sentence = "Kim Kardashian Takes The Plunge In A Simple Black Tee"
print(sentence)
print(sentence2index(sentence))

Kim Kardashian Takes The Plunge In A Simple Black Tee
[39, 35, 581, 14, 4855, 20, 24, 8956, 794, 0]


<h2 id="81-rnnによる予測">81. RNNによる予測</h2>
<p>ID番号で表現された単語列\(\boldsymbol{x} = (x_1, x_2, \dots, x_T)\)がある．ただし，\(T\)は単語列の長さ，\(x_t \in \mathbb{R}^{V}\)は単語のID番号のone-hot表記である（\(V\)は単語の総数である）．再帰型ニューラルネットワーク（RNN: Recurrent Neural Network）を用い，単語列\(\boldsymbol{x}\)からカテゴリ\(y\)を予測するモデルとして，次式を実装せよ．</p>

<p>\[\overrightarrow{h}_0 = 0, \\
\overrightarrow{h}_t = {\rm \overrightarrow{RNN}}(\mathrm{emb}(x_t), \overrightarrow{h}_{t-1}), \\
y = {\rm softmax}(W^{(yh)} \overrightarrow{h}_T + b^{(y)}))\]</p>

<p>ただし，\(\mathrm{emb}(x) \in \mathbb{R}^{d_w}\)は単語埋め込み（単語のone-hot表記から単語ベクトルに変換する関数），\(\overrightarrow{h}_t \in \mathbb{R}^{d_h}\)は時刻\(t\)の隠れ状態ベクトル，\({\rm \overrightarrow{RNN}}(x,h)\)は入力\(x\)と前時刻の隠れ状態\(h\)から次状態を計算するRNNユニット，\(W^{(yh)} \in \mathbb{R}^{L \times d_h}\)は隠れ状態ベクトルからカテゴリを予測するための行列，\(b^{(y)} \in \mathbb{R}^{L}\)はバイアス項である（\(d_w, d_h, L\)はそれぞれ，単語埋め込みの次元数，隠れ状態ベクトルの次元数，ラベル数である）．RNNユニット\({\rm \overrightarrow{RNN}}(x,h)\)には様々な構成が考えられるが，典型例として次式が挙げられる．</p>

<p>\[{\rm \overrightarrow{RNN}}(x,h) = g(W^{(hx)} x + W^{(hh)}h + b^{(h)}))\]</p>

<p>ただし，\(W^{(hx)} \in \mathbb{R}^{d_h \times d_w}，W^{(hh)} \in \mathbb{R}^{d_h \times d_h}, b^{(h)} \in \mathbb{R}^{d_h}\)はRNNユニットのパラメータ，\(g\)は活性化関数（例えば\(\tanh\)やReLUなど）である．</p>
<p>なお，この問題ではパラメータの学習を行わず，ランダムに初期化されたパラメータで\(y\)を計算するだけでよい．次元数などのハイパーパラメータは，\(d_w = 300, d_h=50\)など，適当な値に設定せよ（以降の問題でも同様である）．</p>


In [34]:
import torch
import torch.nn as nn
class RNN(nn.Module):
    def __init__(self, vocab_size, padding_idx, emb_size=300, hidden_size=50, n_labels=4) -> None:
        super(RNN, self).__init__()
        self.hidden_size = hidden_size
        # 入力ベクトルの大きさが異なるので，emb層で形をそろえる
        self.emb = nn.Embedding(vocab_size, emb_size, padding_idx=padding_idx)
        # batch_first とは？
        # self.rnn = nn.RNN(emb_size, hidden_size, nonlinearity='tanh', batch_first=True)
        self.rnn = nn.RNN(emb_size, hidden_size, nonlinearity='tanh')
        self.func = nn.Linear(hidden_size, n_labels)

    def forward(self, x):
        self.batch_size = x.size()[0]
        # h0 = self.init_hidden(x.device)  # h0 の初期化 ゼロベクトル
        h0 = torch.zeros(1, self.hidden_size)
        # h0 = torch.zeros(1, self.batch_size, self.hidden_size)
        emb = self.emb(x)  # 入力サイズが異なるので統一する
        x_rnn, h_last = self.rnn(emb, h0)  # RNN

        # print(h_last.shape)
        # print(h_last[:, -1].shape)
        # print(h_last[:, -1])
        # out = self.func(x_rnn[:, -1, :]) # 最後の層だけ取り出す
        # out = self.func(x_rnn[:, -1]) # 最後の層だけ取り出す
        out = self.func(h_last) #現在のhだけ取り出す
        return out

    # 隠れ層の初期化
    def init_hidden(self, device):
        hidden = torch.zeros(1, self.batch_size, self.hidden_size, device=device)
        return hidden

In [35]:
from torch.utils.data import Dataset

class CreateDataset(Dataset):
    def __init__(self, X, y, tokenizer):
        self.X = X
        self.y = y
        self.tokenizer = tokenizer

    # len(Dataset) で返す値を指定
    def __len__(self):
        return len(self.y)
    
    # Dataset[index] で返す値を指定
    def __getitem__(self, index):
        text = self.X.iloc[index]
        input_features = self.tokenizer(text)
        label = self.y.iloc[index]
        return {
            'inputs': torch.tensor(input_features, dtype=torch.int64),
            'labels': torch.tensor(label, dtype=torch.int64)
        }

In [36]:
# ラベル
category_dict = {'b': 0, 't': 1, 'e': 2, 'm': 3}
y_train = train_data['CATEGORY'].map(category_dict)
y_valid = valid_data['CATEGORY'].map(category_dict)
y_test = test_data['CATEGORY'].map(category_dict)

# 特徴量データセット
X_train = CreateDataset(train_data['TITLE'], y_train, sentence2index)
X_valid = CreateDataset(valid_data['TITLE'], y_valid, sentence2index)
X_test = CreateDataset(test_data['TITLE'], y_test, sentence2index)

# 使い方の例
from pprint import pprint
print(len(X_train))
pprint(X_train[1])

10684
{'inputs': tensor([1824,   58, 1825, 1016,  556, 6650,    2, 2535,  117]),
 'labels': tensor(0)}


In [37]:
vocab_size = len(vocab) + 1  # padding の分 +1 する
padding_idx = len(vocab)  # 空き単語を埋めるときは最大値を入れる
emb_size = 300  # ハイパラ
hidden_size = 50  # ハイパラ
n_labels = 4  # ラベル数

model = RNN(vocab_size, padding_idx, emb_size, hidden_size, n_labels)
# model.eval()

# 先頭3件の入出力を表示
for i in range(3):
    print(f"{i}番目")
    print(f"入力ベクトル：{X_train[i]}")
    print(f"出力ベクトル：{model(X_train[i]['inputs'])}")
    print(f"予測ラベル　：{model(X_train[i]['inputs']).argmax()}")
    print(f"正解ラベル　：{X_train[i]['labels'].item()}")

0番目
入力ベクトル：{'inputs': tensor([ 145,  153, 6648,   59, 5079,    6, 2950, 1823,    0,  741, 6649,    2,
          23,    3]), 'labels': tensor(2)}
出力ベクトル：tensor([[-0.3896,  0.8900, -0.2229,  0.7795]], grad_fn=<AddmmBackward0>)
予測ラベル　：1
正解ラベル　：2
1番目
入力ベクトル：{'inputs': tensor([1824,   58, 1825, 1016,  556, 6650,    2, 2535,  117]), 'labels': tensor(0)}
出力ベクトル：tensor([[ 0.3870,  0.4830, -0.5279,  0.1314]], grad_fn=<AddmmBackward0>)
予測ラベル　：1
正解ラベル　：0
2番目
入力ベクトル：{'inputs': tensor([6651, 6652, 1263,  224, 1637,    5,    7,  324, 4066,  533,   43,  169,
         142,    0,    0,    5,   33,    3]), 'labels': tensor(2)}
出力ベクトル：tensor([[-0.1642,  0.7868, -0.0113,  0.8425]], grad_fn=<AddmmBackward0>)
予測ラベル　：3
正解ラベル　：2


In [38]:
# 確認 入力サイズがちがう．．．
print(X_train[0]['inputs'].shape)
print(X_train[1]['inputs'].shape)

torch.Size([14])
torch.Size([9])


<h2 id="82-確率的勾配降下法による学習">82. 確率的勾配降下法による学習</h2>
<p>確率的勾配降下法（SGD: Stochastic Gradient Descent）を用いて，問題81で構築したモデルを学習せよ．訓練データ上の損失と正解率，評価データ上の損失と正解率を表示しながらモデルを学習し，適当な基準（例えば10エポックなど）で終了させよ．</p>


In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
import numpy as np
from torch.utils.data import DataLoader
from tqdm import tqdm
import time
from sklearn.metrics import accuracy_score


def train(model, output_path, total_epochs):
    optimizer = optim.SGD(model.parameters(), lr=0.01)
    loss_func = nn.CrossEntropyLoss()

    # 指定した epoch 数だけ学習
    for epoch in range(total_epochs):
        train_total_loss = 0.
        train_acc_cnt = 0

        # パラメータ更新
        model.train()
        for data in X_train:
            # print(data['inputs'])
            # print(model(data['inputs']))
            # print(data['labels'])
            x = data['inputs']
            y = data['labels']
            y_pred = model(x)[0]
            loss = loss_func(y_pred, y)  # 損失計算
            optimizer.zero_grad()  # 勾配の初期化
            loss.backward()  # 勾配計算
            optimizer.step()  # パラメータ修正
            train_total_loss += loss.item()

            # 正解率の計算  # ここで計算するのはまずいかも，学習エポックが終わってからやったほうがよさそう
            # 次の問題からは修正
            if y.item() == y_pred.argmax():
                train_acc_cnt += 1

        # valid のロスと正解率の計算
        model.eval()
        valid_acc_cnt = 0
        valid_total_loss = 0.
        with torch.no_grad():
            for data in X_valid:
                x = data['inputs']
                y = data['labels']
                y_pred = model(x)[0]
                loss = loss_func(y_pred, y)  # 損失計算
                optimizer.zero_grad()  # 勾配の初期化
                # loss.backward()  # 勾配計算
                # optimizer.step()  # パラメータ修正
                valid_total_loss += loss.item()

                # 正解率の計算
                if y.item() == y_pred.argmax():
                    valid_acc_cnt += 1

        # 表示
        train_ave_loss = train_total_loss / len(X_train)
        train_acc = train_acc_cnt / len(X_train)
        valid_ave_loss = valid_total_loss / len(X_valid)
        valid_acc = valid_acc_cnt / len(X_valid)
        print(f"epoch{epoch}: train_loss = {train_ave_loss}, train_acc = {train_acc}, valid_loss = {valid_ave_loss}, valid_acc = {valid_acc}")

    # パラメータを保存
    torch.save(model.state_dict(), output_path)


vocab_size = len(vocab) + 1  # padding の分 +1 する
padding_idx = len(vocab)  # 空き単語を埋めるときは最大値を入れる
emb_size = 300  # ハイパラ
hidden_size = 50  # ハイパラ
n_labels = 4  # ラベル数

model = RNN(vocab_size, padding_idx, emb_size, hidden_size, n_labels)
output_path = "./trained_param.npz"
total_epochs = 10
train(model, output_path, total_epochs)

epoch0: train_loss = 0.9860524150169131, train_acc = 0.6433919880194684, valid_loss = 0.9014563560207001, valid_acc = 0.687874251497006
epoch1: train_loss = 0.8265681704164572, train_acc = 0.7137776113815051, valid_loss = 0.8279604763203875, valid_acc = 0.7200598802395209
epoch2: train_loss = 0.7158381654228584, train_acc = 0.7523399475851741, valid_loss = 0.7523488583409891, valid_acc = 0.7372754491017964
epoch3: train_loss = 0.6457718780873205, train_acc = 0.7710595282665669, valid_loss = 0.7497719074974135, valid_acc = 0.7312874251497006
epoch4: train_loss = 0.6039787394605203, train_acc = 0.7903406963684013, valid_loss = 0.7173522087295043, valid_acc = 0.7574850299401198
epoch5: train_loss = 0.56852603803889, train_acc = 0.7982029202545863, valid_loss = 0.8042106931848928, valid_acc = 0.7095808383233533
epoch6: train_loss = 0.5483809195270353, train_acc = 0.8032572070385623, valid_loss = 0.8111276710339721, valid_acc = 0.7350299401197605
epoch7: train_loss = 0.5061867906124584, tra

<h2 id="83-ミニバッチ化gpu上での学習">83. ミニバッチ化・GPU上での学習</h2>
<p>問題82のコードを改変し，\(B\)事例ごとに損失・勾配を計算して学習を行えるようにせよ（\(B\)の値は適当に選べ）．また，GPU上で学習を実行せよ．</p>


In [None]:
# GPUにしても早くなってないような気がする．．．
# 走らせながら正答率を計測する valid acc と 最後にまとめて計算する valid acc2 の結果が異なるのがかなり気になる．．．
import torch
import torch.nn as nn
class RNN(nn.Module):
    def __init__(self, vocab_size, padding_idx, emb_size=300, hidden_size=50, n_labels=4, batch_size=64, device='cpu') -> None:
        super(RNN, self).__init__()
        self.hidden_size = hidden_size
        self.batch_size = batch_size
        self.device = device
        # 入力ベクトルの大きさが異なるので，emb層で形をそろえる
        self.emb = nn.Embedding(vocab_size, emb_size, padding_idx=padding_idx)
        # batch_first とは？ -> batch_size と emb の２次元目のサイズが異なるときに合わせている？？
        self.rnn = nn.RNN(emb_size, hidden_size, nonlinearity='tanh', batch_first=True)
        # self.rnn = nn.RNN(emb_size, hidden_size, nonlinearity='tanh')
        self.func = nn.Linear(hidden_size, n_labels)

    def forward(self, x):
        # バッチサイズを固定すると，一番最後の余りの分がおかしくなるので，動的に毎回決める！
        self.batch_size = x.size()[0]
        h0 = torch.zeros(1, self.batch_size, self.hidden_size, device=self.device) # ここを変更した
        emb = self.emb(x)  # 入力サイズが異なるので統一する
        x_rnn, h_last = self.rnn(emb, h0)  # RNN
        out = self.func(x_rnn[:, -1, :]) # 最後の層だけ取り出す # ここを変更した
        # out = self.func(x_rnn[:, -1]) # 最後の層だけ取り出す
        # out = self.func(h_last) #現在のhだけ取り出す
        return out

In [None]:
from torch.utils.data import Dataset
import numpy as np

class CreateDataset(Dataset):
    def __init__(self, X, y, tokenizer):
        self.X = X
        self.y = y
        self.tokenizer = tokenizer

    # len(Dataset) で返す値を指定
    def __len__(self):
        return len(self.y)
    
    # Dataset[index] で返す値を指定
    def __getitem__(self, index):
        titles = self.X.iloc[index]
        # スライス記法のとき
        if type(index) == slice:
            labels = self.y.iloc[index]
            input_features = []
            labels_tensor = []
            for title, label in zip(titles, labels):
                input_feature = torch.tensor(self.tokenizer(title))
                input_features.append(input_feature)
                labels_tensor.append(torch.tensor(label, dtype=torch.int64))
        else:
            text = self.X.iloc[index]
            input_features = torch.tensor(self.tokenizer(text), dtype=torch.int64)
            labels_tensor = torch.tensor(self.y.iloc[index], dtype=torch.int64)
        return {
            'inputs': input_features,
            'labels': labels_tensor
        }

In [None]:
X_train = CreateDataset(train_data['TITLE'], y_train, sentence2index)
X_valid = CreateDataset(valid_data['TITLE'], y_valid, sentence2index)
X_test = CreateDataset(test_data['TITLE'], y_test, sentence2index)

In [None]:
# 確認
pprint(X_train[0])
pprint(X_train[:3])

{'inputs': tensor([ 145,  153, 6648,   59, 5079,    6, 2950, 1823,    0,  741, 6649,    2,
          23,    3]),
 'labels': tensor(2)}
{'inputs': [tensor([ 145,  153, 6648,   59, 5079,    6, 2950, 1823,    0,  741, 6649,    2,
          23,    3]),
            tensor([1824,   58, 1825, 1016,  556, 6650,    2, 2535,  117]),
            tensor([6651, 6652, 1263,  224, 1637,    5,    7,  324, 4066,  533,   43,  169,
         142,    0,    0,    5,   33,    3])],
 'labels': [tensor(2), tensor(0), tensor(2)]}


In [29]:
# サーバ上で動かなかった問題解決 -> model.to('cpu') を model に入れていなかった -> 正） model = model.to('cpu')
# 今度はhiddenとinputのdeviceが違うというエラーが発生 -> xiをcudaに戻したら解決
from sklearn.metrics import accuracy_score

def measure_acc(model, X, y, device):
    model.eval()
    model = model.to(device)
    with torch.no_grad():
        pred_y = []
        for xi in X:
            xi = xi.to(device)
            pred_yi = model(xi[None]).argmax()
            pred_yi = pred_yi.to('cpu')
            pred_y.append(pred_yi)
    return accuracy_score(pred_y, y)

In [819]:
# 修正版
import torch
import torch.nn as nn
import torch.optim as optim
import numpy as np
from torch.utils.data import DataLoader # データローダ使ってみる
from tqdm import tqdm
import time
from sklearn.metrics import accuracy_score

def train(model, train_loader, valid_loader, output_path, total_epochs, device, lr=0.01):
    optimizer = optim.SGD(model.parameters(), lr=lr)
    loss_func = nn.CrossEntropyLoss()

    model = model.to(device)
    # 指定した epoch 数だけ学習
    for epoch in range(total_epochs):
        train_total_loss = 0.
        # train_acc_cnt = 0

        # パラメータ更新
        model.train()
        for data in tqdm(train_loader):
            x = data['inputs']
            x = x.to(device)
            y = data['labels']
            y = y.to(device)
            y_pred = model(x)

            # バッチの中で損失計算
            train_loss = 0.
            for yi, yi_pred in zip(y, y_pred):
                loss_i = loss_func(yi_pred, yi)
                train_loss += loss_i
            
            optimizer.zero_grad()  # 勾配の初期化
            train_loss.backward()  # 勾配計算
            optimizer.step()  # パラメータ修正
            train_total_loss += train_loss.item()

            # バッチの中で正解率の計算 # ここを修正
            # for yi, yi_pred in zip(y, y_pred):
            #     if yi.item() == yi_pred.argmax():
            #         train_acc_cnt += 1
                
        # train のロスと正解率の計算
        model.eval()
        train_acc = measure_acc(model, X_train[:]['inputs'], X_train[:]['labels'], device)


        # valid のロスと正解率の計算
        model.eval()
        valid_acc_cnt = 0
        valid_total_loss = 0.
        with torch.no_grad():
            for data in tqdm(valid_loader):
                x = data['inputs']
                x = x.to(device)
                y = data['labels']
                y = y.to(device)
                y_pred = model(x)

                # バッチの中で損失計算
                valid_loss = 0.
                for yi, yi_pred in zip(y, y_pred):
                    # print(yi)
                    # print(yi_pred)
                    loss_i = loss_func(yi_pred, yi)
                    valid_loss += loss_i

                optimizer.zero_grad()  # 勾配の初期化
                # valid_loss.backward()  # 勾配計算
                # optimizer.step()  # パラメータ修正
                valid_total_loss += valid_loss

                # バッチの中で正解率の計算  # ここを修正
                for yi, yi_pred in zip(y, y_pred):
                    if yi.item() == yi_pred.argmax():
                        valid_acc_cnt += 1

            # valid のロスと正解率の計算
            # valid_acc = measure_acc(model, X_valid[:]['inputs'], X_valid[:]['labels'], device)

        # 表示
        train_ave_loss = train_total_loss / len(X_train)
        # train_acc = train_acc_cnt / len(X_train)
        valid_ave_loss = valid_total_loss / len(X_valid)
        valid_acc = valid_acc_cnt / len(X_valid)
        print(f"epoch{epoch}: train_loss = {train_ave_loss}, train_acc = {train_acc}, valid_loss = {valid_ave_loss}, valid_acc = {valid_acc}")

    # パラメータを保存
    torch.save(model.state_dict(), output_path)

In [66]:
# 訓練データの最後で何故かエラーが起きる -> あまりの分がおかしくなっていた！
# ロスが下がっていない．．．なぜだ．．．
from torch.utils.data import Dataset
from torch.utils.data import DataLoader
from torch.nn.utils.rnn import pad_sequence

#バッチサイズ
batch_size = 32
PADDING_IDX = vocab_size = len(vocab)

#ミニバッチを取り出して長さを揃える関数
def collate_fn(batch):
    sorted_batch = sorted(batch, key=lambda x: x['inputs'].shape[0], reverse=True)
    sequences = [x['inputs'] for x in sorted_batch]
    sequences_padded = torch.nn.utils.rnn.pad_sequence(sequences, batch_first=True, padding_value=PADDING_IDX)
    labels = torch.LongTensor([x['labels'] for x in sorted_batch])
    return {'inputs': sequences_padded, 'labels': labels}

vocab_size = len(vocab) + 1  # padding の分 +1 する
padding_idx = len(vocab)  # 空き単語を埋めるときは最大値を入れる
emb_size = 300  # ハイパラ
hidden_size = 50  # ハイパラ
n_labels = 4  # ラベル数
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"device: {device}")

train_loader = DataLoader(X_train, batch_size=batch_size, shuffle=True, collate_fn=collate_fn)
valid_loader = DataLoader(X_valid, batch_size=batch_size, shuffle=True, collate_fn=collate_fn)
test_loader = DataLoader(X_test, batch_size=batch_size, shuffle=True, collate_fn=collate_fn)

model = RNN(vocab_size, padding_idx, emb_size, hidden_size, n_labels, batch_size, device)
output_path = "./trained_param.npz"
total_epochs = 10
train(model, train_loader, valid_loader, output_path, total_epochs, device)

device: cpu


TypeError: __init__() takes from 3 to 6 positional arguments but 8 were given

<h2 id="84-単語ベクトルの導入">84. 単語ベクトルの導入</h2>
<p>事前学習済みの単語ベクトル（例えば，Google Newsデータセット（約1,000億単語）での<a href="https://drive.google.com/file/d/0B7XkCwpI5KDYNlNUTTlSS21pQmM/edit?usp=sharing">学習済み単語ベクトル</a>）で単語埋め込み\(\mathrm{emb}(x)\)を初期化し，学習せよ．</p>


In [722]:
import numpy as np
from tqdm import tqdm # 進捗表示
from gensim.models import KeyedVectors
from nltk.tokenize import word_tokenize

# 70からもってきた
word2vec_model = KeyedVectors.load_word2vec_format('./../chapter08/data/GoogleNews-vectors-negative300.bin', binary=True) 

vocab_size = len(vocab) + 1
emb_size = 300
padding_idx = len(vocab)

rnn_emb = nn.Embedding(vocab_size, emb_size, padding_idx=padding_idx)
word2vec_model = KeyedVectors.load_word2vec_format('./../chapter08/data/GoogleNews-vectors-negative300.bin', binary=True) 
embedding_weight_matrix = rnn_emb(torch.tensor([0])).detach().numpy().copy()

# ID持ちの単語が学習済み単語ベクトルを持っていればそれを行方向に足していき，なければnn.Embeddingのベクトルを行方向に足していく
for key, value in vocab.items():
    try:
        embedding_weight_matrix = np.vstack((embedding_weight_matrix, word2vec_model[key]))
    except KeyError:
        embedding_weight_matrix = np.vstack((embedding_weight_matrix, rnn_emb(torch.tensor([value])).detach().numpy().copy()))
    
embedding_weight_matrix = np.vstack((embedding_weight_matrix, np.zeros(emb_size, dtype=np.float32)))  # paddingの分
embedding_weight_matrix = torch.from_numpy(embedding_weight_matrix)

In [800]:
# やっぱりロスが下がらない．．．
# 埋め込み単語ベクトルを変更可能にする
# emb_weight に初期値を入れることで，モード変更をする
import torch
import torch.nn as nn
class RNN(nn.Module):
    def __init__(self, vocab_size, padding_idx, emb_size=300, hidden_size=50, n_labels=4, batch_size=64, device='cpu', emb_weight=None) -> None:
        super(RNN, self).__init__()
        self.hidden_size = hidden_size
        self.batch_size = batch_size
        self.device = device
        # 入力ベクトルの大きさが異なるので，emb層で形をそろえる
        if emb_weight is None:
            self.emb = nn.Embedding(vocab_size, emb_size, padding_idx=padding_idx)
        # 追加
        else:
            self.emb = nn.Embedding.from_pretrained(emb_weight, padding_idx=padding_idx)
        
        # batch_first とは？ -> batch_size と emb の２次元目のサイズが異なるときに合わせている？？
        self.rnn = nn.RNN(emb_size, hidden_size, nonlinearity='tanh', batch_first=True)
        # self.rnn = nn.RNN(emb_size, hidden_size, nonlinearity='tanh')
        self.func = nn.Linear(hidden_size, n_labels)

    def forward(self, x):
        print(x)
        # バッチサイズを固定すると，一番最後の余りの分がおかしくなるので，動的に毎回決める！
        self.batch_size = x.size()[0]
        h0 = torch.zeros(1, self.batch_size, self.hidden_size, device=self.device) # ここを変更した
        emb = self.emb(x)  # 入力サイズが異なるので統一する
        x_rnn, h_last = self.rnn(emb, h0)  # RNN
        out = self.func(x_rnn[:, -1, :]) # 最後の層だけ取り出す # ここを変更した
        # out = self.func(x_rnn[:, -1]) # 最後の層だけ取り出す
        # out = self.func(h_last) #現在のhだけ取り出す
        return out
    
    
vocab_size = len(vocab) + 1  # padding の分 +1 する
padding_idx = len(vocab)  # 空き単語を埋めるときは最大値を入れる
emb_size = 300  # ハイパラ
hidden_size = 50  # ハイパラ
n_labels = 4  # ラベル数
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"device: {device}")

train_loader = DataLoader(X_train, batch_size=batch_size, shuffle=True, collate_fn=collate_fn)
valid_loader = DataLoader(X_valid, batch_size=batch_size, shuffle=True, collate_fn=collate_fn)
test_loader = DataLoader(X_test, batch_size=batch_size, shuffle=True, collate_fn=collate_fn)

model = RNN(vocab_size, padding_idx, emb_size, hidden_size, n_labels, batch_size, device, embedding_weight_matrix)
output_path = "./trained_param.npz"
total_epochs = 10
train(model, train_loader, valid_loader, output_path, total_epochs, device)

device: cpu


  0%|          | 1/334 [00:01<05:47,  1.04s/it]

tensor([[    0,     0,     7,  1710,     2,    90,    23,  1018,     0,    95,
            94,     0,  6396,    90,  8910,    17,     3],
        [ 1767,    17,   927,   166,  2042,    50,    24,  1099,  3192,    19,
             0,  2484,  4319,  2201,   885,    18,     3],
        [ 3115,  1690,    46,   100,  4067,   389,  5787,     5,     2,    90,
            17,  5788,    13,    17,    34,   147,   707],
        [  133,     0,  4664,  4490,     2,  3612,     5,     9,  4753,  8946,
          1215,  2502,  1804,    22,    17,     3, 19690],
        [  173,   475,    24,   935,  5537,  3225,  1338,   141,  7454,     4,
           980,    49,   633,    64,   173,    32, 19690],
        [ 1860,     5,   795,  3653,     5,     2,  1034,     9,   196,    64,
            48,  1330,   162,    25,    48,    27, 19690],
        [ 4753,  3344,     2,  1608,   546,     0,   284,  2052,     2,   389,
             0,     7,   968,   621,  2779,     3, 19690],
        [   86,    73,    11,    8

  1%|          | 2/334 [00:01<05:05,  1.09it/s]

tensor([[   16,  3968,    23,     0,     7,  2696,   411,     0,     5,  2313,
            18,  9254,   731,  1461,     4,  4485,     2,     3],
        [   24,  2740,  2078,     7,    14,   336,  1030,  7756,  1410,  2476,
          6256,    43,    59,   139,  2740,  3966,    12,     3],
        [  324,   454,     6,  8230,  2920,    18,  3954,     6,  1539,    13,
             0,  2167,   214,  6096,     5,  6097,     9,     3],
        [ 1483,  1120,   640,  1158,  3280,    16,   190,    34,  2335,    20,
           165,   245,  6699,     1,     0,     3, 19690, 19690],
        [ 1330,   457,   518,  5417,     7,    38,  5418,    57,    41,  2543,
             1,    38,  4347,    41,  5419, 19690, 19690, 19690],
        [   39,    35,     1,    86,    73,   312,    16,  2742,     0,   107,
           316,    25,  9244,    27, 19690, 19690, 19690, 19690],
        [    0,  4827,    50,  2996,  2997,   171,     0,  4011,     6,  9432,
          4576,     9,    70,     3, 19690, 19690, 

  1%|          | 3/334 [00:02<04:02,  1.36it/s]

tensor([[  108,    62,   850,   505,    16,  1312,   130,    75,   973,  1291,
            69,   144,    14,  6285,  9081,     3],
        [ 2295,   863,   109,   161,   315,   104,     1,   607,   561,     8,
          1851,   341,     1,   258,   111, 19690],
        [    0,     5,   697,   698,   634,   106,     6,     0,  3460,   284,
            18,  4874,   914,  1228,  2163, 19690],
        [ 1419,  2396,  2438,     9,  3633,  1412,  4420,  1521,  1522,     6,
          1585,  4385,     0,     3, 19690, 19690],
        [   10,     0,  5748,     1,   249,     2,   449,  3740,  3976,   195,
          2523,  1091,     6,     3, 19690, 19690],
        [ 1821,   937,  3327,  1946,     1,  5464,   248,   265,   165,  2400,
            14,  1272,  3924,     3, 19690, 19690],
        [ 2478,  3364,  2948,   497,    22,    17,   509,   462,    29,    19,
             0,  1647,  2319, 19690, 19690, 19690],
        [   19,  8634,    31,    24,     0,  1742,    32,  1578,  1097,  1528,
    

  1%|          | 4/334 [00:03<04:05,  1.34it/s]

tensor([[  107,   304,    22,  1541,     1,  1741,  4040,  1176,   329,  1423,
             1,  4817,    93,  1190,  3227,    13,    17,  3998,    64],
        [ 1607,    55,  8757,    81,   921,    32,   998,  2237,  8758,    93,
            23,  8759,    13,    17,   827,     7,  8760,     3, 19690],
        [  986,   318,  1538,    93,     6,  2277,  1364,     1,   281,    18,
            68,  1516,    93,     6,   299,     2,  1172,     3, 19690],
        [   14,  1787,   619,    99,   253,  4683,   740,  7242,   596,   166,
          1012,     6,  3680,   323,   209,     3, 19690, 19690, 19690],
        [    0,  1814,    45,    17,  1068,     0,     0,     6,    17,   544,
             5,     7,  1122,  1853,     3, 19690, 19690, 19690, 19690],
        [    0,  3085,  3875,    40,  1579,   323,   171,   492,     2,     0,
           322,   427,    13,    17,     3, 19690, 19690, 19690, 19690],
        [  211,  5442,     7,  5398,    82,    69,    17,  4598,    34,     0,
         

  1%|▏         | 5/334 [00:03<03:32,  1.55it/s]

tensor([[ 7811,  5163,   171,     0,  2515,   597,     1,  1701,   349,    33,
             0,  9390,     5,     0,  1715,    59,     6,     3],
        [    0,     1,     0,    25,   471,    27,   472,   763,  1600,     0,
           158,    55,     6,    17,  8560,  6329,     3, 19690],
        [ 1192,   313,     0,     5,    22,    39,    18,    86,     4,     0,
             2,   646,   208,  3283,  1741,     3, 19690, 19690],
        [   14,  3363,  3265,  8148,  2589,  2292,  2291,   491,    22,  1968,
          3563,    33,  5194,    59,  1343,     3, 19690, 19690],
        [ 3115,  1690,  2626,   130,    69,    34,   147,   707,  5991,     1,
            58,   375,   559,    55,     3, 19690, 19690, 19690],
        [   28,  2231,  2232,     4,    34,   283,   167,    24,  4007,   392,
           283,    32,   553,   167,   603, 19690, 19690, 19690],
        [  681,  8429,    11,   681,  8429,  1482,   130,    16,   355,   344,
            44,  1261,   217,     4,  1691, 19690, 

  2%|▏         | 6/334 [00:04<03:13,  1.69it/s]

tensor([[ 3195,    93,   168,    33,  1819,     1,  4290,    93,    33,     0,
             5,     7,  4678,     4,  3378,  8231,  2358,   106,     2,     3],
        [  165,   551,   732,  1757,  1798,  7827,   491,    22,    17,  1968,
            13,  5866,     3,  2602,  2401,   142,     3, 19690, 19690, 19690],
        [  194,   216,    11,   194,   216,    30,   138,  6558,    37,  3939,
            16,    76,   379,     0,   455,  8226,     3, 19690, 19690, 19690],
        [  900,   291,     5,   277,   315,  3098,     7,  1066,  1976,     0,
            37,    14,  4229,     4,  3273,     3, 19690, 19690, 19690, 19690],
        [   61,    18,  3634,  3128,     5,  2403,     0,  1373,  2966,  3657,
             0,    20,    14,  7410,    26,     3, 19690, 19690, 19690, 19690],
        [  144,     2,     0,    23,  7976,   176,    26,  1127,     4,  7977,
             2,   604,     2,    17,  2043, 19690, 19690, 19690, 19690, 19690],
        [ 6177,     0,     5,  2881,     0,   

  2%|▏         | 7/334 [00:04<03:32,  1.54it/s]

tensor([[    5,    94,   928,   532,  1735,    12,   318,  2859,     5,     7,
          1416,  2085,  4163,    12,  1313,     8,  5023,  2536,     3],
        [   14,   340,  5332,    16,    15,  2427,     5,  8122,  1454,  4783,
           144,    75,   967,  2886, 19690, 19690, 19690, 19690, 19690],
        [ 9762,    13,   965,    50,  1074,  1485,    18,  1905,  1906,     0,
          8222,     0,     2,     3, 19690, 19690, 19690, 19690, 19690],
        [  149,   150,   138,  1515,   150,  1108,    30,   194,   216,    16,
           163,  3400,   455,    32, 19690, 19690, 19690, 19690, 19690],
        [ 2683,   146,   550,    67,  2824,   367,     1,   326,     0,  2563,
          1362,    21,  1317, 19690, 19690, 19690, 19690, 19690, 19690],
        [  170,   386,     4,   876,     0,   142,   688,     4,  5276,    45,
           389,  6662,     5, 19690, 19690, 19690, 19690, 19690, 19690],
        [ 6992,  6993,     5,   463,     0,     9,  4210,  1422,  1895,  1737,
         

  2%|▏         | 8/334 [00:05<03:16,  1.66it/s]

tensor([[    0,   638,  9552,    50,     5,     7,  1412,  1155,    45,     2,
          1172,    60,    61,     8,    59,  3234,   816,     6,    17,    15,
             2,     3],
        [ 2660,   465,   433,    18,  1677,     0,    13,     0,    18,     0,
            23,   336,   341,  6209,   618,    40,    23,  1265, 19690, 19690,
         19690, 19690],
        [  145,   153,   441,    34,  5743,  2822,     1,   240,   108,   476,
            62,  3046,    32,   553,  2715,   603, 19690, 19690, 19690, 19690,
         19690, 19690],
        [  345,    21,   148,   277,   184,     0,  1094,  2212,    11,   397,
             4,   254,  1730,   173,   248,     3, 19690, 19690, 19690, 19690,
         19690, 19690],
        [ 1600,  4004,     1,     0,  2401,  1797,     4,     0,    57,   639,
          3163,     1,   491,  1614,  3449, 19690, 19690, 19690, 19690, 19690,
         19690, 19690],
        [ 3928,  4781,    18,  1160,  3363,  1996,    19,     0,   879,     0,
           

  3%|▎         | 9/334 [00:05<03:01,  1.79it/s]

tensor([[ 1467,  3500,   136,  4661,  4662,     7,     5,    94,  6667,   551,
          3371,   343,    21,   108,  1546,  2830,    37,     5],
        [ 4284,  1075,     1,  1688,     1,   158,     0,    22,  1074,  1485,
          3783,     6,  3109,    13,  4735,    13,   761,     3],
        [   38,  1200,    41,  6808,    38,  1826,     1,    41,    38,  2618,
            41,     2,   972,  2319,   509,   462, 19690, 19690],
        [ 8364,   162,    28,  3955,    24,  1474,   326,     1,    30,   399,
            28,    62,     4,    24,  3895,   223, 19690, 19690],
        [  800,  1075,  2124,  1739,  9630,     8,   127,    13,  1735,     6,
          3253,  3954,    17,  4328,    33,     3, 19690, 19690],
        [  249,   673,   246,   300,   282,   393,  2220,   292,     1,  8261,
            58,    25,    48,    27, 19690, 19690, 19690, 19690],
        [   10,     0,     4,  4425,  7496,    43,   135,     2,   157,  4426,
            64,    24,    64,    98, 19690, 19690, 

  3%|▎         | 10/334 [00:06<03:22,  1.60it/s]

tensor([[ 1623,    54,  3318,     5,   160,    24,     0,     1,    30,   308,
          1007,   121,    16,    76,    37,  3883,     3],
        [ 1122,  1853,  1682,    64,  1123,  1683,    14,   674,    44,  2591,
            26,    38,  1382,    41,  5149,     3, 19690],
        [ 6275,    96,  9394,     5,  9584,  2782,     0,     1,  1818,  3398,
            66,    76,   312,    31,    24,     3, 19690],
        [  355,   344,  9093,    40,   688,  8153,    66,    18,  5000,  6114,
          1368,    12,  1702,     4,     3, 19690, 19690],
        [  938,  2504,   369,    34,   162,     1,  8701,  1374,  1507,     4,
             0,     0,     1,    31,     3, 19690, 19690],
        [   44,  2227,  1301,     1,  1010,  1312,  4476,   105,  5678,  8950,
            30,     0,  6417,    82,     3, 19690, 19690],
        [   86,    73,   369,    69,   547,  6443,    37,   118,    39,    35,
             1,    30,     0,   990,     3, 19690, 19690],
        [ 1486,   494,    12,  260

  3%|▎         | 11/334 [00:07<03:07,  1.72it/s]

tensor([[    5,    94,  1332,  1081,    17,   927,   358,     5,     7,  1694,
          2091,  7128,     0,    13,   537,    23,  5426,     5,   276,     3],
        [ 4754,    21,    14,  1363,    21,    14,  2897,     5,     7,  2356,
          1525,  9487,    62,     4,   873,  9421,     3, 19690, 19690, 19690],
        [  159,   152,     4,  2159,    31,    14,  4525,   286,  1483,   121,
          2607,   754,     5,    25,   713,    27, 19690, 19690, 19690, 19690],
        [ 2996,  2997,   171,  1598,   294,     4,     0,    45,  5013,   772,
          3796,     5,     3,     9,   116,     3, 19690, 19690, 19690, 19690],
        [ 1304,  2948,    13,     0,     0,     6,    17,   720,   496,     0,
            29,  2181,  2255,     3, 19690, 19690, 19690, 19690, 19690, 19690],
        [  668,  5569,   123,  7353,   128,   465,   895,    19,  5570,    36,
           929,   175,   601,   649, 19690, 19690, 19690, 19690, 19690, 19690],
        [  191,    42,   124,   219,  1491,   

  4%|▎         | 12/334 [00:08<03:42,  1.45it/s]

tensor([[  682,    50,  1460,   346,    34,     0,    54,     0,    18,     0,
             1,     5,     0,     5,  2785],
        [   14,  2947,   643,   990,     0,  3880,    78,    14,  3929,   371,
            20,    96,  2947,   181,     3],
        [  328,  2307,    46,  1265,   891,  5170,     1,  3021,  4152,     2,
           120,   250,   413,   644,     3],
        [    0,     0,    50,     0,     0,  3817,     6,  3310,     0,  3912,
           932,    22,    17,  8040,     3],
        [   14,     0,    13,    15,  6313,  3397,     0,    93,    22,    17,
          1670,   438,  1332,  1548,     3],
        [ 1663,  3781,     2,  5047,    19,  8634,  1308,    13,  2972,  1508,
           538,    25,    57,    27, 19690],
        [ 6632,   185,  2840,   618,    13,  2077,   596,     7,  2912,  4788,
          3556,    93,   114,     3, 19690],
        [  350,  1715,    29,     0,  5467,     6,     0,    18,   318,     0,
            33,     0,  2522,     3, 19690],
        

  4%|▍         | 13/334 [00:08<03:31,  1.52it/s]

tensor([[  852,   848,     0,   178,   934,  5250,     0,     8,   139,  4835,
            13,    94,  5955,  5956,  2742,   848],
        [  979,   355,   344,    11,  1261,   217,     4,  1506,    16,    76,
          4169,    20,  1082,  1083,    11,   187],
        [  384,  2027,    90,    71,    17,  3145,    50,  1388,  1455,   495,
            19,     0,    40,     0,     3, 19690],
        [   20,     0,    13,    17,  6283,  2750,    13,   180,   294,     4,
           268,     1,   397,     3, 19690, 19690],
        [ 2414,  2430,   913,   184,    84,  2301,    30,  4018,  4851,    54,
           466,     1,  9084,     3, 19690, 19690],
        [   34,  1686,  1027,  1691,  4700,    30,   268,  1969,  3748,  1237,
            78,  1368,    21,     3, 19690, 19690],
        [  103,  4834,    83,   373,     0,  4139,    36,   247,    18,    89,
             0,   644,   613, 19690, 19690, 19690],
        [   62,     4,   101,   200,    75,     1,  2403,  4395,    50,   853,
    

  4%|▍         | 14/334 [00:09<03:11,  1.67it/s]

tensor([[ 3698,     4,  5527,     2,   209,    23,  1564,    12,   349,     5,
             7,     0,  1692,    23,   726,  1944,    45,     0,     3],
        [ 6202,    24,  6203,  6204,     7,   527,  4901,  8457,     1,  1149,
          1603,  1437,   672,   258,     3, 19690, 19690, 19690, 19690],
        [ 1438,  3601,   442,    19,  1372,   385,     6,   109,    33,  1856,
           879,  4782,    12,     3, 19690, 19690, 19690, 19690, 19690],
        [    0,    26,    14,   371,     5,   622,  1158,   190,   165,    14,
             0,    30,     0, 19690, 19690, 19690, 19690, 19690, 19690],
        [  938,  2195,  1038,   107,   762,    24,  1966,    37,  2284,     4,
           900,   291,     5, 19690, 19690, 19690, 19690, 19690, 19690],
        [ 1208,    35,  3344,  8623,     2,  1761,   395,  1166,     9,    70,
           411,    51,     3, 19690, 19690, 19690, 19690, 19690, 19690],
        [ 1941,  5574,     4,    34,     0,  1607,    55,  6231,   131,    13,
         

  4%|▍         | 15/334 [00:09<03:26,  1.55it/s]

tensor([[ 1124,     4,    23,  3372,     2,    17,  1575,     1,     5,    46,
          2263,  4753,     9,  1435,  2466,     5,  2374,     2,  4322,   106,
            13,     3],
        [  107,   304,    22,  1541,     1,  1741,  4040,  1176,   329,  1423,
             1,  4817,    93,  1190,  3227,    13,    17,  3998,    64, 19690,
         19690, 19690],
        [  507,   231,     1,  5007,  1815,    18,   819,  9075,    87,   165,
            20,   569,     1,   240,    28,    62,    31,   136,     3, 19690,
         19690, 19690],
        [  231,   695,     2,  5366,  9606,     9,   100,   186,    92,  6629,
          6293,  6625,     8,   284,    13,     3, 19690, 19690, 19690, 19690,
         19690, 19690],
        [  180,   294,     4,  2791,  3404,  8490,     2,  3606,  1264,   388,
             4,   279,  2152,  1981,     3, 19690, 19690, 19690, 19690, 19690,
         19690, 19690],
        [ 2514,  1013,   186,    92,    59,  8139,  6043,  1876,     6,  8140,
          8

  5%|▍         | 16/334 [00:10<03:07,  1.70it/s]

tensor([[ 1058,    57,  1362,    28,    17,    34,   184,  1362,     1,  1540,
           967,    17,   586,   604,    54,  1503,     3],
        [ 3866,  3867,    11,  3866,  3867,    16,   521,  1805,     4,  1987,
          1623,    20,  2800,  2879,    24,   115, 19690],
        [   14,  2464,  8146,  7133,     7,    14,  7471,  8147,  3276,   178,
          6045,   114,   236,    19,     0,     3, 19690],
        [ 9583,  1317,  1429,   660,   218,    13,   742,  1103,     4,  1541,
           269,   661,    12,    19,     0,   132, 19690],
        [ 1452,     0,  7411,     7,   420,  2027,  7412,  1408,   100,   541,
            53,  5613,    29,    17,    15, 19690, 19690],
        [   85,    88,   782,  2218,   696,    13,  1263,     0,    29,   145,
           153,     2,   447,     3, 19690, 19690, 19690],
        [   26,   319,     0,   818,     1,   382,   360,    28,    55,  5928,
           245,     0,   578, 19690, 19690, 19690, 19690],
        [   39,    35,    18,    1

  5%|▌         | 17/334 [00:10<02:57,  1.79it/s]

tensor([[  420,    45,     8,  1624,  1401,   645,  2521,     1,   460,  1259,
            46,     7,  6549,  1424,     0,   618,     3],
        [ 3330,   925,     1,  2892,   323,   494,    92,   142,  2081,  5932,
           166,   168,   209,   100,     0, 19690, 19690],
        [ 9076,  6474,     0,     0,     1,     0,   253,    90,  1390,    12,
           473,     7,     0,  1965,     3, 19690, 19690],
        [ 1410,  2476,  1697,   173,   375,  4657,    82,   121,    74,  3843,
          1198,  2845,    16,    15,  1723, 19690, 19690],
        [  531,  1245,     0,    39,    35,     9,   116,     0,    17,   544,
           116,  3952,     0,    29,     3, 19690, 19690],
        [  354,   526,  9653,     0,     5,   676,    49,     0,  4304,   269,
           661,   553,  1948,   603, 19690, 19690, 19690],
        [  184,  2157,     2,   199,    24,  2128,   110,    25,    30,   407,
            27,     6,   154,  3964, 19690, 19690, 19690],
        [  174,   333,  3226,    1

  5%|▌         | 18/334 [00:11<03:13,  1.63it/s]

tensor([[   30,  9052,  1503,   973,   570,    11,   286,  1056,  5945,   343,
            38,   146,   961,    41,   107,   316,    32],
        [   34,  5795,    32,  1720,  2341,  6031,  6383,  1923,  1625,   388,
          1054,  2665,     5,     3,  1701,     3, 19690],
        [   28,   328,   169,     2,  1333,    23,  5578,  1734,  7362,    32,
           913,  2784,    17,  7363,     3, 19690, 19690],
        [ 3698,  2682,  1637,    70,  4257,     2,  2804,     0,  1166,     5,
             7,  4063,  4064,   171,     3, 19690, 19690],
        [ 1410,  8970,    18,     0,  8181,   285,    92,  4717,     0,     5,
          1604,    22,     0,  5922, 19690, 19690, 19690],
        [ 8635,     0,     7,   409,  1034,  2922,     2,  7584,     4,  8636,
            69,  3696,  1218, 19690, 19690, 19690, 19690],
        [  320,  2167,  5737,  6197,     4,  8452,    13,   119,  4688,     2,
           646,   383,   135, 19690, 19690, 19690, 19690],
        [   44,  8505,     1,    1

  5%|▌         | 18/334 [00:11<03:29,  1.51it/s]


KeyboardInterrupt: 

<h2 id="85-双方向rnn多層化">85. 双方向RNN・多層化</h2>
<p>順方向と逆方向のRNNの両方を用いて入力テキストをエンコードし，モデルを学習せよ．</p>

<p>\[\overleftarrow{h}_{T+1} = 0, \\
\overleftarrow{h}_t = {\rm \overleftarrow{RNN}}(\mathrm{emb}(x_t), \overleftarrow{h}_{t+1}), \\
y = {\rm softmax}(W^{(yh)} [\overrightarrow{h}_T; \overleftarrow{h}_1] + b^{(y)}))\]</p>

<p>ただし，\(\overrightarrow{h}_t \in \mathbb{R}^{d_h}, \overleftarrow{h}_t \in \mathbb{R}^{d_h}\)はそれぞれ，順方向および逆方向のRNNで求めた時刻\(t\)の隠れ状態ベクトル，\({\rm \overleftarrow{RNN}}(x,h)\)は入力\(x\)と次時刻の隠れ状態\(h\)から前状態を計算するRNNユニット，\(W^{(yh)} \in \mathbb{R}^{L \times 2d_h}\)は隠れ状態ベクトルからカテゴリを予測するための行列，\(b^{(y)} \in \mathbb{R}^{L}\)はバイアス項である．また，\([a; b]\)はベクトル\(a\)と\(b\)の連結を表す。</p>
<p>さらに，双方向RNNを多層化して実験せよ．</p>


In [726]:
# 今度はロスがちょっと下がった！
# 双方向と多層化に拡張
import torch
import torch.nn as nn
class RNN(nn.Module):
    def __init__(self, vocab_size, padding_idx, emb_size=300, hidden_size=50, n_labels=4, batch_size=64, device='cpu', emb_weight=None, bidirectional=False, layers=1) -> None:
        super(RNN, self).__init__()
        self.hidden_size = hidden_size
        self.batch_size = batch_size
        self.device = device
        # 双方向と多層化に拡張
        self.bidirectional = bidirectional
        self.layers = layers
        self.directions = bidirectional + 1 # 単方向：１， 双方向：2
        # 入力ベクトルの大きさが異なるので，emb層で形をそろえる
        if emb_weight is None:
            self.emb = nn.Embedding(vocab_size, emb_size, padding_idx=padding_idx)
        else:
            self.emb = nn.Embedding.from_pretrained(emb_weight, padding_idx=padding_idx)
        
        # batch_first とは？ -> batch_size と emb の２次元目のサイズが異なるときに合わせている？？
        # 双方向と多層化に対応        
        self.rnn = nn.RNN(emb_size, hidden_size, self.layers, nonlinearity='tanh', batch_first=True, bidirectional=bidirectional)
        # self.rnn = nn.RNN(emb_size, hidden_size, nonlinearity='tanh')
        # 双方向に対応， 隠れ層に self.directions を掛ける
        self.func = nn.Linear(hidden_size * self.directions, n_labels)

    def forward(self, x):
        # バッチサイズを固定すると，一番最後の余りの分がおかしくなるので，動的に毎回決める！
        self.batch_size = x.size()[0]
        # 多層と双方向に対応， １次元目を 1 -> self.layers * self.directions
        h0 = torch.zeros(self.layers * self.directions, self.batch_size, self.hidden_size, device=self.device) # ここを変更した
        emb = self.emb(x)  # 入力サイズが異なるので統一する
        x_rnn, h_last = self.rnn(emb, h0)  # RNN
        # 双方向に対応
        if self.bidirectional:
            out = self.func(torch.cat([h_last[-2], h_last[-1]], dim=1))
        else:
            out = self.func(x_rnn[:, -1, :]) # 最後の層だけ取り出す # ここを変更した
        # out = self.func(x_rnn[:, -1]) # 最後の層だけ取り出す
        # out = self.func(h_last) #現在のhだけ取り出す
        return out
    
    
vocab_size = len(vocab) + 1  # padding の分 +1 する
padding_idx = len(vocab)  # 空き単語を埋めるときは最大値を入れる
emb_size = 300  # ハイパラ
hidden_size = 50  # ハイパラ
n_labels = 4  # ラベル数
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"device: {device}")

train_loader = DataLoader(X_train, batch_size=batch_size, shuffle=True, collate_fn=collate_fn)
valid_loader = DataLoader(X_valid, batch_size=batch_size, shuffle=True, collate_fn=collate_fn)
test_loader = DataLoader(X_test, batch_size=batch_size, shuffle=True, collate_fn=collate_fn)

model = RNN(vocab_size, padding_idx, emb_size, hidden_size, n_labels, batch_size, device, embedding_weight_matrix, bidirectional=True, layers=3)
output_path = "./trained_param.npz"
total_epochs = 10
train(model, train_loader, valid_loader, output_path, total_epochs, device)

device: cpu


100%|██████████| 334/334 [03:43<00:00,  1.49it/s]
100%|██████████| 42/42 [00:28<00:00,  1.48it/s]


epoch0: train_loss = 1.0233958778824086, train_acc = 0.6118494945713217, valid_loss = 0.9286081194877625, valid_acc = 0.5883233532934131


100%|██████████| 334/334 [03:26<00:00,  1.62it/s]
100%|██████████| 42/42 [00:24<00:00,  1.74it/s]


epoch1: train_loss = 0.878185788686246, train_acc = 0.649475851740921, valid_loss = 0.8094227313995361, valid_acc = 0.6190119760479041


100%|██████████| 334/334 [03:17<00:00,  1.69it/s]
100%|██████████| 42/42 [00:23<00:00,  1.80it/s]


epoch2: train_loss = 0.808253781262608, train_acc = 0.6994571321602396, valid_loss = 0.805242657661438, valid_acc = 0.6601796407185628


100%|██████████| 334/334 [03:11<00:00,  1.75it/s]
100%|██████████| 42/42 [00:24<00:00,  1.69it/s]


epoch3: train_loss = 0.7563907426200703, train_acc = 0.7318420067390491, valid_loss = 0.8188757300376892, valid_acc = 0.6699101796407185


100%|██████████| 334/334 [03:21<00:00,  1.66it/s]
100%|██████████| 42/42 [00:25<00:00,  1.68it/s]


epoch4: train_loss = 0.7158668217075224, train_acc = 0.7485960314488955, valid_loss = 0.8426365852355957, valid_acc = 0.6676646706586826


100%|██████████| 334/334 [03:40<00:00,  1.51it/s]
100%|██████████| 42/42 [00:30<00:00,  1.37it/s]


epoch5: train_loss = 0.7093842308068713, train_acc = 0.6381505054286783, valid_loss = 0.8171107172966003, valid_acc = 0.5965568862275449


100%|██████████| 334/334 [03:30<00:00,  1.59it/s]
100%|██████████| 42/42 [00:24<00:00,  1.72it/s]


epoch6: train_loss = 0.6515607636147159, train_acc = 0.713964807188319, valid_loss = 0.7950844168663025, valid_acc = 0.624251497005988


100%|██████████| 334/334 [03:33<00:00,  1.56it/s]
100%|██████████| 42/42 [00:29<00:00,  1.40it/s]


epoch7: train_loss = 0.6121267506978968, train_acc = 0.7616997379258704, valid_loss = 0.8064600825309753, valid_acc = 0.6803892215568862


100%|██████████| 334/334 [03:45<00:00,  1.48it/s]
100%|██████████| 42/42 [00:23<00:00,  1.82it/s]


epoch8: train_loss = 0.5837842783469112, train_acc = 0.780700112317484, valid_loss = 0.8587974309921265, valid_acc = 0.6699101796407185


100%|██████████| 334/334 [03:11<00:00,  1.75it/s]
100%|██████████| 42/42 [00:21<00:00,  1.93it/s]


epoch9: train_loss = 0.5564715132111693, train_acc = 0.752527143391988, valid_loss = 0.782258927822113, valid_acc = 0.6616766467065869


<h2 id="86-畳み込みニューラルネットワーク-cnn">86. 畳み込みニューラルネットワーク (CNN)</h2>
<p>ID番号で表現された単語列\(\boldsymbol{x} = (x_1, x_2, \dots, x_T)\)がある．ただし，\(T\)は単語列の長さ，\(x_t \in \mathbb{R}^{V}\)は単語のID番号のone-hot表記である（\(V\)は単語の総数である）．畳み込みニューラルネットワーク（CNN: Convolutional Neural Network）を用い，単語列\(\boldsymbol{x}\)からカテゴリ\(y\)を予測するモデルを実装せよ．</p>
<p>ただし，畳み込みニューラルネットワークの構成は以下の通りとする．</p>
<ul>
<li>単語埋め込みの次元数: \(d_w\)</li>
<li>畳み込みのフィルターのサイズ: 3 トークン</li>
<li>畳み込みのストライド: 1 トークン</li>
<li>畳み込みのパディング: あり</li>
<li>畳み込み演算後の各時刻のベクトルの次元数: \(d_h\)</li>
<li>畳み込み演算後に最大値プーリング（max pooling）を適用し，入力文を\(d_h\)次元の隠れベクトルで表現</li>
</ul>
<p>すなわち，時刻\(t\)の特徴ベクトル\(p_t \in \mathbb{R}^{d_h}\)は次式で表される．</p>

<p>\[p_t = g(W^{(px)} [\mathrm{emb}(x_{t-1}); \mathrm{emb}(x_t); \mathrm{emb}(x_{t+1})] + b^{(p)}))\]</p>

<p>ただし，\(W^{(px)} \in \mathbb{R}^{d_h \times 3d_w}, b^{(p)} \in \mathbb{R}^{d_h}\)はCNNのパラメータ，\(g\)は活性化関数（例えば\(\tanh\)やReLUなど），\([a; b; c]\)はベクトル\(a, b, c\)の連結である．なお，行列\(W^{(px)}\)の列数が\(3d_w\)になるのは，3個のトークンの単語埋め込みを連結したものに対して，線形変換を行うためである．</p>
<p>最大値プーリングでは，特徴ベクトルの次元毎に全時刻における最大値を取り，入力文書の特徴ベクトル\(c \in \mathbb{R}^{d_h}\)を求める．\(c[i]\)でベクトル\(c\)の\(i\)番目の次元の値を表すことにすると，最大値プーリングは次式で表される．</p>

<p>\[c[i] = \max_{1 \leq t \leq T} p_t[i]]\]</p>

<p>最後に，入力文書の特徴ベクトル\(c\)に行列\(W^{(yc)} \in \mathbb{R}^{L \times d_h}\)とバイアス項\(b^{(y)} \in \mathbb{R}^{L}\)による線形変換とソフトマックス関数を適用し，カテゴリ\(y\)を予測する．</p>

<p>\[y = {\rm softmax}(W^{(yc)} c + b^{(y)}))\]</p>

<p>なお，この問題ではモデルの学習を行わず，ランダムに初期化された重み行列で\(y\)を計算するだけでよい．</p>


In [743]:
from torch.nn import functional as F

class CNN(nn.Module):
    def __init__(self, vocab_size, padding_idx, out_channels,  emb_size=300, kernel_heights=3, stride=1, n_labels=4, device="cpu", emb_weight=None) -> None:
        """
        stride: 動かす単位（小さいほど細かい）
        kenel_height: 窓の大きさ
        out_channels: 
        conv2d: convolution 層（次元を維持しつつ畳み込み）
        max_pool1d: pooling層（最大値を取り，ダウンサンプリングする）
        """
        super(CNN, self).__init__()
        # 入力ベクトルの大きさが異なるので，emb層で形をそろえる
        if emb_weight is None:
            self.emb = nn.Embedding(vocab_size, emb_size, padding_idx=padding_idx)
        else:
            self.emb = nn.Embedding.from_pretrained(emb_weight, padding_idx=padding_idx)
        self.conv = nn.Conv2d(1, out_channels, (kernel_heights, emb_size), stride, (padding_idx, 0))
        self.drop = nn.Dropout(0.3)
        self.func = nn.Linear(out_channels, n_labels)
        
    def forward(self, x):
        emb = self.emb(x).unsqueeze(1)
        conv = self.conv(emb)  # 畳み込み層
        act = F.relu(conv.squeeze(3))  # 活性化関数
        max_pool = F.max_pool1d(act, act.size()[2])  # pooling 層
        out = self.func(self.drop(max_pool.squeeze(2)))  # 全結合層？
        return out

In [744]:
vocab_size = len(vocab) + 1  # padding の分 +1 する
padding_idx = len(vocab)  # 空き単語を埋めるときは最大値を入れる
out_channels = 50 # ハイパラ？
emb_size = 300  # ハイパラ
kernel_height = 3
stride = 1
n_labels = 4  # ラベル数

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"device: {device}")

model = CNN(vocab_size, padding_idx, out_channels, emb_size, kernel_height, stride, n_labels, device)

for i in range(5):
    xi = X_train[i]['inputs']
    yi = X_train[i]['labels']
    pred_probs = model(xi.unsqueeze(0))
    print(f"予測値：{pred_probs}")
    print(f"予測ラベル：{pred_probs.argmax()}")
    print(f"正解ラベル：{yi}")

device: cpu
予測値：tensor([[ 0.2797, -0.3276, -0.5477, -0.3352]], grad_fn=<AddmmBackward0>)
予測ラベル：0
正解ラベル：2
予測値：tensor([[ 0.5948, -0.0891, -0.8027,  0.3534]], grad_fn=<AddmmBackward0>)
予測ラベル：0
正解ラベル：0
予測値：tensor([[ 0.9635, -0.1160, -0.5544,  0.9841]], grad_fn=<AddmmBackward0>)
予測ラベル：3
正解ラベル：2
予測値：tensor([[ 0.6579,  0.1459, -0.9274,  0.9395]], grad_fn=<AddmmBackward0>)
予測ラベル：3
正解ラベル：2
予測値：tensor([[ 0.8712,  0.4642, -0.8398,  0.2624]], grad_fn=<AddmmBackward0>)
予測ラベル：0
正解ラベル：2


<h2 id="87-確率的勾配降下法によるcnnの学習">87. 確率的勾配降下法によるCNNの学習</h2>
<p>確率的勾配降下法（SGD: Stochastic Gradient Descent）を用いて，問題86で構築したモデルを学習せよ．訓練データ上の損失と正解率，評価データ上の損失と正解率を表示しながらモデルを学習し，適当な基準（例えば10エポックなど）で終了させよ．</p>


In [729]:
# ちゃんとロスが下がった！！
# cpuで回すと10時間かかったらしい
vocab_size = len(vocab) + 1  # padding の分 +1 する
padding_idx = len(vocab)  # 空き単語を埋めるときは最大値を入れる
emb_size = 300  # ハイパラ
hidden_size = 50  # ハイパラ
n_labels = 4  # ラベル数
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"device: {device}")

train_loader = DataLoader(X_train, batch_size=batch_size, shuffle=True, collate_fn=collate_fn)
valid_loader = DataLoader(X_valid, batch_size=batch_size, shuffle=True, collate_fn=collate_fn)
test_loader = DataLoader(X_test, batch_size=batch_size, shuffle=True, collate_fn=collate_fn)

model = CNN(vocab_size, padding_idx, out_channels, emb_size, kernel_height, stride, n_labels, device)
output_path = "./trained_param.npz"
total_epochs = 10
train(model, train_loader, valid_loader, output_path, total_epochs, device)

device: cpu


100%|██████████| 334/334 [34:31<00:00,  6.20s/it]
100%|██████████| 42/42 [02:08<00:00,  3.05s/it]


epoch0: train_loss = 11.279748118267396, train_acc = 0.582366154998128, valid_loss = 5.261847972869873, valid_acc = 0.5546407185628742


100%|██████████| 334/334 [35:28<00:00,  6.37s/it]
100%|██████████| 42/42 [02:09<00:00,  3.10s/it]


epoch1: train_loss = 5.102930523312882, train_acc = 0.7336203669037814, valid_loss = 1.967748761177063, valid_acc = 0.6706586826347305


100%|██████████| 334/334 [35:17<00:00,  6.34s/it]
100%|██████████| 42/42 [02:08<00:00,  3.07s/it]


epoch2: train_loss = 1.6193601711998715, train_acc = 0.7936166229876451, valid_loss = 1.1482020616531372, valid_acc = 0.6976047904191617


100%|██████████| 334/334 [35:02<00:00,  6.30s/it]
100%|██████████| 42/42 [02:05<00:00,  3.00s/it]


epoch3: train_loss = 0.8907247123000596, train_acc = 0.8742980157244478, valid_loss = 1.0152617692947388, valid_acc = 0.7410179640718563


100%|██████████| 334/334 [35:03<00:00,  6.30s/it]
100%|██████████| 42/42 [02:06<00:00,  3.01s/it]


epoch4: train_loss = 0.6636096884007313, train_acc = 0.8873081242980158, valid_loss = 0.9940643310546875, valid_acc = 0.7477544910179641


100%|██████████| 334/334 [34:56<00:00,  6.28s/it]
100%|██████████| 42/42 [02:06<00:00,  3.01s/it]


epoch5: train_loss = 0.4966894327947059, train_acc = 0.9120179707974542, valid_loss = 1.0128010511398315, valid_acc = 0.7604790419161677


100%|██████████| 334/334 [35:06<00:00,  6.31s/it]
100%|██████████| 42/42 [02:06<00:00,  3.01s/it]


epoch6: train_loss = 0.42624359978178117, train_acc = 0.9087420441782104, valid_loss = 1.1438573598861694, valid_acc = 0.7612275449101796


100%|██████████| 334/334 [35:08<00:00,  6.31s/it]
100%|██████████| 42/42 [02:09<00:00,  3.07s/it]


epoch7: train_loss = 0.3629707233350583, train_acc = 0.9338262822912766, valid_loss = 1.040395736694336, valid_acc = 0.7597305389221557


100%|██████████| 334/334 [35:27<00:00,  6.37s/it]
100%|██████████| 42/42 [02:11<00:00,  3.12s/it]


epoch8: train_loss = 0.33030957762934393, train_acc = 0.9378509921377761, valid_loss = 1.100417137145996, valid_acc = 0.7604790419161677


100%|██████████| 334/334 [35:26<00:00,  6.37s/it]
100%|██████████| 42/42 [02:11<00:00,  3.14s/it]


epoch9: train_loss = 0.28721515234881956, train_acc = 0.9463684013478099, valid_loss = 1.1618684530258179, valid_acc = 0.7791916167664671


<h2 id="88-パラメータチューニング">88. パラメータチューニング</h2>
<p>問題85や問題87のコードを改変し，ニューラルネットワークの形状やハイパーパラメータを調整しながら，高性能なカテゴリ分類器を構築せよ．</p>


一番性能が良かったCNNを採用する

optuna でパラメータを自動最適化する

あまりにも時間がかかるので，src/q88.py で実行した
ここに関しては，GPUのほうがCPUよりも3倍くらい早かった気がする
running.log に実行履歴が残っている

In [58]:
# パラメータチューニング用に引数を変更
from torch.nn import functional as F

class CNN(nn.Module):
    def __init__(self, vocab_size, padding_idx, out_channels,  emb_size=300, kernel_heights=3, stride=1, n_labels=4, device="cpu", emb_weight=None, active_func='relu', dropout=0.3) -> None:
        """
        stride: 動かす単位（小さいほど細かい）
        kenel_height: 窓の大きさ
        out_channels: 
        conv2d: convolution 層（次元を維持しつつ畳み込み）
        max_pool1d: pooling層（最大値を取り，ダウンサンプリングする）
        """
        super(CNN, self).__init__()
        # 入力ベクトルの大きさが異なるので，emb層で形をそろえる
        if emb_weight is None:
            self.emb = nn.Embedding(vocab_size, emb_size, padding_idx=padding_idx)
        else:
            self.emb = nn.Embedding.from_pretrained(emb_weight, padding_idx=padding_idx)
        self.conv = nn.Conv2d(1, out_channels, (kernel_heights, emb_size), stride, (padding_idx, 0))
        self.drop = nn.Dropout(dropout)
        self.func = nn.Linear(out_channels, n_labels)
        self.active_func = active_func # 活性化関数をパラメータにする
        
    def forward(self, x):
        emb = self.emb(x).unsqueeze(1)
        conv = self.conv(emb)  # 畳み込み層

        # 活性化関数の最適化を行う
        if self.active_func == 'relu':
            act = F.relu(conv.squeeze(3))
        elif self.active_func == 'tanh':
            act = torch.tanh(conv.squeeze(3))
        elif self.active_func == 'mish':
            act = F.mish(conv.squeeze(3))
        else:
            act = F.relu(conv.squeeze(3))

        max_pool = F.max_pool1d(act, act.size()[2])  # pooling 層
        out = self.func(self.drop(max_pool.squeeze(2)))  # 全結合層？
        return out

In [59]:
# early stoppingを差し込む
class EarlyStopping:
    """earlystoppingクラス"""

    def __init__(self, patience=5, verbose=False, path='checkpoint_model.pth'):
        """引数：最小値の非更新数カウンタ、表示設定、モデル格納path"""

        self.patience = patience    #設定ストップカウンタ
        self.verbose = verbose      #表示の有無
        self.counter = 0            #現在のカウンタ値
        self.best_score = None      #ベストスコア
        self.early_stop = False     #ストップフラグ
        self.val_loss_min = np.Inf   #前回のベストスコア記憶用
        self.path = path             #ベストモデル格納path

    def __call__(self, val_loss, model):
        """
        特殊(call)メソッド
        実際に学習ループ内で最小lossを更新したか否かを計算させる部分
        """
        score = -val_loss

        if self.best_score is None:  #1Epoch目の処理
            self.best_score = score   #1Epoch目はそのままベストスコアとして記録する
            self.checkpoint(val_loss, model)  #記録後にモデルを保存してスコア表示する
        elif score < self.best_score:  # ベストスコアを更新できなかった場合
            self.counter += 1   #ストップカウンタを+1
            if self.verbose:  #表示を有効にした場合は経過を表示
                print(f'EarlyStopping counter: {self.counter} out of {self.patience}')  #現在のカウンタを表示する 
            if self.counter >= self.patience:  #設定カウントを上回ったらストップフラグをTrueに変更
                self.early_stop = True
        else:  #ベストスコアを更新した場合
            self.best_score = score  #ベストスコアを上書き
            self.checkpoint(val_loss, model)  #モデルを保存してスコア表示
            self.counter = 0  #ストップカウンタリセット

    def checkpoint(self, val_loss, model):
        '''ベストスコア更新時に実行されるチェックポイント関数'''
        if self.verbose:  #表示を有効にした場合は、前回のベストスコアからどれだけ更新したか？を表示
            print(f'Validation loss decreased ({self.val_loss_min:.6f} --> {val_loss:.6f}).  Saving model ...')
        torch.save(model.state_dict(), self.path)  #ベストモデルを指定したpathに保存
        self.val_loss_min = val_loss  #その時のlossを記録する


In [67]:
# パラメータチューニング用に引数と返り値を変更
# early stopping の機構を追加
import torch
import torch.nn as nn
import torch.optim as optim
import numpy as np
from torch.utils.data import DataLoader # データローダ使ってみる
from tqdm import tqdm
import time
from sklearn.metrics import accuracy_score

# 学習率を引数に追加
def train(model, train_loader, valid_loader, output_path, total_epochs, device, lr=0.01, op='sgd'):
    earlystopping = EarlyStopping(patience=3, verbose=True)
    
    # 最適化手法を変更
    if op == 'sgd':  
        optimizer = optim.SGD(model.parameters(), lr=lr)
    elif op == 'adam':
        optimizer = optim.Adam(model.parameters(), lr=lr)
    elif op == 'rmsprop':
        optimizer = optim.RMSprop(model.parameters(), lr=lr)
    else:
        optimizer = optim.SGD(model.parameters(), lr=lr)
        
    loss_func = nn.CrossEntropyLoss()

    model = model.to(device)
    # 指定した epoch 数だけ学習
    for epoch in range(total_epochs):
        train_total_loss = 0.
        train_acc_cnt = 0

        # パラメータ更新
        model.train()
        for data in tqdm(train_loader):
            x = data['inputs']
            x = x.to(device)
            y = data['labels']
            y = y.to(device)
            y_pred = model(x)

            # バッチの中で損失計算
            train_loss = 0.
            for yi, yi_pred in zip(y, y_pred):
                loss_i = loss_func(yi_pred, yi)
                train_loss += loss_i
            
            optimizer.zero_grad()  # 勾配の初期化
            train_loss.backward()  # 勾配計算
            optimizer.step()  # パラメータ修正
            train_total_loss += train_loss.item()

            # バッチの中で正解率の計算
            for yi, yi_pred in zip(y, y_pred):
                if yi.item() == yi_pred.argmax():
                    train_acc_cnt += 1
        
        #★毎エポックearlystoppingの判定をさせる★
        train_ave_loss = train_total_loss / len(X_train)
        
        earlystopping(train_ave_loss, model) #callメソッド呼び出し
        if earlystopping.early_stop: #ストップフラグがTrueの場合、breakでforループを抜ける
            print(f"epoch{epoch}: train_loss = {train_ave_loss}")
            print("Early Stopping!")
            break
                
        # train のロスと正解率の計算
        model.eval()
        train_acc = measure_acc(model, X_train[:]['inputs'], X_train[:]['labels'], device)


        # valid のロスと正解率の計算
        model.eval()
        valid_acc_cnt = 0
        valid_total_loss = 0.
        with torch.no_grad():
            for data in tqdm(valid_loader):
                x = data['inputs']
                x = x.to(device)
                y = data['labels']
                y = y.to(device)
                y_pred = model(x)

                # バッチの中で損失計算
                valid_loss = 0.
                for yi, yi_pred in zip(y, y_pred):
                    # print(yi)
                    # print(yi_pred)
                    loss_i = loss_func(yi_pred, yi)
                    valid_loss += loss_i

                optimizer.zero_grad()  # 勾配の初期化
                # valid_loss.backward()  # 勾配計算
                # optimizer.step()  # パラメータ修正
                valid_total_loss += valid_loss

                # バッチの中で正解率の計算
                for yi, yi_pred in zip(y, y_pred):
                    if yi.item() == yi_pred.argmax():
                        valid_acc_cnt += 1

            # valid のロスと正解率の計算
            valid_acc = measure_acc(model, X_valid[:]['inputs'], X_valid[:]['labels'], device)

        # 表示
        train_ave_loss = train_total_loss / len(X_train)
        # train_acc = train_acc_cnt / len(X_train)
        valid_ave_loss = valid_total_loss / len(X_valid)
        # valid_acc = valid_acc_cnt / len(X_valid)
        print(f"epoch{epoch}: train_loss = {train_ave_loss}, train_acc = {train_acc}, valid_loss = {valid_ave_loss}, valid_acc = {valid_acc}")

    # パラメータを保存
    torch.save(model.state_dict(), output_path)
    
    # valid loss を返り値とする
    return valid_ave_loss

In [68]:
# optuna でパラメータの自動最適化
from typing import Any
import optuna

def objective(trial):
    # 固定のもの
    vocab_size = len(vocab) + 1  # padding の分 +1 する
    padding_idx = len(vocab)  # 空き単語を埋めるときは最大値を入れる
    n_labels = 4  # ラベル数
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    train_loader = DataLoader(X_train, batch_size=batch_size, shuffle=True, collate_fn=collate_fn)
    valid_loader = DataLoader(X_valid, batch_size=batch_size, shuffle=True, collate_fn=collate_fn)
    test_loader = DataLoader(X_test, batch_size=batch_size, shuffle=True, collate_fn=collate_fn)
    output_path = "./trained_param.npz"
    total_epochs = 10

    # ハイパラを変更させる
    out_channels = trial.suggest_categorical('out_channels', [16, 32, 64, 128])  # これだけよくわかっていない
    emb_size = trial.suggest_categorical('emb_size', [50, 100, 200, 300])  # 特徴ベクトルの次元数
    kernel_height = trial.suggest_int('kernel_height', 1, 5, step=1)  # 窓の大きさ
    stride = trial.suggest_int('stride', 1, 2, step=1)  # 窓を動かす単位
    active_func = trial.suggest_categorical('active_func', ['relu', 'tanh', 'mish'])  # 活性化関数
    lr = trial.suggest_float('lr', 1e-3, 1e-2, log=True)  # 学習率
    dropout = trial.suggest_float('dropout', 0.2, 0.5)  # ドロップアウト
    op = trial.suggest_categorical('optimizer', ['rmsprop', 'adam', 'sgd'])  # 最適化手法 

    print(f"device: {device}")
    model = CNN(vocab_size, padding_idx, out_channels, emb_size, kernel_height, stride, n_labels, device, active_func=active_func, dropout=dropout)
    valid_loss = train(model, train_loader, valid_loader, output_path, total_epochs, device, lr, op)

    # 訓練の最後で得られた valid_loss でパラメータチューニングを行う
    return valid_loss

In [69]:
# 流石に時間がかかりすぎるのでサーバで実行 (100時間くらいかかる？)
study = optuna.create_study(direction='minimize')
study.optimize(objective, n_trials=10)

[I 2023-11-08 21:26:36,072] A new study created in memory with name: no-name-640055c7-92dc-42e9-930c-ccce990283a0


device: cpu


  1%|          | 2/334 [00:09<27:00,  4.88s/it]
[W 2023-11-08 21:26:45,886] Trial 0 failed with parameters: {'out_channels': 16, 'emb_size': 200, 'kernel_height': 5, 'stride': 2, 'active_func': 'relu', 'lr': 0.004388631453821411, 'dropout': 0.4163492420037371, 'optimizer': 'sgd'} because of the following error: KeyboardInterrupt().
Traceback (most recent call last):
  File "/Users/nyuton/.pyenv/versions/anaconda3-2022.10/lib/python3.9/site-packages/optuna/study/_optimize.py", line 200, in _run_trial
    value_or_values = func(trial)
  File "/var/folders/bw/n738lb8d773382cjs4qszbbh0000gn/T/ipykernel_2658/4258090099.py", line 29, in objective
    valid_loss = train(model, train_loader, valid_loader, output_path, total_epochs, device, lr, op)
  File "/var/folders/bw/n738lb8d773382cjs4qszbbh0000gn/T/ipykernel_2658/4177580950.py", line 50, in train
    train_loss.backward()  # 勾配計算
  File "/Users/nyuton/.pyenv/versions/anaconda3-2022.10/lib/python3.9/site-packages/torch/_tensor.py", line 48

KeyboardInterrupt: 

In [65]:
print(f"最高精度のACC：{study.best_value}")
print('最高精度のパラメータ')
pprint(study.best_params)

ValueError: No trials are completed yet.

<h2 id="89-事前学習済み言語モデルからの転移学習">89. 事前学習済み言語モデルからの転移学習</h2>
<p>事前学習済み言語モデル（例えば<a href="https://github.com/google-research/bert">BERT</a>など）を出発点として，ニュース記事見出しをカテゴリに分類するモデルを構築せよ．</p>


こちらもかなり時間がかかるので，src/q89.pyを用いてサーバ上で回した

出力ログを q89.running.log ファイルに残した

In [46]:
import tensorflow as tf
import transformers
import torch
import torch.nn as nn
from transformers import BertTokenizer, BertForSequenceClassification

class BERTmodel(nn.Module):
    def __init__(self) -> None:
        super().__init__()
        self.bert_sc = BertForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=4)

    def forward(self, encoding):
        outputs = self.bert_sc(**encoding)
        return outputs
    
class CreateDataset(torch.utils.data.Dataset):
    def __init__(self, X, y, transform=None):
        self.X = X
        self.y = y
    
    def __len__(self):
        return len(self.y)
    
    def __getitem__(self, index):
        return {
            'inputs': self.X[index],
            'labels': self.y[index]
        }

In [47]:
# BERT用
import torch
import torch.nn as nn
import torch.optim as optim
import numpy as np
from torch.utils.data import DataLoader # データローダ使ってみる
from tqdm import tqdm
import time
from sklearn.metrics import accuracy_score

def bert_train(model, train_loader, valid_loader, output_path, total_epochs, device, lr=0.01):
    optimizer = optim.SGD(model.parameters(), lr=lr)
    loss_func = nn.CrossEntropyLoss()

    # BERTモデルのエンコード用
    tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')


    model = model.to(device)
    # 指定した epoch 数だけ学習
    for epoch in range(total_epochs):
        train_total_loss = 0.
        train_acc_cnt = 0

        # パラメータ更新
        model.train()
        for batch in tqdm(train_loader):
            x_texts = batch['inputs']
            x_encordings = tokenizer(
                list(x_texts), 
                max_length=128, 
                padding='max_length', 
                truncation=True, 
                return_tensors='pt', 
                return_attention_mask=True, 
                return_token_type_ids=True
            )
            x_encordings = x_encordings.to(device)
            y = batch['labels']
            y = y.to(device)
            y_pred = model(x_encordings).logits

            # バッチの中で損失計算
            train_loss = loss_func(y_pred, y)

            # train_loss = 0.
            # for yi, yi_pred in zip(y, y_pred):
            #     loss_i = loss_func(yi_pred, yi)
            #     train_loss += loss_i
            
            optimizer.zero_grad() # 勾配の初期化
            train_loss.backward()  # 勾配計算
            optimizer.step()  # パラメータ修正
            train_total_loss += train_loss.item()

            # バッチの中で正解率の計算 # ここを修正
            for yi, yi_pred in zip(y, y_pred):
                if yi.item() == yi_pred.argmax():
                    train_acc_cnt += 1
                
        # train のロスと正解率の計算
        model.eval()
        # train_acc = measure_acc(model, X_train[:]['inputs'], X_train[:]['labels'], device)


        # valid のロスと正解率の計算
        model.eval()
        valid_acc_cnt = 0
        valid_total_loss = 0.
        with torch.no_grad():
            for batch in tqdm(valid_loader):
                x_texts = batch['inputs']
                x_encordings = tokenizer(
                    list(x_texts), 
                    max_length=128, 
                    padding='max_length', 
                    truncation=True, 
                    return_tensors='pt', 
                    return_attention_mask=True, 
                    return_token_type_ids=True
                )
                x_encordings = x_encordings.to(device)
                y = batch['labels']
                y = y.to(device)
                y_pred = model(x_encordings).logits

                # バッチの中で損失計算
                valid_loss = loss_func(y_pred, y)
                # valid_loss = 0.
                # for yi, yi_pred in zip(y, y_pred):
                #     # print(yi)
                #     # print(yi_pred)
                #     loss_i = loss_func(yi_pred, yi)
                #     valid_loss += loss_i

                optimizer.zero_grad()  # 勾配の初期化
                # valid_loss.backward()  # 勾配計算
                # optimizer.step()  # パラメータ修正
                valid_total_loss += valid_loss

                # バッチの中で正解率の計算  # ここを修正
                for yi, yi_pred in zip(y, y_pred):
                    if yi.item() == yi_pred.argmax():
                        valid_acc_cnt += 1

            # valid のロスと正解率の計算
            # valid_acc = measure_acc(model, X_valid[:]['inputs'], X_valid[:]['labels'], device)

        # 表示
        train_ave_loss = train_total_loss / len(X_train)
        train_acc = train_acc_cnt / len(X_train)
        valid_ave_loss = valid_total_loss / len(X_valid)
        valid_acc = valid_acc_cnt / len(X_valid)
        print(f"epoch{epoch}: train_loss = {train_ave_loss}, train_acc = {train_acc}, valid_loss = {valid_ave_loss}, valid_acc = {valid_acc}")

    # パラメータを保存
    torch.save(model.state_dict(), output_path)

In [54]:
# BERTモデルに入れるためのデータセットの作成
category_dict = {'b': 0, 't': 1, 'e': 2, 'm': 3}
batch_size = 32

y_train = torch.tensor(train_data['CATEGORY'].map(category_dict).values, dtype=torch.int64)
y_valid = torch.tensor(valid_data['CATEGORY'].map(category_dict).values, dtype=torch.int64)
y_test = torch.tensor(test_data['CATEGORY'].map(category_dict).values, dtype=torch.int64)

train_set = CreateDataset(train_data['TITLE'].to_list(), y_train)
valid_set = CreateDataset(valid_data['TITLE'].to_list(), y_valid)
test_set = CreateDataset(test_data['TITLE'].to_list(), y_test)

train_loader = DataLoader(train_set, batch_size=batch_size, shuffle=True)
valid_loader = DataLoader(valid_set, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(test_set, batch_size=batch_size, shuffle=True)

In [55]:
model = BERTmodel()
total_epochs = 10
lr = 0.01
device = 'cpu'

bert_train(model, train_loader, valid_loader, 'bert_param.npz', total_epochs, device, lr)

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.transform.LayerNorm.bias', 'cls.predictions.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

epoch0: train_loss = 0.0036510499323275775, train_acc = 0.08152377386746537, valid_loss = 0.025179985910654068, valid_acc = 0.7155688622754491


100%|██████████| 42/42 [04:49<00:00,  6.89s/it]
100%|██████████| 42/42 [01:30<00:00,  2.16s/it]


epoch1: train_loss = 0.0021149115124386108, train_acc = 0.1012729314863347, valid_loss = 0.011974153108894825, valid_acc = 0.8720059880239521


100%|██████████| 42/42 [05:07<00:00,  7.33s/it]
100%|██████████| 42/42 [01:38<00:00,  2.35s/it]


epoch2: train_loss = 0.0016038271140701892, train_acc = 0.10922875327592661, valid_loss = 0.011902498081326485, valid_acc = 0.8787425149700598


  5%|▍         | 2/42 [00:22<07:37, 11.45s/it]


KeyboardInterrupt: 

In [77]:
import logging

logger = logging.getLogger('testtest')
logger.setLevel(logging.INFO)

logger.info('できてる？')