# 第8章: ニューラルネット


第7章で取り組んだポジネガ分類を題材として, ニューラルネットワークで分類モデルを実装する.なお, この章ではPyTorchやTensorFlow, JAXなどの深層学習フレームワークを活用せよ.

```{warning}
本章は, `code-cell` ではなく, Markdown のコードブロック内にコードを記述しているため, Google Colab上で直接実行できません.
```

## 70. 単語埋め込みの読み込み


事前学習済み単語埋め込みを活用し, $|V|\times d_{\text{emb}}$の単語埋め込み行列を作成せよ.ここで, $|V|$は単語埋め込みの語彙数, $d_{\text{emb}}$は単語埋め込みの次元数である.ただし, 単語埋め込み行列の先頭の行ベクトル$\pmb{E}_{0,;}$は, 将来的にパディング (\<PAD\>) トークンの埋め込みベクトルとして用いたいので, ゼロベクトルとして予約せよ.ゆえに, $\pmb{E}$の2行目以降に事前学習済み単語埋め込みを読み込むことになる.

もし, Google Newsデータセットの[学習済み単語ベクトル](https://drive.google.com/file/d/0B7XkCwpI5KDYNlNUTTlSS21pQmM/edit?usp=sharing) (300万単語・フレーズ, 300次元) を全て読み込んだ場合, $|V|=3000001$, $d_{\text{emb}}=300$になるはずである (ただ, 300万単語の中には, 殆ど用いられない稀な単語も含まれるので, 語彙を削減した方がメモリの節約になる).


また, 単語埋め込み行列の構築と同時に, 単語埋め込み行列の各行のインデックス番号 (トークンID) と, 単語 (トークン) への双方向の対応付けを保持せよ.

```python
import numpy as np
from gensim.models import KeyedVectors

model = KeyedVectors.load_word2vec_format("../ch06/GoogleNews-vectors-negative300.bin.gz", binary=True)

vocab_size = len(model.key_to_index) + 1
embedding_dim = model.vector_size

embedding_matrix = np.zeros((vocab_size, embedding_dim), dtype=np.float32)

token2id = {"<PAD>": 0}
id2token = {0: "<PAD>"}

for idx, word in enumerate(model.key_to_index, start=1):
    embedding_matrix[idx] = model[word]
    token2id[word] = idx
    id2token[idx] = word

print(f"Embedding matrix shape: {embedding_matrix.shape}")
```

```bash
Embedding matrix shape: (3000001, 300)
```

## 71. データセットの読み込み

[General Language Understanding Evaluation (GLUE)](https://gluebenchmark.com/) ベンチマークで配布されている[Stanford Sentiment Treebank (SST)](https://dl.fbaipublicfiles.com/glue/data/SST-2.zip) をダウンロードし, 訓練セット (train.tsv) と開発セット (dev.tsv) のテキストと極性ラベルと読み込み, 全てのテキストをトークンID列に変換せよ.このとき, 単語埋め込みの語彙でカバーされていない単語は無視し, トークン列に含めないことにせよ.また, テキストの全トークンが単語埋め込みの語彙に含まれておらず, 空のトークン列となってしまう事例は, 訓練セットおよび開発セットから削除せよ (このため, 第7章の実験で得られた正解率と比較できなくなることに注意せよ).

事例の表現方法は任意でよいが, 例えば"contains no wit , only labored gags"がネガティブに分類される事例は, 次のような辞書オブジェクトで表現すればよい.

```py
{'text': 'contains no wit , only labored gags',
 'label': tensor([0.]),
 'input_ids': tensor([ 3475,    87, 15888,    90, 27695, 42637])}
```

この例では, `text`はテキスト, `label`は分類ラベル (ポジティブなら`tensor([1.])`, ネガティブなら`tensor([0.])`), `input_ids`はテキストのトークン列をID列で表現している.

```python
import pandas as pd
import torch
from tqdm import tqdm
from gensim.models import KeyedVectors

model = KeyedVectors.load_word2vec_format("../ch06/GoogleNews-vectors-negative300.bin.gz", binary=True)

token2id = {"<PAD>": 0}
id2token = {0: "<PAD>"}
for idx, word in enumerate(model.key_to_index, start=1):
    token2id[word] = idx
    id2token[idx] = word
    
def load_sst(path):
    df = pd.read_csv(path, sep="\t")
    examples = []
    for _, row in tqdm(df.iterrows(), total=len(df)):
        text = row["sentence"]
        label = float(row["label"])
        tokens = text.split()
        input_ids = [token2id[t] for t in tokens if t in token2id]
        if len(input_ids) == 0:
            continue
        examples.append({
            "text": text,
            "label": torch.tensor([label], dtype=torch.float32),
            "input_ids": torch.tensor(input_ids, dtype=torch.long)
        })
    return examples

train_data = load_sst("../ch07/SST-2/train.tsv")
dev_data   = load_sst("../ch07/SST-2/dev.tsv")

print(f"#train: {len(train_data)}, #dev: {len(dev_data)}")
print("Example: ", train_data[0])
```

```bash
#train: 66650, #dev: 872
Example:  {'text': 'hide new secretions from the parental units ', 'label': tensor([0.]), 'input_ids': tensor([  5785,     66, 113845,     18,     12,  15095,   1594])}
```

## 72. Bag of wordsモデルの構築


単語埋め込みの平均ベクトルでテキストの特徴ベクトルを表現し, 重みベクトルとの内積でポジティブ及びネガティブを分類するニューラルネットワーク (ロジスティック回帰モデル) を設計せよ.

```python
import torch
import torch.nn as nn

class MeanEmbeddingClassifier(nn.Module):
    def __init__(self, embedding_matrix, freeze_embedding=True):
        super().__init__()
        self.embedding = nn.Embedding.from_pretrained(
            torch.tensor(embedding_matrix), freeze=freeze_embedding
        )
        self.linear = nn.Linear(embedding_matrix.shape[1], 1)

    def forward(self, input_ids):
        embedded = self.embedding(input_ids)
        mean_embed = embedded.mean(dim=0)    
        return self.linear(mean_embed)      
```

## 73. モデルの学習

問題72で設計したモデルの重みベクトルを訓練セット上で学習せよ.ただし, 学習中は単語埋め込み行列の値を固定せよ (単語埋め込み行列のファインチューニングは行わない) .また, 学習時に損失値を表示するなど, 学習の進捗状況をモニタリングできるようにせよ.

```python
import pandas as pd
import torch
import torch.nn as nn
import numpy as np
from tqdm import tqdm
from gensim.models import KeyedVectors
from torch.utils.data import DataLoader

w2v = KeyedVectors.load_word2vec_format("../ch06/GoogleNews-vectors-negative300.bin.gz", binary=True)
vocab_size = len(w2v.key_to_index) + 1
embedding_dim = w2v.vector_size

embedding_matrix = np.zeros((vocab_size, embedding_dim), dtype=np.float32)
token2id = {"<PAD>": 0}
for idx, word in enumerate(w2v.key_to_index, start=1):
    embedding_matrix[idx] = w2v[word]
    token2id[word] = idx

def load_sst(path):
    df = pd.read_csv(path, sep="\t")
    examples = []
    for _, row in tqdm(df.iterrows(), total=len(df)):
        tokens = row["sentence"].split()
        input_ids = [token2id[t] for t in tokens if t in token2id]
        if not input_ids:
            continue
        examples.append({
            "label": torch.tensor(float(row["label"]), dtype=torch.float32),
            "input_ids": torch.tensor(input_ids, dtype=torch.long)
        })
    return examples

train_data = load_sst("../ch07/SST-2/train.tsv")
dev_data   = load_sst("../ch07/SST-2/dev.tsv")

class MeanEmbeddingClassifier(nn.Module):
    def __init__(self, embedding_matrix, freeze_embedding=True):
        super().__init__()
        self.embedding = nn.Embedding.from_pretrained(
            torch.tensor(embedding_matrix), freeze=freeze_embedding
        )
        self.linear = nn.Linear(embedding_matrix.shape[1], 1)

    def forward(self, input_ids):
        embedded = self.embedding(input_ids)
        mean_embed = embedded.mean(dim=0)    
        return self.linear(mean_embed)      

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = MeanEmbeddingClassifier(embedding_matrix).to(device)
optimizer = torch.optim.AdamW(model.parameters(), lr=1e-5)
criterion = nn.BCEWithLogitsLoss()

def train_model(model, train_data, dev_data):
    model.train()
    train_loss = []
    for ex in train_data:
        optimizer.zero_grad()
        input_ids = ex["input_ids"].to(device)
        label = ex["label"].to(device)
        logits = model(input_ids)
        loss = criterion(logits.squeeze(), label)
        loss.backward()
        optimizer.step()
        train_loss.append(loss.item())

    model.eval()
    dev_loss = []
    with torch.no_grad():
        for ex in dev_data:
            input_ids = ex["input_ids"].to(device)
            label = ex["label"].to(device)
            logits = model(input_ids)
            loss = criterion(logits.squeeze(), label)
            dev_loss.append(loss.item())

    return np.mean(train_loss), np.mean(dev_loss)

epochs = 10
for epoch in tqdm(range(epochs)):
    train_l, dev_l = train_model(model, train_data, dev_data)
    if epoch % 2 == 0:
        print(f"[Epoch {epoch}] Train loss: {train_l:.4f}, Dev loss: {dev_l:.4f}")

torch.save(model.state_dict(), "./models/model_ex73.pt")
```

```bash
 10%|████████████                                                                                                            | 1/10 [01:39<14:55, 99.51s/it][Epoch 2] Train loss: 0.5946, Dev loss: 0.6274
 30%|████████████████████████████████████                                                                                    | 3/10 [04:34<10:30, 90.13s/it][Epoch 4] Train loss: 0.5255, Dev loss: 0.5876
 50%|████████████████████████████████████████████████████████████                                                            | 5/10 [07:31<07:20, 88.08s/it][Epoch 6] Train loss: 0.4878, Dev loss: 0.5629
 70%|████████████████████████████████████████████████████████████████████████████████████                                    | 7/10 [10:23<04:19, 86.41s/it][Epoch 8] Train loss: 0.4647, Dev loss: 0.5461
 90%|████████████████████████████████████████████████████████████████████████████████████████████████████████████            | 9/10 [13:22<01:27, 87.97s/it][Epoch 10] Train loss: 0.4491, Dev loss: 0.5341
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [14:34<00:00, 87.42s/it]
```

## 74. モデルの評価

問題73で学習したモデルの開発セットにおける正解率を求めよ.

```python
import pandas as pd
import torch
import torch.nn as nn
import numpy as np
from gensim.models import KeyedVectors
from torch.utils.data import DataLoader
from sklearn.metrics import accuracy_score
from tqdm import tqdm

w2v = KeyedVectors.load_word2vec_format("../ch06/GoogleNews-vectors-negative300.bin.gz", binary=True)

vocab_size = len(w2v.key_to_index) + 1
embedding_dim = w2v.vector_size
embedding_matrix = np.zeros((vocab_size, embedding_dim), dtype=np.float32)
token2id = {"<PAD>": 0}
for idx, word in enumerate(w2v.key_to_index, start=1):
    embedding_matrix[idx] = w2v[word]
    token2id[word] = idx

def load_sst(path):
    df = pd.read_csv(path, sep="\t")
    examples = []
    for _, row in tqdm(df.iterrows(), total=len(df)):
        tokens = row["sentence"].split()
        input_ids = [token2id[t] for t in tokens if t in token2id]
        if input_ids:
            examples.append({
                "label": torch.tensor(float(row["label"]), dtype=torch.float32),
                "input_ids": torch.tensor(input_ids, dtype=torch.long)
            })
    return examples

dev_data = load_sst("../ch07/SST-2/dev.tsv")
dev_loader = DataLoader(dev_data, batch_size=1, shuffle=False)

class MeanEmbeddingClassifier(nn.Module):
    def __init__(self, embedding_matrix, freeze_embedding=True):
        super().__init__()
        self.embedding = nn.Embedding.from_pretrained(
            torch.tensor(embedding_matrix), freeze=freeze_embedding)
        self.linear = nn.Linear(embedding_matrix.shape[1], 1)

    def forward(self, input_ids):
        embedded = self.embedding(input_ids)
        mean_embed = embedded.mean(dim=1)
        return self.linear(mean_embed).squeeze(1)

def eval_model(model, data_loader):
    model.eval()
    all_preds, all_labels = [], []
    with torch.no_grad():
        for batch in data_loader:
            input_ids = batch["input_ids"].to(device)
            labels = batch["label"].to(device)
            preds = (torch.sigmoid(model(input_ids)) > 0.5).float()
            all_preds.append(preds.item())
            all_labels.append(labels.item())
    return accuracy_score(all_labels, all_preds)

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
model = MeanEmbeddingClassifier(embedding_matrix).to(device)
model.load_state_dict(torch.load("./models/model_ex73.pt"))

test_acc = eval_model(model, dev_loader)
print(f"Dev ACC: {test_acc:.4f}")
```

```bash
Dev ACC: 0.7752
```

## 75. パディング


複数の事例が与えられたとき, これらをまとめて一つのテンソル・オブジェクトで表現する関数`collate`を実装せよ.与えられた複数の事例のトークン列の長さが異なるときは, トークン列の長さが最も長いものに揃え, 0番のトークンIDでパディングをせよ.さらに, トークン列の長さが長いものから順に, 事例を並び替えよ.

例えば, 訓練データセットの冒頭の4事例が次のように表されているとき,

```py
[{'text': 'hide new secretions from the parental units',
  'label': tensor([0.]),
  'input_ids': tensor([  5785,     66, 113845,     18,     12,  15095,   1594])},
 {'text': 'contains no wit , only labored gags',
  'label': tensor([0.]),
  'input_ids': tensor([ 3475,    87, 15888,    90, 27695, 42637])},
 {'text': 'that loves its characters and communicates something rather beautiful about human nature',
  'label': tensor([1.]),
  'input_ids': tensor([    4,  5053,    45,  3305, 31647,   348,   904,  2815,    47,  1276,  1964])},
 {'text': 'remains utterly satisfied to remain the same throughout',
  'label': tensor([0.]),
  'input_ids': tensor([  987, 14528,  4941,   873,    12,   208,   898])}]
```

`collate`関数を通した結果は以下のようになることが想定される.

```py
{'input_ids': tensor([
    [     4,   5053,     45,   3305,  31647,    348,    904,   2815,     47,   1276,   1964],
    [  5785,     66, 113845,     18,     12,  15095,   1594,      0,      0,      0,      0],
    [   987,  14528,   4941,    873,     12,    208,    898,      0,      0,      0,      0],
    [  3475,     87,  15888,     90,  27695,  42637,      0,      0,      0,      0,      0]]),
 'label': tensor([
    [1.],
    [0.],
    [0.],
    [0.]])}
```

```bash
[{'text': 'hide new secretions from the parental units ', 'label': tensor([0.]), 'input_ids': tensor([  5785,     66, 113845,     18,     12,  15095,   1594])}, {'text': 'contains no wit , only labored gags ', 'label': tensor([0.]), 'input_ids': tensor([ 3475,    87, 15888,    90, 27695, 42637])}, {'text': 'that loves its characters and communicates something rather beautiful about human nature ', 'label': tensor([1.]), 'input_ids': tensor([    4,  5053,    45,  3305, 31647,   348,   904,  2815,    47,  1276,
         1964])}]
{'input_ids': tensor([[     75,    6355,     639,     165,     481,      75,       5],
        [1057396,   12147,   10894,      66,     202,    1270,       0],
        [    628,    1490,       0,       0,       0,       0,       0],
        [  20839,       0,       0,       0,       0,       0,       0]]), 'label': tensor([[1.],
        [0.],
        [0.],
        [0.]])}
```

## 76. ミニバッチ学習


問題75のパディングの処理を活用して, ミニバッチでモデルを学習せよ.また, 学習したモデルの開発セットにおける正解率を求めよ.

```python
import pandas as pd
import torch
import torch.nn as nn
import numpy as np
from tqdm import tqdm
from gensim.models import KeyedVectors
from torch.utils.data import DataLoader
from torch.nn.utils.rnn import pad_sequence
from sklearn.metrics import accuracy_score

w2v = KeyedVectors.load_word2vec_format("../ch06/GoogleNews-vectors-negative300.bin.gz", binary=True)

vocab_size = len(w2v.key_to_index) + 1
embedding_dim = w2v.vector_size

embedding_matrix = np.zeros((vocab_size, embedding_dim), dtype=np.float32)

token2id = {"<PAD>": 0}
id2token = {0: "<PAD>"}
for idx, word in enumerate(w2v.key_to_index, start=1):
    embedding_matrix[idx] = w2v[word]
    token2id[word] = idx
    id2token[idx] = word
    
def load_sst(path):
    df = pd.read_csv(path, sep="\t")
    examples = []
    for _, row in tqdm(df.iterrows(), total=len(df)):
        text = row["sentence"]
        label = float(row["label"])
        tokens = text.split()
        input_ids = [token2id[t] for t in tokens if t in token2id]
        if len(input_ids) == 0:
            continue
        examples.append({
            "text": text,
            "label": torch.tensor([label], dtype=torch.float32),
            "input_ids": torch.tensor(input_ids, dtype=torch.long)
        })
    return examples

train_data = load_sst("../ch07/SST-2/train.tsv")
dev_data   = load_sst("../ch07/SST-2/dev.tsv")

class MeanEmbeddingClassifier(nn.Module):
    def __init__(self, embedding_matrix, freeze_embedding=True):
        super().__init__()
        self.embedding = nn.Embedding.from_pretrained(
            torch.tensor(embedding_matrix), freeze=freeze_embedding
        )
        self.linear = nn.Linear(embedding_matrix.shape[1], 1)

    def forward(self, input_ids):
        mask = (input_ids != 0).unsqueeze(-1)
        embedded = self.embedding(input_ids) * mask
        mean_embed = embedded.sum(1) / mask.sum(1).clamp(min=1)
        return self.linear(mean_embed).squeeze(1)
    
def collate(batch):
    batch.sort(key=lambda x: len(x["input_ids"]), reverse=True)

    input_ids = [item["input_ids"] for item in batch]
    labels = [item["label"] for item in batch]

    padded_ids = pad_sequence(input_ids, batch_first=True, padding_value=0)
    labels = torch.stack(labels)

    return {"input_ids": padded_ids, "labels": labels}

train_loader = DataLoader(train_data, batch_size=64, shuffle=True, collate_fn=collate)
dev_loader = DataLoader(dev_data, batch_size=64, shuffle=True, collate_fn=collate)

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

model = MeanEmbeddingClassifier(embedding_matrix).to(device)
optimizer = torch.optim.AdamW(model.parameters(), lr=1e-5)
criterion = nn.BCEWithLogitsLoss()

def train_model(model, train_loader, dev_loader):
    model.train()
    train_batch_loss = []
    for batch in train_loader:
        input_ids = batch["input_ids"].to(device)
        labels = batch["labels"].to(device).squeeze(1)
        optimizer.zero_grad()
        output = model(input_ids)
        loss = criterion(output, labels)
        loss.backward()
        optimizer.step()
        train_batch_loss.append(loss.item())

    model.eval()
    dev_batch_loss = []
    with torch.no_grad():
        for batch in dev_loader:
            input_ids = batch["input_ids"].to(device)
            labels = batch["labels"].to(device).squeeze(1)
            output = model(input_ids)
            loss = criterion(output, labels)
            dev_batch_loss.append(loss.item())

    train_acc = eval_model(model, train_loader)
    dev_acc = eval_model(model, dev_loader)

    return model, np.mean(train_batch_loss), np.mean(dev_batch_loss), train_acc, dev_acc

def eval_model(model, data_loader):
    model.eval()
    all_preds = []
    all_labels = []

    with torch.no_grad():
        for batch in data_loader:
            input_ids = batch["input_ids"].to(device)
            labels = batch["labels"].to(device)
            logits = model(input_ids)
            probs = torch.sigmoid(logits)
            preds = (probs > 0.5).float()

            all_preds.extend(preds.cpu().numpy())
            all_labels.extend(labels.cpu().numpy())

    acc = accuracy_score(all_labels, all_preds)
    return acc

epochs = 100
train_loss = []
dev_loss = []
train_acc = []
dev_acc = []

for epoch in tqdm(range(epochs)):
    model, train_l, dev_l, train_a, dev_a = train_model(model, train_loader, dev_loader)
    train_loss.append(train_l)
    dev_loss.append(dev_l)
    train_acc.append(train_a)
    dev_acc.append(dev_a)

    if (epoch + 1) % 10 == 0:
        print(f"[Epoch {epoch + 1}]")
        print(f"Train loss: {train_l:.4f}, Dev loss: {dev_l:.4f}")
        print(f"Train acc : {train_a:.4f}, Dev acc : {dev_a:.4f}")
        
path_saved_model = "./models/model_ex76.pt"
torch.save(model.state_dict(), path_saved_model)
```

```bash
[Epoch 10]
Train loss: 0.6315, Dev loss: 0.6625
Train acc : 0.6616, Dev acc : 0.5436
[Epoch 20]
Train loss: 0.5855, Dev loss: 0.6358
Train acc : 0.7450, Dev acc : 0.6239
[Epoch 30]
Train loss: 0.5503, Dev loss: 0.6132
Train acc : 0.7857, Dev acc : 0.6984
[Epoch 40]
Train loss: 0.5231, Dev loss: 0.5952
Train acc : 0.8018, Dev acc : 0.7339
[Epoch 50]
Train loss: 0.5019, Dev loss: 0.5782
Train acc : 0.8100, Dev acc : 0.7569
[Epoch 60]
Train loss: 0.4852, Dev loss: 0.5662
Train acc : 0.8145, Dev acc : 0.7569
[Epoch 70]
Train loss: 0.4717, Dev loss: 0.5562
Train acc : 0.8177, Dev acc : 0.7649
[Epoch 80]
Train loss: 0.4608, Dev loss: 0.5459
Train acc : 0.8206, Dev acc : 0.7706
[Epoch 90]
Train loss: 0.4516, Dev loss: 0.5410
Train acc : 0.8221, Dev acc : 0.7775
[Epoch 100]
Train loss: 0.4439, Dev loss: 0.5328
Train acc : 0.8237, Dev acc : 0.7775
```

## 77. GPU上での学習

問題76のモデル学習をGPU上で実行せよ.また, 学習したモデルの開発セットにおける正解率を求めよ.

```{note}
問題76の解答と同様.
```

## 78. 単語埋め込みのファインチューニング


問題77の学習において, 単語埋め込みのパラメータも同時に更新するファインチューニングを導入せよ.また, 学習したモデルの開発セットにおける正解率を求めよ.

```python
import pandas as pd
import torch
import torch.nn as nn
import numpy as np
from tqdm import tqdm
from gensim.models import KeyedVectors
from torch.utils.data import DataLoader
from torch.nn.utils.rnn import pad_sequence
from sklearn.metrics import accuracy_score

w2v = KeyedVectors.load_word2vec_format("../ch06/GoogleNews-vectors-negative300.bin.gz", binary=True)

vocab_size = len(w2v.key_to_index) + 1
embedding_dim = w2v.vector_size

embedding_matrix = np.zeros((vocab_size, embedding_dim), dtype=np.float32)

token2id = {"<PAD>": 0}
id2token = {0: "<PAD>"}
for idx, word in enumerate(w2v.key_to_index, start=1):
    embedding_matrix[idx] = w2v[word]
    token2id[word] = idx
    id2token[idx] = word
    
def load_sst(path):
    df = pd.read_csv(path, sep="\t")
    examples = []
    for _, row in tqdm(df.iterrows(), total=len(df)):
        text = row["sentence"]
        label = float(row["label"])
        tokens = text.split()
        input_ids = [token2id[t] for t in tokens if t in token2id]
        if len(input_ids) == 0:
            continue
        examples.append({
            "text": text,
            "label": torch.tensor([label], dtype=torch.float32),
            "input_ids": torch.tensor(input_ids, dtype=torch.long)
        })
    return examples

train_data = load_sst("../ch07/SST-2/train.tsv")
dev_data   = load_sst("../ch07/SST-2/dev.tsv")

class MeanEmbeddingClassifier(nn.Module):
    def __init__(self, embedding_matrix, freeze_embedding=True):
        super().__init__()
        self.embedding = nn.Embedding.from_pretrained(
            torch.tensor(embedding_matrix), freeze=freeze_embedding
        )
        self.linear = nn.Linear(embedding_matrix.shape[1], 1)

    def forward(self, input_ids):
        mask = (input_ids != 0).unsqueeze(-1)
        embedded = self.embedding(input_ids) * mask
        mean_embed = embedded.sum(1) / mask.sum(1).clamp(min=1)
        return self.linear(mean_embed).squeeze(1)
    
def collate(batch):
    batch.sort(key=lambda x: len(x["input_ids"]), reverse=True)

    input_ids = [item["input_ids"] for item in batch]
    labels = [item["label"] for item in batch]

    padded_ids = pad_sequence(input_ids, batch_first=True, padding_value=0)
    labels = torch.stack(labels)

    return {"input_ids": padded_ids, "labels": labels}

train_loader = DataLoader(train_data, batch_size=64, shuffle=True, collate_fn=collate)
dev_loader = DataLoader(dev_data, batch_size=64, shuffle=True, collate_fn=collate)

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

model = MeanEmbeddingClassifier(embedding_matrix, freeze_embedding=False).to(device)
optimizer = torch.optim.AdamW(model.parameters(), lr=1e-5)
criterion = nn.BCEWithLogitsLoss()

def train_model(model, train_loader, dev_loader):
    model.train()
    train_batch_loss = []
    for batch in train_loader:
        input_ids = batch["input_ids"].to(device)
        labels = batch["labels"].to(device).squeeze(1)
        optimizer.zero_grad()
        output = model(input_ids)
        loss = criterion(output, labels)
        loss.backward()
        optimizer.step()
        train_batch_loss.append(loss.item())

    model.eval()
    dev_batch_loss = []
    with torch.no_grad():
        for batch in dev_loader:
            input_ids = batch["input_ids"].to(device)
            labels = batch["labels"].to(device).squeeze(1)
            output = model(input_ids)
            loss = criterion(output, labels)
            dev_batch_loss.append(loss.item())

    train_acc = eval_model(model, train_loader)
    dev_acc = eval_model(model, dev_loader)

    return model, np.mean(train_batch_loss), np.mean(dev_batch_loss), train_acc, dev_acc

def eval_model(model, data_loader):
    model.eval()
    all_preds = []
    all_labels = []

    with torch.no_grad():
        for batch in data_loader:
            input_ids = batch["input_ids"].to(device)
            labels = batch["labels"].to(device)
            logits = model(input_ids)
            probs = torch.sigmoid(logits)
            preds = (probs > 0.5).float()

            all_preds.extend(preds.cpu().numpy())
            all_labels.extend(labels.cpu().numpy())

    acc = accuracy_score(all_labels, all_preds)
    return acc

epochs = 100
train_loss = []
dev_loss = []
train_acc = []
dev_acc = []

for epoch in tqdm(range(epochs)):
    model, train_l, dev_l, train_a, dev_a = train_model(model, train_loader, dev_loader)
    train_loss.append(train_l)
    dev_loss.append(dev_l)
    train_acc.append(train_a)
    dev_acc.append(dev_a)

    if (epoch + 1) % 10 == 0:
        print(f"[Epoch {epoch + 1}]")
        print(f"Train loss: {train_l:.4f}, Dev loss: {dev_l:.4f}")
        print(f"Train acc : {train_a:.4f}, Dev acc : {dev_a:.4f}")
        
path_saved_model = "./models/model_ex78.pt"
torch.save(model.state_dict(), path_saved_model)
```

```bash
[Epoch 10]
Train loss: 0.5800, Dev loss: 0.6252
Train acc : 0.7714, Dev acc : 0.6548
[Epoch 20]
Train loss: 0.4502, Dev loss: 0.5404
Train acc : 0.8561, Dev acc : 0.7603
[Epoch 30]
Train loss: 0.3604, Dev loss: 0.4802
Train acc : 0.8805, Dev acc : 0.8073
[Epoch 40]
Train loss: 0.3066, Dev loss: 0.4413
Train acc : 0.8945, Dev acc : 0.8211
[Epoch 50]
Train loss: 0.2724, Dev loss: 0.4249
Train acc : 0.9039, Dev acc : 0.8222
[Epoch 60]
Train loss: 0.2485, Dev loss: 0.4159
Train acc : 0.9121, Dev acc : 0.8257
[Epoch 70]
Train loss: 0.2308, Dev loss: 0.4122
Train acc : 0.9179, Dev acc : 0.8234
[Epoch 80]
Train loss: 0.2167, Dev loss: 0.4100
Train acc : 0.9229, Dev acc : 0.8211
[Epoch 90]
Train loss: 0.2054, Dev loss: 0.4107
Train acc : 0.9265, Dev acc : 0.8177
[Epoch 100]
Train loss: 0.1960, Dev loss: 0.4144
Train acc : 0.9297, Dev acc : 0.8131
```

## 79. アーキテクチャの変更


ニューラルネットワークのアーキテクチャを自由に変更し, モデルを学習せよ.また, 学習したモデルの開発セットにおける正解率を求めよ.例えば, テキストの特徴ベクトル (単語埋め込みの平均ベクトル) に対して多層のニューラルネットワークを通したり, 畳み込みニューラルネットワーク (CNN; Convolutional Neural Network) や再帰型ニューラルネットワーク (RNN; Recurrent Neural Network) などのモデルの学習に挑戦するとよい.

```python
import pandas as pd
import torch
import torch.nn as nn
import numpy as np
from tqdm import tqdm
from gensim.models import KeyedVectors
from torch.utils.data import DataLoader
from torch.nn.utils.rnn import pad_sequence
from sklearn.metrics import accuracy_score
import torch.nn.functional as F

w2v = KeyedVectors.load_word2vec_format("../ch06/GoogleNews-vectors-negative300.bin.gz", binary=True)

vocab_size = len(w2v.key_to_index) + 1
embedding_dim = w2v.vector_size

embedding_matrix = np.zeros((vocab_size, embedding_dim), dtype=np.float32)

token2id = {"<PAD>": 0}
id2token = {0: "<PAD>"}
for idx, word in enumerate(w2v.key_to_index, start=1):
    embedding_matrix[idx] = w2v[word]
    token2id[word] = idx
    id2token[idx] = word
    
def load_sst(path):
    df = pd.read_csv(path, sep="\t")
    examples = []
    for _, row in tqdm(df.iterrows(), total=len(df)):
        text = row["sentence"]
        label = float(row["label"])
        tokens = text.split()
        input_ids = [token2id[t] for t in tokens if t in token2id]
        if len(input_ids) == 0:
            continue
        examples.append({
            "text": text,
            "label": torch.tensor([label], dtype=torch.float32),
            "input_ids": torch.tensor(input_ids, dtype=torch.long)
        })
    return examples

train_data = load_sst("../ch07/SST-2/train.tsv")
dev_data   = load_sst("../ch07/SST-2/dev.tsv")

class CNNClassifier(nn.Module):
    def __init__(self, embedding_matrix, freeze_embedding=True, num_filters=100, filter_sizes=(3, 4, 5), dropout=0.5):
        super().__init__()
        vocab_size, embedding_dim = embedding_matrix.shape
        
        self.embedding = nn.Embedding.from_pretrained(
            torch.tensor(embedding_matrix, dtype=torch.float32),
            freeze=freeze_embedding
        )
        self.convs = nn.ModuleList([
            nn.Conv1d(in_channels=embedding_dim, 
                     out_channels=num_filters, 
                     kernel_size=fs)
            for fs in filter_sizes
        ])
        self.dropout = nn.Dropout(dropout)
        self.fc = nn.Linear(len(filter_sizes) * num_filters, 1)
        
    def forward(self, input_ids):
        embedded = self.embedding(input_ids)
        embedded = embedded.permute(0, 2, 1)
        
        conv_outputs = []
        for conv in self.convs:
            conv_output = F.relu(conv(embedded))
            pooled = F.max_pool1d(conv_output, conv_output.size(2))
            conv_outputs.append(pooled.squeeze(2))

        cat = torch.cat(conv_outputs, dim=1)
        logits = self.fc(self.dropout(cat)).squeeze(1)
        
        return logits
    
def collate(batch):
    batch.sort(key=lambda x: len(x["input_ids"]), reverse=True)

    input_ids = [item["input_ids"] for item in batch]
    labels = [item["label"] for item in batch]

    padded_ids = pad_sequence(input_ids, batch_first=True, padding_value=0)
    labels = torch.stack(labels)

    return {"input_ids": padded_ids, "labels": labels}

train_loader = DataLoader(train_data, batch_size=64, shuffle=True, collate_fn=collate)
dev_loader = DataLoader(dev_data, batch_size=64, shuffle=True, collate_fn=collate)

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

model = CNNClassifier(embedding_matrix, freeze_embedding=False).to(device)
optimizer = torch.optim.AdamW(model.parameters(), lr=1e-5)
criterion = nn.BCEWithLogitsLoss()

def train_model(model, train_loader, dev_loader):
    model.train()
    train_batch_loss = []
    for batch in train_loader:
        input_ids = batch["input_ids"].to(device)
        labels = batch["labels"].to(device).squeeze(1)
        optimizer.zero_grad()
        output = model(input_ids)
        loss = criterion(output, labels)
        loss.backward()
        optimizer.step()
        train_batch_loss.append(loss.item())

    model.eval()
    dev_batch_loss = []
    with torch.no_grad():
        for batch in dev_loader:
            input_ids = batch["input_ids"].to(device)
            labels = batch["labels"].to(device).squeeze(1)
            output = model(input_ids)
            loss = criterion(output, labels)
            dev_batch_loss.append(loss.item())

    train_acc = eval_model(model, train_loader)
    dev_acc = eval_model(model, dev_loader)

    return model, np.mean(train_batch_loss), np.mean(dev_batch_loss), train_acc, dev_acc

def eval_model(model, data_loader):
    model.eval()
    all_preds = []
    all_labels = []

    with torch.no_grad():
        for batch in data_loader:
            input_ids = batch["input_ids"].to(device)
            labels = batch["labels"].to(device)
            logits = model(input_ids)
            probs = torch.sigmoid(logits)
            preds = (probs > 0.5).float()

            all_preds.extend(preds.cpu().numpy())
            all_labels.extend(labels.cpu().numpy())

    acc = accuracy_score(all_labels, all_preds)
    return acc

epoch = 100
train_loss = []
dev_loss = []
train_acc = []
dev_acc = []

for epoch in tqdm(range(epoch)):
    model, train_l, dev_l, train_a, dev_a = train_model(model, train_loader, dev_loader)
    train_loss.append(train_l)
    dev_loss.append(dev_l)
    train_acc.append(train_a)
    dev_acc.append(dev_a)

    if (epoch + 1) % 10 == 0:
        print(f"[Epoch {epoch + 1}]")
        print(f"Train loss: {train_l:.4f}, Dev loss: {dev_l:.4f}")
        print(f"Train acc : {train_a:.4f}, Dev acc : {dev_a:.4f}")
        
path_saved_model = "./models/model_ex79.pt"
torch.save(model.state_dict(), path_saved_model)
```

```bash
[Epoch 10]
Train loss: 0.2846, Dev loss: 0.4100
Train acc : 0.8932, Dev acc : 0.8165
[Epoch 20]
Train loss: 0.2158, Dev loss: 0.4273
Train acc : 0.9208, Dev acc : 0.8119
[Epoch 30]
Train loss: 0.1761, Dev loss: 0.4495
Train acc : 0.9383, Dev acc : 0.8177
[Epoch 40]
Train loss: 0.1492, Dev loss: 0.4782
Train acc : 0.9488, Dev acc : 0.8131
[Epoch 50]
Train loss: 0.1298, Dev loss: 0.5264
Train acc : 0.9569, Dev acc : 0.8119
[Epoch 60]
Train loss: 0.1139, Dev loss: 0.5743
Train acc : 0.9627, Dev acc : 0.8085
[Epoch 70]
Train loss: 0.1015, Dev loss: 0.6285
Train acc : 0.9670, Dev acc : 0.8085
[Epoch 80]
Train loss: 0.0902, Dev loss: 0.6817
Train acc : 0.9704, Dev acc : 0.8028
[Epoch 90]
Train loss: 0.0823, Dev loss: 0.7463
Train acc : 0.9733, Dev acc : 0.7993
[Epoch 100]
Train loss: 0.0758, Dev loss: 0.7942
Train acc : 0.9756, Dev acc : 0.7924
```