  <img src="https://s8.hostingkartinok.com/uploads/images/2018/08/308b49fcfbc619d629fe4604bceb67ac.jpg" width=500, height=450>
  <h3 style="text-align: center;"><b>Физтех-Школа Прикладной математики и информатики (ФПМИ) МФТИ</b></h3>

  ---

Student: Oleg Navolotsky / Наволоцкий Олег  
Stepik: https://stepik.org/users/2403189  
Telegram: [@mehwhatever0](https://t.me/mehwhatever0)  

**Note**: reproducibility depends on [different things](https://pytorch.org/docs/stable/notes/randomness.html):
>Completely reproducible results are not guaranteed across PyTorch releases, individual commits, or different platforms. Furthermore, results may not be reproducible between CPU and GPU executions, even when using identical seeds.

Some used software versions:
- PyTorch 1.8.0
- torchtext 0.9.0
- captum 0.3.1
- NumPy 1.19.2
- Python 3.8.8 (default, Feb 24 2021, 15:54:32) \[MSC v.1928 64 bit (AMD64)] :: Anaconda, Inc. on win32
- NVIDIA Driver 461.33
- NVIDIA CUDA 11.2
- Windows 10 Pro 1909, build 18363.535


Hardware:
- i5 2500 8 gb
- GTX 1060 6 gb

In [1]:
import os
import random

import numpy as np
import torch


SEED = 1234


def enable_reproducibility(
        seed=SEED, raise_if_no_deterministic=True,
        cudnn_deterministic=True, disable_cudnn_benchmarking=True):
    # https://pytorch.org/docs/stable/notes/randomness.html#avoiding-nondeterministic-algorithms
    torch.use_deterministic_algorithms(raise_if_no_deterministic)

    # https://docs.nvidia.com/cuda/cublas/index.html#cublasApi_reproducibility
    os.environ['CUBLAS_WORKSPACE_CONFIG'] = ":4096:8"
    
    torch.backends.cudnn.benchmark = not disable_cudnn_benchmarking
    torch.backends.cudnn.deterministic = cudnn_deterministic

    torch.manual_seed(seed)
    np.random.seed(seed)
    random.seed(seed)
    os.environ['PYTHONHASHSEED'] = str(seed)

In [2]:
enable_reproducibility()

  and should_run_async(code)


# Задание 3

## Классификация текстов

В этом задании вам предстоит попробовать несколько методов, используемых в задаче классификации, а также понять насколько хорошо модель понимает смысл слов и какие слова в примере влияют на результат.

В этом задании мы будем использовать библиотеку torchtext. Она довольна проста в использовании и поможет нам сконцентрироваться на задаче, а не на написании Dataloader-а.

Датасет, на котором мы будем проводить эксперименты, — это комментарии к фильмам с сайта IMDB.

In [3]:
import shelve
import time

import torchtext
from torch.utils.data import random_split

start = time.time()
with shelve.open('imdb_dataset_fast_cache') as imdb_dataset_fast_cache:
    if any(split not in imdb_dataset_fast_cache for split in ('train', 'valid', 'test') ):
        print("Loading dataset from slow torchtext files...")
        train_valid, test = torchtext.datasets.IMDB(split=('train', 'test'))
        train_valid, test = list(train_valid), list(test)
        # default value of the argument split_ratio in torchtext.legacy.data.Data.split()
        split_ratio = 0.7
        num_train = int(len(train_valid) * split_ratio)  
        train, valid = random_split(train_valid, [num_train, len(train_valid) - num_train])
        train = list(train)
        valid = list(valid)
        imdb_dataset_fast_cache['train'] = train
        imdb_dataset_fast_cache['valid'] = valid
        imdb_dataset_fast_cache['test'] = test
        print("Dataset cached.")
    else:
        train = imdb_dataset_fast_cache['train']
        valid = imdb_dataset_fast_cache['valid']
        test = imdb_dataset_fast_cache['test']
        print("Dataset loaded from cache.")
print(f"Dataset downloaded. Time spent: {time.time() - start}")

Dataset loaded from cache.
Dataset downloaded. Time spent: 0.3427903652191162


In [4]:
from collections import Counter
from itertools import chain

from torchtext.data.utils import get_tokenizer
from torchtext.vocab import Vocab

# get tokenizer as used in torchtext.legacy.data.Field by default (string.split)
tokenizer = get_tokenizer(None) 
counter = Counter(chain.from_iterable(tokenizer(line) for _, line in train))
vocab = Vocab(counter, min_freq=1)
PAD_TOKEN = '<pad>'  # default special padding token in Vocab

  and should_run_async(code)


In [5]:
labels = set([label for (label, _) in chain(train, valid)])
num_classes = len(labels)
num_classes, labels

(2, {'neg', 'pos'})

In [6]:
def label_transform(label):
    if label == 'pos':
        return 1
    elif label == 'neg':
        return 0
    raise ValueError(f"unknown label {label}")

def label_inverse_transform(idx):
    if idx == 1:
        return 'pos'
    elif idx == 0:
        return 'neg'
    raise ValueError(f"unknown idx {idx}")

def text_transform(text, lower=True):
    if lower:
        text = text.lower()
    return [vocab[token] for token in tokenizer(text)]

In [7]:
from torch.nn.utils.rnn import pad_sequence

def collate_batch(batch):
    label_list, text_list, texts_lengths = [], [], []
    for (label, text) in batch:
        label_list.append(label_transform(label))
        token_indices = text_transform(text)
        texts_lengths.append(len(token_indices))
        processed_text = torch.tensor(token_indices)
        text_list.append(processed_text)
    return torch.tensor(label_list), pad_sequence(text_list, batch_first=True, padding_value=vocab[PAD_TOKEN]), texts_lengths

In [8]:
import math
import random

from torch.utils.data import DataLoader, Sampler


class BatchSamplerMimickingBucketIterator(Sampler):
    def __init__(self, raw_dataset_list, tokenizer, batch_size, drop_last=False, pool_size_multiplier=1, decreasing_order_within_batch=True):
        self._batch_size = batch_size
        self._drop_last = drop_last
        self._pool_size_multiplier = pool_size_multiplier
        self._indices_and_lengths = [(i, len(tokenizer(text))) for i, (_, text) in enumerate(raw_dataset_list)]
        self._decreasing_order_within_batch = decreasing_order_within_batch
    
    def __len__(self):
        round_ = math.floor if self._drop_last else math.ceil
        return round_(len(self._indices_and_lengths) / self._batch_size)
    
    def __iter__(self):
        batch_size = self._batch_size
        drop_last = self._drop_last
        pool_size = batch_size * self._pool_size_multiplier
        indices = self._indices_and_lengths
        reverse = self._decreasing_order_within_batch
        random.shuffle(indices)
        pooled_indices = []
        # create pool of indices with similar lengths
        for i in range(0, len(indices), batch_size * pool_size):
            pooled_indices.extend(sorted(indices[i:i + batch_size * pool_size], key=lambda x: x[1], reverse=reverse))

        pooled_indices = [x[0] for x in pooled_indices]

        # yield indices for current batch
        last_index = len(pooled_indices) - len(pooled_indices) % batch_size
        for i in range(0, len(pooled_indices), batch_size):
            if drop_last and i == last_index:
                break
            yield pooled_indices[i:i + batch_size]

In [9]:
from torch.utils.data import DataLoader

batch_size = 64
# 8 * 100 is taken from here:
# https://github.com/pytorch/text/blob/master/examples/legacy_tutorial/migration_tutorial.ipynb
pool_size_multiplier = 8 * 100 // batch_size

batch_sampler = BatchSamplerMimickingBucketIterator(train, tokenizer, batch_size, pool_size_multiplier=pool_size_multiplier, drop_last=True)
train_loader = DataLoader(train, batch_sampler=batch_sampler, collate_fn=collate_batch)
valid_loader = DataLoader(valid, batch_size=batch_size, collate_fn=collate_batch)
test_loader = DataLoader(test, batch_size=batch_size, collate_fn=collate_batch)

In [10]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

  ## RNN

  Для начала попробуем использовать рекурентные нейронные сети. На семинаре вы познакомились с GRU, вы можете также попробовать LSTM. Можно использовать для классификации как hidden_state, так и output последнего токена.

In [11]:
from torch import nn

class RNNBaseline(nn.Module):
    def __init__(
            self, vocab_size, embedding_dim, hidden_dim, output_dim,
            n_layers, bidirectional, dropout, pad_idx):
        super().__init__()
        self.embedding = nn.Embedding(vocab_size, embedding_dim, padding_idx=pad_idx)
        self.rnn = nn.GRU(
            input_size=embedding_dim,
            hidden_size=hidden_dim,
            num_layers=n_layers, dropout=dropout, bidirectional=bidirectional)
        self.fc = nn.Linear((bidirectional + 1) * hidden_dim, output_dim)

    def forward(self, texts, texts_lengths):
        embedded = self.embedding(texts)
        packed_embedded = nn.utils.rnn.pack_padded_sequence(embedded, texts_lengths, batch_first=True, enforce_sorted=False)
        _, hidden = self.rnn(packed_embedded)
        features = torch.hstack((hidden[-2], hidden[-1]))
        return self.fc(features)

Обучите сетку! Используйте любые вам удобные инструменты, Catalyst, PyTorch Lightning или свои велосипеды.

In [12]:
from copy import deepcopy

import numpy as np
from tqdm.notebook import tqdm

def training(model, train_loader, valid_loader, patience):
    min_loss = np.inf
    cur_patience = 0
    for epoch in range(1, max_epochs + 1):
        train_loss = 0.0
        model.train()
        pbar = tqdm(enumerate(train_loader), total=len(train_loader), leave=True)
        pbar.set_description(f"epoch {epoch}, training")
        for it, batch in pbar: 
            labels, texts, texts_lengths = batch
            labels, texts = labels.to(device), texts.to(device)
            if labels.ndim == 1:
                labels = labels.unsqueeze(1)
            opt.zero_grad()
            output = model(texts, texts_lengths)
            labels = labels.type_as(output)
            loss = loss_func(output, labels)
            loss.backward()
            opt.step()
            train_loss += loss.item()

        train_loss /= len(train_loader)
        val_loss = 0.0
        model.eval()
        pbar = tqdm(enumerate(valid_loader), total=len(valid_loader), leave=True)
        pbar.set_description(f"epoch {epoch}, validation")
        with torch.no_grad():
            for it, batch in pbar:
                labels, texts, texts_lengths = batch
                labels, texts = labels.to(device), texts.to(device)
                if labels.ndim == 1:
                    labels = labels.unsqueeze(1)
                output = model(texts, texts_lengths)
                labels = labels.type_as(output)
                loss = loss_func(output, labels)
                val_loss += loss.item()
        val_loss /= len(valid_loader)
        spam = False
        if val_loss < min_loss:
            min_loss = val_loss
            best_model_state_dict = deepcopy(model.state_dict())
            cur_patience = 0
        else:
            cur_patience += 1
            if cur_patience > patience:
                spam = True
        print('Epoch: {}, Training Loss: {}, Validation Loss: {}'.format(epoch, train_loss, val_loss))
        if spam:
            print(f"Patience is over. Training stopped after {patience + 1} epochs "
                  "without decreasing validation loss.")
            break
    return best_model_state_dict

Поиграйтесь с гиперпараметрами

In [13]:
enable_reproducibility()

vocab_size = len(vocab)
pad_idx = vocab[PAD_TOKEN]
emb_dim = 100
hidden_dim = 256
output_dim = 1
n_layers = 2
bidirectional = True
dropout = 0.2
patience = 3

model = RNNBaseline(
    vocab_size=vocab_size,
    embedding_dim=emb_dim,
    hidden_dim=hidden_dim,
    output_dim=output_dim,
    n_layers=n_layers,
    bidirectional=bidirectional,
    dropout=dropout,
    pad_idx=pad_idx
)
model = model.to(device)

opt = torch.optim.Adam(model.parameters())
loss_func = nn.BCEWithLogitsLoss()

max_epochs = 20

In [14]:
%%time
enable_reproducibility(raise_if_no_deterministic=False)
best_model_state_dict = training(model, train_loader, valid_loader, patience)
enable_reproducibility()

  0%|          | 0/273 [00:00<?, ?it/s]

  0%|          | 0/118 [00:00<?, ?it/s]

Epoch: 1, Training Loss: 0.6146579961200337, Validation Loss: 0.6222612438565593


  0%|          | 0/273 [00:00<?, ?it/s]

  0%|          | 0/118 [00:00<?, ?it/s]

Epoch: 2, Training Loss: 0.415011859529621, Validation Loss: 0.43982345686625623


  0%|          | 0/273 [00:00<?, ?it/s]

  0%|          | 0/118 [00:00<?, ?it/s]

Epoch: 3, Training Loss: 0.2542561568019591, Validation Loss: 0.5177739997536449


  0%|          | 0/273 [00:00<?, ?it/s]

  0%|          | 0/118 [00:00<?, ?it/s]

Epoch: 4, Training Loss: 0.15477233025289716, Validation Loss: 0.4227705445708865


  0%|          | 0/273 [00:00<?, ?it/s]

  0%|          | 0/118 [00:00<?, ?it/s]

Epoch: 5, Training Loss: 0.08658311832969115, Validation Loss: 0.5345852810700061


  0%|          | 0/273 [00:00<?, ?it/s]

  0%|          | 0/118 [00:00<?, ?it/s]

Epoch: 6, Training Loss: 0.038926231528584584, Validation Loss: 0.630939403832969


  0%|          | 0/273 [00:00<?, ?it/s]

  0%|          | 0/118 [00:00<?, ?it/s]

Epoch: 7, Training Loss: 0.02319928084098437, Validation Loss: 0.6858035110063472


  0%|          | 0/273 [00:00<?, ?it/s]

  0%|          | 0/118 [00:00<?, ?it/s]

Epoch: 8, Training Loss: 0.01369139640509951, Validation Loss: 0.7156649151722253
Patience is over. Training stopped after 4 epochs without decreasing validation loss.
Wall time: 6min 57s


Посчитайте f1-score вашего классификатора на тестовом датасете.

In [15]:
from sklearn.metrics import f1_score as sk_f1_score

@torch.no_grad()
def testing(model, test_loader, device):
    all_results = []
    all_labels = []
    model.eval()
    for labels, texts, texts_lengths in tqdm(test_loader, desc="testing"):
        all_labels.append(labels)
        texts = texts.to(device)
        all_results.append(model(texts, texts_lengths))
    all_results = torch.cat(all_results)
    all_labels = torch.cat(all_labels).view(all_results.shape)
    return all_results, all_labels

def binary_predict(input, output_type=torch.long):
    return (torch.sigmoid(input) > 0.5).type(output_type)

def f1_score(y_pred, y_true):
    y_pred = y_pred.cpu().numpy()
    y_true = y_true.cpu().numpy()
    return sk_f1_score(y_true, y_pred)

**Ответ**:

In [16]:
model.load_state_dict(best_model_state_dict)
outputs, labels = testing(model, test_loader, device)
preds = binary_predict(outputs)
print(f"f1-score of the RNNBaseline: {f1_score(preds, labels)}")

  and should_run_async(code)


testing:   0%|          | 0/391 [00:00<?, ?it/s]

f1-score of the RNNBaseline: 0.8428285366524009


  Вы можете использовать Conv2d с `in_channels=1, kernel_size=(kernel_sizes[0], emb_dim))` или Conv1d c `in_channels=emb_dim, kernel_size=kernel_size[0]`. Но хорошенько подумайте над shape в обоих случаях.

In [17]:
import torch
from torch import nn

class CNN(nn.Module):
    def __init__(
        self,
        vocab_size, emb_dim,
        out_channels, kernel_sizes,
        dropout=0.5, classes_num=1
    ):
        super().__init__()
        self._init_embeddings(vocab_size, emb_dim)
        self._init_convs(emb_dim, out_channels, kernel_sizes)
        self._init_fc(len(kernel_sizes) * out_channels, classes_num)

    def _init_embeddings(self, vocab_size, emb_dim):
        self.embedding = nn.Embedding(vocab_size, emb_dim)

    def _init_convs(self, in_channels, out_channels, kernel_sizes):
        convs = []
        for i, kernel_size in enumerate(kernel_sizes):
            convs.append(
                nn.Sequential(
                    nn.Conv1d(
                        in_channels=in_channels,
                        out_channels=out_channels,
                        kernel_size=kernel_size),
                    nn.ReLU(inplace=True),
                    nn.AdaptiveMaxPool1d(1),
                    nn.Dropout(dropout)
                )
            )
        self.convs = nn.ModuleList(convs)

    def _init_fc(self, features_num, classes_num):
        self.fc = nn.Linear(features_num, classes_num)
        
    def forward(self, texts, *_): 
        embedded = self.embedding(texts)
        embedded = embedded.permute(0, 2, 1)
        conved = [conv(embedded) for conv in self.convs]
        features = torch.cat(conved, dim=1).squeeze(-1)
        return self.fc(features)

  and should_run_async(code)


In [18]:
kernel_sizes = [3, 4, 5]
vocab_size = len(vocab)
out_channels = 64
dropout = 0.5
dim = 300
patience = 3

model = CNN(vocab_size=vocab_size, emb_dim=dim, out_channels=out_channels,
            kernel_sizes=kernel_sizes, dropout=dropout)
model.to(device)
opt = torch.optim.Adam(model.parameters())
loss_func = nn.BCEWithLogitsLoss()
max_epochs = 30

  Обучите!

In [19]:
%%time
enable_reproducibility(raise_if_no_deterministic=False)
best_model_state_dict = training(model, train_loader, valid_loader, patience)
enable_reproducibility()

  0%|          | 0/273 [00:00<?, ?it/s]

  0%|          | 0/118 [00:00<?, ?it/s]

Epoch: 1, Training Loss: 0.6446519849937914, Validation Loss: 0.48709957705715956


  0%|          | 0/273 [00:00<?, ?it/s]

  0%|          | 0/118 [00:00<?, ?it/s]

Epoch: 2, Training Loss: 0.4992998676858979, Validation Loss: 0.4255752631668317


  0%|          | 0/273 [00:00<?, ?it/s]

  0%|          | 0/118 [00:00<?, ?it/s]

Epoch: 3, Training Loss: 0.4284234596041096, Validation Loss: 0.3751080806477595


  0%|          | 0/273 [00:00<?, ?it/s]

  0%|          | 0/118 [00:00<?, ?it/s]

Epoch: 4, Training Loss: 0.35162894094819985, Validation Loss: 0.36326472372826885


  0%|          | 0/273 [00:00<?, ?it/s]

  0%|          | 0/118 [00:00<?, ?it/s]

Epoch: 5, Training Loss: 0.2756578110895314, Validation Loss: 0.3750240553991269


  0%|          | 0/273 [00:00<?, ?it/s]

  0%|          | 0/118 [00:00<?, ?it/s]

Epoch: 6, Training Loss: 0.2077772510357392, Validation Loss: 0.4353838670304266


  0%|          | 0/273 [00:00<?, ?it/s]

  0%|          | 0/118 [00:00<?, ?it/s]

Epoch: 7, Training Loss: 0.1490694847940416, Validation Loss: 0.458261256121983


  0%|          | 0/273 [00:00<?, ?it/s]

  0%|          | 0/118 [00:00<?, ?it/s]

Epoch: 8, Training Loss: 0.09036829796766405, Validation Loss: 0.528672018041045
Patience is over. Training stopped after 4 epochs without decreasing validation loss.
Wall time: 3min 38s


  Посчитайте f1-score вашего классификатора.

  **Ответ**:

In [20]:
model.load_state_dict(best_model_state_dict)
outputs, labels = testing(model, test_loader, device)
preds = binary_predict(outputs)
print(f"f1-score of the CNN: {f1_score(preds, labels)}")

testing:   0%|          | 0/391 [00:00<?, ?it/s]

f1-score of the CNN: 0.8434916282354897


  ## Интерпретируемость

  Посмотрим, куда смотрит наша модель. Достаточно запустить код ниже.

In [21]:
# !conda install captum -c pytorch

  and should_run_async(code)


In [22]:
from captum.attr import LayerIntegratedGradients, TokenReferenceBase, visualization


def get_sentence_min_len():
    kernel_sizes = globals().get('kernel_sizes')
    return max(kernel_sizes) if kernel_sizes is not None else 7

def forward_with_softmax(model, input):
    logits = model(input)
    return torch.softmax(logits, 0)[0][1]

def forward_with_sigmoid(model, input):
    return torch.sigmoid(model(input))

# accumalate couple samples in this array for visualization purposes
vis_data_records_ig = []

def interpret_sentence(model, sentence, label=0, min_len=None, print_result=False):
    model.eval()
    text = [tok for tok in tokenizer(sentence)]
    if min_len is None:
        min_len = get_sentence_min_len()
    if len(text) < min_len:
        text += [PAD_TOKEN] * (min_len - len(text))
    indexed = [vocab[t] for t in text]

    model.zero_grad()

    input_indices = torch.tensor(indexed, device=device)
    input_indices = input_indices.unsqueeze(0)
    
    # input_indices dim: [sequence_length]
    seq_length = len(text)

    # predict
    pred = forward_with_sigmoid(model, input_indices).item()
    pred_ind = round(pred)

    # generate reference indices for each sample
    reference_indices = token_reference.generate_reference(seq_length, device=device).unsqueeze(0)

    # compute attributions and approximation delta using layer integrated gradients
    attributions_ig, delta = lig.attribute(input_indices, reference_indices, n_steps=5000, return_convergence_delta=True)
    if print_result:
        print("text: ", sentence)
        print(f"true: {label_inverse_transform(label)}, pred: {label_inverse_transform(pred_ind)} ({pred:.2f}), delta: {abs(delta.item())}")

    add_attributions_to_visualizer(attributions_ig, text, pred, pred_ind, label, delta, vis_data_records_ig)
    
def add_attributions_to_visualizer(attributions, text, pred, pred_ind, label, delta, vis_data_records):
    attributions = attributions.sum(dim=2).squeeze(0)
    attributions = attributions / torch.norm(attributions)
    attributions = attributions.cpu().detach().numpy()

    # storing couple samples in an array for visualization purposes
    vis_data_records.append(
        visualization.VisualizationDataRecord(
            attributions,
            pred,
            label_inverse_transform(pred_ind),
            label_inverse_transform(label),
            label_inverse_transform(1),
            attributions.sum(),       
            text,
            delta
        )
    )

  def _figure_formats_changed(self, name, old, new):


In [23]:
predefined_sentences = [
    ('It was a fantastic performance !', 1),
    ('Best film ever', 1),
    ('Such a great show!', 1),
    ('It was a horrible movie', 0),
    ('I\'ve never watched something as bad', 0),
    ('It is a disgusting movie!', 0),
]

  and should_run_async(code)


Попробуйте добавить свои примеры!

In [24]:
my_sentences = [
    ("It is definitely worth watching", 1),
    ("This movie made my evening", 1),
    ("You'll want to advice your enemies to watch it", 0),
    ("No way you find more useless thing to spend your time", 0)
]

def show_visualization(model, sentences):
    global token_reference, lig, vis_data_records_ig
    token_reference = TokenReferenceBase(reference_token_idx=vocab[PAD_TOKEN])
    lig = LayerIntegratedGradients(model, model.embedding)
    vis_data_records_ig = []
    enable_reproducibility(raise_if_no_deterministic=False)
    for sentence, label in sentences:
        interpret_sentence(model, sentence, label)
    enable_reproducibility()
    print('Visualize attributions based on Integrated Gradients')
    visualization.visualize_text(vis_data_records_ig)

In [25]:
show_visualization(model, predefined_sentences + my_sentences)

Visualize attributions based on Integrated Gradients


True Label,Predicted Label,Attribution Label,Attribution Score,Word Importance
pos,pos (0.97),pos,1.3,It was a fantastic performance !
,,,,
pos,pos (0.65),pos,1.19,Best film ever #pad #pad
,,,,
pos,pos (0.87),pos,1.07,Such a great show! #pad
,,,,
neg,neg (0.18),pos,-0.56,It was a horrible movie
,,,,
neg,neg (0.39),pos,-0.23,I've never watched something as bad
,,,,


  ## Эмбэдинги слов

  Вы ведь не забыли, как мы можем применить знания о word2vec и GloVe. Давайте попробуем!

In [26]:
import numpy as np
import torch
from torch import nn

class CNNUsingPretrainedEmbeddings(CNN):
    def __init__(
        self,
        embeddings,
        out_channels, kernel_sizes,
        dropout=0.5, classes_num=1
    ):
        nn.Module.__init__(self)
        self._init_embeddings(embeddings)
        self._init_convs(self.embedding.embedding_dim, out_channels, kernel_sizes)
        self._init_fc(len(kernel_sizes) * out_channels, classes_num)

    def _init_embeddings(self, embeddings):
        if isinstance(embeddings, np.ndarray):
            embeddings = torch.from_numpy(embeddings)
        self.embedding = nn.Embedding.from_pretrained(embeddings)

In [27]:
from torchtext.vocab import GloVe

vocab = Vocab(counter, vectors=GloVe(name='6B', dim=300), min_freq=1)  # text_transform() will find it in the global scope

In [28]:
kernel_sizes = [3, 4, 5]
vocab_size = len(vocab)
out_channels = 64
dropout = 0.5
dim = 300
patience = 3

model = CNNUsingPretrainedEmbeddings(embeddings=vocab.vectors, out_channels=out_channels,
            kernel_sizes=kernel_sizes, dropout=dropout)
model.to(device)
opt = torch.optim.Adam(model.parameters())
loss_func = nn.BCEWithLogitsLoss()
max_epochs = 30

Вы знаете, что делать.

In [29]:
%%time
enable_reproducibility(raise_if_no_deterministic=False)
best_model_state_dict = training(model, train_loader, valid_loader, patience)
enable_reproducibility()

  0%|          | 0/273 [00:00<?, ?it/s]

  0%|          | 0/118 [00:00<?, ?it/s]

Epoch: 1, Training Loss: 0.496580944393144, Validation Loss: 0.3799072001445091


  0%|          | 0/273 [00:00<?, ?it/s]

  0%|          | 0/118 [00:00<?, ?it/s]

Epoch: 2, Training Loss: 0.3741347910392852, Validation Loss: 0.35343898188764766


  0%|          | 0/273 [00:00<?, ?it/s]

  0%|          | 0/118 [00:00<?, ?it/s]

Epoch: 3, Training Loss: 0.32974767553937306, Validation Loss: 0.34037109464406967


  0%|          | 0/273 [00:00<?, ?it/s]

  0%|          | 0/118 [00:00<?, ?it/s]

Epoch: 4, Training Loss: 0.2910369736698521, Validation Loss: 0.33756355777130287


  0%|          | 0/273 [00:00<?, ?it/s]

  0%|          | 0/118 [00:00<?, ?it/s]

Epoch: 5, Training Loss: 0.26507175285300927, Validation Loss: 0.3368449385388423


  0%|          | 0/273 [00:00<?, ?it/s]

  0%|          | 0/118 [00:00<?, ?it/s]

Epoch: 6, Training Loss: 0.23151345802095782, Validation Loss: 0.34697062054933125


  0%|          | 0/273 [00:00<?, ?it/s]

  0%|          | 0/118 [00:00<?, ?it/s]

Epoch: 7, Training Loss: 0.21334638419277938, Validation Loss: 0.35341593118037207


  0%|          | 0/273 [00:00<?, ?it/s]

  0%|          | 0/118 [00:00<?, ?it/s]

Epoch: 8, Training Loss: 0.19667354294341124, Validation Loss: 0.37053701660390626


  0%|          | 0/273 [00:00<?, ?it/s]

  0%|          | 0/118 [00:00<?, ?it/s]

Epoch: 9, Training Loss: 0.17398141480081683, Validation Loss: 0.4084903116701013
Patience is over. Training stopped after 4 epochs without decreasing validation loss.
Wall time: 1min 52s


Посчитайте f1-score вашего классификатора.

**Ответ**:

In [30]:
model.load_state_dict(best_model_state_dict)
outputs, labels = testing(model, test_loader, device)
preds = binary_predict(outputs)
print(f"f1-score of the CNNUsingPretrainedEmbeddings: {f1_score(preds, labels)}")

testing:   0%|          | 0/391 [00:00<?, ?it/s]

f1-score of the CNNUsingPretrainedEmbeddings: 0.8589970016839857


Проверим насколько все хорошо!

In [31]:
show_visualization(model, predefined_sentences + my_sentences)

  and should_run_async(code)
Visualize attributions based on Integrated Gradients


True Label,Predicted Label,Attribution Label,Attribution Score,Word Importance
pos,pos (0.95),pos,1.0,It was a fantastic performance !
,,,,
pos,pos (0.54),pos,0.27,Best film ever #pad #pad
,,,,
pos,pos (0.87),pos,1.06,Such a great show! #pad
,,,,
neg,neg (0.17),pos,-0.8,It was a horrible movie
,,,,
neg,neg (0.49),pos,-0.18,I've never watched something as bad
,,,,
