<h1>Intermediate Sequence Modeling for Natural Language Processing<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Introduction" data-toc-modified-id="Introduction-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Introduction</a></span></li><li><span><a href="#The-Problem-with-Vanilla/Elman-RNNs" data-toc-modified-id="The-Problem-with-Vanilla/Elman-RNNs-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>The Problem with Vanilla/Elman RNNs</a></span></li><li><span><a href="#Gating-as-a-Solution-to-a-Vanilla-RNNs-Problems" data-toc-modified-id="Gating-as-a-Solution-to-a-Vanilla-RNNs-Problems-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Gating as a Solution to a Vanilla RNNs Problems</a></span></li><li><span><a href="#Tips-and-Tricks-for-training-sequence-models" data-toc-modified-id="Tips-and-Tricks-for-training-sequence-models-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Tips and Tricks for training sequence models</a></span></li><li><span><a href="#Example:-A-Character-RNN-for-Generating-Surnames" data-toc-modified-id="Example:-A-Character-RNN-for-Generating-Surnames-5"><span class="toc-item-num">5&nbsp;&nbsp;</span>Example: A Character RNN for Generating Surnames</a></span><ul class="toc-item"><li><span><a href="#Vocabbulary,-Vectorizer-and-Dataset" data-toc-modified-id="Vocabbulary,-Vectorizer-and-Dataset-5.1"><span class="toc-item-num">5.1&nbsp;&nbsp;</span>Vocabbulary, Vectorizer and Dataset</a></span></li><li><span><a href="#Unconditioned-Surname-Generation-Model" data-toc-modified-id="Unconditioned-Surname-Generation-Model-5.2"><span class="toc-item-num">5.2&nbsp;&nbsp;</span>Unconditioned Surname Generation Model</a></span></li></ul></li></ul></div>

## Introduction

- _Sequence Prediction_ task requires to label each item of a sequence. Examples include language modeling, part of speech tagging, name entity recognition.
- Sequence prediction is also referred as Sequence Labeling.

![Figure 7.1](../images/figure_7_1.png)

## The Problem with Vanilla/Elman RNNs

Elman RNNs suffers from two problems:

- Inability to retain information for long range predictions
    - At each time step we simply update the hidden state vector regardless of whether it made sense. Due to this, RNN has no control over which values are retained and which are discarded in the hidden state. However what is desired is some way for the RNN to decide of the update is optional or if the update happens by how much and what parts of the state vector and so on.
- Gradient Stability
    - Vanilla RNNs also suffers from vanishing gradients or exploding gradients.

Some solutions that can be address these problems are:
- ReLUs
- Gradient Clipping
- Careful Initialization
- Gating(Most reliable)

## Gating as a Solution to a Vanilla RNNs Problems

To understand gating solution, lets suppose that we are adding two numberss, $a$ and $b$ and we want to control how much of $b$ gets into the sum. So we can write this as:

$$ a + \lambda b $$
    
where $\lambda$ is a value between 0 and 1. So if $\lambda = 0$, these is no contribution from b and if $\lambda = 1$ b contributes fully.

In above example, we can interpret $\lambda$ as a _switch_ or a _gate_ in controlling the amount of $b$ that gets into the sum. This is the intuition behind the gating mechanism.

In case of Elman RNN, the previous hidden state was $h_{t-1}$ and the current input is $x_t$, the recurrent update in Elman RNN would look something like:

$$ h_t = h_{t-1} + F(h_{t-1}, x_t) $$

where $F$ is the recurrent computation of the RNN. This is unconditioned sum and has the vanilla RNN problems mentioned above.

This can be updated with gating function by making $\lambda$ a function of previous hidden state vector $h_{t-1}$ then the RNN update equation would look like:

$$ h_t = h_{t-1} + \lambda(h_{t-1}, x_t) F(h_{t-1}, x_t) $$

Now $\lambda$ function controls how much of the current input gets to update the state $h_{t-1}$ and now function $\lambda$ is context dependent. The function $\lambda$ is usually a sigmoid function in gated networks.

In case of the _long short term memory network_(LSTM), above intuition is extended to incorporate not only conditional updated but also intentional forgetting of the values in the previous hidden state $h_{t-1}$. This forgetting happens by multiplying the previous hidden state and value $h_{t-1}$ with another function $\mu$ that also produces values between 0 and 1.

$$ h_t = \mu(h_{t-1}, x_t)h_{t-1} + \lambda(h_{t-1}, x_t) F(h_{t-1}, x_t) $$

## Tips and Tricks for training sequence models

- When possible use the gated variants
- When possible, prefer GRUs over LSTMs
- Use Adam as your optimizer
- Gradient Clipping
- Early Stopping

## Example: A Character RNN for Generating Surnames

In [27]:
%load_ext nb_black

import os
from argparse import Namespace
from collections import Counter
import json
import re
import string

import numpy as np
import pandas as pd
import torch
import torch.nn as nn
from torch.nn import functional as F
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader
from tqdm import notebook

import utils

The nb_black extension is already loaded. To reload it, use:
  %reload_ext nb_black


<IPython.core.display.Javascript object>

### Vocabbulary, Vectorizer and Dataset

In [4]:
class Vocabulary(object):
    def __init__(self, token_to_idx=None):
        if token_to_idx is None:
            token_to_idx = {}
        self._token_to_idx = token_to_idx
        self._idx_to_token = {idk: token for token, idx in self._token_to_idx.items()}

    def to_serializable(self):
        return {"token_to_idx": self._token_to_idx}

    @classmethod
    def from_serializable(cls, contents):
        return cls(**contents)

    def add_token(self, token):
        if token in self._token_to_idx:
            index = self._token_to_idx[token]
        else:
            index = len(self._token_to_idx)
            self._token_to_idx[token] = index
            self._idx_to_token[index] = token
        return index

    def add_many(self, tokens):
        return [self.add_token(token) for token in tokens]

    def lookup_token(self, token):
        return self._token_to_idx[token]

    def lookup_index(self, index):
        if index not in self._idx_to_token:
            raise KeyError(f"The index {index} is not in the Vocab.")
        return self._idx_to_token[index]

    def __str__(self):
        return f"<Vocabulary(size={len(self)})>"

    def __len__(self):
        return len(self._token_to_idx)

<IPython.core.display.Javascript object>

In [5]:
class SequenceVocabulary(Vocabulary):
    def __init__(
        self,
        token_to_idx=None,
        unk_token="<UNK>",
        mask_token="<MASK>",
        begin_seq_token="<BEGIN>",
        end_seq_token="<ENF>",
    ):
        super(SequenceVocabulary, self).__init__(token_to_idx)
        self._mask_token = mask_token
        self._unk_token = unk_token
        self._begin_seq_token = begin_seq_token
        self._end_seq_token = end_seq_token

        self.mask_index = self.add_token(self._mask_token)
        self.unk_index = self.add_token(self._unk_token)
        self.begin_seq_index = self.add_token(self._begin_seq_token)
        self.end_seq_index = self.add_token(self._end_seq_token)

    def to_serializable(self):
        contents = super(SequenceVocabulary, self).to_serializable()
        contents.update(
            {
                "unk_token": self._unk_token,
                "mask_token": self._mask_token,
                "begin_seq_token": self._begin_seq_token,
                "end_seq_token": self._end_seq_token,
            }
        )
        return contents

    def lookup_token(self, token):
        if self.unk_index >= 0:
            return self._token_to_idx.get(token, self.unk_index)
        else:
            return self._token_to_idx[token]

<IPython.core.display.Javascript object>

In [9]:
class SurnameVectorizer(object):
    def __init__(self, char_vocab, nationality_vocab):
        self.char_vocab = char_vocab
        self.nationality_vocab = nationality_vocab

    def vectorize(self, surname, vector_length=-1):
        indices = [self.char_vocab.begin_seq_index]
        indices.extend(self.char_vocab.lookup_token(token) for token in surname)
        indices.append(self.char_vocab.end_seq_index)
        if vector_length < 0:
            vector_length = len(indices) - 1
        from_vector = np.zeros(vector_length, dtype=np.int64)
        from_indices = indices[:-1]
        from_vector[: len(from_indices)] = from_indices
        from_vector[len(from_indices) :] = self.char_vocab.mask_index

        to_vector = np.empty(vector_length, dtype=np.int64)
        to_indices = indices[1:]
        to_vector[: len(to_indices)] = to_indices
        to_vector[len(to_indices) :] = self.char_vocab.mask_index

        return from_vector, to_vector

    @classmethod
    def from_dataframe(cls, surname_df):
        char_vocab = SequenceVocabulary()
        nationality_vocab = Vocabulary()
        for index, row in surname_df.iterrows():
            for char in row.surname:
                char_vocab.add_token(char)
            nationality_vocab.add_token(row.nationality)
        return cls(char_vocab=char_vocab, nationality_vocab=nationality_vocab)

    @classmethod
    def from_serializable(cls, contents):
        char_vocab = SequenceVocabulary.from_serializable(contents["char_vocab"])
        nat_vocab = Vocabulary.from_serializable(contents["nationality_vocab"])
        return cls(char_vocab=char_vocab, nationality_vocab=nationality_vocab)

    def to_serializable(self):
        return {
            "char_vocab": self.char_vocab.to_serializable(),
            "nationality_vocab": self.nationality_vocab.to_serializable(),
        }

<IPython.core.display.Javascript object>

In [18]:
class SurnameDataset(Dataset):
    def __init__(self, surname_df, vectorizer):
        self.surname_df = surname_df
        self._vectorizer = vectorizer
        self._max_seq_length = max(map(len, self.surname_df.surname)) + 2

        self.train_df = self.surname_df[self.surname_df.split == "train"]
        self.train_size = len(self.train_df)

        self.val_df = self.surname_df[self.surname_df.split == "val"]
        self.val_size = len(self.val_df)

        self.test_df = self.surname_df[self.surname_df.split == "test"]
        self.test_size = len(self.test_df)

        self._lookup_dict = {
            "train": (self.train_df, self.train_size),
            "val": (self.val_df, self.val_size),
            "test": (self.test_df, self.test_size),
        }
        self.set_split("train")

    @classmethod
    def load_dataset_and_make_vectorizer(cls, surname_csv):
        surname_df = pd.read_csv(surname_csv)
        return cls(surname_df, SurnameVectorizer.from_dataframe(surname_df))

    @classmethod
    def load_dataset_and_load_vectorizer(cls, surname_csv, vectorizer_filepath):
        surname_df = pd.read_csb(surname_csv)
        vectorizer = cls.load_vectorizer_only(vectorizer_filepath)
        return cls(surname_df, vectorizer)

    @staticmethod
    def load_vectorizer_only(vectorizer_filepath):
        with open(vectorizer_filepath) as fp:
            return SurnameVectorizer.from_serializable(json.load(fp))

    def save_vectorizer(self, vectorizer_filepath):
        with open(vectorizer_filepath, "w") as fp:
            json.dump(self._vectorizer.to_serializable(), fp)

    def get_vectorizer(self):
        return self._vectorizer

    def set_split(self, split="train"):
        self._train_split = split
        self._target_df, self._target_size = self._lookup_dict[split]

    def __len__(self):
        return self._target_size

    def __getitem__(self, index):
        row = self._target_df.iloc[index]
        from_vector, to_vector = self._vectorizer.vectorize(
            row.surname, self._max_seq_length
        )
        nationality_index = self._vectorizer.nationality_vocab.lookup_token(
            row.nationality
        )
        return {
            "x_data": from_vector,
            "y_target": to_vector,
            "class_index": nationality_index,
        }

    def get_num_batches(self, batch_size):
        return len(self) // batch_size

<IPython.core.display.Javascript object>

### Unconditioned Surname Generation Model

In [36]:
class SurnameGenerationModel(nn.Module):
    def __init__(
        self,
        char_embedding_size,
        char_vocab_size,
        rnn_hidden_size,
        batch_first=True,
        padding_idx=0,
        dropout_p=0.5,
    ):
        super(SurnameGenerationModel, self).__init__()
        self.char_emb = nn.Embedding(
            num_embeddings=char_vocab_size,
            embedding_dim=char_embedding_size,
            padding_idx=padding_idx,
        )
        self.rnn = nn.GRU(
            input_size=char_embedding_size,
            hidden_size=rnn_hidden_size,
            batch_first=batch_first,
        )
        self.fc = nn.Linear(in_features=rnn_hidden_size, out_features=char_vocab_size)
        self._dropout_p = dropout_p

    def forward(self, x_in, apply_softmax=False):
        x_embedded = self.char_emb(x_in)
        y_out, _ = self.rnn(x_embedded)
        batch_size, seq_size, feat_size = y_out.shape
        y_out = y_out.contiguous().view(batch_size * seq_size, feat_size)
        y_out = self.fc(F.dropout(y_out, p=self._dropout_p))
        if apply_softmax:
            y_out = F.softmax(y_out, dim=1)
        new_feat_size = y_out.shape[-1]
        y_out = y_out.view(batch_size, seq_size, new_feat_size)
        return y_out

<IPython.core.display.Javascript object>

In [65]:
def sample_from_model(
    model, vectorizer, num_samples=1, sample_size=20, temperature=1.0
):
    begin_seq_index = [
        vectorizer.char_vocab.begin_seq_index for _ in range(num_samples)
    ]
    begin_seq_index = torch.tensor(begin_seq_index, dtype=torch.int64).unsqueeze(dim=1)
    indices = [begin_seq_index]
    h_t = None

    for time_step in range(sample_size):
        x_t = indices[time_step]
        x_emb_t = model.char_emb(x_t)
        rnn_out_t, h_t = model.rnn(x_emb_t, h_t)
        prediction_vector = model.fc(rnn_out_t.squeeze(dim=1))
        probability_vector = F.softmax(prediction_vector / temperature, dim=1)
        indices.append(torch.multinomial(probability_vector, num_samples=1))
    indices = torch.stack(indices).squeeze().permute(1, 0)
    return indices


def decode_samples(sampled_indices, vectorizer):
    decoded_surnames = []
    vocab = vectorizer.char_vocab

    for sample_index in range(sampled_indices.shape[0]):
        surname = ""
        for time_step in range(sampled_indices.shape[1]):
            sample_item = sampled_indices[sample_index, time_step].item()
            if sample_item == vocab.begin_seq_index:
                continue
            elif sample_item == vocab.end_seq_index:
                break
            else:
                surname += vocab.lookup_index(sample_item)
        decoded_surnames.append(surname)
    return decoded_surnames

<IPython.core.display.Javascript object>

In [50]:
def normalize_sizes(y_pred, y_true):
    if len(y_pred.size()) == 3:
        y_pred = y_pred.contiguous().view(-1, y_pred.size(2))
    if len(y_true.size()) == 2:
        y_true = y_true.contiguous().view(-1)
    return y_pred, y_true


def compute_accuracy(y_pred, y_true, mask_index):
    y_pred, y_true = normalize_sizes(y_pred, y_true)
    _, y_pred_indices = y_pred.max(dim=1)
    correct_indices = torch.eq(y_pred_indices, y_true).float()
    valid_indices = torch.ne(y_true, mask_index).float()
    n_correct = (correct_indices * valid_indices).sum().item()
    n_valid = valid_indices.sum().item()

    return n_correct / n_valid * 100


def sequence_loss(y_pred, y_true, mask_index):
    y_pred, y_true = normalize_sizes(y_pred, y_true)
    return F.cross_entropy(y_pred, y_true, ignore_index=mask_index)

<IPython.core.display.Javascript object>

In [51]:
args = Namespace(
    # Data and path information
    surname_csv="../data/surnames/surnames_with_splits.csv",
    vectorizer_file="vectorizer.json",
    model_state_file="model.pth",
    save_dir="models/chapter07/model1_unconditioned_surname_generation",
    # Model hyper parameter
    char_embedding_size=32,
    rnn_hidden_size=32,
    # Training hyper parameter
    num_epochs=100,
    learning_rate=0.001,
    batch_size=127,
    seed=1337,
    early_stopping_criteria=5,
    # Runtime hyper parameter
    cuda=True,
    catch_keyboard_interrupt=True,
    reload_from_files=False,
    expand_filepaths_to_save_dir=True,
)

if not torch.cuda.is_available():
    args.cuda = False

args.device = torch.device("cuda" if args.cuda else "cpu")

print("Using CUDA: {}".format(args.cuda))


if args.expand_filepaths_to_save_dir:
    args.vectorizer_file = os.path.join(args.save_dir, args.vectorizer_file)

    args.model_state_file = os.path.join(args.save_dir, args.model_state_file)

# Set seed for reproducibility
utils.set_seed_everywhere(args.seed, args.cuda)

# handle dirs
utils.handle_dirs(args.save_dir)

Using CUDA: False


<IPython.core.display.Javascript object>

In [52]:
dataset = SurnameDataset.load_dataset_and_make_vectorizer(args.surname_csv)
dataset.save_vectorizer(args.vectorizer_file)
vectorizer = dataset.get_vectorizer()
classifier = SurnameGenerationModel(
    char_embedding_size=args.char_embedding_size,
    char_vocab_size=len(vectorizer.char_vocab),
    rnn_hidden_size=args.rnn_hidden_size,
    padding_idx=vectorizer.char_vocab.mask_index,
)
print(classifier)
classifer = classifier.to(args.device)
optimizer = optim.Adam(classifier.parameters(), lr=args.learning_rate)
scheduler = optim.lr_scheduler.ReduceLROnPlateau(
    optimizer=optimizer, mode="min", factor=0.5, patience=1
)

SurnameGenerationModel(
  (char_emb): Embedding(88, 32, padding_idx=0)
  (rnn): GRU(32, 32, batch_first=True)
  (fc): Linear(in_features=32, out_features=88, bias=True)
)


<IPython.core.display.Javascript object>

In [54]:
mask_index = vectorizer.char_vocab.mask_index
train_state = utils.make_train_state(args)
epoch_bar = notebook.tqdm(desc="Training Routine", total=args.num_epochs, position=0)
dataset.set_split("train")
train_bar = notebook.tqdm(
    desc="split=train",
    total=dataset.get_num_batches(args.batch_size),
    position=1,
    leave=True,
)
dataset.set_split("val")
val_bar = notebook.tqdm(
    desc="split=val",
    total=dataset.get_num_batches(args.batch_size),
    position=1,
    leave=True,
)

for epoch_index in range(args.num_epochs):
    train_state["epoch_index"] = epoch_index
    # Iterate Over Training Dataset
    # Setup: Batch Generator, set loss & acc to 0, set train mode on
    dataset.set_split("train")
    if epoch_index == 0:
        print(
            f"============ Split={dataset._train_split}, Size={len(dataset)} ============"
        )
    batch_generator = utils.generate_batches(
        dataset, batch_size=args.batch_size, device=args.device
    )
    training_running_loss, training_running_acc = 0.0, 0.0
    classifier.train()

    for batch_index, batch_dict in enumerate(batch_generator):
        # 5 Step Training Routine

        # Step 1. Zero the Gradients
        optimizer.zero_grad()

        # Step 2. Compute the gradients
        y_pred = classifier(x_in=batch_dict["x_data"])

        # Step 3. Compute the Output
        loss = sequence_loss(y_pred, batch_dict["y_target"], mask_index=mask_index)

        # Step 4. Use loss to produce gradients
        loss.backward()

        # Step 5. Use Optimizer to take gradient step
        optimizer.step()

        # Compute the running loss and accuracy
        loss_batch = loss.item()
        training_running_loss += (loss_batch - training_running_loss) / (
            batch_index + 1
        )
        acc_batch = compute_accuracy(y_pred, batch_dict["y_target"], mask_index)
        training_running_acc += (acc_batch - training_running_acc) / (batch_index + 1)

        # Update the bar
        train_bar.set_postfix(
            loss=training_running_loss, acc=training_running_acc, epoch=epoch_index
        )
        train_bar.update()
    train_state["train_loss"].append(training_running_loss)
    train_state["train_acc"].append(training_running_acc)

    # Iterate Over Val Dataset
    # Setup: Batch Generator, set loss and acc to 0, set eval mode on
    dataset.set_split("val")
    val_running_loss, val_running_acc = 0.0, 0.0
    if len(dataset) > 0:
        if epoch_index == 0:
            print(
                f"============ Split={dataset._train_split}, Size={len(dataset)} ============"
            )
        batch_generator = utils.generate_batches(
            dataset, batch_size=args.batch_size, device=args.device
        )
        classifier.eval()

        for batch_index, batch_dict in enumerate(batch_generator):
            # Step 1. Compute the Output
            y_pred = classifier(x_in=batch_dict["x_data"])

            # Step 2. Compute the loss
            loss = sequence_loss(y_pred, batch_dict["y_target"], mask_index)
            loss_batch = loss.item()
            val_running_loss += (loss_batch - val_running_loss) / (batch_index + 1)

            # Step 3. Compute the accuracy
            acc_batch = compute_accuracy(y_pred, batch_dict["y_target"], mask_index)
            val_running_acc += (acc_batch - val_running_acc) / (batch_index + 1)
            val_bar.set_postfix(
                loss=val_running_loss, acc=val_running_acc, epoch=epoch_index
            )
            val_bar.update()
        train_state["val_loss"].append(val_running_loss)
        train_state["val_acc"].append(val_running_acc)
        scheduler.step(train_state["val_loss"][-1])
    else:
        if epoch_index == 0:
            print(f"============ Skipping Validation Pass ============")
        train_state["val_loss"].append(val_running_loss)
        train_state["val_acc"].append(val_running_acc)
        scheduler.step(train_state["train_loss"][-1])
    train_state = utils.update_train_state(
        args=args, model=classifier, train_state=train_state
    )

    train_bar.n, val_bar.n = 0, 0
    epoch_bar.update()

    if train_state["stop_early"]:
        print("Stopping early....")
        break

    if epoch_index % 10 == 0:
        print(
            f"--------------- {epoch_index}th Epoch Stats---------------\n"
            f"Training Loss={training_running_loss}, "
            f"Training Accuracy={training_running_acc}\n"
            f"Validation Loss={val_running_loss}, "
            f"Validation Accuracy={val_running_acc}.\n"
            "------------------------------------------------------------"
        )

Training Routine:   0%|          | 0/100 [00:00<?, ?it/s]

split=train:   0%|          | 0/60 [00:00<?, ?it/s]

split=val:   0%|          | 0/12 [00:00<?, ?it/s]

--------------- 0th Epoch Stats---------------
Training Loss=3.191117993990579, Training Accuracy=16.067885805767308
Validation Loss=3.0385606686274214, Validation Accuracy=18.485038965492375.
------------------------------------------------------------
--------------- 10th Epoch Stats---------------
Training Loss=2.629402653376261, Training Accuracy=23.857159282774653
Validation Loss=2.632326662540436, Validation Accuracy=23.492179186107773.
------------------------------------------------------------
--------------- 20th Epoch Stats---------------
Training Loss=2.571993613243103, Training Accuracy=24.511945207309108
Validation Loss=2.5790288845698037, Validation Accuracy=24.43093176767602.
------------------------------------------------------------
--------------- 30th Epoch Stats---------------
Training Loss=2.546698451042175, Training Accuracy=25.0946113805576
Validation Loss=2.5679496328035993, Validation Accuracy=24.807333548733393.
----------------------------------------------

<IPython.core.display.Javascript object>

In [55]:
np.random.choice(np.arange(len(vectorizer.nationality_vocab)), replace=True, size=2)

array([8, 7])

<IPython.core.display.Javascript object>

In [58]:
classifier.load_state_dict(torch.load(train_state["model_filename"]))

model = classifier.to(args.device)

dataset.set_split("test")

batch_generator = utils.generate_batches(
    dataset, batch_size=args.batch_size, device=args.device
)
running_acc, running_loss = 0, 0
model.eval()

for batch_index, batch_dict in enumerate(batch_generator):
    y_pred = model(x_in=batch_dict["x_data"])
    loss = sequence_loss(y_pred, batch_dict["y_target"], mask_index)
    running_loss += (loss.item() - running_loss) / (batch_index + 1)
    acc_batch = compute_accuracy(y_pred, batch_dict["y_target"], mask_index)
    running_acc += (acc_batch - running_acc) / (batch_index + 1)
train_state["test_loss"] = running_loss
train_state["test_acc"] = running_acc

<IPython.core.display.Javascript object>

In [59]:
print("Test loss: {};".format(train_state["test_loss"]))
print("Test Accuracy: {}".format(train_state["test_acc"]))

Test loss: 2.551867961883545;
Test Accuracy: 25.28460472071609


<IPython.core.display.Javascript object>

In [68]:
num_names = 10
model = model.cpu()
samples_surnames = decode_samples(
    sample_from_model(model, vectorizer, num_samples=num_names), vectorizer
)
print("-" * 15)
for i in range(num_names):
    print(samples_surnames[i])

---------------
Afleen
Punuoir
Tufles
Celog
Meas
Kurn
Lasselar
Dardskgre
Kamtas
Jreano


<IPython.core.display.Javascript object>