# Transducer implementation in PyTorch

*by Loren Lugosch*



In this notebook, we will implement a Transducer sequence-to-sequence model for inserting missing vowels into a sentence 

EX: ("W wll mplmnt sm cd." --> "We will implement some code.")
*idea: we can change the target sentences to be specific to a domain*


Default: ("Hll, Wrld" --> "Hello, World").

In [82]:
import torch
import string
import numpy as np
import itertools
from collections import Counter
from tqdm import tqdm
!pip install unidecode
import unidecode


"""
DIRECTIONS: Aside from the given training data, find one other open-source data. 
<15 min>
"""
# 1. Default training data.
!wget https://raw.githubusercontent.com/lorenlugosch/infer_missing_vowels/master/data/train/war_and_peace.txt
!pwd

# 2. Find a second training dataset.
!wget https://www.gutenberg.org/ebooks/20228.txt.utf-8
!mv 20228.txt.utf-8 rizal.txt
!pwd

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
--2022-09-30 23:13:43--  https://raw.githubusercontent.com/lorenlugosch/infer_missing_vowels/master/data/train/war_and_peace.txt
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 3196229 (3.0M) [text/plain]
Saving to: ‘war_and_peace.txt.11’


2022-09-30 23:13:44 (300 MB/s) - ‘war_and_peace.txt.11’ saved [3196229/3196229]

/content
--2022-09-30 23:13:44--  https://www.gutenberg.org/ebooks/20228.txt.utf-8
Resolving www.gutenberg.org (www.gutenberg.org)... 152.19.134.47, 2610:28:3090:3000:0:bad:cafe:47
Connecting to www.gutenberg.org (www.gutenberg.org)|152.19.134.47|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://www.gutenberg

# Building blocks

First, we will define the encoder, predictor, and joiner using standard neural nets.

<img src="https://lorenlugosch.github.io/images/transducer/transducer-model.png" width="25%">

In [83]:
"""
RHETORICAL Q: How does changing these numbers affect the performance of the model?
In training? In testing? In after-paper performance?
"""
NULL_INDEX = 0

encoder_dim = 1024
predictor_dim = 1024
joiner_dim = 1024

The encoder is any network that can take as input a variable-length sequence: so, RNNs, CNNs, and self-attention/Transformer encoders will all work.


In [84]:
class Encoder(torch.nn.Module):
  def __init__(self, num_inputs):
    """
    @num_inputs: the input size/length

    DIRECTIONS: complete the variables for the input_size, hidden_size, and bidirectional arguments in self.rnn
    <3 min>
    """
    super(Encoder, self).__init__()
    self.embed = torch.nn.Embedding(num_inputs, encoder_dim)
    self.rnn = torch.nn.GRU(input_size=encoder_dim, hidden_size=encoder_dim, num_layers=3, batch_first=True, bidirectional=True, dropout=0.1)
    self.linear = torch.nn.Linear(encoder_dim*2, joiner_dim)

  def forward(self, x):
    out = x
    out = self.embed(out)
    out = self.rnn(out)[0]
    out = self.linear(out)
    return out

The predictor is any _causal_ network (= can't look at the future): in other words, unidirectional RNNs, causal convolutions, or masked self-attention. 

In [85]:
class Predictor(torch.nn.Module):
  def __init__(self, num_outputs):
    """
    @num_outputs: the output size/length

    DIRECTIONS: complete the variables for the input_size and hidden_size arguments in self.rnn
                complete the arguments to torch.nn.Linear in self.linear (hint: 2 required)
    <3 min>
    """
    super(Predictor, self).__init__()
    self.embed = torch.nn.Embedding(num_outputs, predictor_dim)
    self.rnn = torch.nn.GRUCell(input_size=predictor_dim, hidden_size=predictor_dim)
    self.linear = torch.nn.Linear(encoder_dim, joiner_dim)
    
    self.initial_state = torch.nn.Parameter(torch.randn(predictor_dim))
    self.start_symbol = NULL_INDEX # In the original paper, a vector of 0s is used; just using the null index instead is easier when using an Embedding layer.


  def forward_one_step(self, input, previous_state):
    """
    This is a helper function for the inherited forward method (since the input may vary).
    @input: decoder input
    @previous_state: state before passing the input through the RNN's forward method
    """
    embedding = self.embed(input)
    state = self.rnn.forward(embedding, previous_state)
    out = self.linear(state)
    return out, state


  def forward(self, y):
    """
    @y: tensor y

    DIRECTIONS: complete the variables for batch_size and U (hint: utilize how y is formatted)
                complete the variable in the for loop, i.e. replace the 'None'
    <5 min>
    """
    batch_size = y.shape[0]
    U = y.shape[1]
    outs = []
    state = torch.stack([self.initial_state] * batch_size).to(y.device)
    for u in range(U+1): # hint: we want to get the NULL output for the final timestep 
      if u == 0:
        decoder_input = torch.tensor([self.start_symbol] * batch_size).to(y.device)
      else:
        decoder_input = y[:,u-1]
      out, state = self.forward_one_step(decoder_input, state)
      outs.append(out)
    out = torch.stack(outs, dim=1)
    return out

The joiner is a feedforward network/MLP with one hidden layer applied independently to each $(t,u)$ index.

(The linear part of the hidden layer is contained in the encoder and predictor, so we just do the nonlinearity here and then the output layer.)

In [86]:
class Joiner(torch.nn.Module):
  def __init__(self, num_outputs):
    """
    @num_outputs: size of softmax output over all labels
    """
    super(Joiner, self).__init__()
    self.linear = torch.nn.Linear(joiner_dim, num_outputs)

  def forward(self, encoder_out, predictor_out):
    """
    @encoder_out: 
    @predictor_out: 

    DIRECTIONS:    choose and apply a nonlinear function of your choice
    RHETORICAL Q:  why do we add nonlinearity in a neural network?
    <5 min>
    """
    out = encoder_out + predictor_out
    out = torch.nn.functional.relu(out)
    out = self.linear(out)
    return out

In [87]:


print(torch.cuda.is_available())
print(torch.cuda.current_device())
print(torch.cuda.get_device_name(0))



True
0
Tesla T4


# Transducer model + loss function

Using the encoder, predictor, and joiner, we will implement the Transducer model and its loss function.

<img src="https://lorenlugosch.github.io/images/transducer/forward-messages.png" width="25%">

We can use a simple PyTorch implementation of the loss function, relying on automatic differentiation to give us gradients.

In [96]:
class Transducer(torch.nn.Module):
  def __init__(self, num_inputs, num_outputs):
    super(Transducer, self).__init__()
    self.encoder = Encoder(num_inputs)
    self.predictor = Predictor(num_outputs)
    self.joiner = Joiner(num_outputs)

    if torch.cuda.is_available(): self.device = torch.cuda.current_device()
    else: self.device = "cpu"
    self.to(self.device)

  def compute_forward_prob(self, joiner_out, T, U, y):
    """
    @joiner_out: tensor of shape (B, T_max, U_max+1, num_labels)
    @T: list of input lengths
    @U: list of output lengths 
    @y: label tensor (B, U_max+1)

    DIRECTIONS: draw out a couple iterations of the nested for loop below
    <15 min>
    """
    B = joiner_out.shape[0]                                        #B = batch size??
    T_max = joiner_out.shape[1]
    U_max = joiner_out.shape[2] - 1
    log_alpha = torch.zeros(B, T_max, U_max+1).to(model.device)
    for t in range(T_max):
      for u in range(U_max+1):
          if u == 0:
            if t == 0:
              log_alpha[:, t, u] = 0.

            else: #t > 0
              log_alpha[:, t, u] = log_alpha[:, t-1, u] + joiner_out[:, t-1, 0, NULL_INDEX] 
                  
          else: #u > 0
            if t == 0:
              log_alpha[:, t, u] = log_alpha[:, t,u-1] + torch.gather(joiner_out[:, t, u-1], dim=1, index=y[:,u-1].view(-1,1) ).reshape(-1)
            
            else: #t > 0
              log_alpha[:, t, u] = torch.logsumexp(torch.stack([
                  log_alpha[:, t-1, u] + joiner_out[:, t-1, u, NULL_INDEX],
                  log_alpha[:, t, u-1] + torch.gather(joiner_out[:, t, u-1], dim=1, index=y[:,u-1].view(-1,1) ).reshape(-1)
              ]), dim=0)
    
    log_probs = []
    for b in range(B):
      log_prob = log_alpha[b, T[b]-1, U[b]] + joiner_out[b, T[b]-1, U[b], NULL_INDEX]
      log_probs.append(log_prob)
    log_probs = torch.stack(log_probs) 
    return log_prob # history of logits??

  def compute_loss(self, x, y, T, U):
    """
    @x: input/training tensor
    @y: label tensor
    @T: list of the length of input sequences
    @U: list of the length of output sequences
    """
    encoder_out = self.encoder.forward(x)
    predictor_out = self.predictor.forward(y)
    joiner_out = self.joiner.forward(encoder_out.unsqueeze(2), predictor_out.unsqueeze(1)).log_softmax(3)
    loss = -self.compute_forward_prob(joiner_out, T, U, y).mean()
    return loss

Let's first verify that the forward algorithm actually correctly computes the sum (in log space, the [logsumexp](https://lorenlugosch.github.io/posts/2020/06/logsumexp/)) of all possible alignments, using a short input/output pair for which computing all possible alignments is feasible.

<img src="https://lorenlugosch.github.io/images/transducer/cat-align-1.png" width="25%">

In [89]:
def compute_single_alignment_prob(self, encoder_out, predictor_out, T, U, z, y):
    """
    Computes the probability of one alignment, z.
    @encoder_out: Transducer's self.encoder
    @predictor_out: Transducer's self.predictor
    @T: list of the length of input sequences
    @U: list of the length of output sequences
    @z: 
    @y: label tensor

    DIRECTIONS: write a brief description of the argument 'z' above
                complete the variables for t_indices and u_indices
    <5 min>
    """
    t = 0; u = 0
    t_u_indices = []
    y_expanded = []
    for step in z:
      t_u_indices.append((t,u))
      if step == 0: # right (null)
        y_expanded.append(NULL_INDEX)
        t += 1
      if step == 1: # down (label)
        y_expanded.append(y[u])
        u += 1
    t_u_indices.append((T-1,U))
    y_expanded.append(NULL_INDEX)

    t_indices = [t for (t,u) in t_u_indices]
    u_indices = [u for (t,u) in t_u_indices]
    encoder_out_expanded = encoder_out[t_indices]
    predictor_out_expanded = predictor_out[u_indices]
    joiner_out = self.joiner.forward(encoder_out_expanded, predictor_out_expanded).log_softmax(1)
    logprob = -torch.nn.functional.nll_loss(input=joiner_out, target=torch.tensor(y_expanded).long().to(self.device), reduction="sum")
    return logprob

Transducer.compute_single_alignment_prob = compute_single_alignment_prob

In [90]:
# Generate example inputs/outputs
num_outputs = len(string.ascii_uppercase) + 1 # [null, A, B, ... Z]
model = Transducer(1, num_outputs)
y_letters = "CAT"
y = torch.tensor([string.ascii_uppercase.index(l) + 1 for l in y_letters]).unsqueeze(0).to(model.device)
T = torch.tensor([4]); U = torch.tensor([len(y_letters)]); B = 1

encoder_out = torch.randn(B, T, joiner_dim).to(model.device)
predictor_out = torch.randn(B, U+1, joiner_dim).to(model.device)
joiner_out = model.joiner.forward(encoder_out.unsqueeze(2), predictor_out.unsqueeze(1)).log_softmax(3)

#######################################################
# Compute loss by enumerating all possible alignments #
#######################################################
all_permutations = list(itertools.permutations([0]*(T-1) + [1]*U))
all_distinct_permutations = list(Counter(all_permutations).keys())
alignment_probs = []
for z in all_distinct_permutations:
  alignment_prob = model.compute_single_alignment_prob(encoder_out[0], predictor_out[0], T.item(), U.item(), z, y[0])
  alignment_probs.append(alignment_prob)
loss_enumerate = -torch.tensor(alignment_probs).logsumexp(0)

#######################################################
# Compute loss using the forward algorithm            #
#######################################################
loss_forward = -model.compute_forward_prob(joiner_out, T, U, y)

print("Loss computed by enumerating all possible alignments: ", loss_enumerate)
print("Loss computed using the forward algorithm: ", loss_forward)

Loss computed by enumerating all possible alignments:  tensor(19.7130)
Loss computed using the forward algorithm:  tensor(19.7130, device='cuda:0', grad_fn=<NegBackward0>)


Now let's add the greedy search algorithm for predicting an output sequence.

---



(Note that I've assumed we're using RNNs for the predictor here. You would have to modify this code a bit if you want to use convolutions/self-attention instead.) 
<br/><br/>
<img src="https://lorenlugosch.github.io/images/transducer/greedy-search.png" width="50%">

In [98]:
"""
DIRECTIONS: YOU DO NOT NEED TO IMPLEMENT BEAM SEARCH
Here is an *opportunity* to create a beam-search. While the code
for a greedy search is here, we can improve this algorithmically! So, you
use the greedy search code here to ensure that things are working

<might take a while>
"""


def greedy_search(self, x, T):
  y_batch = []
  B = len(x)
  encoder_out = self.encoder.forward(x)
  U_max = 200
  for b in range(B):
    t = 0; u = 0; y = [self.predictor.start_symbol]; predictor_state = self.predictor.initial_state.unsqueeze(0)
    while t < T[b] and u < U_max:
      predictor_input = torch.tensor([ y[-1] ]).to(x.device)
      g_u, predictor_state = self.predictor.forward_one_step(predictor_input, predictor_state)
      f_t = encoder_out[b, t]
      h_t_u = self.joiner.forward(f_t, g_u)
      argmax = h_t_u.max(-1)[1].item()
      if argmax == NULL_INDEX:
        t += 1
      else: # argmax == a label
        u += 1
        y.append(argmax)
    y_batch.append(y[1:]) # remove start symbol
  return y_batch

Transducer.greedy_search = greedy_search



In [92]:
!pip install speechbrain
from speechbrain.nnet.loss.transducer_loss import TransducerLoss
transducer_loss = TransducerLoss(0)

def compute_loss(self, x, y, T, U):
    encoder_out = self.encoder.forward(x)
    predictor_out = self.predictor.forward(y)
    joiner_out = self.joiner.forward(encoder_out.unsqueeze(2), predictor_out.unsqueeze(1)).log_softmax(3)
    #loss = -self.compute_forward_prob(joiner_out, T, U, y).mean()
    T = T.to(joiner_out.device)
    U = U.to(joiner_out.device)
    loss = transducer_loss(joiner_out, y, T, U) #, blank_index=NULL_INDEX, reduction="mean")
    return loss

Transducer.compute_loss = compute_loss

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


# Some utilities

Here we will add a bit of boilerplate code for training and loading data.

In [93]:
class TextDataset(torch.utils.data.Dataset):
  def __init__(self, lines, batch_size):
    """
    @lines: list of strings
    """
    lines = list(filter(("\n").__ne__, lines))

    self.lines = lines 
    collate = Collate()
    self.loader = torch.utils.data.DataLoader(self, batch_size=batch_size, num_workers=1, shuffle=True, collate_fn=collate)

  def __len__(self):
    return len(self.lines)

  def __getitem__(self, idx):
    line = self.lines[idx].replace("\n", "")
    line = unidecode.unidecode(line) # remove special characters
    x = "".join(c for c in line if c not in "AEIOUaeiou") # remove vowels from input
    y = line
    return (x,y)

def encode_string(s):
  """
  @s: string
  """
  for c in s:
    if c not in string.printable:
      print(s)
  return [string.printable.index(c) + 1 for c in s]

def decode_labels(l):
  """
  @l: list of labels
  """
  return "".join([string.printable[c - 1] for c in l])


class Collate:
  def __call__(self, batch):
    """
    Returns a minibatch of strings, encoded as labels and padded to have the same length.
    @batch: list of tuples (input string, output string)

    DIRECTIONS: after obtaining results from training on the default text, train on your second training text 
    <10 min>
    """
    x = []; y = []
    batch_size = len(batch)
    for index in range(batch_size):
      x_,y_ = batch[index]
      x.append(encode_string(x_))
      y.append(encode_string(y_))

    # pad all sequences to have same length
    T = [len(x_) for x_ in x]
    U = [len(y_) for y_ in y]
    T_max = max(T)
    U_max = max(U)
    for index in range(batch_size):
      x[index] += [NULL_INDEX] * (T_max - len(x[index]))
      x[index] = torch.tensor(x[index])
      y[index] += [NULL_INDEX] * (U_max - len(y[index]))
      y[index] = torch.tensor(y[index])

    # stack into single tensor
    x = torch.stack(x)
    y = torch.stack(y)
    T = torch.tensor(T)
    U = torch.tensor(U)

    return (x,y,T,U)

with open("rizal.txt", "r") as f:
  lines = f.readlines()

end = round(0.9 * len(lines))
train_lines = lines[:end]
test_lines = lines[end:]
train_set = TextDataset(train_lines, batch_size=64) #8)
test_set = TextDataset(test_lines, batch_size=64) #8)
train_set.__getitem__(0)

('Th Prjct Gtnbrg Bk f Nl M Tngr, by Js Rzl',
 'The Project Gutenberg EBook of Noli Me Tangere, by Jose Rizal')

In [94]:
class Trainer:
  def __init__(self, model, lr):
    self.model = model
    self.lr = lr
    self.optimizer = torch.optim.Adam(model.parameters(), lr=self.lr)
  
  def train(self, dataset, print_interval = 20):
    train_loss = 0
    num_samples = 0
    self.model.train()
    pbar = tqdm(dataset.loader)
    for idx, batch in enumerate(pbar):
      x,y,T,U = batch
      x = x.to(self.model.device); y = y.to(self.model.device)
      batch_size = len(x)
      num_samples += batch_size
      loss = self.model.compute_loss(x,y,T,U)
      self.optimizer.zero_grad()
      pbar.set_description("%.2f" % loss.item())
      loss.backward()
      self.optimizer.step()
      train_loss += loss.item() * batch_size
      if idx % print_interval == 0:
        self.model.eval()
        guesses = self.model.greedy_search(x,T)
        self.model.train()
        print("\n")
        for b in range(2):
          print("input:", decode_labels(x[b,:T[b]]))
          print("guess:", decode_labels(guesses[b]))
          print("truth:", decode_labels(y[b,:U[b]]))
          print("")
    train_loss /= num_samples
    return train_loss

  def test(self, dataset, print_interval=1):
    test_loss = 0
    num_samples = 0
    self.model.eval()
    pbar = tqdm(dataset.loader)
    for idx, batch in enumerate(pbar):
      x,y,T,U = batch
      x = x.to(self.model.device); y = y.to(self.model.device)
      batch_size = len(x)
      num_samples += batch_size
      loss = self.model.compute_loss(x,y,T,U)
      pbar.set_description("%.2f" % loss.item())
      test_loss += loss.item() * batch_size
      if idx % print_interval == 0:
        print("\n")
        print("input:", decode_labels(x[0,:T[0]]))
        print("guess:", decode_labels(self.model.greedy_search(x,T)[0]))
        print("truth:", decode_labels(y[0,:U[0]]))
        print("")
    test_loss /= num_samples
    return test_loss
    

# Training the model

Now we will train a model. This will generate some output sequences every 20 batches.

In [99]:
num_chars = len(string.printable)
model = Transducer(num_inputs=num_chars+1, num_outputs=num_chars+1)
trainer = Trainer(model=model, lr=0.0003)

num_epochs = 1
train_losses=[]
test_losses=[]

for epoch in range(num_epochs):
    train_loss = trainer.train(train_set)
    test_loss = trainer.test(test_set)
    train_losses.append(train_loss)
    test_losses.append(test_loss)
    print("Epoch %d: train loss = %f, test loss = %f" % (epoch, train_loss, test_loss))

413.28:   0%|          | 1/270 [00:23<1:46:21, 23.72s/it]



input: knttln n~g dyng png-gyn s snggl n llk. ng
guess: 
truth: kinatatalian n~g duyang pinag-uuguyan sa sanggol na lalaki. Ang

input: Mpnglw t ng-sp-sp n~g sy'y msmpng n~g Cptn Gnrl.
guess: 
truth: Mapanglaw at nag-iisip-isip n~g siya'y masumpong n~g Capitan General.



190.40:   8%|▊         | 21/270 [07:59<1:38:29, 23.73s/it]



input: clgyn t hnd c mnglng n c'y mnglpypy: bnnt cng
guess: a
truth: calagayan at hindi co minagaling na aco'y manglupaypay: binanta cong

input: c rn ng cmcn!--ng sngt nt n~g gyn dn ny n~g
guess: a
truth: aco rin ang cumacain!--ang isinagot nito n~g gayon din anyo n~g



165.58:  15%|█▌        | 41/270 [15:39<1:33:26, 24.48s/it]



input: bng clptn n~g trbnl n~g nqscng ngprtng n sy'y
guess: n~g cang cang cang cang cang cang cang cang cang cang cang cang cang cang cang cang cang cang cang cang cang cang cang cang cang cang cang cang cang cang cang cang cang cang cang cang cang cang cang c
truth: boong calupitan n~g tribunal n~g Inquisiciong nagparatang na siya'y

input: mcgy'y blgtd ng bbn~g n~g nyng lht n m~g pgppgl:
guess: nang cang cang cang cang cang cang cang cang cang cang cang cang cang cang cang cang cang cang cang cang cang cang cang cang cang cang cang cang cang cang cang cang cang cang cang cang cang cang cang 
truth: macagayo'y baligtad ang ibubun~ga n~g inyong lahat na m~ga pagpapagal:



157.31:  23%|██▎       | 61/270 [23:21<1:25:42, 24.60s/it]



input: Tnwg n~g Cptn Gnrl ng cnyng ydnt.
guess: nang cang cang cang cang cang cang cang cang cang cang cang cang cang cang cang cang cang cang cang cang cang cang cang cang cang cang cang cang cang cang cang cang cang cang cang cang cang cang cang 
truth: Tinawag n~g Capitan General ang canyang ayudante.

input: hmhdlng s pgsss, t spgc't mlk ng chgtn n~g tlsn
guess: --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
truth: humahadlang sa pagsisisi, at sapagca't malaki ang cahigtan n~g tulisan



165.39:  30%|███       | 81/270 [31:06<1:17:56, 24.75s/it]



input: s kn, cng d c sn ngsnn~glng n~g cw y tnttnng c.
guess: nang mang mang mang mang mang mang mang mang mang mang mang mang mang mang mang mang mang mang mang mang mang mang mang mang mang mang mang mang mang mang mang mang mang mang mang mang mang mang mang 
truth: sa akin, cung di ca sana nagsinun~galing n~g icaw ay tinatatanong co.

input: n tg bng lpn, n tntcpn ntn n~g pgc ndlntng t ng
guess: nang mang mang mang mang mang mang mang mang mang mang mang mang mang mang mang mang mang mang mang mang mang mang mang mang mang mang mang mang mang mang mang mang mang mang mang mang mang mang mang 
truth: na taga ibang lupain, na tinatacpan natin n~g pagca indolenteng ito ang



133.92:  37%|███▋      | 101/270 [38:51<1:09:00, 24.50s/it]



input: s cmy n Mr Cr, n pmsc n hls hnd mchcbng t kmng
guess: nang na na na na na na na na na na na na na na na na na na na na na na na na na na na na na na na na na na na na na na na na na na na na na na na na na na na na na na na na na na na na na na na na na 
truth: sa camay ni Maria Ciara, na pumasoc na halos hindi macahacbang at kiming

input: lp't hnpn nny rn ng bng gntng tng kncln~gn!
guess: nang na na na na na na na na na na na na na na na na na na na na na na na na na na na na na na na na na na na na na na na na na na na na na na na na na na na na na na na na na na na na na na na na na 
truth: lupai't hanapin ninyo roon ang ibang guintong ating kinacailan~gan!



128.79:  45%|████▍     | 121/270 [46:29<1:00:07, 24.21s/it]



input: pgsscpn n n~g yng m~g nc n hmrp t sspt ng slpng
guess: inagpagang pagpagang pagpagang pagpagang pagpagang pagpagang pagpagang pagpagang pagpagang pagpagang pagpagang pagpagang pagpagang pagpagang pagpagang pagpagang pagpagang pagpagang pagpagang pagpagang
truth: pagsisicapan na n~g iyong m~ga anac na humarap at isisipot ang salaping

input: knllgyn n~g spn t Prtgl.
guess: inang canyang canyang canyang canyang canyang canyang canyang canyang canyang canyang canyang canyang canyang canyang canyang canyang canyang canyang canyang canyang canyang canyang canyang canyang ca
truth: kinalalagyan n~g Espana at Portugal.



143.25:  52%|█████▏    | 141/270 [54:03<50:42, 23.59s/it]



input: =XLX.=
guess: ----ang m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~
truth: =XLIX.=

input: pnghgnt c cy s pmmg-tn n~g py, n~g dg t n~g kng
guess: n~g m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m
truth: ipanghiganti co cayo sa pamamag-itan n~g apoy, n~g dugo at n~g aking



102.54:  60%|█████▉    | 161/270 [1:01:42<43:43, 24.07s/it]



input: hnd tglg, hnd ltn, hnd nsc t b p. ?ng m~g frl cy
guess: nananang mang mang mang mang mang mang mang mang mang mang mang mang mang mang mang mang mang mang mang mang mang mang mang mang mang mang mang mang mang mang mang mang mang mang mang mang mang mang m
truth: hindi tagalog, hindi latin, hindi insic at iba pa. ?Ang m~ga fraile caya

input: --?S cl m cy?--ng tnng n~g mlt n ngttc.--Sbhn mng
guess: ----ang magang magang magang magang magang magang magang magang magang magang magang magang magang magang magang magang magang magang magang magang magang magang magang magang magang magang magang mag
truth: --?Sa acala mo caya?--ang tanong n~g maliit na nagtataca.--Sabihin mong



140.26:  67%|██████▋   | 181/270 [1:09:19<35:25, 23.88s/it]



input: dlg ng m~g tlbs n~g clbz, hnhmy ng m~g ptn t
guess: mang m~ga casang m~ga casang m~ga casang m~ga casang m~ga casang m~ga casang m~ga casang m~ga casang m~ga casang m~ga casang m~ga casang m~ga casang m~ga casang m~ga casang m~ga casang m~ga casang m~g
truth: dalaga ang m~ga talbos n~g calabaza, hinihimay ang m~ga patani at

input: ?bkt? Dyt't ?hnd n~g cy mngyyrng mgcys ng pgsnt s
guess: malalang m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m
truth: ?bakit? Diyata't ?hindi n~ga caya mangyayaring magcaayos ang pagsinta sa



128.69:  74%|███████▍  | 201/270 [1:17:05<27:48, 24.19s/it]



input: Hnpls n brr ng cnyng n.
guess: pagpagpagpagpagpagpagpagpagpagpagpagpagpagpagaling sa sa sa sa sa sa sa sa sa sa sa sa sa sa sa sa sa sa sa sa sa sa sa sa sa sa sa sa sa sa sa sa sa sa sa sa sa sa sa sa sa sa sa sa sa sa sa sa sa sa
truth: Hinaplos ni Ibarra ang canyang noo.

input: N~g mclmps ng sndlng hnd pg-mc n gnmt n ls s
guess: m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga 
truth: N~g macalampas ang sandaling hindi pag-imic na guinamit ni Elias sa



88.75:  82%|████████▏ | 221/270 [1:24:34<19:06, 23.41s/it]



input: --?Bkt cy myc? _?bnm gntm sms?_[264]
guess: ---ang ang ang ang ang ang ang ang ang ang ang ang ang ang ang ang ang ang ang ang ang ang ang ang ang ang ang ang ang ang ang ang ang ang ang ang ang ang ang ang ang ang ang ang ang ang ang ang ang a
truth: --?Bakit cayo umiiyac? _?Ubinam gentium sumus?_[264]

input: Smntlng nngyyr t'y sy nmng pgdtng n cptng Tg n
guess: ang canyang canyang canyang canyang canyang canyang canyang canyang canyang canyang canyang canyang canyang canyang canyang canyang canyang canyang canyang canyang canyang canyang canyang canyang cany
truth: Samantalang nangyayari ito'y siya namang pagdating ni capitang Tiago na



126.42:  89%|████████▉ | 241/270 [1:32:06<11:31, 23.85s/it]



input: bhl:
guess: --!an~ga
truth: bahala:

input: chlhlhng mgsysy. Bcd s rt'y ng m~g bgy n ssbhn
guess: --!ang cabala
truth: cahulihulihang magsaysay. Bucod sa rito'y ang m~ga bagay na sasabihin



137.00:  97%|█████████▋| 261/270 [1:39:38<03:24, 22.72s/it]



input: n nnggglng s mts, pgdtng s bb'y nwwl-ng cblhn,
guess: nan~gang nan~ganga
truth: na nanggagaling sa mataas, pagdating sa baba'y nawawal-ang cabuluhan,

input: m'y kklnln s cnyng tng n lb!"
guess: nalalang nalalang nalan
truth: ma'y kikilanlin sa canyang utang na loob!"



140.81: 100%|██████████| 270/270 [1:42:46<00:00, 22.84s/it]
20.41:   0%|          | 0/29 [00:01<?, ?it/s]



input: --?Sn cy ng hrj n s rw n~g fst'y ngccn~gn? Cy


20.41:   3%|▎         | 1/29 [00:03<01:43,  3.69s/it]

guess: --!--!--ang-angangan
truth: --?Sino caya ang hereje na sa araw n~g fiesta'y nagcacain~gin? Caya



114.20:   3%|▎         | 1/29 [00:04<01:43,  3.69s/it]



input: Hmhp n~g bng glt ng bgy hls s mgdmg; hnd smct ng


114.20:   7%|▋         | 2/29 [00:07<01:42,  3.78s/it]

guess: --ang m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga
truth: Humihip n~g boong galit ang bagyo halos sa magdamag; hindi sumicat ang



294.86:   7%|▋         | 2/29 [00:08<01:42,  3.78s/it]



input: mw s cnyng bhy s cptng Tnng n my skt, ptln t


294.86:  10%|█         | 3/29 [00:10<01:32,  3.56s/it]

guess: cananganga
truth: Umuwi sa canyang bahay si capitang Tinong na may sakit, putlain at



147.05:  10%|█         | 3/29 [00:11<01:32,  3.56s/it]



input: nng pltn c n~g yng slt ..., d mn'y n~g sy rw y


147.05:  14%|█▍        | 4/29 [00:14<01:29,  3.59s/it]

guess: nan~ga
truth: nang palitan co n~g iyong sulat ..., di umano'y n~g siya raw ay



119.84:  14%|█▍        | 4/29 [00:15<01:29,  3.59s/it]



input: Nggplng ng cdd, wlng nrrn~gg n mncnc cng d ng


119.84:  17%|█▋        | 5/29 [00:17<01:24,  3.54s/it]

guess: --!pagcapan~gan n~gan~ga
truth: Nagugupiling ang ciudad, walang naririn~gig na manacanaca cung di ang



117.24:  17%|█▋        | 5/29 [00:19<01:24,  3.54s/it]



input: cglhn.


117.24:  21%|██        | 6/29 [00:21<01:23,  3.62s/it]

guess: --angala
truth: caguluhan.



105.34:  21%|██        | 6/29 [00:22<01:23,  3.62s/it]



input: nlss y cmply wth prgrph 1..8 r 1..9.


105.34:  24%|██▍       | 7/29 [00:25<01:20,  3.64s/it]

guess: calalat at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at 
truth: unless you comply with paragraph 1.E.8 or 1.E.9.



88.99:  24%|██▍       | 7/29 [00:26<01:20,  3.64s/it] 



input: t snng ng m~g lbrng wlng cnn mng csmn, n snlt


88.99:  28%|██▊       | 8/29 [00:28<01:13,  3.52s/it]

guess: ang matang at nanganga
truth: At sinunog ang m~ga librong walang caanoano mang casamaan, na sinulat



187.94:  28%|██▊       | 8/29 [00:29<01:13,  3.52s/it]



input: cstl't nsc lmng ng cnyng m~g nnyyhn; tngcl s


187.94:  31%|███       | 9/29 [00:32<01:11,  3.59s/it]

guess: calalangangangang matamat at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at 
truth: castila't insic lamang ang canyang m~ga inanyayahan; tungcol sa



121.12:  31%|███       | 9/29 [00:33<01:11,  3.59s/it]



input: lmng c s pgtn~gs s pgdrlt nmng lht, n nsn n~g


121.12:  34%|███▍      | 10/29 [00:36<01:08,  3.61s/it]

guess: calalang calalang calalang calalang calalang calalang calalang calalang calalang calalang calalang calalang calalang calalang calalang calalang calalang calalang calalang calalang calalang calalang ca
truth: lamang aco sa pagtan~gis sa pagdaralita naming lahat, na inisin n~g



130.00:  34%|███▍      | 10/29 [00:37<01:08,  3.61s/it]



input: kng ps. kng pnhmc cw pnhmc c ng kng snt.... ?n


130.00:  38%|███▊      | 11/29 [00:39<01:03,  3.52s/it]

guess: caningan
truth: aking puso. Aking ipinahamac icaw ipinahamac co ang aking sinta.... ?ano



98.59:  38%|███▊      | 11/29 [00:40<01:03,  3.52s/it] 



input: Gtnbrg Ltrry rchv Fndtn, th wnr f th Prjct


98.59:  41%|████▏     | 12/29 [00:42<00:59,  3.49s/it]

guess: at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at
truth: Gutenberg Literary Archive Foundation, the owner of the Project



326.73:  41%|████▏     | 12/29 [00:43<00:59,  3.49s/it]



input: llkng ncscy s bngcng yn y pmnhc s hgdnng bt,


326.73:  45%|████▍     | 13/29 [00:46<00:56,  3.52s/it]

guess: calalalanganganga
truth: lalaking nacasacay sa bangcang iyon ay pumanhic sa hagdanang bato,



78.27:  45%|████▍     | 13/29 [00:47<00:56,  3.52s/it] 



input: Sctn  2.  nfrmtn bt th Mssn f Prjct Gtnbrg-tm


78.27:  48%|████▊     | 14/29 [00:49<00:52,  3.47s/it]

guess: catang atanga
truth: Section  2.  Information about the Mission of Project Gutenberg-tm



103.86:  48%|████▊     | 14/29 [00:50<00:52,  3.47s/it]



input: cpnglwn.--Knh c s blnggng png bsn~gn s kn n~g kng


103.86:  52%|█████▏    | 15/29 [00:52<00:47,  3.40s/it]

guess: calangan~gang
truth: capanglawan.--Kinuha aco sa bilangguang pinag absan~gan sa akin n~g aking



115.21:  52%|█████▏    | 15/29 [00:53<00:47,  3.40s/it]



input: sy'y cnyng nkt s cnyng hrp ng sng t n pngmmsdn


115.21:  55%|█████▌    | 16/29 [00:56<00:44,  3.41s/it]

guess: cangangangan
truth: siya'y canyang nakita sa canyang harap ang isang tao na pinagmamasdan



120.55:  55%|█████▌    | 16/29 [00:57<00:44,  3.41s/it]



input: nbg nny ng knmltn nnyng byn, cwn~gs n~g tng


120.55:  59%|█████▊    | 17/29 [00:59<00:41,  3.44s/it]

guess: nan~gan n~g caningan~gangan~gan~gan
truth: iniibig ninyo ang kinamulatan ninyong bayan, cawan~gis n~g ating



285.72:  59%|█████▊    | 17/29 [01:01<00:41,  3.44s/it]



input: ntpl, n~g Vrgn dl Rsr,  cng hnd m'y n~g Vrgn dl


285.72:  62%|██████▏   | 18/29 [01:03<00:37,  3.37s/it]

guess: capatang at nan~ga
truth: Antipolo, n~g Virgen del Rosario, o cung hindi ma'y n~g Virgen del



114.55:  62%|██████▏   | 18/29 [01:04<00:37,  3.37s/it]



input: lmn~gn t ng-clng tmcs, n~gn't ngpthlg s chy ng


114.55:  66%|██████▌   | 19/29 [01:06<00:33,  3.38s/it]

guess: nan~gangangan
truth: lumin~gon at nag-acalang tumacas, n~guni't nagpatihulog sa cahoy ang



107.40:  66%|██████▌   | 19/29 [01:07<00:33,  3.38s/it]



input: m~g tn~gng gnng tmtwg s cnlng srl n~g <<cvlzd>>.


107.40:  69%|██████▉   | 20/29 [01:10<00:30,  3.43s/it]

guess: --ang m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga m~ga
truth: m~ga tan~ging guinoong tumatawag sa canilang sarili n~g <<civilizado>>.



104.90:  69%|██████▉   | 20/29 [01:11<00:30,  3.43s/it]



input: --Hl n ty,--ng bnlng n nmmtl.


104.90:  72%|███████▏  | 21/29 [01:13<00:27,  3.41s/it]

guess: --!angalangan
truth: --Huli na tayo,--ang ibinulong na namumutla.



125.01:  72%|███████▏  | 21/29 [01:14<00:27,  3.41s/it]



input: mt, n smsnd s ny n~g rg nyng n.


125.01:  76%|███████▌  | 22/29 [01:17<00:24,  3.49s/it]

guess: ang matan nan nan~gangan
truth: mata, na sumusunod sa anyo n~g irog niyang ina.



305.58:  76%|███████▌  | 22/29 [01:18<00:24,  3.49s/it]



input:      wd t th wnr f th Prjct Gtnbrg-tm trdmrk, bt h


305.58:  79%|███████▉  | 23/29 [01:20<00:20,  3.43s/it]

guess: --ang at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at
truth:      owed to the owner of the Project Gutenberg-tm trademark, but he



283.78:  79%|███████▉  | 23/29 [01:21<00:20,  3.43s/it]



input: cphmcn; t cng cncnn ng m~g ht t m~g cmy n smbg


283.78:  83%|████████▎ | 24/29 [01:23<00:17,  3.46s/it]

guess: --!canganga
truth: capahamacan; at cung canicanino ang m~ga hita at m~ga camay na sumabog



99.87:  83%|████████▎ | 24/29 [01:24<00:17,  3.46s/it] 



input: cmtyn? !Wl n n~gng nllb s kn cng hnd ng pgtts,


99.87:  86%|████████▌ | 25/29 [01:27<00:13,  3.42s/it]

guess: canganganganga
truth: camatayan? !Wala na n~gang nalalabi sa akin cung hindi ang pagtitiis,



13.57:  86%|████████▌ | 25/29 [01:28<00:13,  3.42s/it]



input: lmn ng bngc, t wlng n mng dndlt n sct nyng


13.57:  90%|████████▉ | 26/29 [01:30<00:10,  3.50s/it]

guess: calalang nangan~gan
truth: laman ang bangca, at walang ano mang idinudulot na sucat niyang



123.75:  90%|████████▉ | 26/29 [01:31<00:10,  3.50s/it]



input: --?Nkt n nny?--n ls, t nlgy s bngc ng


123.75:  93%|█████████▎| 27/29 [01:34<00:07,  3.60s/it]

guess: --!--!at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at
truth: --?Nakita na ninyo?--ani Elias, at inilagay sa bangca ang



100.07:  93%|█████████▎| 27/29 [01:35<00:07,  3.60s/it]



input: Tnngnn sy n t sbl n ngglt.


100.07:  97%|█████████▋| 28/29 [01:38<00:03,  3.54s/it]

guess: nan~gan nan~gan nanga
truth: Tiningnan siya ni tia Isabel na nagugulat.



227.64:  97%|█████████▋| 28/29 [01:39<00:03,  3.54s/it]



input: --yn n~g ng smssp c; tnmn nny ng sgt.


227.64: 100%|██████████| 29/29 [01:41<00:00,  3.51s/it]

guess: --!--!--anganganga
truth: --Iyan n~ga ang sumasaisip co; tinamaan ninyo ang sugat.

Epoch 0: train loss = 134.123707, test loss = 144.209132





In [100]:
print(train_losses)
print(test_losses)

[134.1237073446313]
[144.20913219451904]


Let's test the model on a new sentence:

In [103]:
"""
DIRECTIONS: Experiment with different test outputs. What are some things to keep in mind when changing the test outputs?
<5 min>
"""

test_output = "Umakyat ako sa jeep ni Tatay at umupo sa tabi niya."
test_input = "".join(c for c in test_output if c not in "AEIOUaeiou")
print("input: " + test_input)
x = torch.tensor(encode_string(test_input)).unsqueeze(0).to(model.device)
y = torch.tensor(encode_string(test_output)).unsqueeze(0).to(model.device)
T = torch.tensor([x.shape[1]]).to(model.device)
U = torch.tensor([y.shape[1]]).to(model.device)
guess = model.greedy_search(x,T)[0]
print("truth: " + test_output)
print("guess: " + decode_labels(guess))
print("")
y_guess = torch.tensor(guess).unsqueeze(0).to(model.device)
U_guess = torch.tensor(len(guess)).unsqueeze(0).to(model.device)

print("NLL of truth: " + str(model.compute_loss(x, y, T, U)))
print("NLL of guess: " + str(model.compute_loss(x, y_guess, T, U_guess)))

input: mkyt k s jp n Tty t mp s tb ny.
truth: Umakyat ako sa jeep ni Tatay at umupo sa tabi niya.
guess: at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at at

NLL of truth: tensor(127.5484, device='cuda:0', grad_fn=<NegBackward0>)
NLL of guess: tensor(147.4962, device='cuda:0', grad_fn=<NegBackward0>)


Observe that the negative log-likelihood of the guess is actually worse than that of the true label sequence (AKA, a "[search error](https://www.aclweb.org/anthology/D19-1331.pdf)"). This suggests that we could get better results using a beam search instead of the greedy search.