The following few segments of code define the two parts of the sequence to sequence (seq2seq) model: the encoder and the decoder. Each of these two parts is a recurrent neural network (RNN), which is a neural network that performs some operation on a sequence of data and uses the output generated by that operation as input for the next step (recurrence). For these RNNs, we use the **Gated Recurrent Unit** (GRU) architecture, as opposed to the more commonly used **Long Short Term Memory** (LSTM) architecture. This is because, despite being a newer architecture, GRU works similarly to LSTM and has been shown to yield similar results while being slightly more efficient computationally. [This paper](https://arxiv.org/pdf/1412.3555v1.pdf) gives a more in-depth overview of the differences between the two architectures.
# Encoder
In a seq2seq model using an encoder and decoder, the responsibility of the encoder is to encode, or condense, the input sequence into a single vector while retaining the original meaning of that sequence. For each packet in the input sequence, the encoder will produce two things using the embedding layer:



1.   A **vector** (called output_vector in the following code)
2.   A **hidden state** (called hidden_state in the following code)



Following this, the vector and hidden state will be taken as input to do the next step on the next packet in the sequence, and the output vector will be adjusted accordingly and a new hidden state produced. This process is repeated until a final output vector (the **context vector**) is reached, which will be given to the decoder later on. The forward function carries out these tasks in our implementation.

In [0]:
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Recurrent neural network for Encoder of the seq2seq model
class Encoder(nn.Module):
    def __init__(self, input_size, hidden_size):
        super(Encoder, self).__init__()
        self.hidden_size = hidden_size
        self.embedding = nn.Embedding(num_embeddings=input_size, embedding_dim=hidden_size) # Embedding layer
        self.gru = nn.GRU(hidden_size, hidden_size) # Applies Gated Recurrent Unit (GRU) to input sequence

    def forward(self, input_token, hidden_state):
        embedded = self.embedding(input_token).view(1, 1, -1)
        output_vector = embedded
        output_vector, hidden_state = self.gru(output_vector, hidden_state)
        return output_vector, hidden_state

    def init_hidden(self):
        return torch.zeros(1, 1, self.hidden_size, device=device)

# Decoder

The decoder, like the encoder, is a recurrent neural network using GRU architecture. The decoder takes the context vector as its initial hidden state. As before, the forward function carries out the necessary steps, taking an input token and hidden state as input, then producing an output vector and new hidden state. Unlike the encoder, however, the decoder applies the softmax function to the output vector for normalization.

In [0]:
# Recurrent neural network for Decoder of seq2seq model
# MIGHT HAVE TO USE ATTN DECODER FROM TUTORIAL
class Decoder(nn.Module):
    def __init__(self, hidden_size, output_size):
        super(Decoder, self).__init__()
        self.hidden_size = hidden_size
        self.embedding = nn.Embedding(output_size, hidden_size) # Embedding layer
        self.gru = nn.GRU(hidden_size, hidden_size) # Applies GRU
        self.out = nn.Linear(hidden_size, output_size)
        self.softmax = nn.LogSoftmax(dim=1)

    def forward(self, input_token, hidden_state):
        output_vector = self.embedding(input_token).view(1, 1, -1)
        output_vector = F.relu(output_vector)
        output_vector, hidden_state = self.gru(output_vector, hidden_state)
        output_vector = self.softmax(self.out(output_vector[0]))
        return output_vector, hidden_state

    def init_hidden(self):
        return torch.zeros(1, 1, self.hidden_size, device=device)


In [0]:
# Start and end of connection tokens
SOC_token = 0
EOC_token = 1

# Training function
# Criterion = negative log likelihood loss (NLLLoss)
def train(input_tensor, target_tensor, encoder, decoder, encoder_optimizer, decoder_optimizer, criterion, max_length=320):
    encoder_hidden = encoder.init_hidden() # Initialize hidden state of the encoder
    encoder_optimizer.zero_grad()
    decoder_optimizer.zero_grad()
    input_length = input_tensor.size(0) # length of the input sequence
    target_length = target_tensor.size(0) # length of the target sequence
    #encoder_outputs = torch.zeros(max_length, encoder.hidden_size, device=device)
    loss = 0

# loop through the input tokens w/ encoder and get the final vector/hidden state
    for ei in range(input_length):
        encoder_output, encoder_hidden = encoder(input_tensor[ei], encoder_hidden)
       # encoder_outputs[ei] = encoder_output[0, 0]

    decoder_input = torch.tensor([[SOC_token]], device=device)
    decoder_hidden = encoder_hidden # Initialize the hidden state of the decoder
    # Run the decoder for each element of the target sequence
    for di in range(target_length):
      decoder_output, decoder_hidden = decoder(decoder_input, decoder_hidden)
      topv, topi = decoder_output.topk(1)
      decoder_input = topi.squeeze().detach()
      loss = loss + criterion(decoder_output, target_tensor[di]) # Compute the loss
      if decoder_input.item() == EOC_token: # break if end of connection token is reached
        break

    loss.backward()
    encoder_optimizer.step()
    decoder_optimizer.step()
    return loss.item() / target_length

In [0]:
import math
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker

# Helper functions to keep track of the time elapsed and time remaining
def as_minutes(sec):
    mins = math.floor(sec / 60)
    sec = sec - (mins * 60)
    return '%dm %ds' % (mins, sec)

def time_since(since, percent):
    now = time.time()
    sec = now-since
    es = sec/(percent)
    rs = es-sec
    return '%sec (- %sec)' % (as_minutes(sec), as_minutes(rs))

# Plot loss vs number of iterations
#plt.switch_backend('agg')
def show_plot(points):
    plt.figure()
    fig, ax = plt.subplots()
    loc = ticker.MultipleLocator(base=0.2) # Put ticks at intervals of 0.2
    ax.yaxis.set_major_locator(loc)
    plt.plot(points)

In [0]:
# Repeatedly run the train function and print evaluation info as it goes
def train_iterations(encoder, decoder, n_iterations, print_every=1000, plot_every=100, learning_rate=0.01):
    start = time.time()
    plot_losses = []
    print_loss_total = 0  # Reset every print_every
    plot_loss_total = 0  # Reset every plot_every

    encoder_optimizer = optim.SGD(encoder.parameters(), lr=learning_rate)
    decoder_optimizer = optim.SGD(decoder.parameters(), lr=learning_rate)
    # need to make this line work with our data
    #training_pairs = [tensorsFromPair(random.choice(pairs)) for i in range(n_iterations)]
    criterion = nn.NLLLoss()

    # Loop to train the model with the specified number of iterations
    for iteration in range(1, n_iterations + 1):
        training_pair = training_pairs[iteration - 1]
        # Tutorial sequences are in pairs with format [input, target]
        input_tensor = training_pair[0]
        target_tensor = training_pair[1]
        loss = train(input_tensor, target_tensor, encoder, decoder, encoder_optimizer, decoder_optimizer, criterion)
        print_loss_total = print_loss_total + loss
        plot_loss_total = plot_loss_total + loss

        # If it has reached the print interval, print progress information
        if iteration % print_every == 0:
            print_loss_avg = print_loss_total / print_every
            print_loss_total = 0
            print('%s (%d %d%%) %.4f' % (time_since(start, iteration / n_iterations), iteration, iteration / n_iterations * 100, print_loss_avg))

        # If it has reached the plot interval, add info to the plot_losses array
        if iter % plot_every == 0:
            plot_loss_avg = plot_loss_total / plot_every
            plot_losses.append(plot_loss_avg)
            plot_loss_total = 0

    show_plot(plot_losses)