# Overview

In this notebook I will train character-level LSTM. The model will train character by character on some text in order to aftterewards produce completely new text also character by character. This example will be based on Anna Karenina text. The goal is to produce a network that will be able to generate a chunk of text based in the same style as Anna Karenina.<br>

The structure of the network.<br>
<img src="images/LSTM4.jpeg" width="500"><br>
Credits: Udacity Computer vision Nanodegree


# Loding and preparing the data for training

In [1]:
# load resources
import numpy as np
import torch
from torch import nn
import torch.nn.functional as F

In [2]:
# open text file and read it as a text
with open("data/anna.txt", "r") as stream:
    text = stream.read()

# show sample
text[:100]

'Chapter 1\n\n\nHappy families are all alike; every unhappy family is unhappy in its own\nway.\n\nEverythin'

Now we need to map that text into integers and to do that following steps were taken:
1. Create a tuple of all distinct characters present in the text
2. From that tuple create a dictionary where integers are keys
3. Then inverse mapping from relation `int : char` to `char: int`
4. Using dict from point 3. map whole text into a list of corresponding integers

In [27]:
# step 1: use set constructor to get unique chars from whole text. Next make it immutable using tuple
chars = tuple(set(text))
# step 2: create dictionary `int : char`
int2char = dict(enumerate(chars))
# step 3: now inverse the mapping
char2int = {char : integer for integer, char in int2char.items()}
# step 4: map every char in text to the corresponding value in char2int dict, save it as numpy array
encoded = np.array([char2int[char]for char in text])

Let's see again the first line of the text and its encoded version

In [4]:
text[:100]

'Chapter 1\n\n\nHappy families are all alike; every unhappy family is unhappy in its own\nway.\n\nEverythin'

In [5]:
encoded[:100]

array([11, 12,  3, 50, 16, 27, 14, 80, 68, 75, 75, 75, 42,  3, 50, 50, 34,
       80, 81,  3, 20, 23, 63, 23, 27, 45, 80,  3, 14, 27, 80,  3, 63, 63,
       80,  3, 63, 23, 47, 27, 46, 80, 27, 77, 27, 14, 34, 80, 57, 28, 12,
        3, 50, 50, 34, 80, 81,  3, 20, 23, 63, 34, 80, 23, 45, 80, 57, 28,
       12,  3, 50, 50, 34, 80, 23, 28, 80, 23, 16, 45, 80,  5, 65, 28, 75,
       65,  3, 34, 25, 75, 75, 24, 77, 27, 14, 34, 16, 12, 23, 28])

In [6]:
# the number of distinct characters in our text == size of our vocabulary
max(encoded)

82

Everything is working perfectly :)

### One-hot encoding
As can be seen on the image above LSTM expects one-hot encoded characters. So for every letter we would have a vector of length `max(encoded) = 82`, where only one element will be 1 representing that particular character.

In [7]:
def one_hot_encode(arr, n_labels):
    
    # Initialize the the encoded array
    one_hot = np.zeros((np.multiply(*arr.shape), n_labels), dtype=np.float32)
    
    # Fill the appropriate elements with ones
    one_hot[np.arange(one_hot.shape[0]), arr.flatten()] = 1.
    
    # Finally reshape it to get back to the original array
    one_hot = one_hot.reshape((*arr.shape, n_labels))
    
    return one_hot

### Making training mini-batches
When making mini-batches in sequence data it is important to understand wheter we are talking abiut **batch size** or **sequence lenght**. Image presented below perfectly describes that.<br>

<img src="images/LSTM5.png" width="500"><br>
Credits: Udacity Computer vision Nanodegree<br>

### Creating batches: step-by-step guide
Legend:<br>
`N - batch size`<br>
`M - Sequence length`<br>
`K - Total number of completely full batches of size N`<br>
`arr - sequence of encoded characters(ecoded by a dictionary, not one-hot encoded)`<br>
`n - number of all characters in arr, simply len(arr)`

1. Discard data that do not fit in complete batches<br>
To do taht we need to compute `K`. It is simply number of all chars in `arr`, `n` divided by number of chars in a single batch `N * M`. Once we get `K` we have to multiply it by `N * M` in order to obtain number of chars from `arr` we want to keep.

2. Having prepared `arr` we need to split it into `N` sequences
`arr` has to be reshaped into matrix of size `(N, M * K)`.

3. Lastly, we have to iterate through that matrix to get our batches
Iterate through that matrix can be see as moving a window of size `(N, M)` with a step `M`

In [8]:
def get_batches(arr, n_seqs, n_steps):
    """
    
    Generator that returs batches of size (n_seqs, n_steps) from arr
    
    Paramteters:
    -----------
    arr: numpy array from which the batches are created
    n_seqs: batch size (N)
    n_steps: num (M)
    
    """
    # compute number of characters in a batch (K)
    num_char_batch = n_seqs * n_steps
    
    # get the number of batches that fit arr completely, // - integer division
    n_batches = len(arr)//num_char_batch
    
    # keep only enough characters to make full batches
    arr = arr[:n_batches*num_char_batch]
    
    # reshape arr in order to get shape (N, M*K)
    arr = arr.reshape(n_seqs, -1)
    
    # get batches from prepared array
    for n in range(0, arr.shape[1], n_steps):
        
        # batch with features
        x = arr[ : , n : n + n_steps]
        
        # batch with targets
        y = np.zeros_like(x)
        
        # shift feature batch by one,
        try:
            y[:, :-1], y[:, -1] = x[:, 1:], arr[:, n + n_steps]
        except IndexError:
            #when we get to the end of the batch take first column of the arr
            y[:, :-1], y[:, -1] = x[:, 1:], arr[:, 0]
        yield x, y        
        
# Let's test it
batches = get_batches(encoded, 10, 10)
x, y = next(batches)
print("Feature batch\n",x)
print("Target batch\n",y)

Feature batch
 [[11 12  3 50 16 27 14 80 68 75]
 [34 72 55 80  3 28 45 65 27 14]
 [ 3 35 28 23 81 23 26 27 28 16]
 [80  3 80 50 14 27 77 23  5 57]
 [27 80 16 12  3 28 47 27 67 80]
 [80 45  3 65 80 12 27 14 80 16]
 [26 16 27 67 80 65 12  3 16 75]
 [ 3 45 80 45 57 81 81 27 14 23]
 [27 67 80 23 16 45 80 16 23 20]
 [26 27 80  5 81 80 20 23 28 67]]
Target batch
 [[12  3 50 16 27 14 80 68 75 75]
 [72 55 80  3 28 45 65 27 14 27]
 [35 28 23 81 23 26 27 28 16 46]
 [ 3 80 50 14 27 77 23  5 57 45]
 [80 16 12  3 28 47 27 67 80 51]
 [45  3 65 80 12 27 14 80 16 27]
 [16 27 67 80 65 12  3 16 75 12]
 [45 80 45 57 81 81 27 14 23 28]
 [67 80 23 16 45 80 16 23 20 27]
 [27 80  5 81 80 20 23 28 67 80]]


Function seems to work as expected. We can see that the second column of the `Feature batch` is the first column of the `Target batch` and also last column in `Feature batch` is the last but one column in `Target batch`.

# Defining the network structure

<img src="images/LSTM6.png" width="500"><br>
Credits: Udacity Computer vision Nanodegree<br>


### Model structure


In [9]:
class CharRNN(nn.Module):
    
    def __init__(self, tokens, n_steps=100, n_hidden=256, n_layers=2,
                               drop_prob=0.5, lr=0.001):
        super().__init__()
        self.drop_prob = drop_prob
        self.n_layers = n_layers
        self.n_hidden = n_hidden
        self.lr = lr
        
        # creating character dictionaries
        self.chars = tokens
        self.int2char = dict(enumerate(self.chars))
        self.char2int = {ch: ii for ii, ch in self.int2char.items()}
        
        # LSTM layer
        self.lstm = nn.LSTM(len(self.chars), self.n_hidden, self.n_layers, dropout=self.drop_prob, batch_first=True)
        
        # Dropout layer
        self.dropout = nn.Dropout(self.drop_prob)
        
        # Fully-connected layer
        self.fc = nn.Linear(n_hidden, len(self.chars))
        
        # initialize the weights
        self.init_weights()
      
    
    def forward(self, x, hc):
        '''
        
        Forward pass through the network. 
            These inputs are x, and the hidden/cell state `hc`. 
            
        '''
        # Get x, and the new hidden state (h, c) from the lstm
        x, (h, c) = self.lstm(x, hc)
        
        # Apply droput
        x = self.dropout(x)
        
        # Stack up LSTM outputs using view
        x = x.view(x.size()[0]*x.size()[1], self.n_hidden)
        
        # put x through the fully-connected layer
        x = self.fc(x)
        
        # return x and the hidden state (h, c)
        return x, (h, c)
    
    
    def predict(self, char, h=None, cuda=False, top_k=None):
        ''' Given a character, predict the next character.
        
            Returns the predicted character and the hidden state.
        '''
        if cuda:
            self.cuda()
        else:
            self.cpu()
        
        if h is None:
            h = self.init_hidden(1)
        
        x = np.array([[self.char2int[char]]])
        x = one_hot_encode(x, len(self.chars))
        inputs = torch.from_numpy(x)
        if cuda:
            inputs = inputs.cuda()
        
        h = tuple([each.data for each in h])
        out, h = self.forward(inputs, h)

        p = F.softmax(out, dim=1).data
        if cuda:
            p = p.cpu()
        
        if top_k is None:
            top_ch = np.arange(len(self.chars))
        else:
            p, top_ch = p.topk(top_k)
            top_ch = top_ch.numpy().squeeze()
        
        p = p.numpy().squeeze()
        char = np.random.choice(top_ch, p=p/p.sum())
            
        return self.int2char[char], h
    
    def init_weights(self):
        ''' Initialize weights for fully connected layer '''
        initrange = 0.1
        
        # Set bias tensor to all zeros
        self.fc.bias.data.fill_(0)
        # FC weights as random uniform
        self.fc.weight.data.uniform_(-1, 1)
        
    def init_hidden(self, n_seqs):
        ''' Initializes hidden state '''
        # Create two new tensors with sizes n_layers x n_seqs x n_hidden,
        # initialized to zero, for hidden state and cell state of LSTM
        weight = next(self.parameters()).data
        return (weight.new(self.n_layers, n_seqs, self.n_hidden).zero_(),
                weight.new(self.n_layers, n_seqs, self.n_hidden).zero_())
        

In order to fully understand how exacly the LSTM model works, I will manually perform one iteration of the training loop presented below. The goal will be to show steps that are taken during processing one batch of train set.

In [10]:
def train(net, data, epochs=10, n_seqs=10, n_steps=50, lr=0.001, clip=5, val_frac=0.1, cuda=False, print_every=10):
    ''' Training a network 
    
        Arguments
        ---------
        
        net: CharRNN network
        data: text data to train the network
        epochs: Number of epochs to train
        n_seqs: Number of mini-sequences per mini-batch, aka batch size
        n_steps: Number of character steps per mini-batch
        lr: learning rate
        clip: gradient clipping
        val_frac: Fraction of data to hold out for validation
        cuda: Train with CUDA on a GPU
        print_every: Number of steps for printing training and validation loss
    
    '''
    
    net.train()
    opt = torch.optim.Adam(net.parameters(), lr=lr)
    criterion = nn.CrossEntropyLoss()
    
    # create training and validation data
    val_idx = int(len(data)*(1-val_frac))
    data, val_data = data[:val_idx], data[val_idx:]
    
    if cuda:
        net.cuda()
    
    counter = 0
    n_chars = len(net.chars)
    for e in range(epochs):
        h = net.init_hidden(n_seqs)
        for x, y in get_batches(data, n_seqs, n_steps):
            counter += 1
            
            # One-hot encode our data and make them Torch tensors
            x = one_hot_encode(x, n_chars)
            inputs, targets = torch.from_numpy(x), torch.from_numpy(y)
            
            if cuda:
                inputs, targets = inputs.cuda(), targets.cuda()

            # Creating new variables for the hidden state, otherwise
            # we'd backprop through the entire training history
            h = tuple([each.data for each in h])

            net.zero_grad()
            
            output, h = net.forward(inputs, h)
            loss = criterion(output, targets.view(n_seqs*n_steps))

            loss.backward()
            
            # `clip_grad_norm` helps prevent the exploding gradient problem in RNNs / LSTMs.
            nn.utils.clip_grad_norm_(net.parameters(), clip)

            opt.step()
            
            if counter % print_every == 0:
                
                # Get validation loss
                val_h = net.init_hidden(n_seqs)
                val_losses = []
                for x, y in get_batches(val_data, n_seqs, n_steps):
                    # One-hot encode our data and make them Torch tensors
                    x = one_hot_encode(x, n_chars)
                    x, y = torch.from_numpy(x), torch.from_numpy(y)
                    
                    # Creating new variables for the hidden state, otherwise
                    # we'd backprop through the entire training history
                    val_h = tuple([each.data for each in val_h])
                    
                    inputs, targets = x, y
                    if cuda:
                        inputs, targets = inputs.cuda(), targets.cuda()

                    output, val_h = net.forward(inputs, val_h)
                    val_loss = criterion(output, targets.view(n_seqs*n_steps))
                
                    val_losses.append(val_loss.item())
                
                print("Epoch: {}/{}...".format(e+1, epochs),
                      "Step: {}...".format(counter),
                      "Loss: {:.4f}...".format(loss.item()),
                      "Val Loss: {:.4f}".format(np.mean(val_losses)))

# Training of one batch: step-by-step
First we have to define a network with parameters

In [11]:
if 'net' in locals():
    del net

# define and print the net
net = CharRNN(chars, n_hidden=512, n_layers=2)
print(net)

CharRNN(
  (lstm): LSTM(83, 512, num_layers=2, batch_first=True, dropout=0.5)
  (dropout): Dropout(p=0.5)
  (fc): Linear(in_features=512, out_features=83, bias=True)
)


So the structure is as follows:
1. We have an input layer of size 83 because it is the length of our vocabulary and every letter is one-hot encoded to the vector of size 83
2. Next our inputs goes through 2 layers of LSTM with 512 nodes each
3. At last, second layer of LSTM is connected with fully-connected layer which takes input of size 512 and outputs vector of lenght 83<br>

#### First we turn on train mode in network, declare optimizer and our loss function

In [12]:
net.train()
opt = torch.optim.Adam(net.parameters(), lr=0.001)
criterion = nn.CrossEntropyLoss()

Then we have our encoded data so every char mapped to its value in char2int dictionary.<br>
Overall we have nearly 2 million characters to process during one epoch.

In [13]:
len(encoded)

1985223

Next step is to divide dat set into train and validation sets

In [14]:
# first we take 10% of that data for validation set
val_frac = 0.1
data = encoded
val_idx = int(len(data)*(1-val_frac))
data, val_data = data[:val_idx], data[val_idx:]
print(len(data), len(val_data))

1786700 198523


#### Next we are entering training loop

In [15]:
# We need to define some variables to emulate the train process

# batch_size
n_seqs = 10

#sequnce length
n_steps = 50

# number of distinct characters
n_chars = len(net.chars)
print(n_chars)

83


Before entering the batch loop (loop, where all batches are feeded to network) we have to initialize hidden state in order to clear out memory before next epoch.<br>After that we get our batch

In [16]:
x, y = next(get_batches(data, n_seqs, n_steps))
print(x.shape)

(10, 50)


So our batch is a matrix of shape (10, 50), meaning that we have 10 sequences of 50 characters each after getting our batch we have to one-hot encode it.

In [17]:
x = one_hot_encode(x, n_chars)
print(x.shape)

(10, 50, 83)


After that each char in x transformed into vector of size 83 with only one 1 and rest is zeros.<br>
Having done that we have to transform x and y into torch tensors.

In [18]:
inputs, targets = torch.from_numpy(x), torch.from_numpy(y)
print(inputs.shape, targets.shape)

torch.Size([10, 50, 83]) torch.Size([10, 50])


Before processing each batch we initialize hidden layer which is a tuple of `(hidden state, cell state)` so it is kind of `(short term memory, long term memory)`
each of that cells has following dimensions `(number of LSTM layers, number of steps/number of chars in a sequence, number of hidden nodes) = (2, 10, 512)`

In [19]:
h = net.init_hidden(n_seqs)
print(h[0].shape)

torch.Size([2, 10, 512])


Creating new variables for the hidden state, otherwise we'd backprop through the entire training history

In [20]:
h = tuple([each.data for each in h])

After that we conduct forward pass: `output, h = net.forward(inputs, h)`, but I will emulate working of forward method.

In [21]:
# just before forward pass we zero the gradients
net.zero_grad()

# first we have to go through 2 layers of LSTM with our input so:
x, (h,c) = net.lstm(inputs, h)

# So for each character in our batch we have 512 values of hidden units which can be seen above
print(x.shape)

# here we can see shapes of our long term and short term memory
print(h.shape, c.shape)

torch.Size([10, 50, 512])
torch.Size([2, 10, 512]) torch.Size([2, 10, 512])


In [22]:
# after that we go through dropout
x = net.dropout(x)
print(x.shape)

torch.Size([10, 50, 512])


After taht we stack up LSTM outputs from `2 layers` meaning taht we concatenate batch `(10,50)` into `500` char long sequence in order to put it thorug fully connected layer

In [23]:
x = x.view(x.size()[0]*x.size()[1], net.n_hidden)
print(x.shape)

torch.Size([500, 512])


Finally we put that through fully connected layer

In [24]:
x = net.fc(x)
print(x.shape)

torch.Size([500, 83])


#### Having that matrix we compute loss

In [25]:
# x has diemnsions (500, 83) so we need to adjust targets and reduce it to dimension (10 * 50) = (500)
loss = criterion(x, targets.view(n_seqs * n_steps))

# calculate gradients
loss.backward()

# `clip_grad_norm` helps prevent the exploding gradient problem in RNNs / LSTMs.
clip = 5
nn.utils.clip_grad_norm_(net.parameters(), clip)

# use Adam optimizer to update weights
opt.step()

# THE END

# Actual training
It is finally time to acually train the network. At first I will start with some given hyperparameters and then introduce strategy to actually find the best composition of those. Training will be perform on GPU.

In [26]:
if 'net' in locals():
    del net

# define and print the net
net = CharRNN(chars, n_hidden=512, n_layers=2)
print(net)

CharRNN(
  (lstm): LSTM(83, 512, num_layers=2, batch_first=True, dropout=0.5)
  (dropout): Dropout(p=0.5)
  (fc): Linear(in_features=512, out_features=83, bias=True)
)


The training was performed using Udacity workspace when GPU was enabled.<br>Below I've just copied log of loss over 25 epochs.

In [None]:
n_seqs, n_steps = 128, 100

train(net, encoded, epochs=25, n_seqs=n_seqs, n_steps=n_steps, lr=0.001, cuda=True, print_every=10)

Epoch: 1/25... Step: 10... Loss: 3.3159... Val Loss: 3.3093<br>Epoch: 1/25... Step: 20... Loss: 3.1873... Val Loss: 3.2018<br>Epoch: 1/25... Step: 30... Loss: 3.1056... Val Loss: 3.0856<br>Epoch: 1/25... Step: 40... Loss: 2.9297... Val Loss: 2.9325<br>Epoch: 1/25... Step: 50... Loss: 2.7908... Val Loss: 2.7526<br>Epoch: 1/25... Step: 60... Loss: 2.6358... Val Loss: 2.6394<br>Epoch: 1/25... Step: 70... Loss: 2.5551... Val Loss: 2.5683<br>Epoch: 1/25... Step: 80... Loss: 2.4930... Val Loss: 2.5113<br>Epoch: 1/25... Step: 90... Loss: 2.4619... Val Loss: 2.4654<br>Epoch: 1/25... Step: 100... Loss: 2.4012... Val Loss: 2.4249<br>Epoch: 1/25... Step: 110... Loss: 2.3536... Val Loss: 2.3902<br>Epoch: 1/25... Step: 120... Loss: 2.2958... Val Loss: 2.3639<br>Epoch: 1/25... Step: 130... Loss: 2.3073... Val Loss: 2.3310<br>Epoch: 2/25... Step: 140... Loss: 2.2821... Val Loss: 2.3068<br>Epoch: 2/25... Step: 150... Loss: 2.2514... Val Loss: 2.2862<br>Epoch: 2/25... Step: 160... Loss: 2.2345... Val Loss: 2.2568<br>Epoch: 2/25... Step: 170... Loss: 2.1947... Val Loss: 2.2368<br>Epoch: 2/25... Step: 180... Loss: 2.1444... Val Loss: 2.2165<br>Epoch: 2/25... Step: 190... Loss: 2.0891... Val Loss: 2.1917<br>Epoch: 2/25... Step: 200... Loss: 2.1047... Val Loss: 2.1715<br>Epoch: 2/25... Step: 210... Loss: 2.0788... Val Loss: 2.1480<br>Epoch: 2/25... Step: 220... Loss: 2.0384... Val Loss: 2.1330<br>Epoch: 2/25... Step: 230... Loss: 2.0574... Val Loss: 2.1126<br>Epoch: 2/25... Step: 240... Loss: 2.0574... Val Loss: 2.0972<br>Epoch: 2/25... Step: 250... Loss: 1.9791... Val Loss: 2.0772<br>Epoch: 2/25... Step: 260... Loss: 1.9577... Val Loss: 2.0597<br>Epoch: 2/25... Step: 270... Loss: 1.9817... Val Loss: 2.0538<br>Epoch: 3/25... Step: 280... Loss: 1.9690... Val Loss: 2.0336<br>Epoch: 3/25... Step: 290... Loss: 1.9669... Val Loss: 2.0236<br>Epoch: 3/25... Step: 300... Loss: 1.9260... Val Loss: 2.0012<br>Epoch: 3/25... Step: 310... Loss: 1.9150... Val Loss: 1.9907<br>Epoch: 3/25... Step: 320... Loss: 1.8774... Val Loss: 1.9850<br>Epoch: 3/25... Step: 330... Loss: 1.8600... Val Loss: 1.9686<br>Epoch: 3/25... Step: 340... Loss: 1.9092... Val Loss: 1.9688<br>Epoch: 3/25... Step: 350... Loss: 1.8587... Val Loss: 1.9501<br>Epoch: 3/25... Step: 360... Loss: 1.8098... Val Loss: 1.9428<br>Epoch: 3/25... Step: 370... Loss: 1.8453... Val Loss: 1.9274<br>Epoch: 3/25... Step: 380... Loss: 1.8386... Val Loss: 1.9228<br>Epoch: 3/25... Step: 390... Loss: 1.8054... Val Loss: 1.9142<br>Epoch: 3/25... Step: 400... Loss: 1.7830... Val Loss: 1.9039<br>Epoch: 3/25... Step: 410... Loss: 1.7997... Val Loss: 1.8935<br>Epoch: 4/25... Step: 420... Loss: 1.8049... Val Loss: 1.8995<br>Epoch: 4/25... Step: 430... Loss: 1.7970... Val Loss: 1.8935<br>Epoch: 4/25... Step: 440... Loss: 1.7874... Val Loss: 1.8746<br>Epoch: 4/25... Step: 450... Loss: 1.7329... Val Loss: 1.8633<br>Epoch: 4/25... Step: 460... Loss: 1.7154... Val Loss: 1.8649<br>Epoch: 4/25... Step: 470... Loss: 1.7659... Val Loss: 1.8441<br>Epoch: 4/25... Step: 480... Loss: 1.7373... Val Loss: 1.8343<br>Epoch: 4/25... Step: 490... Loss: 1.7382... Val Loss: 1.8269<br>Epoch: 4/25... Step: 500... Loss: 1.7347... Val Loss: 1.8096<br>Epoch: 4/25... Step: 510... Loss: 1.7092... Val Loss: 1.8068<br>Epoch: 4/25... Step: 520... Loss: 1.7303... Val Loss: 1.8002<br>Epoch: 4/25... Step: 530... Loss: 1.6954... Val Loss: 1.7920<br>Epoch: 4/25... Step: 540... Loss: 1.6561... Val Loss: 1.7857<br>Epoch: 4/25... Step: 550... Loss: 1.7250... Val Loss: 1.7773<br>Epoch: 5/25... Step: 560... Loss: 1.6785... Val Loss: 1.7746<br>Epoch: 5/25... Step: 570... Loss: 1.6542... Val Loss: 1.7725<br>Epoch: 5/25... Step: 580... Loss: 1.6477... Val Loss: 1.7600<br>Epoch: 5/25... Step: 590... Loss: 1.6425... Val Loss: 1.7561<br>Epoch: 5/25... Step: 600... Loss: 1.6433... Val Loss: 1.7512<br>Epoch: 5/25... Step: 610... Loss: 1.6155... Val Loss: 1.7479<br>Epoch: 5/25... Step: 620... Loss: 1.6287... Val Loss: 1.7419<br>Epoch: 5/25... Step: 630... Loss: 1.6535... Val Loss: 1.7396<br>Epoch: 5/25... Step: 640... Loss: 1.6032... Val Loss: 1.7283<br>Epoch: 5/25... Step: 650... Loss: 1.6183... Val Loss: 1.7290<br>Epoch: 5/25... Step: 660... Loss: 1.6003... Val Loss: 1.7176<br>Epoch: 5/25... Step: 670... Loss: 1.6108... Val Loss: 1.7121<br>Epoch: 5/25... Step: 680... Loss: 1.6111... Val Loss: 1.7075<br>Epoch: 5/25... Step: 690... Loss: 1.5933... Val Loss: 1.6998<br>Epoch: 6/25... Step: 700... Loss: 1.5820... Val Loss: 1.6995<br>Epoch: 6/25... Step: 710... Loss: 1.5780... Val Loss: 1.6966<br>Epoch: 6/25... Step: 720... Loss: 1.5630... Val Loss: 1.6890<br>Epoch: 6/25... Step: 730... Loss: 1.5711... Val Loss: 1.6838<br>Epoch: 6/25... Step: 740... Loss: 1.5351... Val Loss: 1.6778<br>Epoch: 6/25... Step: 750... Loss: 1.5393... Val Loss: 1.6732<br>Epoch: 6/25... Step: 760... Loss: 1.5733... Val Loss: 1.6745<br>Epoch: 6/25... Step: 770... Loss: 1.5439... Val Loss: 1.6674<br>Epoch: 6/25... Step: 780... Loss: 1.5302... Val Loss: 1.6678<br>Epoch: 6/25... Step: 790... Loss: 1.5147... Val Loss: 1.6688<br>Epoch: 6/25... Step: 800... Loss: 1.5488... Val Loss: 1.6506<br>Epoch: 6/25... Step: 810... Loss: 1.5204... Val Loss: 1.6464<br>Epoch: 6/25... Step: 820... Loss: 1.4854... Val Loss: 1.6457<br>Epoch: 6/25... Step: 830... Loss: 1.5307... Val Loss: 1.6456<br>Epoch: 7/25... Step: 840... Loss: 1.4946... Val Loss: 1.6413<br>Epoch: 7/25... Step: 850... Loss: 1.5114... Val Loss: 1.6342<br>Epoch: 7/25... Step: 860... Loss: 1.4910... Val Loss: 1.6290<br>Epoch: 7/25... Step: 870... Loss: 1.4871... Val Loss: 1.6307<br>Epoch: 7/25... Step: 880... Loss: 1.4952... Val Loss: 1.6224<br>Epoch: 7/25... Step: 890... Loss: 1.4992... Val Loss: 1.6254<br>Epoch: 7/25... Step: 900... Loss: 1.4963... Val Loss: 1.6277<br>Epoch: 7/25... Step: 910... Loss: 1.4472... Val Loss: 1.6142<br>Epoch: 7/25... Step: 920... Loss: 1.4770... Val Loss: 1.6144<br>Epoch: 7/25... Step: 930... Loss: 1.4561... Val Loss: 1.6112<br>Epoch: 7/25... Step: 940... Loss: 1.4617... Val Loss: 1.6131<br>Epoch: 7/25... Step: 950... Loss: 1.4788... Val Loss: 1.6054<br>Epoch: 7/25... Step: 960... Loss: 1.4930... Val Loss: 1.6024<br>Epoch: 7/25... Step: 970... Loss: 1.4810... Val Loss: 1.6031<br>Epoch: 8/25... Step: 980... Loss: 1.4477... Val Loss: 1.6051<br>Epoch: 8/25... Step: 990... Loss: 1.4636... Val Loss: 1.5951<br>Epoch: 8/25... Step: 1000... Loss: 1.4532... Val Loss: 1.5906<br>Epoch: 8/25... Step: 1010... Loss: 1.4911... Val Loss: 1.5973<br>Epoch: 8/25... Step: 1020... Loss: 1.4695... Val Loss: 1.5870<br>Epoch: 8/25... Step: 1030... Loss: 1.4444... Val Loss: 1.5850<br>Epoch: 8/25... Step: 1040... Loss: 1.4483... Val Loss: 1.5780<br>Epoch: 8/25... Step: 1050... Loss: 1.4211... Val Loss: 1.5773<br>Epoch: 8/25... Step: 1060... Loss: 1.4228... Val Loss: 1.5742<br>Epoch: 8/25... Step: 1070... Loss: 1.4290... Val Loss: 1.5729<br>Epoch: 8/25... Step: 1080... Loss: 1.4320... Val Loss: 1.5685<br>Epoch: 8/25... Step: 1090... Loss: 1.4218... Val Loss: 1.5744<br>Epoch: 8/25... Step: 1100... Loss: 1.4132... Val Loss: 1.5695<br>Epoch: 8/25... Step: 1110... Loss: 1.4295... Val Loss: 1.5660<br>Epoch: 9/25... Step: 1120... Loss: 1.4236... Val Loss: 1.5612<br>Epoch: 9/25... Step: 1130... Loss: 1.4288... Val Loss: 1.5617<br>Epoch: 9/25... Step: 1140... Loss: 1.4319... Val Loss: 1.5658<br>Epoch: 9/25... Step: 1150... Loss: 1.4372... Val Loss: 1.5612<br>Epoch: 9/25... Step: 1160... Loss: 1.4037... Val Loss: 1.5563<br>Epoch: 9/25... Step: 1170... Loss: 1.4041... Val Loss: 1.5523<br>Epoch: 9/25... Step: 1180... Loss: 1.3926... Val Loss: 1.5528<br>Epoch: 9/25... Step: 1190... Loss: 1.4313... Val Loss: 1.5456<br>Epoch: 9/25... Step: 1200... Loss: 1.3838... Val Loss: 1.5454<br>Epoch: 9/25... Step: 1210... Loss: 1.3912... Val Loss: 1.5534<br>Epoch: 9/25... Step: 1220... Loss: 1.4083... Val Loss: 1.5469<br>Epoch: 9/25... Step: 1230... Loss: 1.3862... Val Loss: 1.5406<br>Epoch: 9/25... Step: 1240... Loss: 1.3935... Val Loss: 1.5354<br>Epoch: 9/25... Step: 1250... Loss: 1.3859... Val Loss: 1.5359<br>Epoch: 10/25... Step: 1260... Loss: 1.4033... Val Loss: 1.5369<br>Epoch: 10/25... Step: 1270... Loss: 1.3965... Val Loss: 1.5346<br>Epoch: 10/25... Step: 1280... Loss: 1.4034... Val Loss: 1.5365<br>Epoch: 10/25... Step: 1290... Loss: 1.3915... Val Loss: 1.5305<br>Epoch: 10/25... Step: 1300... Loss: 1.3837... Val Loss: 1.5237<br>Epoch: 10/25... Step: 1310... Loss: 1.3917... Val Loss: 1.5241<br>Epoch: 10/25... Step: 1320... Loss: 1.3571... Val Loss: 1.5268<br>Epoch: 10/25... Step: 1330... Loss: 1.3556... Val Loss: 1.5149<br>Epoch: 10/25... Step: 1340... Loss: 1.3505... Val Loss: 1.5204<br>Epoch: 10/25... Step: 1350... Loss: 1.3432... Val Loss: 1.5282<br>Epoch: 10/25... Step: 1360... Loss: 1.3681... Val Loss: 1.5201<br>Epoch: 10/25... Step: 1370... Loss: 1.3438... Val Loss: 1.5191<br>Epoch: 10/25... Step: 1380... Loss: 1.3807... Val Loss: 1.5114<br>Epoch: 10/25... Step: 1390... Loss: 1.3951... Val Loss: 1.5117<br>Epoch: 11/25... Step: 1400... Loss: 1.3943... Val Loss: 1.5132<br>Epoch: 11/25... Step: 1410... Loss: 1.3973... Val Loss: 1.5121<br>Epoch: 11/25... Step: 1420... Loss: 1.3931... Val Loss: 1.5089<br>Epoch: 11/25... Step: 1430... Loss: 1.3529... Val Loss: 1.5090<br>Epoch: 11/25... Step: 1440... Loss: 1.3750... Val Loss: 1.5089<br>Epoch: 11/25... Step: 1450... Loss: 1.3117... Val Loss: 1.5087<br>Epoch: 11/25... Step: 1460... Loss: 1.3378... Val Loss: 1.5021<br>Epoch: 11/25... Step: 1470... Loss: 1.3335... Val Loss: 1.4986<br>Epoch: 11/25... Step: 1480... Loss: 1.3478... Val Loss: 1.4991<br>Epoch: 11/25... Step: 1490... Loss: 1.3293... Val Loss: 1.4990<br>Epoch: 11/25... Step: 1500... Loss: 1.3224... Val Loss: 1.5034<br>Epoch: 11/25... Step: 1510... Loss: 1.3007... Val Loss: 1.5053<br>Epoch: 11/25... Step: 1520... Loss: 1.3443... Val Loss: 1.4996<br>Epoch: 12/25... Step: 1530... Loss: 1.3903... Val Loss: 1.4923<br>Epoch: 12/25... Step: 1540... Loss: 1.3428... Val Loss: 1.4921<br>Epoch: 12/25... Step: 1550... Loss: 1.3674... Val Loss: 1.5067<br>Epoch: 12/25... Step: 1560... Loss: 1.3717... Val Loss: 1.4891<br>Epoch: 12/25... Step: 1570... Loss: 1.3109... Val Loss: 1.4941<br>Epoch: 12/25... Step: 1580... Loss: 1.3049... Val Loss: 1.4908<br>Epoch: 12/25... Step: 1590... Loss: 1.2866... Val Loss: 1.4870<br>Epoch: 12/25... Step: 1600... Loss: 1.3148... Val Loss: 1.4815<br>Epoch: 12/25... Step: 1610... Loss: 1.3099... Val Loss: 1.4816<br>Epoch: 12/25... Step: 1620... Loss: 1.3000... Val Loss: 1.4939<br>Epoch: 12/25... Step: 1630... Loss: 1.3458... Val Loss: 1.4979<br>Epoch: 12/25... Step: 1640... Loss: 1.3063... Val Loss: 1.4900<br>Epoch: 12/25... Step: 1650... Loss: 1.2868... Val Loss: 1.4868<br>Epoch: 12/25... Step: 1660... Loss: 1.3420... Val Loss: 1.4797<br>Epoch: 13/25... Step: 1670... Loss: 1.3007... Val Loss: 1.4791<br>Epoch: 13/25... Step: 1680... Loss: 1.3180... Val Loss: 1.4844<br>Epoch: 13/25... Step: 1690... Loss: 1.2990... Val Loss: 1.4728<br>Epoch: 13/25... Step: 1700... Loss: 1.2951... Val Loss: 1.4825<br>Epoch: 13/25... Step: 1710... Loss: 1.2754... Val Loss: 1.4770<br>Epoch: 13/25... Step: 1720... Loss: 1.2911... Val Loss: 1.4742<br>Epoch: 13/25... Step: 1730... Loss: 1.3257... Val Loss: 1.4724<br>Epoch: 13/25... Step: 1740... Loss: 1.2895... Val Loss: 1.4690<br>Epoch: 13/25... Step: 1750... Loss: 1.2667... Val Loss: 1.4689<br>Epoch: 13/25... Step: 1760... Loss: 1.2911... Val Loss: 1.4685<br>Epoch: 13/25... Step: 1770... Loss: 1.3091... Val Loss: 1.4718<br>Epoch: 13/25... Step: 1780... Loss: 1.2802... Val Loss: 1.4713<br>Epoch: 13/25... Step: 1790... Loss: 1.2763... Val Loss: 1.4728<br>Epoch: 13/25... Step: 1800... Loss: 1.2970... Val Loss: 1.4659<br>Epoch: 14/25... Step: 1810... Loss: 1.3146... Val Loss: 1.4687<br>Epoch: 14/25... Step: 1820... Loss: 1.2907... Val Loss: 1.4791<br>Epoch: 14/25... Step: 1830... Loss: 1.3057... Val Loss: 1.4598<br>Epoch: 14/25... Step: 1840... Loss: 1.2567... Val Loss: 1.4582<br>Epoch: 14/25... Step: 1850... Loss: 1.2383... Val Loss: 1.4598<br>Epoch: 14/25... Step: 1860... Loss: 1.3005... Val Loss: 1.4655<br>Epoch: 14/25... Step: 1870... Loss: 1.2972... Val Loss: 1.4575<br>Epoch: 14/25... Step: 1880... Loss: 1.2919... Val Loss: 1.4687<br>Epoch: 14/25... Step: 1890... Loss: 1.3076... Val Loss: 1.4660<br>Epoch: 14/25... Step: 1900... Loss: 1.2872... Val Loss: 1.4543<br>Epoch: 14/25... Step: 1910... Loss: 1.2714... Val Loss: 1.4595<br>Epoch: 14/25... Step: 1920... Loss: 1.2844... Val Loss: 1.4638<br>Epoch: 14/25... Step: 1930... Loss: 1.2511... Val Loss: 1.4607<br>Epoch: 14/25... Step: 1940... Loss: 1.2981... Val Loss: 1.4574<br>Epoch: 15/25... Step: 1950... Loss: 1.2744... Val Loss: 1.4683<br>Epoch: 15/25... Step: 1960... Loss: 1.2715... Val Loss: 1.4550<br>Epoch: 15/25... Step: 1970... Loss: 1.2739... Val Loss: 1.4469<br>Epoch: 15/25... Step: 1980... Loss: 1.2490... Val Loss: 1.4507<br>Epoch: 15/25... Step: 1990... Loss: 1.2513... Val Loss: 1.4609<br>Epoch: 15/25... Step: 2000... Loss: 1.2359... Val Loss: 1.4561<br>Epoch: 15/25... Step: 2010... Loss: 1.2736... Val Loss: 1.4578<br>Epoch: 15/25... Step: 2020... Loss: 1.2806... Val Loss: 1.4551<br>Epoch: 15/25... Step: 2030... Loss: 1.2510... Val Loss: 1.4521<br>Epoch: 15/25... Step: 2040... Loss: 1.2701... Val Loss: 1.4490<br>Epoch: 15/25... Step: 2050... Loss: 1.2614... Val Loss: 1.4533<br>Epoch: 15/25... Step: 2060... Loss: 1.2566... Val Loss: 1.4474<br>Epoch: 15/25... Step: 2070... Loss: 1.2693... Val Loss: 1.4444<br>Epoch: 15/25... Step: 2080... Loss: 1.2649... Val Loss: 1.4406<br>Epoch: 16/25... Step: 2090... Loss: 1.2665... Val Loss: 1.4460<br>Epoch: 16/25... Step: 2100... Loss: 1.2546... Val Loss: 1.4525<br>Epoch: 16/25... Step: 2110... Loss: 1.2442... Val Loss: 1.4526<br>Epoch: 16/25... Step: 2120... Loss: 1.2638... Val Loss: 1.4473<br>Epoch: 16/25... Step: 2130... Loss: 1.2401... Val Loss: 1.4415<br>Epoch: 16/25... Step: 2140... Loss: 1.2537... Val Loss: 1.4457<br>Epoch: 16/25... Step: 2150... Loss: 1.2769... Val Loss: 1.4383<br>Epoch: 16/25... Step: 2160... Loss: 1.2489... Val Loss: 1.4389<br>Epoch: 16/25... Step: 2170... Loss: 1.2383... Val Loss: 1.4406<br>Epoch: 16/25... Step: 2180... Loss: 1.2312... Val Loss: 1.4310<br>Epoch: 16/25... Step: 2190... Loss: 1.2602... Val Loss: 1.4316<br>Epoch: 16/25... Step: 2200... Loss: 1.2356... Val Loss: 1.4425<br>Epoch: 16/25... Step: 2210... Loss: 1.2032... Val Loss: 1.4251<br>Epoch: 16/25... Step: 2220... Loss: 1.2519... Val Loss: 1.4344<br>Epoch: 17/25... Step: 2230... Loss: 1.2266... Val Loss: 1.4337<br>Epoch: 17/25... Step: 2240... Loss: 1.2406... Val Loss: 1.4349<br>Epoch: 17/25... Step: 2250... Loss: 1.2180... Val Loss: 1.4317<br>Epoch: 17/25... Step: 2260... Loss: 1.2314... Val Loss: 1.4355<br>Epoch: 17/25... Step: 2270... Loss: 1.2425... Val Loss: 1.4373<br>Epoch: 17/25... Step: 2280... Loss: 1.2526... Val Loss: 1.4278<br>Epoch: 17/25... Step: 2290... Loss: 1.2497... Val Loss: 1.4296<br>Epoch: 17/25... Step: 2300... Loss: 1.2055... Val Loss: 1.4337<br>Epoch: 17/25... Step: 2310... Loss: 1.2305... Val Loss: 1.4362<br>Epoch: 17/25... Step: 2320... Loss: 1.2166... Val Loss: 1.4241<br>Epoch: 17/25... Step: 2330... Loss: 1.2081... Val Loss: 1.4306<br>Epoch: 17/25... Step: 2340... Loss: 1.2462... Val Loss: 1.4281<br>Epoch: 17/25... Step: 2350... Loss: 1.2305... Val Loss: 1.4277<br>Epoch: 17/25... Step: 2360... Loss: 1.2385... Val Loss: 1.4270<br>Epoch: 18/25... Step: 2370... Loss: 1.2096... Val Loss: 1.4187<br>Epoch: 18/25... Step: 2380... Loss: 1.2114... Val Loss: 1.4218<br>Epoch: 18/25... Step: 2390... Loss: 1.2315... Val Loss: 1.4169<br>Epoch: 18/25... Step: 2400... Loss: 1.2432... Val Loss: 1.4329<br>Epoch: 18/25... Step: 2410... Loss: 1.2376... Val Loss: 1.4322<br>Epoch: 18/25... Step: 2420... Loss: 1.2207... Val Loss: 1.4278<br>Epoch: 18/25... Step: 2430... Loss: 1.2163... Val Loss: 1.4174<br>Epoch: 18/25... Step: 2440... Loss: 1.2131... Val Loss: 1.4222<br>Epoch: 18/25... Step: 2450... Loss: 1.2042... Val Loss: 1.4207<br>Epoch: 18/25... Step: 2460... Loss: 1.2079... Val Loss: 1.4238<br>Epoch: 18/25... Step: 2470... Loss: 1.2153... Val Loss: 1.4208<br>Epoch: 18/25... Step: 2480... Loss: 1.1968... Val Loss: 1.4229<br>Epoch: 18/25... Step: 2490... Loss: 1.1980... Val Loss: 1.4055<br>Epoch: 18/25... Step: 2500... Loss: 1.2011... Val Loss: 1.4180<br>Epoch: 19/25... Step: 2510... Loss: 1.2024... Val Loss: 1.4171<br>Epoch: 19/25... Step: 2520... Loss: 1.2278... Val Loss: 1.4091<br>Epoch: 19/25... Step: 2530... Loss: 1.2273... Val Loss: 1.4123<br>Epoch: 19/25... Step: 2540... Loss: 1.2346... Val Loss: 1.4162<br>Epoch: 19/25... Step: 2550... Loss: 1.2017... Val Loss: 1.4152<br>Epoch: 19/25... Step: 2560... Loss: 1.2127... Val Loss: 1.4294<br>Epoch: 19/25... Step: 2570... Loss: 1.2070... Val Loss: 1.4064<br>Epoch: 19/25... Step: 2580... Loss: 1.2283... Val Loss: 1.4084<br>Epoch: 19/25... Step: 2590... Loss: 1.1904... Val Loss: 1.4187<br>Epoch: 19/25... Step: 2600... Loss: 1.1897... Val Loss: 1.4105<br>Epoch: 19/25... Step: 2610... Loss: 1.2033... Val Loss: 1.4099<br>Epoch: 19/25... Step: 2620... Loss: 1.1718... Val Loss: 1.4165<br>Epoch: 19/25... Step: 2630... Loss: 1.1813... Val Loss: 1.4095<br>Epoch: 19/25... Step: 2640... Loss: 1.1980... Val Loss: 1.4104<br>Epoch: 20/25... Step: 2650... Loss: 1.2074... Val Loss: 1.4076<br>Epoch: 20/25... Step: 2660... Loss: 1.2030... Val Loss: 1.4070<br>Epoch: 20/25... Step: 2670... Loss: 1.2150... Val Loss: 1.4029<br>Epoch: 20/25... Step: 2680... Loss: 1.2049... Val Loss: 1.4089<br>Epoch: 20/25... Step: 2690... Loss: 1.1914... Val Loss: 1.4067<br>Epoch: 20/25... Step: 2700... Loss: 1.2063... Val Loss: 1.4005<br>Epoch: 20/25... Step: 2710... Loss: 1.1731... Val Loss: 1.4072<br>Epoch: 20/25... Step: 2720... Loss: 1.1674... Val Loss: 1.4124<br>Epoch: 20/25... Step: 2730... Loss: 1.1641... Val Loss: 1.4052<br>Epoch: 20/25... Step: 2740... Loss: 1.1678... Val Loss: 1.4027<br>Epoch: 20/25... Step: 2750... Loss: 1.1806... Val Loss: 1.4015<br>Epoch: 20/25... Step: 2760... Loss: 1.1636... Val Loss: 1.4095<br>Epoch: 20/25... Step: 2770... Loss: 1.2028... Val Loss: 1.4045<br>Epoch: 20/25... Step: 2780... Loss: 1.2315... Val Loss: 1.4034<br>Epoch: 21/25... Step: 2790... Loss: 1.2162... Val Loss: 1.4008<br>Epoch: 21/25... Step: 2800... Loss: 1.2212... Val Loss: 1.4046<br>Epoch: 21/25... Step: 2810... Loss: 1.2181... Val Loss: 1.4030<br>Epoch: 21/25... Step: 2820... Loss: 1.1858... Val Loss: 1.4028<br>Epoch: 21/25... Step: 2830... Loss: 1.2059... Val Loss: 1.4084<br>Epoch: 21/25... Step: 2840... Loss: 1.1486... Val Loss: 1.4002<br>Epoch: 21/25... Step: 2850... Loss: 1.1739... Val Loss: 1.4011<br>Epoch: 21/25... Step: 2860... Loss: 1.1581... Val Loss: 1.4108<br>Epoch: 21/25... Step: 2870... Loss: 1.1800... Val Loss: 1.4096<br>Epoch: 21/25... Step: 2880... Loss: 1.1703... Val Loss: 1.4025<br>Epoch: 21/25... Step: 2890... Loss: 1.1635... Val Loss: 1.3980<br>Epoch: 21/25... Step: 2900... Loss: 1.1407... Val Loss: 1.4059<br>Epoch: 21/25... Step: 2910... Loss: 1.2229... Val Loss: 1.4270<br>Epoch: 22/25... Step: 2920... Loss: 1.2787... Val Loss: 1.4140<br>Epoch: 22/25... Step: 2930... Loss: 1.2063... Val Loss: 1.4057<br>Epoch: 22/25... Step: 2940... Loss: 1.1982... Val Loss: 1.4058<br>Epoch: 22/25... Step: 2950... Loss: 1.2099... Val Loss: 1.3976<br>Epoch: 22/25... Step: 2960... Loss: 1.1736... Val Loss: 1.4033<br>Epoch: 22/25... Step: 2970... Loss: 1.1567... Val Loss: 1.4017<br>Epoch: 22/25... Step: 2980... Loss: 1.1463... Val Loss: 1.4027<br>Epoch: 22/25... Step: 2990... Loss: 1.1713... Val Loss: 1.4008<br>Epoch: 22/25... Step: 3000... Loss: 1.1607... Val Loss: 1.4054<br>Epoch: 22/25... Step: 3010... Loss: 1.1530... Val Loss: 1.4167<br>Epoch: 22/25... Step: 3020... Loss: 1.1811... Val Loss: 1.4064<br>Epoch: 22/25... Step: 3030... Loss: 1.1526... Val Loss: 1.3998<br>Epoch: 22/25... Step: 3040... Loss: 1.1459... Val Loss: 1.4009<br>Epoch: 22/25... Step: 3050... Loss: 1.1944... Val Loss: 1.3943<br>Epoch: 23/25... Step: 3060... Loss: 1.1658... Val Loss: 1.3946<br>Epoch: 23/25... Step: 3070... Loss: 1.1700... Val Loss: 1.3895<br>Epoch: 23/25... Step: 3080... Loss: 1.1574... Val Loss: 1.3829<br>Epoch: 23/25... Step: 3090... Loss: 1.1556... Val Loss: 1.3820<br>Epoch: 23/25... Step: 3100... Loss: 1.1351... Val Loss: 1.3933<br>Epoch: 23/25... Step: 3110... Loss: 1.1485... Val Loss: 1.3930<br>Epoch: 23/25... Step: 3120... Loss: 1.1766... Val Loss: 1.3924<br>Epoch: 23/25... Step: 3130... Loss: 1.1540... Val Loss: 1.3909<br>Epoch: 23/25... Step: 3140... Loss: 1.1190... Val Loss: 1.3997<br>Epoch: 23/25... Step: 3150... Loss: 1.1522... Val Loss: 1.4046<br>Epoch: 23/25... Step: 3160... Loss: 1.1726... Val Loss: 1.3917<br>Epoch: 23/25... Step: 3170... Loss: 1.1372... Val Loss: 1.3902<br>Epoch: 23/25... Step: 3180... Loss: 1.1235... Val Loss: 1.4004<br>Epoch: 23/25... Step: 3190... Loss: 1.1626... Val Loss: 1.3864<br>Epoch: 24/25... Step: 3200... Loss: 1.1751... Val Loss: 1.3870<br>Epoch: 24/25... Step: 3210... Loss: 1.1394... Val Loss: 1.3920<br>Epoch: 24/25... Step: 3220... Loss: 1.1766... Val Loss: 1.3808<br>Epoch: 24/25... Step: 3230... Loss: 1.1254... Val Loss: 1.3886<br>Epoch: 24/25... Step: 3240... Loss: 1.1046... Val Loss: 1.4000<br>Epoch: 24/25... Step: 3250... Loss: 1.1602... Val Loss: 1.4034<br>Epoch: 24/25... Step: 3260... Loss: 1.1723... Val Loss: 1.3969<br>Epoch: 24/25... Step: 3270... Loss: 1.1700... Val Loss: 1.3906<br>Epoch: 24/25... Step: 3280... Loss: 1.1739... Val Loss: 1.3903<br>Epoch: 24/25... Step: 3290... Loss: 1.1553... Val Loss: 1.3923<br>Epoch: 24/25... Step: 3300... Loss: 1.1445... Val Loss: 1.3922<br>Epoch: 24/25... Step: 3310... Loss: 1.1489... Val Loss: 1.3919<br>Epoch: 24/25... Step: 3320... Loss: 1.1202... Val Loss: 1.3979<br>Epoch: 24/25... Step: 3330... Loss: 1.1774... Val Loss: 1.3878<br>Epoch: 25/25... Step: 3340... Loss: 1.1549... Val Loss: 1.3821<br>Epoch: 25/25... Step: 3350... Loss: 1.1438... Val Loss: 1.3912<br>Epoch: 25/25... Step: 3360... Loss: 1.1271... Val Loss: 1.3932<br>Epoch: 25/25... Step: 3370... Loss: 1.1367... Val Loss: 1.3955<br>Epoch: 25/25... Step: 3380... Loss: 1.1297... Val Loss: 1.3961<br>Epoch: 25/25... Step: 3390... Loss: 1.1254... Val Loss: 1.3958<br>Epoch: 25/25... Step: 3400... Loss: 1.1462... Val Loss: 1.4001<br>Epoch: 25/25... Step: 3410... Loss: 1.1525... Val Loss: 1.3992<br>Epoch: 25/25... Step: 3420... Loss: 1.1297... Val Loss: 1.3957<br>Epoch: 25/25... Step: 3430... Loss: 1.1515... Val Loss: 1.3939<br>Epoch: 25/25... Step: 3440... Loss: 1.1242... Val Loss: 1.3945<br>Epoch: 25/25... Step: 3450... Loss: 1.1464... Val Loss: 1.4006<br>Epoch: 25/25... Step: 3460... Loss: 1.1599... Val Loss: 1.3969<br>Epoch: 25/25... Step: 3470... Loss: 1.1557... Val Loss: 1.3970

It can be seen that model probably overfits because the gap between training loss and validation loss increases

#### Saving trained model
Those commands were used in udacity workspace to save trained model. I downloaded it and loaded it here to make predictions and add a few lines to Anna Karenina :)

In [None]:
# change the name, for saving multiple files
model_name = 'rnn_1_epoch.net'

checkpoint = {'n_hidden': net.n_hidden,
              'n_layers': net.n_layers,
              'state_dict': net.state_dict(),
              'tokens': net.chars}

with open(model_name, 'wb') as f:
    torch.save(checkpoint, f)

#### Loding trained model


In [39]:
# Here we have loaded in a model that trained over 1 epoch `rnn_1_epoch.net`
with open('models/LSTM_25_epoch.net', 'rb') as f:
    checkpoint = torch.load(f, map_location='cpu')
    
loaded = CharRNN(checkpoint['tokens'], n_hidden=checkpoint['n_hidden'], n_layers=checkpoint['n_layers'])
loaded.load_state_dict(checkpoint['state_dict'])
print(loaded)

CharRNN(
  (lstm): LSTM(83, 512, num_layers=2, batch_first=True, dropout=0.5)
  (dropout): Dropout(p=0.5)
  (fc): Linear(in_features=512, out_features=83, bias=True)
)


# Sampling

Now that the model is trained, we'll want to sample from it. To sample, we pass in a character and have the network predict the next character. Then we take that character, pass it back in, and get another predicted character. Just keep doing this and you'll generate a bunch of text!

## Top K sampling

Our predictions come from a categorcial probability distribution over all the possible characters. We can make the sample text and make it more reasonable to handle (with less variables) by only considering some $K$ most probable characters. This will prevent the network from giving us completely absurd characters while allowing it to introduce some noise and randomness into the sampled text.

Typically you'll want to prime the network so you can build up a hidden state. Otherwise the network will start out generating characters at random. In general the first bunch of characters will be a little rough since it hasn't built up a long history of characters to predict from.

In [37]:
def sample(net, size, prime='The', top_k=None, cuda=False):
        
    if cuda:
        net.cuda()
    else:
        net.cpu()

    net.eval()
    
    # First off, run through the prime characters
    chars = [ch for ch in prime]
    h = net.init_hidden(1)
    for ch in prime:
        char, h = net.predict(ch, h, cuda=cuda, top_k=top_k)

    chars.append(char)
    
    # Now pass in the previous character and get a new one
    for ii in range(size):
        char, h = net.predict(chars[-1], h, cuda=cuda, top_k=top_k)
        chars.append(char)

    return ''.join(chars)

In [41]:
print(sample(loaded, 2000, prime="Anna", top_k=5, cuda=False))

Anna, there was something ever
thinking of in suppositions. Anna thanks that him was not to be
attone on the stay, but he could not help hing, and so much that
that impression that were such ansolucient. This is true it was so
life as to the conversation in this passionate. All the same as he would have a
significance were as handed a still strees, as he had been in
the previous of the conversations on the same smile of supprusting on his
cale. And she was a conviction. Though they would have long ago to drive
on that husband had been saw such intension on a position of their
mind. To their point out the staties and handsome
creatures to all the completion that is what I have been to be an
instance," said Simply to him on the crowd, and a cease of the clears, went in it,
and so that there was something to seath. The muscle of the propinest came
about at the sick man that she was saying through them,
that though to be confidently asking herself that were not simply
that the corruct, bet

Well, maybe it doesn't make sense for now but the network learned to generate real words, some punctuation, paragraphs and even citations. Probably adding more regularization (dropout) would decrease the problem of overfitting. Regarding the architecture of the model decreasing the number of hidden units might help. Also model with 1 LSTM layer might be worth checking.