# Anna KaRNNa

In this notebook, we'll build a character-wise RNN trained on Anna Karenina, one of my all-time favorite books. It'll be able to generate new text based on the text from the book.

This network is based off of Andrej Karpathy's [post on RNNs](http://karpathy.github.io/2015/05/21/rnn-effectiveness/) and [implementation in Torch](https://github.com/karpathy/char-rnn). Also, some information [here at r2rt](http://r2rt.com/recurrent-neural-networks-in-tensorflow-ii.html) and from [Sherjil Ozair](https://github.com/sherjilozair/char-rnn-tensorflow) on GitHub. Below is the general architecture of the character-wise RNN.

<img src="assets/charseq.jpeg" width="500">

In [98]:
import time
from collections import namedtuple

import numpy as np
import tensorflow as tf

First we'll load the text file and convert it into integers for our network to use. Here I'm creating a couple dictionaries to convert the characters to and from integers. Encoding the characters as integers makes it easier to use as input in the network.

In [99]:
with open('anna.txt', 'r') as f:
    text=f.read()
vocab = sorted(set(text))

vocab_to_int = {c: i for i, c in enumerate(vocab)}
int_to_vocab = dict(enumerate(vocab))
encoded = np.array([vocab_to_int[c] for c in text], dtype=np.int32)

Let's check out the first 100 characters, make sure everything is peachy. According to the [American Book Review](http://americanbookreview.org/100bestlines.asp), this is the 6th best first line of a book ever.

In [100]:
text[:100]

'Chapter 1\n\n\nHappy families are all alike; every unhappy family is unhappy in its own\nway.\n\nEverythin'

And we can see the characters encoded as integers.

In [101]:
encoded[:100]

array([31, 64, 57, 72, 76, 61, 74,  1, 16,  0,  0,  0, 36, 57, 72, 72, 81,
        1, 62, 57, 69, 65, 68, 65, 61, 75,  1, 57, 74, 61,  1, 57, 68, 68,
        1, 57, 68, 65, 67, 61, 26,  1, 61, 78, 61, 74, 81,  1, 77, 70, 64,
       57, 72, 72, 81,  1, 62, 57, 69, 65, 68, 81,  1, 65, 75,  1, 77, 70,
       64, 57, 72, 72, 81,  1, 65, 70,  1, 65, 76, 75,  1, 71, 79, 70,  0,
       79, 57, 81, 13,  0,  0, 33, 78, 61, 74, 81, 76, 64, 65, 70], dtype=int32)

Since the network is working with individual characters, it's similar to a classification problem in which we are trying to predict the next character from the previous text.  Here's how many 'classes' our network has to pick from.

In [102]:
len(vocab)

83

## Making training mini-batches

Here is where we'll make our mini-batches for training. Remember that we want our batches to be multiple sequences of some desired number of sequence steps. Considering a simple example, our batches would look like this:

<img src="assets/sequence_batching@1x.png" width=500px>


<br>
We have our text encoded as integers as one long array in `encoded`. Let's create a function that will give us an iterator for our batches. I like using [generator functions](https://jeffknupp.com/blog/2013/04/07/improve-your-python-yield-and-generators-explained/) to do this. Then we can pass `encoded` into this function and get our batch generator.

The first thing we need to do is discard some of the text so we only have completely full batches. Each batch contains $N \times M$ characters, where $N$ is the batch size (the number of sequences) and $M$ is the number of steps. Then, to get the number of batches we can make from some array `arr`, you divide the length of `arr` by the batch size. Once you know the number of batches and the batch size, you can get the total number of characters to keep.

After that, we need to split `arr` into $N$ sequences. You can do this using `arr.reshape(size)` where `size` is a tuple containing the dimensions sizes of the reshaped array. We know we want $N$ sequences (`n_seqs` below), let's make that the size of the first dimension. For the second dimension, you can use `-1` as a placeholder in the size, it'll fill up the array with the appropriate data for you. After this, you should have an array that is $N \times (M * K)$ where $K$ is the number of batches.

Now that we have this array, we can iterate through it to get our batches. The idea is each batch is a $N \times M$ window on the array. For each subsequent batch, the window moves over by `n_steps`. We also want to create both the input and target arrays. Remember that the targets are the inputs shifted over one character. You'll usually see the first input character used as the last target character, so something like this:
```python
y[:, :-1], y[:, -1] = x[:, 1:], x[:, 0]
```
where `x` is the input batch and `y` is the target batch.

The way I like to do this window is use `range` to take steps of size `n_steps` from $0$ to `arr.shape[1]`, the total number of steps in each sequence. That way, the integers you get from `range` always point to the start of a batch, and each window is `n_steps` wide.

> **Exercise:** Write the code for creating batches in the function below. The exercises in this notebook _will not be easy_. I've provided a notebook with solutions alongside this notebook. If you get stuck, checkout the solutions. The most important thing is that you don't copy and paste the code into here, **type out the solution code yourself.**

In [103]:
def get_batches(arr, n_seqs, n_steps):
    '''Create a generator that returns batches of size
       n_seqs x n_steps from arr.
       
       Arguments
       ---------
       arr: Array you want to make batches from
       n_seqs: Batch size, the number of sequences per batch
       n_steps: Number of sequence steps per batch
    '''
    # Get the number of characters per batch and number of batches we can make
    batch_size=n_seqs*n_steps
    n_batches = int(len(arr)/batch_size)
    
    
    # Keep only enough characters to make full batches
    arr = arr[:batch_size *n_batches]
    
    # Reshape into n_seqs rows
    arr = arr.reshape((n_seqs,-1))
    
    for n in range(0, arr.shape[1], n_steps):
        # The features
        x = arr[:,n: n+n_steps]
        # The targets, shifted by one
        y=np.zeros_like(x)
        y[:, :-1], y[:, -1] = x[:, 1:], x[:, 0]
        yield x, y

Now I'll make my data sets and we can check out what's going on here. Here I'm going to use a batch size of 10 and 50 sequence steps.

In [104]:
batches = get_batches(encoded, 10, 50)
x, y = next(batches)


In [105]:
print('x\n', x[:10, :10])
print('\ny\n', y[:10, :10])


x
 [[31 64 57 72 76 61 74  1 16  0]
 [ 1 57 69  1 70 71 76  1 63 71]
 [78 65 70 13  0  0  3 53 61 75]
 [70  1 60 77 74 65 70 63  1 64]
 [ 1 65 76  1 65 75 11  1 75 65]
 [ 1 37 76  1 79 57 75  0 71 70]
 [64 61 70  1 59 71 69 61  1 62]
 [26  1 58 77 76  1 70 71 79  1]
 [76  1 65 75 70  7 76 13  1 48]
 [ 1 75 57 65 60  1 76 71  1 64]]

y
 [[64 57 72 76 61 74  1 16  0  0]
 [57 69  1 70 71 76  1 63 71 65]
 [65 70 13  0  0  3 53 61 75 11]
 [ 1 60 77 74 65 70 63  1 64 65]
 [65 76  1 65 75 11  1 75 65 74]
 [37 76  1 79 57 75  0 71 70 68]
 [61 70  1 59 71 69 61  1 62 71]
 [ 1 58 77 76  1 70 71 79  1 75]
 [ 1 65 75 70  7 76 13  1 48 64]
 [75 57 65 60  1 76 71  1 64 61]]


If you implemented `get_batches` correctly, the above output should look something like 
```
x
 [[55 63 69 22  6 76 45  5 16 35]
 [ 5 69  1  5 12 52  6  5 56 52]
 [48 29 12 61 35 35  8 64 76 78]
 [12  5 24 39 45 29 12 56  5 63]
 [ 5 29  6  5 29 78 28  5 78 29]
 [ 5 13  6  5 36 69 78 35 52 12]
 [63 76 12  5 18 52  1 76  5 58]
 [34  5 73 39  6  5 12 52 36  5]
 [ 6  5 29 78 12 79  6 61  5 59]
 [ 5 78 69 29 24  5  6 52  5 63]]

y
 [[63 69 22  6 76 45  5 16 35 35]
 [69  1  5 12 52  6  5 56 52 29]
 [29 12 61 35 35  8 64 76 78 28]
 [ 5 24 39 45 29 12 56  5 63 29]
 [29  6  5 29 78 28  5 78 29 45]
 [13  6  5 36 69 78 35 52 12 43]
 [76 12  5 18 52  1 76  5 58 52]
 [ 5 73 39  6  5 12 52 36  5 78]
 [ 5 29 78 12 79  6 61  5 59 63]
 [78 69 29 24  5  6 52  5 63 76]]
 ```
 although the exact numbers will be different. Check to make sure the data is shifted over one step for `y`.

## Building the model

Below is where you'll build the network. We'll break it up into parts so it's easier to reason about each bit. Then we can connect them up into the whole network.

<img src="assets/charRNN.png" width=500px>


### Inputs

First off we'll create our input placeholders. As usual we need placeholders for the training data and the targets. We'll also create a placeholder for dropout layers called `keep_prob`. This will be a scalar, that is a 0-D tensor. To make a scalar, you create a placeholder without giving it a size.

> **Exercise:** Create the input placeholders in the function below.

In [106]:
def build_inputs(batch_size, num_steps):
    ''' Define placeholders for inputs, targets, and dropout 
    
        Arguments
        ---------
        batch_size: Batch size, number of sequences per batch
        num_steps: Number of sequence steps in a batch
        
    '''
    # Declare placeholders we'll feed into the graph
    inputs = tf.placeholder(tf.int32,shape=(batch_size,num_steps))
    targets = tf.placeholder(tf.int32,shape=(batch_size,num_steps))
    
    # Keep probability placeholder for drop out layers
    keep_prob = tf.placeholder(tf.float32)
    
    return inputs, targets, keep_prob

### LSTM Cell

Here we will create the LSTM cell we'll use in the hidden layer. We'll use this cell as a building block for the RNN. So we aren't actually defining the RNN here, just the type of cell we'll use in the hidden layer.

We first create a basic LSTM cell with

```python
lstm = tf.contrib.rnn.BasicLSTMCell(num_units)
```

where `num_units` is the number of units in the hidden layers in the cell. Then we can add dropout by wrapping it with 

```python
tf.contrib.rnn.DropoutWrapper(lstm, output_keep_prob=keep_prob)
```
You pass in a cell and it will automatically add dropout to the inputs or outputs. Finally, we can stack up the LSTM cells into layers with [`tf.contrib.rnn.MultiRNNCell`](https://www.tensorflow.org/versions/r1.0/api_docs/python/tf/contrib/rnn/MultiRNNCell). With this, you pass in a list of cells and it will send the output of one cell into the next cell. Previously with TensorFlow 1.0, you could do this

```python
tf.contrib.rnn.MultiRNNCell([cell]*num_layers)
```

This might look a little weird if you know Python well because this will create a list of the same `cell` object. However, TensorFlow 1.0 will create different weight matrices for all `cell` objects. But, starting with TensorFlow 1.1 you actually need to create new cell objects in the list. To get it to work in TensorFlow 1.1, it should look like

```python
def build_cell(num_units, keep_prob):
    lstm = tf.contrib.rnn.BasicLSTMCell(num_units)
    drop = tf.contrib.rnn.DropoutWrapper(lstm, output_keep_prob=keep_prob)
    
    return drop
    
tf.contrib.rnn.MultiRNNCell([build_cell(num_units, keep_prob) for _ in range(num_layers)])
```

Even though this is actually multiple LSTM cells stacked on each other, you can treat the multiple layers as one cell.

We also need to create an initial cell state of all zeros. This can be done like so

```python
initial_state = cell.zero_state(batch_size, tf.float32)
```

Below, we implement the `build_lstm` function to create these LSTM cells and the initial state.

In [118]:
def build_lstm(lstm_size, num_layers, batch_size, keep_prob):
    ''' Build LSTM cell.
    
        Arguments
        ---------
        keep_prob: Scalar tensor (tf.placeholder) for the dropout keep probability
        lstm_size: Size of the hidden layers in the LSTM cells
        num_layers: Number of LSTM layers
        batch_size: Batch size

    '''
    ### Build the LSTM Cell
    # Use a basic LSTM cell
    def build_cell(lstm_size, keep_prob):
         # Use a basic LSTM cell
        lstm = tf.contrib.rnn.BasicLSTMCell(lstm_size)
    
        # Add dropout to the cell outputs
        drop = tf.contrib.rnn.DropoutWrapper(lstm,output_keep_prob=keep_prob)
        return drop
    
    # Stack up multiple LSTM layers, for deep learning
    cell = tf.contrib.rnn.MultiRNNCell([build_cell(lstm_size, keep_prob) for _ in range(num_layers)])
    initial_state = cell.zero_state(batch_size, tf.float32)
    
    
    return cell, initial_state

### RNN Output

Here we'll create the output layer. We need to connect the output of the RNN cells to a full connected layer with a softmax output. The softmax output gives us a probability distribution we can use to predict the next character, so we want this layer to have size $C$, the number of classes/characters we have in our text.

If our input has batch size $N$, number of steps $M$, and the hidden layer has $L$ hidden units, then the output is a 3D tensor with size $N \times M \times L$. The output of each LSTM cell has size $L$, we have $M$ of them, one for each sequence step, and we have $N$ sequences. So the total size is $N \times M \times L$. 

We are using the same fully connected layer, the same weights, for each of the outputs. Then, to make things easier, we should reshape the outputs into a 2D tensor with shape $(M * N) \times L$. That is, one row for each sequence and step, where the values of each row are the output from the LSTM cells. We get the LSTM output as a list, `lstm_output`. First we need to concatenate this whole list into one array with [`tf.concat`](https://www.tensorflow.org/api_docs/python/tf/concat). Then, reshape it (with `tf.reshape`) to size $(M * N) \times L$.

One we have the outputs reshaped, we can do the matrix multiplication with the weights. We need to wrap the weight and bias variables in a variable scope with `tf.variable_scope(scope_name)` because there are weights being created in the LSTM cells. TensorFlow will throw an error if the weights created here have the same names as the weights created in the LSTM cells, which they will be default. To avoid this, we wrap the variables in a variable scope so we can give them unique names.

> **Exercise:** Implement the output layer in the function below.

In [119]:
def build_output(lstm_output, in_size, out_size):
    ''' Build a softmax layer, return the softmax output and logits.
    
        Arguments
        ---------
        
        lstm_output: List of output tensors from the LSTM layer
        in_size: Size of the input tensor, for example, size of the LSTM cells
        out_size: Size of this softmax layer
    
    '''

    # Reshape output so it's a bunch of rows, one row for each step for each sequence.
    # Concatenate lstm_output over axis 1 (the columns)
    seq_output = tf.concat(lstm_output,1)
    # Reshape seq_output to a 2D tensor with lstm_size columns
    x = tf.reshape(seq_output,[-1,in_size])
    
    # Connect the RNN outputs to a softmax layer
    with tf.variable_scope('softmax'):
        # Create the weight and bias variables here
        softmax_w = tf.Variable(tf.truncated_normal((in_size,out_size),stddev=0.1))
        softmax_b = tf.Variable(tf.zeros(out_size))
    
    # Since output is a bunch of rows of RNN cell outputs, logits will be a bunch
    # of rows of logit outputs, one for each step and sequence
    logits = tf.matmul(x, softmax_w)+softmax_b
    
    # Use softmax to get the probabilities for predicted characters
    out = tf.nn.softmax(logits)
    
    return out, logits

### Training loss

Next up is the training loss. We get the logits and targets and calculate the softmax cross-entropy loss. First we need to one-hot encode the targets, we're getting them as encoded characters. Then, reshape the one-hot targets so it's a 2D tensor with size $(M*N) \times C$ where $C$ is the number of classes/characters we have. Remember that we reshaped the LSTM outputs and ran them through a fully connected layer with $C$ units. So our logits will also have size $(M*N) \times C$.

Then we run the logits and targets through `tf.nn.softmax_cross_entropy_with_logits` and find the mean to get the loss.

>**Exercise:** Implement the loss calculation in the function below.

In [120]:
def build_loss(logits, targets, lstm_size, num_classes):
    ''' Calculate the loss from the logits and the targets.
    
        Arguments
        ---------
        logits: Logits from final fully connected layer
        targets: Targets for supervised learning
        lstm_size: Number of LSTM hidden units
        num_classes: Number of classes in targets
        
    '''
    
    # One-hot encode targets and reshape to match logits, one row per sequence per step
    y_one_hot = tf.one_hot(targets,num_classes)
    y_reshaped =  tf.reshape(y_one_hot, logits.get_shape())
    
    # Softmax cross entropy loss
    loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits,labels=y_reshaped))
    
    return loss

### Optimizer

Here we build the optimizer. Normal RNNs have have issues gradients exploding and disappearing. LSTMs fix the disappearance problem, but the gradients can still grow without bound. To fix this, we can clip the gradients above some threshold. That is, if a gradient is larger than that threshold, we set it to the threshold. This will ensure the gradients never grow overly large. Then we use an AdamOptimizer for the learning step.

In [121]:
def build_optimizer(loss, learning_rate, grad_clip):
    ''' Build optmizer for training, using gradient clipping.
    
        Arguments:
        loss: Network loss
        learning_rate: Learning rate for optimizer
    
    '''
    
    # Optimizer for training, using gradient clipping to control exploding gradients
    tvars = tf.trainable_variables()
    grads, _ = tf.clip_by_global_norm(tf.gradients(loss, tvars), grad_clip)
    train_op = tf.train.AdamOptimizer(learning_rate)
    optimizer = train_op.apply_gradients(zip(grads, tvars))
    
    return optimizer

### Build the network

Now we can put all the pieces together and build a class for the network. To actually run data through the LSTM cells, we will use [`tf.nn.dynamic_rnn`](https://www.tensorflow.org/versions/r1.0/api_docs/python/tf/nn/dynamic_rnn). This function will pass the hidden and cell states across LSTM cells appropriately for us. It returns the outputs for each LSTM cell at each step for each sequence in the mini-batch. It also gives us the final LSTM state. We want to save this state as `final_state` so we can pass it to the first LSTM cell in the the next mini-batch run. For `tf.nn.dynamic_rnn`, we pass in the cell and initial state we get from `build_lstm`, as well as our input sequences. Also, we need to one-hot encode the inputs before going into the RNN. 

> **Exercise:** Use the functions you've implemented previously and `tf.nn.dynamic_rnn` to build the network.

In [122]:
class CharRNN:
    
    def __init__(self, num_classes, batch_size=64, num_steps=50, 
                       lstm_size=128, num_layers=2, learning_rate=0.001, 
                       grad_clip=5, sampling=False):
    
        # When we're using this network for sampling later, we'll be passing in
        # one character at a time, so providing an option for that
        if sampling == True:
            batch_size, num_steps = 1, 1
        else:
            batch_size, num_steps = batch_size, num_steps

        tf.reset_default_graph()
        
        # Build the input placeholder tensors
        self.inputs, self.targets, self.keep_prob = build_inputs(batch_size,num_steps)

        # Build the LSTM cell
        cell, self.initial_state = build_lstm(lstm_size, num_layers, batch_size, self.keep_prob)

        ### Run the data through the RNN layers
        # First, one-hot encode the input tokens
        x_one_hot = tf.one_hot(self.inputs,num_classes)
        
        # Run each sequence step through the RNN with tf.nn.dynamic_rnn 
        outputs, state =tf.nn.dynamic_rnn(cell,x_one_hot,initial_state=self.initial_state)
        self.final_state = state
        
        # Get softmax predictions and logits
        self.prediction, self.logits = build_output(outputs,lstm_size,num_classes)
        
        # Loss and optimizer (with gradient clipping)
        self.loss =  build_loss(self.logits,self.targets,lstm_size,num_classes)
        self.optimizer = build_optimizer(self.loss,learning_rate,grad_clip)

## Hyperparameters

Here are the hyperparameters for the network.

* `batch_size` - Number of sequences running through the network in one pass.
* `num_steps` - Number of characters in the sequence the network is trained on. Larger is better typically, the network will learn more long range dependencies. But it takes longer to train. 100 is typically a good number here.
* `lstm_size` - The number of units in the hidden layers.
* `num_layers` - Number of hidden LSTM layers to use
* `learning_rate` - Learning rate for training
* `keep_prob` - The dropout keep probability when training. If you're network is overfitting, try decreasing this.

Here's some good advice from Andrej Karpathy on training the network. I'm going to copy it in here for your benefit, but also link to [where it originally came from](https://github.com/karpathy/char-rnn#tips-and-tricks).

> ## Tips and Tricks

>### Monitoring Validation Loss vs. Training Loss
>If you're somewhat new to Machine Learning or Neural Networks it can take a bit of expertise to get good models. The most important quantity to keep track of is the difference between your training loss (printed during training) and the validation loss (printed once in a while when the RNN is run on the validation data (by default every 1000 iterations)). In particular:

> - If your training loss is much lower than validation loss then this means the network might be **overfitting**. Solutions to this are to decrease your network size, or to increase dropout. For example you could try dropout of 0.5 and so on.
> - If your training/validation loss are about equal then your model is **underfitting**. Increase the size of your model (either number of layers or the raw number of neurons per layer)

> ### Approximate number of parameters

> The two most important parameters that control the model are `lstm_size` and `num_layers`. I would advise that you always use `num_layers` of either 2/3. The `lstm_size` can be adjusted based on how much data you have. The two important quantities to keep track of here are:

> - The number of parameters in your model. This is printed when you start training.
> - The size of your dataset. 1MB file is approximately 1 million characters.

>These two should be about the same order of magnitude. It's a little tricky to tell. Here are some examples:

> - I have a 100MB dataset and I'm using the default parameter settings (which currently print 150K parameters). My data size is significantly larger (100 mil >> 0.15 mil), so I expect to heavily underfit. I am thinking I can comfortably afford to make `lstm_size` larger.
> - I have a 10MB dataset and running a 10 million parameter model. I'm slightly nervous and I'm carefully monitoring my validation loss. If it's larger than my training loss then I may want to try to increase dropout a bit and see if that helps the validation loss.

> ### Best models strategy

>The winning strategy to obtaining very good models (if you have the compute time) is to always err on making the network larger (as large as you're willing to wait for it to compute) and then try different dropout values (between 0,1). Whatever model has the best validation performance (the loss, written in the checkpoint filename, low is good) is the one you should use in the end.

>It is very common in deep learning to run many different models with many different hyperparameter settings, and in the end take whatever checkpoint gave the best validation performance.

>By the way, the size of your training and validation splits are also parameters. Make sure you have a decent amount of data in your validation set or otherwise the validation performance will be noisy and not very informative.

In [123]:
batch_size = 10         # Sequences per batch
num_steps = 50          # Number of sequence steps per batch
lstm_size = 128         # Size of hidden layers in LSTMs
num_layers = 2          # Number of LSTM layers
learning_rate = 0.01    # Learning rate
keep_prob = 0.5         # Dropout keep probability

## Time for training

This is typical training code, passing inputs and targets into the network, then running the optimizer. Here we also get back the final LSTM state for the mini-batch. Then, we pass that state back into the network so the next batch can continue the state from the previous batch. And every so often (set by `save_every_n`) I save a checkpoint.

Here I'm saving checkpoints with the format

`i{iteration number}_l{# hidden layer units}.ckpt`

> **Exercise:** Set the hyperparameters above to train the network. Watch the training loss, it should be consistently dropping. Also, I highly advise running this on a GPU.

In [None]:
epochs = 20
# Save every N iterations
save_every_n = 200

model = CharRNN(len(vocab), batch_size=batch_size, num_steps=num_steps,
                lstm_size=lstm_size, num_layers=num_layers, 
                learning_rate=learning_rate)

saver = tf.train.Saver(max_to_keep=100)
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    
    # Use the line below to load a checkpoint and resume training
    #saver.restore(sess, 'checkpoints/______.ckpt')
    counter = 0
    for e in range(epochs):
        # Train network
        new_state = sess.run(model.initial_state)
        loss = 0
        for x, y in get_batches(encoded, batch_size, num_steps):
            counter += 1
            start = time.time()
            feed = {model.inputs: x,
                    model.targets: y,
                    model.keep_prob: keep_prob,
                    model.initial_state: new_state}
            batch_loss, new_state, _ = sess.run([model.loss, 
                                                 model.final_state, 
                                                 model.optimizer], 
                                                 feed_dict=feed)
            
            end = time.time()
            print('Epoch: {}/{}... '.format(e+1, epochs),
                  'Training Step: {}... '.format(counter),
                  'Training loss: {:.4f}... '.format(batch_loss),
                  '{:.4f} sec/batch'.format((end-start)))
        
            if (counter % save_every_n == 0):
                saver.save(sess, "checkpoints/i{}_l{}.ckpt".format(counter, lstm_size))
    
    saver.save(sess, "checkpoints/i{}_l{}.ckpt".format(counter, lstm_size))

Epoch: 1/20...  Training Step: 1...  Training loss: 4.4182...  0.4473 sec/batch
Epoch: 1/20...  Training Step: 2...  Training loss: 4.3191...  0.3569 sec/batch
Epoch: 1/20...  Training Step: 3...  Training loss: 3.5670...  0.2751 sec/batch
Epoch: 1/20...  Training Step: 4...  Training loss: 3.4060...  0.2871 sec/batch
Epoch: 1/20...  Training Step: 5...  Training loss: 3.2374...  0.3197 sec/batch
Epoch: 1/20...  Training Step: 6...  Training loss: 3.3240...  0.2544 sec/batch
Epoch: 1/20...  Training Step: 7...  Training loss: 3.2064...  0.2217 sec/batch
Epoch: 1/20...  Training Step: 8...  Training loss: 3.2148...  0.2753 sec/batch
Epoch: 1/20...  Training Step: 9...  Training loss: 3.2799...  0.2578 sec/batch
Epoch: 1/20...  Training Step: 10...  Training loss: 3.2422...  0.3073 sec/batch
Epoch: 1/20...  Training Step: 11...  Training loss: 3.2458...  0.1913 sec/batch
Epoch: 1/20...  Training Step: 12...  Training loss: 3.1641...  0.3403 sec/batch
Epoch: 1/20...  Training Step: 13... 

Epoch: 1/20...  Training Step: 103...  Training loss: 2.7486...  0.1711 sec/batch
Epoch: 1/20...  Training Step: 104...  Training loss: 2.5126...  0.2899 sec/batch
Epoch: 1/20...  Training Step: 105...  Training loss: 2.7361...  0.2569 sec/batch
Epoch: 1/20...  Training Step: 106...  Training loss: 2.8731...  0.3064 sec/batch
Epoch: 1/20...  Training Step: 107...  Training loss: 2.7336...  0.2923 sec/batch
Epoch: 1/20...  Training Step: 108...  Training loss: 2.7124...  0.2261 sec/batch
Epoch: 1/20...  Training Step: 109...  Training loss: 2.6918...  0.2408 sec/batch
Epoch: 1/20...  Training Step: 110...  Training loss: 2.7697...  0.3028 sec/batch
Epoch: 1/20...  Training Step: 111...  Training loss: 2.6581...  0.2487 sec/batch
Epoch: 1/20...  Training Step: 112...  Training loss: 2.6954...  0.2424 sec/batch
Epoch: 1/20...  Training Step: 113...  Training loss: 2.6593...  0.1551 sec/batch
Epoch: 1/20...  Training Step: 114...  Training loss: 2.6196...  0.1816 sec/batch
Epoch: 1/20...  

Epoch: 1/20...  Training Step: 203...  Training loss: 2.4954...  0.3220 sec/batch
Epoch: 1/20...  Training Step: 204...  Training loss: 2.5090...  0.2889 sec/batch
Epoch: 1/20...  Training Step: 205...  Training loss: 2.4512...  0.3150 sec/batch
Epoch: 1/20...  Training Step: 206...  Training loss: 2.4502...  0.1847 sec/batch
Epoch: 1/20...  Training Step: 207...  Training loss: 2.5050...  0.3106 sec/batch
Epoch: 1/20...  Training Step: 208...  Training loss: 2.5034...  0.2812 sec/batch
Epoch: 1/20...  Training Step: 209...  Training loss: 2.4269...  0.2622 sec/batch
Epoch: 1/20...  Training Step: 210...  Training loss: 2.3720...  0.2477 sec/batch
Epoch: 1/20...  Training Step: 211...  Training loss: 2.4021...  0.2687 sec/batch
Epoch: 1/20...  Training Step: 212...  Training loss: 2.4786...  0.2511 sec/batch
Epoch: 1/20...  Training Step: 213...  Training loss: 2.4652...  0.2710 sec/batch
Epoch: 1/20...  Training Step: 214...  Training loss: 2.4755...  0.2324 sec/batch
Epoch: 1/20...  

Epoch: 1/20...  Training Step: 303...  Training loss: 2.2525...  0.3299 sec/batch
Epoch: 1/20...  Training Step: 304...  Training loss: 2.3097...  0.2654 sec/batch
Epoch: 1/20...  Training Step: 305...  Training loss: 2.3816...  0.3084 sec/batch
Epoch: 1/20...  Training Step: 306...  Training loss: 2.2215...  0.1926 sec/batch
Epoch: 1/20...  Training Step: 307...  Training loss: 2.3340...  0.2003 sec/batch
Epoch: 1/20...  Training Step: 308...  Training loss: 2.3332...  0.2561 sec/batch
Epoch: 1/20...  Training Step: 309...  Training loss: 2.3365...  0.1855 sec/batch
Epoch: 1/20...  Training Step: 310...  Training loss: 2.3722...  0.2489 sec/batch
Epoch: 1/20...  Training Step: 311...  Training loss: 2.3959...  0.2887 sec/batch
Epoch: 1/20...  Training Step: 312...  Training loss: 2.2810...  0.2700 sec/batch
Epoch: 1/20...  Training Step: 313...  Training loss: 2.2488...  0.2723 sec/batch
Epoch: 1/20...  Training Step: 314...  Training loss: 2.2825...  0.2857 sec/batch
Epoch: 1/20...  

Epoch: 1/20...  Training Step: 403...  Training loss: 2.2607...  0.2201 sec/batch
Epoch: 1/20...  Training Step: 404...  Training loss: 2.3022...  0.1917 sec/batch
Epoch: 1/20...  Training Step: 405...  Training loss: 2.2569...  0.3376 sec/batch
Epoch: 1/20...  Training Step: 406...  Training loss: 2.2279...  0.3321 sec/batch
Epoch: 1/20...  Training Step: 407...  Training loss: 2.2933...  0.2930 sec/batch
Epoch: 1/20...  Training Step: 408...  Training loss: 2.2779...  0.2628 sec/batch
Epoch: 1/20...  Training Step: 409...  Training loss: 2.2386...  0.3098 sec/batch
Epoch: 1/20...  Training Step: 410...  Training loss: 2.2257...  0.2176 sec/batch
Epoch: 1/20...  Training Step: 411...  Training loss: 2.2532...  0.2673 sec/batch
Epoch: 1/20...  Training Step: 412...  Training loss: 2.2142...  0.3082 sec/batch
Epoch: 1/20...  Training Step: 413...  Training loss: 2.3600...  0.2410 sec/batch
Epoch: 1/20...  Training Step: 414...  Training loss: 2.1643...  0.2620 sec/batch
Epoch: 1/20...  

Epoch: 1/20...  Training Step: 503...  Training loss: 2.1980...  0.2283 sec/batch
Epoch: 1/20...  Training Step: 504...  Training loss: 2.0369...  0.2960 sec/batch
Epoch: 1/20...  Training Step: 505...  Training loss: 2.2350...  0.2187 sec/batch
Epoch: 1/20...  Training Step: 506...  Training loss: 2.1819...  0.2251 sec/batch
Epoch: 1/20...  Training Step: 507...  Training loss: 2.1598...  0.2669 sec/batch
Epoch: 1/20...  Training Step: 508...  Training loss: 2.2164...  0.2895 sec/batch
Epoch: 1/20...  Training Step: 509...  Training loss: 2.1367...  0.2491 sec/batch
Epoch: 1/20...  Training Step: 510...  Training loss: 2.1853...  0.2815 sec/batch
Epoch: 1/20...  Training Step: 511...  Training loss: 2.1936...  0.2843 sec/batch
Epoch: 1/20...  Training Step: 512...  Training loss: 2.4033...  0.3330 sec/batch
Epoch: 1/20...  Training Step: 513...  Training loss: 2.1944...  0.2155 sec/batch
Epoch: 1/20...  Training Step: 514...  Training loss: 2.0957...  0.2886 sec/batch
Epoch: 1/20...  

Epoch: 1/20...  Training Step: 603...  Training loss: 2.2210...  0.3038 sec/batch
Epoch: 1/20...  Training Step: 604...  Training loss: 2.0535...  0.1162 sec/batch
Epoch: 1/20...  Training Step: 605...  Training loss: 2.0745...  0.2617 sec/batch
Epoch: 1/20...  Training Step: 606...  Training loss: 2.1008...  0.2720 sec/batch
Epoch: 1/20...  Training Step: 607...  Training loss: 2.2414...  0.2845 sec/batch
Epoch: 1/20...  Training Step: 608...  Training loss: 2.1304...  0.2800 sec/batch
Epoch: 1/20...  Training Step: 609...  Training loss: 2.3313...  0.2644 sec/batch
Epoch: 1/20...  Training Step: 610...  Training loss: 2.1650...  0.2889 sec/batch
Epoch: 1/20...  Training Step: 611...  Training loss: 2.2522...  0.2218 sec/batch
Epoch: 1/20...  Training Step: 612...  Training loss: 2.1360...  0.2639 sec/batch
Epoch: 1/20...  Training Step: 613...  Training loss: 2.1476...  0.2883 sec/batch
Epoch: 1/20...  Training Step: 614...  Training loss: 2.2176...  0.2601 sec/batch
Epoch: 1/20...  

Epoch: 1/20...  Training Step: 703...  Training loss: 2.1776...  0.2723 sec/batch
Epoch: 1/20...  Training Step: 704...  Training loss: 2.1367...  0.3050 sec/batch
Epoch: 1/20...  Training Step: 705...  Training loss: 2.1682...  0.1999 sec/batch
Epoch: 1/20...  Training Step: 706...  Training loss: 2.1791...  0.2407 sec/batch
Epoch: 1/20...  Training Step: 707...  Training loss: 2.1321...  0.2788 sec/batch
Epoch: 1/20...  Training Step: 708...  Training loss: 2.0249...  0.2162 sec/batch
Epoch: 1/20...  Training Step: 709...  Training loss: 2.1533...  0.2194 sec/batch
Epoch: 1/20...  Training Step: 710...  Training loss: 2.1284...  0.3165 sec/batch
Epoch: 1/20...  Training Step: 711...  Training loss: 2.0515...  0.2250 sec/batch
Epoch: 1/20...  Training Step: 712...  Training loss: 2.1547...  0.1972 sec/batch
Epoch: 1/20...  Training Step: 713...  Training loss: 2.1503...  0.2151 sec/batch
Epoch: 1/20...  Training Step: 714...  Training loss: 2.1030...  0.2979 sec/batch
Epoch: 1/20...  

Epoch: 1/20...  Training Step: 803...  Training loss: 2.0860...  0.2622 sec/batch
Epoch: 1/20...  Training Step: 804...  Training loss: 2.1569...  0.2247 sec/batch
Epoch: 1/20...  Training Step: 805...  Training loss: 2.1448...  0.2249 sec/batch
Epoch: 1/20...  Training Step: 806...  Training loss: 2.1432...  0.2977 sec/batch
Epoch: 1/20...  Training Step: 807...  Training loss: 2.0223...  0.2528 sec/batch
Epoch: 1/20...  Training Step: 808...  Training loss: 1.9629...  0.3344 sec/batch
Epoch: 1/20...  Training Step: 809...  Training loss: 2.0663...  0.1980 sec/batch
Epoch: 1/20...  Training Step: 810...  Training loss: 2.0421...  0.1750 sec/batch
Epoch: 1/20...  Training Step: 811...  Training loss: 2.0937...  0.2334 sec/batch
Epoch: 1/20...  Training Step: 812...  Training loss: 2.1179...  0.2921 sec/batch
Epoch: 1/20...  Training Step: 813...  Training loss: 2.0386...  0.2941 sec/batch
Epoch: 1/20...  Training Step: 814...  Training loss: 2.1233...  0.2416 sec/batch
Epoch: 1/20...  

Epoch: 1/20...  Training Step: 903...  Training loss: 2.0298...  0.3028 sec/batch
Epoch: 1/20...  Training Step: 904...  Training loss: 2.0791...  0.2376 sec/batch
Epoch: 1/20...  Training Step: 905...  Training loss: 2.0769...  0.2469 sec/batch
Epoch: 1/20...  Training Step: 906...  Training loss: 2.1106...  0.2848 sec/batch
Epoch: 1/20...  Training Step: 907...  Training loss: 2.1554...  0.2591 sec/batch
Epoch: 1/20...  Training Step: 908...  Training loss: 2.1380...  0.2553 sec/batch
Epoch: 1/20...  Training Step: 909...  Training loss: 2.1297...  0.1501 sec/batch
Epoch: 1/20...  Training Step: 910...  Training loss: 2.1197...  0.1741 sec/batch
Epoch: 1/20...  Training Step: 911...  Training loss: 1.9428...  0.2527 sec/batch
Epoch: 1/20...  Training Step: 912...  Training loss: 2.0138...  0.1928 sec/batch
Epoch: 1/20...  Training Step: 913...  Training loss: 2.1019...  0.2271 sec/batch
Epoch: 1/20...  Training Step: 914...  Training loss: 2.0862...  0.1622 sec/batch
Epoch: 1/20...  

Epoch: 1/20...  Training Step: 1003...  Training loss: 2.0487...  0.2253 sec/batch
Epoch: 1/20...  Training Step: 1004...  Training loss: 1.9587...  0.2040 sec/batch
Epoch: 1/20...  Training Step: 1005...  Training loss: 1.9242...  0.1880 sec/batch
Epoch: 1/20...  Training Step: 1006...  Training loss: 2.0018...  0.2961 sec/batch
Epoch: 1/20...  Training Step: 1007...  Training loss: 2.0404...  0.2672 sec/batch
Epoch: 1/20...  Training Step: 1008...  Training loss: 2.1243...  0.2381 sec/batch
Epoch: 1/20...  Training Step: 1009...  Training loss: 2.1375...  0.2200 sec/batch
Epoch: 1/20...  Training Step: 1010...  Training loss: 2.0731...  0.2467 sec/batch
Epoch: 1/20...  Training Step: 1011...  Training loss: 2.0165...  0.1847 sec/batch
Epoch: 1/20...  Training Step: 1012...  Training loss: 2.0927...  0.2241 sec/batch
Epoch: 1/20...  Training Step: 1013...  Training loss: 2.0364...  0.3049 sec/batch
Epoch: 1/20...  Training Step: 1014...  Training loss: 2.1821...  0.2353 sec/batch
Epoc

Epoch: 1/20...  Training Step: 1102...  Training loss: 1.8938...  0.2384 sec/batch
Epoch: 1/20...  Training Step: 1103...  Training loss: 1.8354...  0.2253 sec/batch
Epoch: 1/20...  Training Step: 1104...  Training loss: 2.1176...  0.2736 sec/batch
Epoch: 1/20...  Training Step: 1105...  Training loss: 1.9994...  0.2924 sec/batch
Epoch: 1/20...  Training Step: 1106...  Training loss: 1.9884...  0.2367 sec/batch
Epoch: 1/20...  Training Step: 1107...  Training loss: 2.0283...  0.2930 sec/batch
Epoch: 1/20...  Training Step: 1108...  Training loss: 2.0469...  0.2503 sec/batch
Epoch: 1/20...  Training Step: 1109...  Training loss: 1.9683...  0.3037 sec/batch
Epoch: 1/20...  Training Step: 1110...  Training loss: 2.0464...  0.3005 sec/batch
Epoch: 1/20...  Training Step: 1111...  Training loss: 2.0247...  0.2446 sec/batch
Epoch: 1/20...  Training Step: 1112...  Training loss: 2.0113...  0.2328 sec/batch
Epoch: 1/20...  Training Step: 1113...  Training loss: 1.9168...  0.2355 sec/batch
Epoc

Epoch: 1/20...  Training Step: 1201...  Training loss: 1.9100...  0.1476 sec/batch
Epoch: 1/20...  Training Step: 1202...  Training loss: 2.0793...  0.3265 sec/batch
Epoch: 1/20...  Training Step: 1203...  Training loss: 1.9462...  0.2764 sec/batch
Epoch: 1/20...  Training Step: 1204...  Training loss: 2.0201...  0.2733 sec/batch
Epoch: 1/20...  Training Step: 1205...  Training loss: 2.0212...  0.2867 sec/batch
Epoch: 1/20...  Training Step: 1206...  Training loss: 1.8910...  0.2099 sec/batch
Epoch: 1/20...  Training Step: 1207...  Training loss: 1.9404...  0.2591 sec/batch
Epoch: 1/20...  Training Step: 1208...  Training loss: 2.0517...  0.2688 sec/batch
Epoch: 1/20...  Training Step: 1209...  Training loss: 2.1020...  0.2506 sec/batch
Epoch: 1/20...  Training Step: 1210...  Training loss: 2.0389...  0.2428 sec/batch
Epoch: 1/20...  Training Step: 1211...  Training loss: 1.9469...  0.2507 sec/batch
Epoch: 1/20...  Training Step: 1212...  Training loss: 2.0510...  0.2701 sec/batch
Epoc

Epoch: 1/20...  Training Step: 1300...  Training loss: 1.9849...  0.2321 sec/batch
Epoch: 1/20...  Training Step: 1301...  Training loss: 1.8990...  0.2712 sec/batch
Epoch: 1/20...  Training Step: 1302...  Training loss: 2.0229...  0.2726 sec/batch
Epoch: 1/20...  Training Step: 1303...  Training loss: 1.9595...  0.2763 sec/batch
Epoch: 1/20...  Training Step: 1304...  Training loss: 2.0483...  0.2439 sec/batch
Epoch: 1/20...  Training Step: 1305...  Training loss: 2.0427...  0.2701 sec/batch
Epoch: 1/20...  Training Step: 1306...  Training loss: 2.1650...  0.2119 sec/batch
Epoch: 1/20...  Training Step: 1307...  Training loss: 2.2334...  0.2137 sec/batch
Epoch: 1/20...  Training Step: 1308...  Training loss: 2.0268...  0.3100 sec/batch
Epoch: 1/20...  Training Step: 1309...  Training loss: 2.0482...  0.2419 sec/batch
Epoch: 1/20...  Training Step: 1310...  Training loss: 2.0639...  0.2744 sec/batch
Epoch: 1/20...  Training Step: 1311...  Training loss: 1.9481...  0.2688 sec/batch
Epoc

Epoch: 1/20...  Training Step: 1399...  Training loss: 2.0168...  0.2242 sec/batch
Epoch: 1/20...  Training Step: 1400...  Training loss: 2.0925...  0.2956 sec/batch
Epoch: 1/20...  Training Step: 1401...  Training loss: 1.8256...  0.1755 sec/batch
Epoch: 1/20...  Training Step: 1402...  Training loss: 2.0201...  0.3343 sec/batch
Epoch: 1/20...  Training Step: 1403...  Training loss: 1.9332...  0.2458 sec/batch
Epoch: 1/20...  Training Step: 1404...  Training loss: 2.0708...  0.2956 sec/batch
Epoch: 1/20...  Training Step: 1405...  Training loss: 2.0919...  0.2791 sec/batch
Epoch: 1/20...  Training Step: 1406...  Training loss: 1.9411...  0.3031 sec/batch
Epoch: 1/20...  Training Step: 1407...  Training loss: 1.9846...  0.2466 sec/batch
Epoch: 1/20...  Training Step: 1408...  Training loss: 2.0215...  0.3124 sec/batch
Epoch: 1/20...  Training Step: 1409...  Training loss: 1.9776...  0.2695 sec/batch
Epoch: 1/20...  Training Step: 1410...  Training loss: 1.9302...  0.2804 sec/batch
Epoc

Epoch: 1/20...  Training Step: 1498...  Training loss: 1.9420...  0.3165 sec/batch
Epoch: 1/20...  Training Step: 1499...  Training loss: 1.9858...  0.2585 sec/batch
Epoch: 1/20...  Training Step: 1500...  Training loss: 1.9649...  0.4759 sec/batch
Epoch: 1/20...  Training Step: 1501...  Training loss: 1.9584...  0.2353 sec/batch
Epoch: 1/20...  Training Step: 1502...  Training loss: 1.8971...  0.3171 sec/batch
Epoch: 1/20...  Training Step: 1503...  Training loss: 2.0726...  0.2710 sec/batch
Epoch: 1/20...  Training Step: 1504...  Training loss: 2.1564...  0.2440 sec/batch
Epoch: 1/20...  Training Step: 1505...  Training loss: 1.9125...  0.2337 sec/batch
Epoch: 1/20...  Training Step: 1506...  Training loss: 1.9207...  0.2699 sec/batch
Epoch: 1/20...  Training Step: 1507...  Training loss: 1.9558...  0.3045 sec/batch
Epoch: 1/20...  Training Step: 1508...  Training loss: 1.8967...  0.2232 sec/batch
Epoch: 1/20...  Training Step: 1509...  Training loss: 2.0067...  0.2917 sec/batch
Epoc

Epoch: 1/20...  Training Step: 1597...  Training loss: 1.8691...  0.2479 sec/batch
Epoch: 1/20...  Training Step: 1598...  Training loss: 2.0165...  0.2890 sec/batch
Epoch: 1/20...  Training Step: 1599...  Training loss: 1.9240...  0.2688 sec/batch
Epoch: 1/20...  Training Step: 1600...  Training loss: 1.9969...  0.2893 sec/batch
Epoch: 1/20...  Training Step: 1601...  Training loss: 2.0022...  0.2064 sec/batch
Epoch: 1/20...  Training Step: 1602...  Training loss: 1.9951...  0.2631 sec/batch
Epoch: 1/20...  Training Step: 1603...  Training loss: 2.0605...  0.3313 sec/batch
Epoch: 1/20...  Training Step: 1604...  Training loss: 2.0404...  0.2213 sec/batch
Epoch: 1/20...  Training Step: 1605...  Training loss: 2.0576...  0.3407 sec/batch
Epoch: 1/20...  Training Step: 1606...  Training loss: 1.9400...  0.2544 sec/batch
Epoch: 1/20...  Training Step: 1607...  Training loss: 2.1098...  0.2278 sec/batch
Epoch: 1/20...  Training Step: 1608...  Training loss: 2.0937...  0.2252 sec/batch
Epoc

Epoch: 1/20...  Training Step: 1696...  Training loss: 1.9161...  0.2630 sec/batch
Epoch: 1/20...  Training Step: 1697...  Training loss: 1.9177...  0.2025 sec/batch
Epoch: 1/20...  Training Step: 1698...  Training loss: 1.9839...  0.2128 sec/batch
Epoch: 1/20...  Training Step: 1699...  Training loss: 2.0311...  0.2480 sec/batch
Epoch: 1/20...  Training Step: 1700...  Training loss: 2.0138...  0.2328 sec/batch
Epoch: 1/20...  Training Step: 1701...  Training loss: 1.7716...  0.2759 sec/batch
Epoch: 1/20...  Training Step: 1702...  Training loss: 1.9561...  0.1828 sec/batch
Epoch: 1/20...  Training Step: 1703...  Training loss: 1.9554...  0.2547 sec/batch
Epoch: 1/20...  Training Step: 1704...  Training loss: 1.9772...  0.2147 sec/batch
Epoch: 1/20...  Training Step: 1705...  Training loss: 1.9850...  0.2353 sec/batch
Epoch: 1/20...  Training Step: 1706...  Training loss: 2.0478...  0.2124 sec/batch
Epoch: 1/20...  Training Step: 1707...  Training loss: 2.0029...  0.2313 sec/batch
Epoc

Epoch: 1/20...  Training Step: 1795...  Training loss: 1.7213...  0.2378 sec/batch
Epoch: 1/20...  Training Step: 1796...  Training loss: 1.9291...  0.2596 sec/batch
Epoch: 1/20...  Training Step: 1797...  Training loss: 1.7290...  0.2769 sec/batch
Epoch: 1/20...  Training Step: 1798...  Training loss: 1.8616...  0.2632 sec/batch
Epoch: 1/20...  Training Step: 1799...  Training loss: 1.8993...  0.2500 sec/batch
Epoch: 1/20...  Training Step: 1800...  Training loss: 1.9724...  0.2226 sec/batch
Epoch: 1/20...  Training Step: 1801...  Training loss: 1.9161...  0.1755 sec/batch
Epoch: 1/20...  Training Step: 1802...  Training loss: 2.0513...  0.2538 sec/batch
Epoch: 1/20...  Training Step: 1803...  Training loss: 1.9607...  0.2080 sec/batch
Epoch: 1/20...  Training Step: 1804...  Training loss: 1.8369...  0.2320 sec/batch
Epoch: 1/20...  Training Step: 1805...  Training loss: 1.7688...  0.2830 sec/batch
Epoch: 1/20...  Training Step: 1806...  Training loss: 1.8823...  0.2633 sec/batch
Epoc

Epoch: 1/20...  Training Step: 1894...  Training loss: 1.9307...  0.3030 sec/batch
Epoch: 1/20...  Training Step: 1895...  Training loss: 1.7604...  0.3310 sec/batch
Epoch: 1/20...  Training Step: 1896...  Training loss: 1.9189...  0.2927 sec/batch
Epoch: 1/20...  Training Step: 1897...  Training loss: 1.9583...  0.2882 sec/batch
Epoch: 1/20...  Training Step: 1898...  Training loss: 2.0016...  0.2247 sec/batch
Epoch: 1/20...  Training Step: 1899...  Training loss: 1.9047...  0.3051 sec/batch
Epoch: 1/20...  Training Step: 1900...  Training loss: 1.7744...  0.2516 sec/batch
Epoch: 1/20...  Training Step: 1901...  Training loss: 2.0009...  0.2710 sec/batch
Epoch: 1/20...  Training Step: 1902...  Training loss: 1.9540...  0.2559 sec/batch
Epoch: 1/20...  Training Step: 1903...  Training loss: 1.9256...  0.3001 sec/batch
Epoch: 1/20...  Training Step: 1904...  Training loss: 1.7866...  0.2432 sec/batch
Epoch: 1/20...  Training Step: 1905...  Training loss: 1.8032...  0.2775 sec/batch
Epoc

Epoch: 1/20...  Training Step: 1993...  Training loss: 2.1661...  0.2743 sec/batch
Epoch: 1/20...  Training Step: 1994...  Training loss: 2.0533...  0.2517 sec/batch
Epoch: 1/20...  Training Step: 1995...  Training loss: 1.9062...  0.2782 sec/batch
Epoch: 1/20...  Training Step: 1996...  Training loss: 2.0185...  0.2087 sec/batch
Epoch: 1/20...  Training Step: 1997...  Training loss: 1.8644...  0.2946 sec/batch
Epoch: 1/20...  Training Step: 1998...  Training loss: 1.9184...  0.2476 sec/batch
Epoch: 1/20...  Training Step: 1999...  Training loss: 1.8535...  0.1244 sec/batch
Epoch: 1/20...  Training Step: 2000...  Training loss: 1.9238...  0.3096 sec/batch
Epoch: 1/20...  Training Step: 2001...  Training loss: 1.8550...  0.2446 sec/batch
Epoch: 1/20...  Training Step: 2002...  Training loss: 1.7631...  0.3135 sec/batch
Epoch: 1/20...  Training Step: 2003...  Training loss: 1.7627...  0.2564 sec/batch
Epoch: 1/20...  Training Step: 2004...  Training loss: 1.7742...  0.3318 sec/batch
Epoc

Epoch: 1/20...  Training Step: 2092...  Training loss: 1.8662...  0.2353 sec/batch
Epoch: 1/20...  Training Step: 2093...  Training loss: 1.9007...  0.2784 sec/batch
Epoch: 1/20...  Training Step: 2094...  Training loss: 1.8338...  0.2342 sec/batch
Epoch: 1/20...  Training Step: 2095...  Training loss: 1.8163...  0.2548 sec/batch
Epoch: 1/20...  Training Step: 2096...  Training loss: 1.9844...  0.1865 sec/batch
Epoch: 1/20...  Training Step: 2097...  Training loss: 1.9043...  0.3203 sec/batch
Epoch: 1/20...  Training Step: 2098...  Training loss: 1.8603...  0.2731 sec/batch
Epoch: 1/20...  Training Step: 2099...  Training loss: 1.7558...  0.3322 sec/batch
Epoch: 1/20...  Training Step: 2100...  Training loss: 1.8471...  0.2080 sec/batch
Epoch: 1/20...  Training Step: 2101...  Training loss: 1.8010...  0.2648 sec/batch
Epoch: 1/20...  Training Step: 2102...  Training loss: 1.9553...  0.2879 sec/batch
Epoch: 1/20...  Training Step: 2103...  Training loss: 1.9174...  0.3404 sec/batch
Epoc

Epoch: 1/20...  Training Step: 2191...  Training loss: 2.0093...  0.2732 sec/batch
Epoch: 1/20...  Training Step: 2192...  Training loss: 1.8758...  0.1873 sec/batch
Epoch: 1/20...  Training Step: 2193...  Training loss: 1.7905...  0.2076 sec/batch
Epoch: 1/20...  Training Step: 2194...  Training loss: 1.7463...  0.2215 sec/batch
Epoch: 1/20...  Training Step: 2195...  Training loss: 1.8945...  0.2787 sec/batch
Epoch: 1/20...  Training Step: 2196...  Training loss: 2.0570...  0.2570 sec/batch
Epoch: 1/20...  Training Step: 2197...  Training loss: 1.9305...  0.2616 sec/batch
Epoch: 1/20...  Training Step: 2198...  Training loss: 1.9587...  0.2118 sec/batch
Epoch: 1/20...  Training Step: 2199...  Training loss: 1.8598...  0.2454 sec/batch
Epoch: 1/20...  Training Step: 2200...  Training loss: 1.8473...  0.3046 sec/batch
Epoch: 1/20...  Training Step: 2201...  Training loss: 1.9703...  0.2245 sec/batch
Epoch: 1/20...  Training Step: 2202...  Training loss: 2.0178...  0.2826 sec/batch
Epoc

Epoch: 1/20...  Training Step: 2290...  Training loss: 1.8864...  0.2181 sec/batch
Epoch: 1/20...  Training Step: 2291...  Training loss: 1.7654...  0.2038 sec/batch
Epoch: 1/20...  Training Step: 2292...  Training loss: 1.9607...  0.2165 sec/batch
Epoch: 1/20...  Training Step: 2293...  Training loss: 1.7936...  0.2487 sec/batch
Epoch: 1/20...  Training Step: 2294...  Training loss: 1.8141...  0.2502 sec/batch
Epoch: 1/20...  Training Step: 2295...  Training loss: 1.8787...  0.2728 sec/batch
Epoch: 1/20...  Training Step: 2296...  Training loss: 1.7279...  0.2138 sec/batch
Epoch: 1/20...  Training Step: 2297...  Training loss: 1.8656...  0.1995 sec/batch
Epoch: 1/20...  Training Step: 2298...  Training loss: 1.7464...  0.2333 sec/batch
Epoch: 1/20...  Training Step: 2299...  Training loss: 1.9754...  0.2371 sec/batch
Epoch: 1/20...  Training Step: 2300...  Training loss: 1.7609...  0.2612 sec/batch
Epoch: 1/20...  Training Step: 2301...  Training loss: 2.0059...  0.2217 sec/batch
Epoc

Epoch: 1/20...  Training Step: 2389...  Training loss: 1.9584...  0.2981 sec/batch
Epoch: 1/20...  Training Step: 2390...  Training loss: 1.9391...  0.2059 sec/batch
Epoch: 1/20...  Training Step: 2391...  Training loss: 2.0474...  0.2570 sec/batch
Epoch: 1/20...  Training Step: 2392...  Training loss: 1.8656...  0.2707 sec/batch
Epoch: 1/20...  Training Step: 2393...  Training loss: 1.9984...  0.2474 sec/batch
Epoch: 1/20...  Training Step: 2394...  Training loss: 2.0442...  0.2620 sec/batch
Epoch: 1/20...  Training Step: 2395...  Training loss: 1.9499...  0.2588 sec/batch
Epoch: 1/20...  Training Step: 2396...  Training loss: 1.9175...  0.2487 sec/batch
Epoch: 1/20...  Training Step: 2397...  Training loss: 1.8798...  0.2244 sec/batch
Epoch: 1/20...  Training Step: 2398...  Training loss: 1.8618...  0.2925 sec/batch
Epoch: 1/20...  Training Step: 2399...  Training loss: 1.9818...  0.2580 sec/batch
Epoch: 1/20...  Training Step: 2400...  Training loss: 1.9766...  0.2712 sec/batch
Epoc

Epoch: 1/20...  Training Step: 2488...  Training loss: 1.7544...  0.2191 sec/batch
Epoch: 1/20...  Training Step: 2489...  Training loss: 1.9381...  0.2122 sec/batch
Epoch: 1/20...  Training Step: 2490...  Training loss: 1.9371...  0.1952 sec/batch
Epoch: 1/20...  Training Step: 2491...  Training loss: 2.0008...  0.2863 sec/batch
Epoch: 1/20...  Training Step: 2492...  Training loss: 1.9362...  0.3782 sec/batch
Epoch: 1/20...  Training Step: 2493...  Training loss: 1.9136...  0.2849 sec/batch
Epoch: 1/20...  Training Step: 2494...  Training loss: 1.8684...  0.2910 sec/batch
Epoch: 1/20...  Training Step: 2495...  Training loss: 1.8523...  0.2735 sec/batch
Epoch: 1/20...  Training Step: 2496...  Training loss: 1.8280...  0.2511 sec/batch
Epoch: 1/20...  Training Step: 2497...  Training loss: 1.8920...  0.2600 sec/batch
Epoch: 1/20...  Training Step: 2498...  Training loss: 1.9156...  0.2343 sec/batch
Epoch: 1/20...  Training Step: 2499...  Training loss: 1.9103...  0.1807 sec/batch
Epoc

Epoch: 1/20...  Training Step: 2587...  Training loss: 1.8507...  0.2681 sec/batch
Epoch: 1/20...  Training Step: 2588...  Training loss: 2.0218...  0.2870 sec/batch
Epoch: 1/20...  Training Step: 2589...  Training loss: 1.8557...  0.3035 sec/batch
Epoch: 1/20...  Training Step: 2590...  Training loss: 1.9848...  0.3309 sec/batch
Epoch: 1/20...  Training Step: 2591...  Training loss: 1.8867...  0.3224 sec/batch
Epoch: 1/20...  Training Step: 2592...  Training loss: 1.8944...  0.3024 sec/batch
Epoch: 1/20...  Training Step: 2593...  Training loss: 1.9420...  0.2087 sec/batch
Epoch: 1/20...  Training Step: 2594...  Training loss: 1.8650...  0.2833 sec/batch
Epoch: 1/20...  Training Step: 2595...  Training loss: 1.7639...  0.2654 sec/batch
Epoch: 1/20...  Training Step: 2596...  Training loss: 1.8865...  0.2979 sec/batch
Epoch: 1/20...  Training Step: 2597...  Training loss: 1.7425...  0.2684 sec/batch
Epoch: 1/20...  Training Step: 2598...  Training loss: 1.8067...  0.2803 sec/batch
Epoc

Epoch: 1/20...  Training Step: 2686...  Training loss: 1.7883...  0.2371 sec/batch
Epoch: 1/20...  Training Step: 2687...  Training loss: 1.8290...  0.2400 sec/batch
Epoch: 1/20...  Training Step: 2688...  Training loss: 1.8349...  0.1639 sec/batch
Epoch: 1/20...  Training Step: 2689...  Training loss: 1.8244...  0.1952 sec/batch
Epoch: 1/20...  Training Step: 2690...  Training loss: 1.9546...  0.2542 sec/batch
Epoch: 1/20...  Training Step: 2691...  Training loss: 1.9914...  0.2591 sec/batch
Epoch: 1/20...  Training Step: 2692...  Training loss: 1.8550...  0.3253 sec/batch
Epoch: 1/20...  Training Step: 2693...  Training loss: 1.8308...  0.2980 sec/batch
Epoch: 1/20...  Training Step: 2694...  Training loss: 1.9813...  0.2372 sec/batch
Epoch: 1/20...  Training Step: 2695...  Training loss: 2.0938...  0.1802 sec/batch
Epoch: 1/20...  Training Step: 2696...  Training loss: 1.9008...  0.2598 sec/batch
Epoch: 1/20...  Training Step: 2697...  Training loss: 1.7831...  0.2834 sec/batch
Epoc

Epoch: 1/20...  Training Step: 2785...  Training loss: 1.7869...  0.2925 sec/batch
Epoch: 1/20...  Training Step: 2786...  Training loss: 1.9785...  0.2198 sec/batch
Epoch: 1/20...  Training Step: 2787...  Training loss: 1.8210...  0.1968 sec/batch
Epoch: 1/20...  Training Step: 2788...  Training loss: 1.8155...  0.2454 sec/batch
Epoch: 1/20...  Training Step: 2789...  Training loss: 1.7928...  0.2521 sec/batch
Epoch: 1/20...  Training Step: 2790...  Training loss: 1.7892...  0.2151 sec/batch
Epoch: 1/20...  Training Step: 2791...  Training loss: 1.8581...  0.2240 sec/batch
Epoch: 1/20...  Training Step: 2792...  Training loss: 1.7417...  0.2381 sec/batch
Epoch: 1/20...  Training Step: 2793...  Training loss: 1.7569...  0.1950 sec/batch
Epoch: 1/20...  Training Step: 2794...  Training loss: 1.7084...  0.3542 sec/batch
Epoch: 1/20...  Training Step: 2795...  Training loss: 1.7689...  0.2805 sec/batch
Epoch: 1/20...  Training Step: 2796...  Training loss: 1.9452...  0.3269 sec/batch
Epoc

Epoch: 1/20...  Training Step: 2884...  Training loss: 1.8663...  0.2731 sec/batch
Epoch: 1/20...  Training Step: 2885...  Training loss: 1.8243...  0.2496 sec/batch
Epoch: 1/20...  Training Step: 2886...  Training loss: 1.6756...  0.2646 sec/batch
Epoch: 1/20...  Training Step: 2887...  Training loss: 1.7951...  0.2132 sec/batch
Epoch: 1/20...  Training Step: 2888...  Training loss: 1.9531...  0.2896 sec/batch
Epoch: 1/20...  Training Step: 2889...  Training loss: 1.9187...  0.1783 sec/batch
Epoch: 1/20...  Training Step: 2890...  Training loss: 1.7955...  0.2698 sec/batch
Epoch: 1/20...  Training Step: 2891...  Training loss: 1.9370...  0.2680 sec/batch
Epoch: 1/20...  Training Step: 2892...  Training loss: 1.7272...  0.2092 sec/batch
Epoch: 1/20...  Training Step: 2893...  Training loss: 2.0160...  0.2326 sec/batch
Epoch: 1/20...  Training Step: 2894...  Training loss: 1.8612...  0.2139 sec/batch
Epoch: 1/20...  Training Step: 2895...  Training loss: 1.7924...  0.2971 sec/batch
Epoc

Epoch: 1/20...  Training Step: 2983...  Training loss: 1.7834...  0.2517 sec/batch
Epoch: 1/20...  Training Step: 2984...  Training loss: 1.7616...  0.1956 sec/batch
Epoch: 1/20...  Training Step: 2985...  Training loss: 1.8769...  0.2827 sec/batch
Epoch: 1/20...  Training Step: 2986...  Training loss: 1.8823...  0.2605 sec/batch
Epoch: 1/20...  Training Step: 2987...  Training loss: 1.8330...  0.2985 sec/batch
Epoch: 1/20...  Training Step: 2988...  Training loss: 2.0227...  0.2966 sec/batch
Epoch: 1/20...  Training Step: 2989...  Training loss: 1.9326...  0.1842 sec/batch
Epoch: 1/20...  Training Step: 2990...  Training loss: 1.9157...  0.1883 sec/batch
Epoch: 1/20...  Training Step: 2991...  Training loss: 1.8884...  0.2098 sec/batch
Epoch: 1/20...  Training Step: 2992...  Training loss: 1.8843...  0.2270 sec/batch
Epoch: 1/20...  Training Step: 2993...  Training loss: 1.6918...  0.2372 sec/batch
Epoch: 1/20...  Training Step: 2994...  Training loss: 1.8979...  0.2734 sec/batch
Epoc

Epoch: 1/20...  Training Step: 3082...  Training loss: 1.7492...  0.2438 sec/batch
Epoch: 1/20...  Training Step: 3083...  Training loss: 1.9724...  0.2532 sec/batch
Epoch: 1/20...  Training Step: 3084...  Training loss: 1.8048...  0.2834 sec/batch
Epoch: 1/20...  Training Step: 3085...  Training loss: 1.8837...  0.3182 sec/batch
Epoch: 1/20...  Training Step: 3086...  Training loss: 1.8113...  0.2874 sec/batch
Epoch: 1/20...  Training Step: 3087...  Training loss: 1.7781...  0.2876 sec/batch
Epoch: 1/20...  Training Step: 3088...  Training loss: 1.8727...  0.2663 sec/batch
Epoch: 1/20...  Training Step: 3089...  Training loss: 1.7779...  0.2379 sec/batch
Epoch: 1/20...  Training Step: 3090...  Training loss: 1.8394...  0.3218 sec/batch
Epoch: 1/20...  Training Step: 3091...  Training loss: 1.6878...  0.2112 sec/batch
Epoch: 1/20...  Training Step: 3092...  Training loss: 1.7713...  0.2478 sec/batch
Epoch: 1/20...  Training Step: 3093...  Training loss: 1.9083...  0.2855 sec/batch
Epoc

Epoch: 1/20...  Training Step: 3181...  Training loss: 1.8159...  0.2245 sec/batch
Epoch: 1/20...  Training Step: 3182...  Training loss: 1.8031...  0.2307 sec/batch
Epoch: 1/20...  Training Step: 3183...  Training loss: 1.8495...  0.1791 sec/batch
Epoch: 1/20...  Training Step: 3184...  Training loss: 1.8598...  0.2737 sec/batch
Epoch: 1/20...  Training Step: 3185...  Training loss: 1.9237...  0.1870 sec/batch
Epoch: 1/20...  Training Step: 3186...  Training loss: 1.8407...  0.2054 sec/batch
Epoch: 1/20...  Training Step: 3187...  Training loss: 1.7944...  0.2329 sec/batch
Epoch: 1/20...  Training Step: 3188...  Training loss: 1.9657...  0.2528 sec/batch
Epoch: 1/20...  Training Step: 3189...  Training loss: 1.7875...  0.3152 sec/batch
Epoch: 1/20...  Training Step: 3190...  Training loss: 1.8076...  0.3245 sec/batch
Epoch: 1/20...  Training Step: 3191...  Training loss: 1.8020...  0.3209 sec/batch
Epoch: 1/20...  Training Step: 3192...  Training loss: 1.7714...  0.2441 sec/batch
Epoc

Epoch: 1/20...  Training Step: 3280...  Training loss: 1.7447...  0.2915 sec/batch
Epoch: 1/20...  Training Step: 3281...  Training loss: 1.8568...  0.2905 sec/batch
Epoch: 1/20...  Training Step: 3282...  Training loss: 1.7741...  0.2409 sec/batch
Epoch: 1/20...  Training Step: 3283...  Training loss: 1.8468...  0.3053 sec/batch
Epoch: 1/20...  Training Step: 3284...  Training loss: 1.8490...  0.2724 sec/batch
Epoch: 1/20...  Training Step: 3285...  Training loss: 1.7443...  0.2355 sec/batch
Epoch: 1/20...  Training Step: 3286...  Training loss: 1.8143...  0.2858 sec/batch
Epoch: 1/20...  Training Step: 3287...  Training loss: 1.8130...  0.2047 sec/batch
Epoch: 1/20...  Training Step: 3288...  Training loss: 1.8038...  0.1936 sec/batch
Epoch: 1/20...  Training Step: 3289...  Training loss: 1.9199...  0.3346 sec/batch
Epoch: 1/20...  Training Step: 3290...  Training loss: 1.9716...  0.1955 sec/batch
Epoch: 1/20...  Training Step: 3291...  Training loss: 1.9172...  0.2141 sec/batch
Epoc

Epoch: 1/20...  Training Step: 3379...  Training loss: 1.7262...  0.2397 sec/batch
Epoch: 1/20...  Training Step: 3380...  Training loss: 1.7120...  0.2089 sec/batch
Epoch: 1/20...  Training Step: 3381...  Training loss: 1.7335...  0.1896 sec/batch
Epoch: 1/20...  Training Step: 3382...  Training loss: 1.7880...  0.2735 sec/batch
Epoch: 1/20...  Training Step: 3383...  Training loss: 1.7089...  0.3087 sec/batch
Epoch: 1/20...  Training Step: 3384...  Training loss: 1.8192...  0.2420 sec/batch
Epoch: 1/20...  Training Step: 3385...  Training loss: 1.8128...  0.2253 sec/batch
Epoch: 1/20...  Training Step: 3386...  Training loss: 1.6971...  0.2840 sec/batch
Epoch: 1/20...  Training Step: 3387...  Training loss: 1.8772...  0.2220 sec/batch
Epoch: 1/20...  Training Step: 3388...  Training loss: 1.9415...  0.2795 sec/batch
Epoch: 1/20...  Training Step: 3389...  Training loss: 1.7997...  0.2876 sec/batch
Epoch: 1/20...  Training Step: 3390...  Training loss: 1.8942...  0.2631 sec/batch
Epoc

Epoch: 1/20...  Training Step: 3478...  Training loss: 1.8356...  0.2128 sec/batch
Epoch: 1/20...  Training Step: 3479...  Training loss: 1.7660...  0.2604 sec/batch
Epoch: 1/20...  Training Step: 3480...  Training loss: 1.6480...  0.1848 sec/batch
Epoch: 1/20...  Training Step: 3481...  Training loss: 1.8321...  0.2898 sec/batch
Epoch: 1/20...  Training Step: 3482...  Training loss: 1.8587...  0.2805 sec/batch
Epoch: 1/20...  Training Step: 3483...  Training loss: 1.9333...  0.2773 sec/batch
Epoch: 1/20...  Training Step: 3484...  Training loss: 1.9010...  0.2987 sec/batch
Epoch: 1/20...  Training Step: 3485...  Training loss: 1.8891...  0.2592 sec/batch
Epoch: 1/20...  Training Step: 3486...  Training loss: 2.1088...  0.3352 sec/batch
Epoch: 1/20...  Training Step: 3487...  Training loss: 1.8182...  0.2951 sec/batch
Epoch: 1/20...  Training Step: 3488...  Training loss: 1.9589...  0.2181 sec/batch
Epoch: 1/20...  Training Step: 3489...  Training loss: 1.8110...  0.2046 sec/batch
Epoc

Epoch: 1/20...  Training Step: 3577...  Training loss: 1.7489...  0.2603 sec/batch
Epoch: 1/20...  Training Step: 3578...  Training loss: 1.7060...  0.2467 sec/batch
Epoch: 1/20...  Training Step: 3579...  Training loss: 1.7377...  0.2642 sec/batch
Epoch: 1/20...  Training Step: 3580...  Training loss: 1.7094...  0.3041 sec/batch
Epoch: 1/20...  Training Step: 3581...  Training loss: 1.8192...  0.2604 sec/batch
Epoch: 1/20...  Training Step: 3582...  Training loss: 1.8074...  0.2222 sec/batch
Epoch: 1/20...  Training Step: 3583...  Training loss: 1.6726...  0.2853 sec/batch
Epoch: 1/20...  Training Step: 3584...  Training loss: 1.8555...  0.3425 sec/batch
Epoch: 1/20...  Training Step: 3585...  Training loss: 1.8198...  0.3011 sec/batch
Epoch: 1/20...  Training Step: 3586...  Training loss: 1.9594...  0.2723 sec/batch
Epoch: 1/20...  Training Step: 3587...  Training loss: 1.9129...  0.2098 sec/batch
Epoch: 1/20...  Training Step: 3588...  Training loss: 1.8296...  0.2374 sec/batch
Epoc

Epoch: 1/20...  Training Step: 3676...  Training loss: 2.0151...  0.2451 sec/batch
Epoch: 1/20...  Training Step: 3677...  Training loss: 1.9883...  0.2460 sec/batch
Epoch: 1/20...  Training Step: 3678...  Training loss: 1.8634...  0.3194 sec/batch
Epoch: 1/20...  Training Step: 3679...  Training loss: 1.9903...  0.3031 sec/batch
Epoch: 1/20...  Training Step: 3680...  Training loss: 1.9814...  0.3357 sec/batch
Epoch: 1/20...  Training Step: 3681...  Training loss: 1.9218...  0.2808 sec/batch
Epoch: 1/20...  Training Step: 3682...  Training loss: 1.8479...  0.2306 sec/batch
Epoch: 1/20...  Training Step: 3683...  Training loss: 1.7935...  0.2641 sec/batch
Epoch: 1/20...  Training Step: 3684...  Training loss: 1.8160...  0.2696 sec/batch
Epoch: 1/20...  Training Step: 3685...  Training loss: 1.8153...  0.1877 sec/batch
Epoch: 1/20...  Training Step: 3686...  Training loss: 1.9009...  0.2445 sec/batch
Epoch: 1/20...  Training Step: 3687...  Training loss: 1.9271...  0.2483 sec/batch
Epoc

Epoch: 1/20...  Training Step: 3775...  Training loss: 1.9201...  0.2881 sec/batch
Epoch: 1/20...  Training Step: 3776...  Training loss: 1.8662...  0.2686 sec/batch
Epoch: 1/20...  Training Step: 3777...  Training loss: 1.8489...  0.2160 sec/batch
Epoch: 1/20...  Training Step: 3778...  Training loss: 1.8657...  0.2315 sec/batch
Epoch: 1/20...  Training Step: 3779...  Training loss: 1.9069...  0.2679 sec/batch
Epoch: 1/20...  Training Step: 3780...  Training loss: 2.0124...  0.2735 sec/batch
Epoch: 1/20...  Training Step: 3781...  Training loss: 1.8078...  0.1812 sec/batch
Epoch: 1/20...  Training Step: 3782...  Training loss: 1.7941...  0.1775 sec/batch
Epoch: 1/20...  Training Step: 3783...  Training loss: 1.8096...  0.2081 sec/batch
Epoch: 1/20...  Training Step: 3784...  Training loss: 1.8740...  0.2854 sec/batch
Epoch: 1/20...  Training Step: 3785...  Training loss: 1.7481...  0.1914 sec/batch
Epoch: 1/20...  Training Step: 3786...  Training loss: 1.7523...  0.2375 sec/batch
Epoc

Epoch: 1/20...  Training Step: 3874...  Training loss: 1.7689...  0.2518 sec/batch
Epoch: 1/20...  Training Step: 3875...  Training loss: 1.7566...  0.2547 sec/batch
Epoch: 1/20...  Training Step: 3876...  Training loss: 1.8462...  0.2628 sec/batch
Epoch: 1/20...  Training Step: 3877...  Training loss: 1.9595...  0.1479 sec/batch
Epoch: 1/20...  Training Step: 3878...  Training loss: 1.8127...  0.2670 sec/batch
Epoch: 1/20...  Training Step: 3879...  Training loss: 1.7994...  0.2882 sec/batch
Epoch: 1/20...  Training Step: 3880...  Training loss: 1.8921...  0.1894 sec/batch
Epoch: 1/20...  Training Step: 3881...  Training loss: 1.9227...  0.2991 sec/batch
Epoch: 1/20...  Training Step: 3882...  Training loss: 1.9297...  0.2050 sec/batch
Epoch: 1/20...  Training Step: 3883...  Training loss: 1.9375...  0.2157 sec/batch
Epoch: 1/20...  Training Step: 3884...  Training loss: 1.8678...  0.2535 sec/batch
Epoch: 1/20...  Training Step: 3885...  Training loss: 1.8811...  0.1756 sec/batch
Epoc

Epoch: 2/20...  Training Step: 3973...  Training loss: 1.7594...  0.3008 sec/batch
Epoch: 2/20...  Training Step: 3974...  Training loss: 1.7155...  0.3163 sec/batch
Epoch: 2/20...  Training Step: 3975...  Training loss: 1.8949...  0.2467 sec/batch
Epoch: 2/20...  Training Step: 3976...  Training loss: 1.7696...  0.1799 sec/batch
Epoch: 2/20...  Training Step: 3977...  Training loss: 1.7581...  0.2853 sec/batch
Epoch: 2/20...  Training Step: 3978...  Training loss: 1.7882...  0.3103 sec/batch
Epoch: 2/20...  Training Step: 3979...  Training loss: 1.9328...  0.2114 sec/batch
Epoch: 2/20...  Training Step: 3980...  Training loss: 1.8201...  0.1909 sec/batch
Epoch: 2/20...  Training Step: 3981...  Training loss: 1.7609...  0.3274 sec/batch
Epoch: 2/20...  Training Step: 3982...  Training loss: 1.6906...  0.2326 sec/batch
Epoch: 2/20...  Training Step: 3983...  Training loss: 1.7809...  0.2519 sec/batch
Epoch: 2/20...  Training Step: 3984...  Training loss: 1.8268...  0.3225 sec/batch
Epoc

Epoch: 2/20...  Training Step: 4072...  Training loss: 1.7389...  0.2472 sec/batch
Epoch: 2/20...  Training Step: 4073...  Training loss: 1.8112...  0.2712 sec/batch
Epoch: 2/20...  Training Step: 4074...  Training loss: 1.6967...  0.3384 sec/batch
Epoch: 2/20...  Training Step: 4075...  Training loss: 1.8035...  0.2882 sec/batch
Epoch: 2/20...  Training Step: 4076...  Training loss: 1.7902...  0.2535 sec/batch
Epoch: 2/20...  Training Step: 4077...  Training loss: 1.7946...  0.3239 sec/batch
Epoch: 2/20...  Training Step: 4078...  Training loss: 1.7828...  0.2934 sec/batch
Epoch: 2/20...  Training Step: 4079...  Training loss: 1.6639...  0.2919 sec/batch
Epoch: 2/20...  Training Step: 4080...  Training loss: 1.7574...  0.3572 sec/batch
Epoch: 2/20...  Training Step: 4081...  Training loss: 1.6836...  0.2354 sec/batch
Epoch: 2/20...  Training Step: 4082...  Training loss: 1.8134...  0.2604 sec/batch
Epoch: 2/20...  Training Step: 4083...  Training loss: 1.7414...  0.3383 sec/batch
Epoc

Epoch: 2/20...  Training Step: 4171...  Training loss: 1.7319...  0.3360 sec/batch
Epoch: 2/20...  Training Step: 4172...  Training loss: 1.9448...  0.2624 sec/batch
Epoch: 2/20...  Training Step: 4173...  Training loss: 1.8131...  0.3221 sec/batch
Epoch: 2/20...  Training Step: 4174...  Training loss: 1.8732...  0.2083 sec/batch
Epoch: 2/20...  Training Step: 4175...  Training loss: 1.8190...  0.2026 sec/batch
Epoch: 2/20...  Training Step: 4176...  Training loss: 1.9064...  0.3216 sec/batch
Epoch: 2/20...  Training Step: 4177...  Training loss: 1.7893...  0.2894 sec/batch
Epoch: 2/20...  Training Step: 4178...  Training loss: 1.7793...  0.2580 sec/batch
Epoch: 2/20...  Training Step: 4179...  Training loss: 1.9081...  0.2497 sec/batch
Epoch: 2/20...  Training Step: 4180...  Training loss: 1.7584...  0.2442 sec/batch
Epoch: 2/20...  Training Step: 4181...  Training loss: 1.7077...  0.2167 sec/batch
Epoch: 2/20...  Training Step: 4182...  Training loss: 1.8572...  0.2738 sec/batch
Epoc

Epoch: 2/20...  Training Step: 4270...  Training loss: 1.7709...  0.2804 sec/batch
Epoch: 2/20...  Training Step: 4271...  Training loss: 1.8643...  0.2997 sec/batch
Epoch: 2/20...  Training Step: 4272...  Training loss: 1.8544...  0.2661 sec/batch
Epoch: 2/20...  Training Step: 4273...  Training loss: 1.7934...  0.2388 sec/batch
Epoch: 2/20...  Training Step: 4274...  Training loss: 1.8133...  0.1997 sec/batch
Epoch: 2/20...  Training Step: 4275...  Training loss: 1.9104...  0.2494 sec/batch
Epoch: 2/20...  Training Step: 4276...  Training loss: 1.7228...  0.2851 sec/batch
Epoch: 2/20...  Training Step: 4277...  Training loss: 1.9409...  0.3175 sec/batch
Epoch: 2/20...  Training Step: 4278...  Training loss: 1.8520...  0.2225 sec/batch
Epoch: 2/20...  Training Step: 4279...  Training loss: 1.8282...  0.2799 sec/batch
Epoch: 2/20...  Training Step: 4280...  Training loss: 1.9312...  0.2303 sec/batch
Epoch: 2/20...  Training Step: 4281...  Training loss: 1.8476...  0.3389 sec/batch
Epoc

Epoch: 2/20...  Training Step: 4369...  Training loss: 1.8489...  0.3327 sec/batch
Epoch: 2/20...  Training Step: 4370...  Training loss: 1.8068...  0.2412 sec/batch
Epoch: 2/20...  Training Step: 4371...  Training loss: 1.7722...  0.2727 sec/batch
Epoch: 2/20...  Training Step: 4372...  Training loss: 1.7755...  0.2376 sec/batch
Epoch: 2/20...  Training Step: 4373...  Training loss: 1.7888...  0.2678 sec/batch
Epoch: 2/20...  Training Step: 4374...  Training loss: 1.7773...  0.3527 sec/batch
Epoch: 2/20...  Training Step: 4375...  Training loss: 1.8968...  0.3737 sec/batch
Epoch: 2/20...  Training Step: 4376...  Training loss: 1.8213...  0.3519 sec/batch
Epoch: 2/20...  Training Step: 4377...  Training loss: 1.8677...  0.3029 sec/batch
Epoch: 2/20...  Training Step: 4378...  Training loss: 1.8649...  0.3167 sec/batch
Epoch: 2/20...  Training Step: 4379...  Training loss: 1.7639...  0.2787 sec/batch
Epoch: 2/20...  Training Step: 4380...  Training loss: 1.8736...  0.1976 sec/batch
Epoc

Epoch: 2/20...  Training Step: 4468...  Training loss: 1.8473...  0.2489 sec/batch
Epoch: 2/20...  Training Step: 4469...  Training loss: 1.7441...  0.2392 sec/batch
Epoch: 2/20...  Training Step: 4470...  Training loss: 1.9065...  0.2784 sec/batch
Epoch: 2/20...  Training Step: 4471...  Training loss: 1.7332...  0.2193 sec/batch
Epoch: 2/20...  Training Step: 4472...  Training loss: 1.8664...  0.2132 sec/batch
Epoch: 2/20...  Training Step: 4473...  Training loss: 1.8566...  0.3017 sec/batch
Epoch: 2/20...  Training Step: 4474...  Training loss: 1.7784...  0.3188 sec/batch
Epoch: 2/20...  Training Step: 4475...  Training loss: 1.7967...  0.2753 sec/batch
Epoch: 2/20...  Training Step: 4476...  Training loss: 1.8414...  0.2923 sec/batch
Epoch: 2/20...  Training Step: 4477...  Training loss: 1.8731...  0.2062 sec/batch
Epoch: 2/20...  Training Step: 4478...  Training loss: 1.8987...  0.2856 sec/batch
Epoch: 2/20...  Training Step: 4479...  Training loss: 1.8808...  0.2358 sec/batch
Epoc

Epoch: 2/20...  Training Step: 4567...  Training loss: 1.6907...  0.1994 sec/batch
Epoch: 2/20...  Training Step: 4568...  Training loss: 1.9103...  0.2946 sec/batch
Epoch: 2/20...  Training Step: 4569...  Training loss: 1.8559...  0.2645 sec/batch
Epoch: 2/20...  Training Step: 4570...  Training loss: 1.6919...  0.2402 sec/batch
Epoch: 2/20...  Training Step: 4571...  Training loss: 1.9458...  0.2324 sec/batch
Epoch: 2/20...  Training Step: 4572...  Training loss: 1.8040...  0.1949 sec/batch
Epoch: 2/20...  Training Step: 4573...  Training loss: 1.8191...  0.2758 sec/batch
Epoch: 2/20...  Training Step: 4574...  Training loss: 1.7069...  0.2008 sec/batch
Epoch: 2/20...  Training Step: 4575...  Training loss: 1.7926...  0.2771 sec/batch
Epoch: 2/20...  Training Step: 4576...  Training loss: 1.7870...  0.2428 sec/batch
Epoch: 2/20...  Training Step: 4577...  Training loss: 1.7160...  0.2641 sec/batch
Epoch: 2/20...  Training Step: 4578...  Training loss: 1.8565...  0.3033 sec/batch
Epoc

Epoch: 2/20...  Training Step: 4666...  Training loss: 1.9340...  0.2536 sec/batch
Epoch: 2/20...  Training Step: 4667...  Training loss: 1.8665...  0.2295 sec/batch
Epoch: 2/20...  Training Step: 4668...  Training loss: 1.8903...  0.1859 sec/batch
Epoch: 2/20...  Training Step: 4669...  Training loss: 1.8371...  0.2774 sec/batch
Epoch: 2/20...  Training Step: 4670...  Training loss: 1.7888...  0.2372 sec/batch
Epoch: 2/20...  Training Step: 4671...  Training loss: 1.8278...  0.2983 sec/batch
Epoch: 2/20...  Training Step: 4672...  Training loss: 1.8353...  0.2409 sec/batch
Epoch: 2/20...  Training Step: 4673...  Training loss: 1.8876...  0.2175 sec/batch
Epoch: 2/20...  Training Step: 4674...  Training loss: 1.8047...  0.2752 sec/batch
Epoch: 2/20...  Training Step: 4675...  Training loss: 1.8122...  0.2204 sec/batch
Epoch: 2/20...  Training Step: 4676...  Training loss: 1.8485...  0.2895 sec/batch
Epoch: 2/20...  Training Step: 4677...  Training loss: 1.8684...  0.2219 sec/batch
Epoc

Epoch: 2/20...  Training Step: 4765...  Training loss: 1.6979...  0.2109 sec/batch
Epoch: 2/20...  Training Step: 4766...  Training loss: 1.7083...  0.2868 sec/batch
Epoch: 2/20...  Training Step: 4767...  Training loss: 1.8611...  0.2378 sec/batch
Epoch: 2/20...  Training Step: 4768...  Training loss: 1.7059...  0.2130 sec/batch
Epoch: 2/20...  Training Step: 4769...  Training loss: 1.8120...  0.2677 sec/batch
Epoch: 2/20...  Training Step: 4770...  Training loss: 1.7673...  0.2378 sec/batch
Epoch: 2/20...  Training Step: 4771...  Training loss: 1.8107...  0.3143 sec/batch
Epoch: 2/20...  Training Step: 4772...  Training loss: 1.6673...  0.2199 sec/batch
Epoch: 2/20...  Training Step: 4773...  Training loss: 1.8984...  0.2905 sec/batch
Epoch: 2/20...  Training Step: 4774...  Training loss: 1.9279...  0.2845 sec/batch
Epoch: 2/20...  Training Step: 4775...  Training loss: 1.9260...  0.2849 sec/batch
Epoch: 2/20...  Training Step: 4776...  Training loss: 1.9400...  0.1852 sec/batch
Epoc

Epoch: 2/20...  Training Step: 4864...  Training loss: 1.8215...  0.2256 sec/batch
Epoch: 2/20...  Training Step: 4865...  Training loss: 1.8109...  0.2867 sec/batch
Epoch: 2/20...  Training Step: 4866...  Training loss: 1.8130...  0.2046 sec/batch
Epoch: 2/20...  Training Step: 4867...  Training loss: 1.9882...  0.2350 sec/batch
Epoch: 2/20...  Training Step: 4868...  Training loss: 1.7377...  0.2355 sec/batch
Epoch: 2/20...  Training Step: 4869...  Training loss: 1.8149...  0.3109 sec/batch
Epoch: 2/20...  Training Step: 4870...  Training loss: 1.7911...  0.2501 sec/batch
Epoch: 2/20...  Training Step: 4871...  Training loss: 1.6884...  0.2727 sec/batch
Epoch: 2/20...  Training Step: 4872...  Training loss: 1.7333...  0.2969 sec/batch
Epoch: 2/20...  Training Step: 4873...  Training loss: 1.7491...  0.2969 sec/batch
Epoch: 2/20...  Training Step: 4874...  Training loss: 1.7444...  0.2201 sec/batch
Epoch: 2/20...  Training Step: 4875...  Training loss: 1.7558...  0.2260 sec/batch
Epoc

Epoch: 2/20...  Training Step: 4963...  Training loss: 1.8320...  0.2668 sec/batch
Epoch: 2/20...  Training Step: 4964...  Training loss: 1.8658...  0.2642 sec/batch
Epoch: 2/20...  Training Step: 4965...  Training loss: 1.8619...  0.3233 sec/batch
Epoch: 2/20...  Training Step: 4966...  Training loss: 1.9051...  0.2371 sec/batch
Epoch: 2/20...  Training Step: 4967...  Training loss: 1.7118...  0.2072 sec/batch
Epoch: 2/20...  Training Step: 4968...  Training loss: 1.8921...  0.2224 sec/batch
Epoch: 2/20...  Training Step: 4969...  Training loss: 1.7671...  0.3177 sec/batch
Epoch: 2/20...  Training Step: 4970...  Training loss: 1.7968...  0.3058 sec/batch
Epoch: 2/20...  Training Step: 4971...  Training loss: 1.7711...  0.2629 sec/batch
Epoch: 2/20...  Training Step: 4972...  Training loss: 1.6417...  0.2329 sec/batch
Epoch: 2/20...  Training Step: 4973...  Training loss: 1.7655...  0.2088 sec/batch
Epoch: 2/20...  Training Step: 4974...  Training loss: 1.6603...  0.3213 sec/batch
Epoc

Epoch: 2/20...  Training Step: 5062...  Training loss: 1.9494...  0.2062 sec/batch
Epoch: 2/20...  Training Step: 5063...  Training loss: 1.7592...  0.3010 sec/batch
Epoch: 2/20...  Training Step: 5064...  Training loss: 1.8381...  0.2312 sec/batch
Epoch: 2/20...  Training Step: 5065...  Training loss: 1.7474...  0.3002 sec/batch
Epoch: 2/20...  Training Step: 5066...  Training loss: 1.8470...  0.3143 sec/batch
Epoch: 2/20...  Training Step: 5067...  Training loss: 1.6759...  0.2723 sec/batch
Epoch: 2/20...  Training Step: 5068...  Training loss: 1.6243...  0.2396 sec/batch
Epoch: 2/20...  Training Step: 5069...  Training loss: 1.7897...  0.2267 sec/batch
Epoch: 2/20...  Training Step: 5070...  Training loss: 1.7786...  0.2322 sec/batch
Epoch: 2/20...  Training Step: 5071...  Training loss: 1.7149...  0.2725 sec/batch
Epoch: 2/20...  Training Step: 5072...  Training loss: 1.6983...  0.2614 sec/batch
Epoch: 2/20...  Training Step: 5073...  Training loss: 1.5970...  0.2916 sec/batch
Epoc

Epoch: 2/20...  Training Step: 5161...  Training loss: 1.7326...  0.2885 sec/batch
Epoch: 2/20...  Training Step: 5162...  Training loss: 1.8144...  0.3155 sec/batch
Epoch: 2/20...  Training Step: 5163...  Training loss: 1.7794...  0.3283 sec/batch
Epoch: 2/20...  Training Step: 5164...  Training loss: 1.8101...  0.2190 sec/batch
Epoch: 2/20...  Training Step: 5165...  Training loss: 1.9869...  0.2837 sec/batch
Epoch: 2/20...  Training Step: 5166...  Training loss: 1.7973...  0.2943 sec/batch
Epoch: 2/20...  Training Step: 5167...  Training loss: 1.6801...  0.2461 sec/batch
Epoch: 2/20...  Training Step: 5168...  Training loss: 1.7427...  0.2631 sec/batch
Epoch: 2/20...  Training Step: 5169...  Training loss: 1.7373...  0.2671 sec/batch
Epoch: 2/20...  Training Step: 5170...  Training loss: 1.7195...  0.2981 sec/batch
Epoch: 2/20...  Training Step: 5171...  Training loss: 1.7012...  0.1962 sec/batch
Epoch: 2/20...  Training Step: 5172...  Training loss: 1.8385...  0.2630 sec/batch
Epoc

Epoch: 2/20...  Training Step: 5260...  Training loss: 1.8797...  0.2846 sec/batch
Epoch: 2/20...  Training Step: 5261...  Training loss: 1.8582...  0.2831 sec/batch
Epoch: 2/20...  Training Step: 5262...  Training loss: 1.9099...  0.3023 sec/batch
Epoch: 2/20...  Training Step: 5263...  Training loss: 1.7797...  0.2506 sec/batch
Epoch: 2/20...  Training Step: 5264...  Training loss: 1.8350...  0.2835 sec/batch
Epoch: 2/20...  Training Step: 5265...  Training loss: 1.8806...  0.2321 sec/batch
Epoch: 2/20...  Training Step: 5266...  Training loss: 1.8467...  0.2466 sec/batch
Epoch: 2/20...  Training Step: 5267...  Training loss: 1.7653...  0.2345 sec/batch
Epoch: 2/20...  Training Step: 5268...  Training loss: 1.5719...  0.2183 sec/batch
Epoch: 2/20...  Training Step: 5269...  Training loss: 1.7074...  0.2138 sec/batch
Epoch: 2/20...  Training Step: 5270...  Training loss: 1.7426...  0.1940 sec/batch
Epoch: 2/20...  Training Step: 5271...  Training loss: 1.8071...  0.1948 sec/batch
Epoc

Epoch: 2/20...  Training Step: 5359...  Training loss: 1.7799...  0.2675 sec/batch
Epoch: 2/20...  Training Step: 5360...  Training loss: 1.7647...  0.2822 sec/batch
Epoch: 2/20...  Training Step: 5361...  Training loss: 1.8072...  0.2040 sec/batch
Epoch: 2/20...  Training Step: 5362...  Training loss: 1.7527...  0.1817 sec/batch
Epoch: 2/20...  Training Step: 5363...  Training loss: 1.8931...  0.3054 sec/batch
Epoch: 2/20...  Training Step: 5364...  Training loss: 1.7762...  0.3021 sec/batch
Epoch: 2/20...  Training Step: 5365...  Training loss: 1.7884...  0.2673 sec/batch
Epoch: 2/20...  Training Step: 5366...  Training loss: 1.6785...  0.2911 sec/batch
Epoch: 2/20...  Training Step: 5367...  Training loss: 1.7932...  0.2712 sec/batch
Epoch: 2/20...  Training Step: 5368...  Training loss: 1.7348...  0.2341 sec/batch
Epoch: 2/20...  Training Step: 5369...  Training loss: 1.8517...  0.2883 sec/batch
Epoch: 2/20...  Training Step: 5370...  Training loss: 1.9339...  0.2905 sec/batch
Epoc

Epoch: 2/20...  Training Step: 5458...  Training loss: 1.8606...  0.2113 sec/batch
Epoch: 2/20...  Training Step: 5459...  Training loss: 1.8631...  0.2796 sec/batch
Epoch: 2/20...  Training Step: 5460...  Training loss: 1.7962...  0.2218 sec/batch
Epoch: 2/20...  Training Step: 5461...  Training loss: 1.8849...  0.1928 sec/batch
Epoch: 2/20...  Training Step: 5462...  Training loss: 1.7024...  0.2901 sec/batch
Epoch: 2/20...  Training Step: 5463...  Training loss: 1.9646...  0.3157 sec/batch
Epoch: 2/20...  Training Step: 5464...  Training loss: 1.6755...  0.2313 sec/batch
Epoch: 2/20...  Training Step: 5465...  Training loss: 1.7215...  0.2574 sec/batch
Epoch: 2/20...  Training Step: 5466...  Training loss: 1.7174...  0.2154 sec/batch
Epoch: 2/20...  Training Step: 5467...  Training loss: 1.7412...  0.2943 sec/batch
Epoch: 2/20...  Training Step: 5468...  Training loss: 1.7815...  0.3018 sec/batch
Epoch: 2/20...  Training Step: 5469...  Training loss: 1.8994...  0.2957 sec/batch
Epoc

Epoch: 2/20...  Training Step: 5557...  Training loss: 1.7683...  0.2249 sec/batch
Epoch: 2/20...  Training Step: 5558...  Training loss: 1.7597...  0.3186 sec/batch
Epoch: 2/20...  Training Step: 5559...  Training loss: 1.8829...  0.2732 sec/batch
Epoch: 2/20...  Training Step: 5560...  Training loss: 1.7856...  0.2164 sec/batch
Epoch: 2/20...  Training Step: 5561...  Training loss: 1.7696...  0.2345 sec/batch
Epoch: 2/20...  Training Step: 5562...  Training loss: 1.8957...  0.2507 sec/batch
Epoch: 2/20...  Training Step: 5563...  Training loss: 1.7904...  0.2356 sec/batch
Epoch: 2/20...  Training Step: 5564...  Training loss: 1.8718...  0.3007 sec/batch
Epoch: 2/20...  Training Step: 5565...  Training loss: 1.8684...  0.1883 sec/batch
Epoch: 2/20...  Training Step: 5566...  Training loss: 1.7804...  0.2366 sec/batch
Epoch: 2/20...  Training Step: 5567...  Training loss: 1.6540...  0.2352 sec/batch
Epoch: 2/20...  Training Step: 5568...  Training loss: 1.7994...  0.2942 sec/batch
Epoc

Epoch: 2/20...  Training Step: 5656...  Training loss: 1.7970...  0.2873 sec/batch
Epoch: 2/20...  Training Step: 5657...  Training loss: 1.8200...  0.3230 sec/batch
Epoch: 2/20...  Training Step: 5658...  Training loss: 1.8840...  0.2552 sec/batch
Epoch: 2/20...  Training Step: 5659...  Training loss: 1.8530...  0.2439 sec/batch
Epoch: 2/20...  Training Step: 5660...  Training loss: 1.7734...  0.2872 sec/batch
Epoch: 2/20...  Training Step: 5661...  Training loss: 1.8115...  0.2744 sec/batch
Epoch: 2/20...  Training Step: 5662...  Training loss: 1.7702...  0.2199 sec/batch
Epoch: 2/20...  Training Step: 5663...  Training loss: 1.6238...  0.2755 sec/batch
Epoch: 2/20...  Training Step: 5664...  Training loss: 1.8069...  0.2744 sec/batch
Epoch: 2/20...  Training Step: 5665...  Training loss: 1.9030...  0.2184 sec/batch
Epoch: 2/20...  Training Step: 5666...  Training loss: 1.7606...  0.2071 sec/batch
Epoch: 2/20...  Training Step: 5667...  Training loss: 1.6832...  0.2045 sec/batch
Epoc

Epoch: 2/20...  Training Step: 5755...  Training loss: 1.7421...  0.2034 sec/batch
Epoch: 2/20...  Training Step: 5756...  Training loss: 1.7431...  0.2152 sec/batch
Epoch: 2/20...  Training Step: 5757...  Training loss: 1.8320...  0.2363 sec/batch
Epoch: 2/20...  Training Step: 5758...  Training loss: 1.7443...  0.2154 sec/batch
Epoch: 2/20...  Training Step: 5759...  Training loss: 1.8109...  0.3070 sec/batch
Epoch: 2/20...  Training Step: 5760...  Training loss: 1.7320...  0.2693 sec/batch
Epoch: 2/20...  Training Step: 5761...  Training loss: 1.7801...  0.2532 sec/batch
Epoch: 2/20...  Training Step: 5762...  Training loss: 1.9966...  0.2992 sec/batch
Epoch: 2/20...  Training Step: 5763...  Training loss: 1.6959...  0.2143 sec/batch
Epoch: 2/20...  Training Step: 5764...  Training loss: 1.5583...  0.2406 sec/batch
Epoch: 2/20...  Training Step: 5765...  Training loss: 1.6078...  0.2700 sec/batch
Epoch: 2/20...  Training Step: 5766...  Training loss: 1.8169...  0.2112 sec/batch
Epoc

Epoch: 2/20...  Training Step: 5854...  Training loss: 1.8799...  0.2929 sec/batch
Epoch: 2/20...  Training Step: 5855...  Training loss: 1.6758...  0.2624 sec/batch
Epoch: 2/20...  Training Step: 5856...  Training loss: 1.8072...  0.1958 sec/batch
Epoch: 2/20...  Training Step: 5857...  Training loss: 1.8870...  0.2445 sec/batch
Epoch: 2/20...  Training Step: 5858...  Training loss: 1.8685...  0.2337 sec/batch
Epoch: 2/20...  Training Step: 5859...  Training loss: 1.7242...  0.2497 sec/batch
Epoch: 2/20...  Training Step: 5860...  Training loss: 1.7799...  0.2295 sec/batch
Epoch: 2/20...  Training Step: 5861...  Training loss: 1.8197...  0.2135 sec/batch
Epoch: 2/20...  Training Step: 5862...  Training loss: 1.8935...  0.3296 sec/batch
Epoch: 2/20...  Training Step: 5863...  Training loss: 1.7619...  0.2095 sec/batch
Epoch: 2/20...  Training Step: 5864...  Training loss: 1.8111...  0.2921 sec/batch
Epoch: 2/20...  Training Step: 5865...  Training loss: 1.7383...  0.2693 sec/batch
Epoc

Epoch: 2/20...  Training Step: 5953...  Training loss: 1.7158...  0.2287 sec/batch
Epoch: 2/20...  Training Step: 5954...  Training loss: 2.0083...  0.3199 sec/batch
Epoch: 2/20...  Training Step: 5955...  Training loss: 2.0131...  0.1838 sec/batch
Epoch: 2/20...  Training Step: 5956...  Training loss: 1.7732...  0.1970 sec/batch
Epoch: 2/20...  Training Step: 5957...  Training loss: 1.7018...  0.2671 sec/batch
Epoch: 2/20...  Training Step: 5958...  Training loss: 1.8050...  0.2619 sec/batch
Epoch: 2/20...  Training Step: 5959...  Training loss: 1.6793...  0.2388 sec/batch
Epoch: 2/20...  Training Step: 5960...  Training loss: 1.7535...  0.3094 sec/batch
Epoch: 2/20...  Training Step: 5961...  Training loss: 1.6427...  0.2578 sec/batch
Epoch: 2/20...  Training Step: 5962...  Training loss: 1.7903...  0.1827 sec/batch
Epoch: 2/20...  Training Step: 5963...  Training loss: 2.0361...  0.2112 sec/batch
Epoch: 2/20...  Training Step: 5964...  Training loss: 1.9085...  0.1492 sec/batch
Epoc

Epoch: 2/20...  Training Step: 6052...  Training loss: 1.7341...  0.3469 sec/batch
Epoch: 2/20...  Training Step: 6053...  Training loss: 1.7992...  0.3036 sec/batch
Epoch: 2/20...  Training Step: 6054...  Training loss: 1.8008...  0.2084 sec/batch
Epoch: 2/20...  Training Step: 6055...  Training loss: 1.7179...  0.3412 sec/batch
Epoch: 2/20...  Training Step: 6056...  Training loss: 1.7087...  0.3160 sec/batch
Epoch: 2/20...  Training Step: 6057...  Training loss: 1.7530...  0.2587 sec/batch
Epoch: 2/20...  Training Step: 6058...  Training loss: 1.9590...  0.3736 sec/batch
Epoch: 2/20...  Training Step: 6059...  Training loss: 1.7528...  0.2336 sec/batch
Epoch: 2/20...  Training Step: 6060...  Training loss: 1.7359...  0.2256 sec/batch
Epoch: 2/20...  Training Step: 6061...  Training loss: 1.8547...  0.3624 sec/batch
Epoch: 2/20...  Training Step: 6062...  Training loss: 1.7374...  0.2789 sec/batch
Epoch: 2/20...  Training Step: 6063...  Training loss: 1.7952...  0.2751 sec/batch
Epoc

Epoch: 2/20...  Training Step: 6151...  Training loss: 1.7887...  0.2873 sec/batch
Epoch: 2/20...  Training Step: 6152...  Training loss: 2.1184...  0.3090 sec/batch
Epoch: 2/20...  Training Step: 6153...  Training loss: 1.6568...  0.2436 sec/batch
Epoch: 2/20...  Training Step: 6154...  Training loss: 1.8704...  0.1946 sec/batch
Epoch: 2/20...  Training Step: 6155...  Training loss: 1.8508...  0.2050 sec/batch
Epoch: 2/20...  Training Step: 6156...  Training loss: 1.7228...  0.2387 sec/batch
Epoch: 2/20...  Training Step: 6157...  Training loss: 1.8934...  0.3101 sec/batch
Epoch: 2/20...  Training Step: 6158...  Training loss: 1.6221...  0.3635 sec/batch
Epoch: 2/20...  Training Step: 6159...  Training loss: 1.8129...  0.3348 sec/batch
Epoch: 2/20...  Training Step: 6160...  Training loss: 1.7582...  0.2757 sec/batch
Epoch: 2/20...  Training Step: 6161...  Training loss: 1.8426...  0.3028 sec/batch
Epoch: 2/20...  Training Step: 6162...  Training loss: 1.7693...  0.2263 sec/batch
Epoc

Epoch: 2/20...  Training Step: 6250...  Training loss: 1.7281...  0.2688 sec/batch
Epoch: 2/20...  Training Step: 6251...  Training loss: 1.7628...  0.2621 sec/batch
Epoch: 2/20...  Training Step: 6252...  Training loss: 1.7491...  0.3073 sec/batch
Epoch: 2/20...  Training Step: 6253...  Training loss: 1.7786...  0.2879 sec/batch
Epoch: 2/20...  Training Step: 6254...  Training loss: 1.6355...  0.2179 sec/batch
Epoch: 2/20...  Training Step: 6255...  Training loss: 1.6455...  0.2678 sec/batch
Epoch: 2/20...  Training Step: 6256...  Training loss: 1.6814...  0.2609 sec/batch
Epoch: 2/20...  Training Step: 6257...  Training loss: 1.6683...  0.2719 sec/batch
Epoch: 2/20...  Training Step: 6258...  Training loss: 1.6914...  0.2590 sec/batch
Epoch: 2/20...  Training Step: 6259...  Training loss: 1.7058...  0.2085 sec/batch
Epoch: 2/20...  Training Step: 6260...  Training loss: 1.7691...  0.1736 sec/batch
Epoch: 2/20...  Training Step: 6261...  Training loss: 1.6549...  0.1877 sec/batch
Epoc

Epoch: 2/20...  Training Step: 6350...  Training loss: 1.9183...  0.2304 sec/batch
Epoch: 2/20...  Training Step: 6351...  Training loss: 1.7366...  0.2728 sec/batch
Epoch: 2/20...  Training Step: 6352...  Training loss: 1.8488...  0.2814 sec/batch
Epoch: 2/20...  Training Step: 6353...  Training loss: 1.8908...  0.2583 sec/batch
Epoch: 2/20...  Training Step: 6354...  Training loss: 1.7758...  0.2729 sec/batch
Epoch: 2/20...  Training Step: 6355...  Training loss: 1.8109...  0.2806 sec/batch
Epoch: 2/20...  Training Step: 6356...  Training loss: 1.6676...  0.3009 sec/batch
Epoch: 2/20...  Training Step: 6357...  Training loss: 1.6394...  0.3245 sec/batch
Epoch: 2/20...  Training Step: 6358...  Training loss: 1.8985...  0.2201 sec/batch
Epoch: 2/20...  Training Step: 6359...  Training loss: 1.7982...  0.2309 sec/batch
Epoch: 2/20...  Training Step: 6360...  Training loss: 1.8189...  0.2268 sec/batch
Epoch: 2/20...  Training Step: 6361...  Training loss: 1.8611...  0.2780 sec/batch
Epoc

Epoch: 2/20...  Training Step: 6449...  Training loss: 1.8252...  0.2609 sec/batch
Epoch: 2/20...  Training Step: 6450...  Training loss: 1.8008...  0.2788 sec/batch
Epoch: 2/20...  Training Step: 6451...  Training loss: 1.6660...  0.3424 sec/batch
Epoch: 2/20...  Training Step: 6452...  Training loss: 1.7612...  0.2630 sec/batch
Epoch: 2/20...  Training Step: 6453...  Training loss: 1.9641...  0.2544 sec/batch
Epoch: 2/20...  Training Step: 6454...  Training loss: 1.7451...  0.2532 sec/batch
Epoch: 2/20...  Training Step: 6455...  Training loss: 1.7768...  0.1926 sec/batch
Epoch: 2/20...  Training Step: 6456...  Training loss: 1.6599...  0.2709 sec/batch
Epoch: 2/20...  Training Step: 6457...  Training loss: 1.7186...  0.3241 sec/batch
Epoch: 2/20...  Training Step: 6458...  Training loss: 1.6733...  0.2089 sec/batch
Epoch: 2/20...  Training Step: 6459...  Training loss: 1.9090...  0.2211 sec/batch
Epoch: 2/20...  Training Step: 6460...  Training loss: 1.8020...  0.3291 sec/batch
Epoc

Epoch: 2/20...  Training Step: 6548...  Training loss: 1.7941...  0.3052 sec/batch
Epoch: 2/20...  Training Step: 6549...  Training loss: 1.7668...  0.2574 sec/batch
Epoch: 2/20...  Training Step: 6550...  Training loss: 1.8188...  0.2749 sec/batch
Epoch: 2/20...  Training Step: 6551...  Training loss: 1.7589...  0.2806 sec/batch
Epoch: 2/20...  Training Step: 6552...  Training loss: 1.9078...  0.3346 sec/batch
Epoch: 2/20...  Training Step: 6553...  Training loss: 1.7026...  0.2224 sec/batch
Epoch: 2/20...  Training Step: 6554...  Training loss: 1.7351...  0.2290 sec/batch
Epoch: 2/20...  Training Step: 6555...  Training loss: 1.7966...  0.2518 sec/batch
Epoch: 2/20...  Training Step: 6556...  Training loss: 1.9294...  0.3284 sec/batch
Epoch: 2/20...  Training Step: 6557...  Training loss: 1.7816...  0.2975 sec/batch
Epoch: 2/20...  Training Step: 6558...  Training loss: 1.9583...  0.3579 sec/batch
Epoch: 2/20...  Training Step: 6559...  Training loss: 1.7843...  0.3421 sec/batch
Epoc

Epoch: 2/20...  Training Step: 6647...  Training loss: 1.6760...  0.2767 sec/batch
Epoch: 2/20...  Training Step: 6648...  Training loss: 1.8027...  0.1645 sec/batch
Epoch: 2/20...  Training Step: 6649...  Training loss: 1.8141...  0.2267 sec/batch
Epoch: 2/20...  Training Step: 6650...  Training loss: 1.6646...  0.2357 sec/batch
Epoch: 2/20...  Training Step: 6651...  Training loss: 1.8187...  0.2249 sec/batch
Epoch: 2/20...  Training Step: 6652...  Training loss: 1.7038...  0.1990 sec/batch
Epoch: 2/20...  Training Step: 6653...  Training loss: 1.8026...  0.2460 sec/batch
Epoch: 2/20...  Training Step: 6654...  Training loss: 1.7332...  0.2644 sec/batch
Epoch: 2/20...  Training Step: 6655...  Training loss: 1.7499...  0.2124 sec/batch
Epoch: 2/20...  Training Step: 6656...  Training loss: 1.6828...  0.2240 sec/batch
Epoch: 2/20...  Training Step: 6657...  Training loss: 1.7373...  0.3126 sec/batch
Epoch: 2/20...  Training Step: 6658...  Training loss: 1.7338...  0.2649 sec/batch
Epoc

Epoch: 2/20...  Training Step: 6746...  Training loss: 1.7667...  0.2150 sec/batch
Epoch: 2/20...  Training Step: 6747...  Training loss: 1.7830...  0.2811 sec/batch
Epoch: 2/20...  Training Step: 6748...  Training loss: 1.6415...  0.2919 sec/batch
Epoch: 2/20...  Training Step: 6749...  Training loss: 1.6482...  0.2379 sec/batch
Epoch: 2/20...  Training Step: 6750...  Training loss: 1.7113...  0.3240 sec/batch
Epoch: 2/20...  Training Step: 6751...  Training loss: 1.6728...  0.2043 sec/batch
Epoch: 2/20...  Training Step: 6752...  Training loss: 1.9403...  0.2555 sec/batch
Epoch: 2/20...  Training Step: 6753...  Training loss: 1.6756...  0.2401 sec/batch
Epoch: 2/20...  Training Step: 6754...  Training loss: 1.6405...  0.2084 sec/batch
Epoch: 2/20...  Training Step: 6755...  Training loss: 1.6602...  0.2417 sec/batch
Epoch: 2/20...  Training Step: 6756...  Training loss: 1.8536...  0.2579 sec/batch
Epoch: 2/20...  Training Step: 6757...  Training loss: 1.6173...  0.2424 sec/batch
Epoc

Epoch: 2/20...  Training Step: 6845...  Training loss: 1.7840...  0.2536 sec/batch
Epoch: 2/20...  Training Step: 6846...  Training loss: 1.9815...  0.2789 sec/batch
Epoch: 2/20...  Training Step: 6847...  Training loss: 1.7709...  0.2005 sec/batch
Epoch: 2/20...  Training Step: 6848...  Training loss: 1.6985...  0.2712 sec/batch
Epoch: 2/20...  Training Step: 6849...  Training loss: 1.9354...  0.2600 sec/batch
Epoch: 2/20...  Training Step: 6850...  Training loss: 1.7414...  0.2559 sec/batch
Epoch: 2/20...  Training Step: 6851...  Training loss: 1.9080...  0.2540 sec/batch
Epoch: 2/20...  Training Step: 6852...  Training loss: 1.8826...  0.2553 sec/batch
Epoch: 2/20...  Training Step: 6853...  Training loss: 1.8381...  0.2990 sec/batch
Epoch: 2/20...  Training Step: 6854...  Training loss: 1.7151...  0.2616 sec/batch
Epoch: 2/20...  Training Step: 6855...  Training loss: 1.7500...  0.2904 sec/batch
Epoch: 2/20...  Training Step: 6856...  Training loss: 1.5857...  0.2317 sec/batch
Epoc

Epoch: 2/20...  Training Step: 6944...  Training loss: 1.9167...  0.2163 sec/batch
Epoch: 2/20...  Training Step: 6945...  Training loss: 1.8031...  0.2033 sec/batch
Epoch: 2/20...  Training Step: 6946...  Training loss: 1.7411...  0.3189 sec/batch
Epoch: 2/20...  Training Step: 6947...  Training loss: 1.6804...  0.1777 sec/batch
Epoch: 2/20...  Training Step: 6948...  Training loss: 1.8565...  0.3020 sec/batch
Epoch: 2/20...  Training Step: 6949...  Training loss: 1.8270...  0.2425 sec/batch
Epoch: 2/20...  Training Step: 6950...  Training loss: 1.9879...  0.2157 sec/batch
Epoch: 2/20...  Training Step: 6951...  Training loss: 1.9266...  0.2104 sec/batch
Epoch: 2/20...  Training Step: 6952...  Training loss: 1.8747...  0.2180 sec/batch
Epoch: 2/20...  Training Step: 6953...  Training loss: 1.6521...  0.3243 sec/batch
Epoch: 2/20...  Training Step: 6954...  Training loss: 1.6314...  0.3296 sec/batch
Epoch: 2/20...  Training Step: 6955...  Training loss: 1.7444...  0.2806 sec/batch
Epoc

Epoch: 2/20...  Training Step: 7043...  Training loss: 1.8539...  0.2765 sec/batch
Epoch: 2/20...  Training Step: 7044...  Training loss: 1.5496...  0.2948 sec/batch
Epoch: 2/20...  Training Step: 7045...  Training loss: 1.8344...  0.2683 sec/batch
Epoch: 2/20...  Training Step: 7046...  Training loss: 1.7858...  0.3104 sec/batch
Epoch: 2/20...  Training Step: 7047...  Training loss: 1.7628...  0.1883 sec/batch
Epoch: 2/20...  Training Step: 7048...  Training loss: 1.7277...  0.2700 sec/batch
Epoch: 2/20...  Training Step: 7049...  Training loss: 1.7621...  0.2621 sec/batch
Epoch: 2/20...  Training Step: 7050...  Training loss: 1.7578...  0.2868 sec/batch
Epoch: 2/20...  Training Step: 7051...  Training loss: 1.8196...  0.2354 sec/batch
Epoch: 2/20...  Training Step: 7052...  Training loss: 1.6856...  0.1716 sec/batch
Epoch: 2/20...  Training Step: 7053...  Training loss: 1.8385...  0.3102 sec/batch
Epoch: 2/20...  Training Step: 7054...  Training loss: 1.6921...  0.2759 sec/batch
Epoc

Epoch: 2/20...  Training Step: 7142...  Training loss: 1.7909...  0.2229 sec/batch
Epoch: 2/20...  Training Step: 7143...  Training loss: 1.8246...  0.2469 sec/batch
Epoch: 2/20...  Training Step: 7144...  Training loss: 1.6566...  0.2779 sec/batch
Epoch: 2/20...  Training Step: 7145...  Training loss: 1.7496...  0.2864 sec/batch
Epoch: 2/20...  Training Step: 7146...  Training loss: 1.8390...  0.2027 sec/batch
Epoch: 2/20...  Training Step: 7147...  Training loss: 1.9063...  0.2545 sec/batch
Epoch: 2/20...  Training Step: 7148...  Training loss: 1.6880...  0.3048 sec/batch
Epoch: 2/20...  Training Step: 7149...  Training loss: 1.8292...  0.3095 sec/batch
Epoch: 2/20...  Training Step: 7150...  Training loss: 1.6622...  0.3155 sec/batch
Epoch: 2/20...  Training Step: 7151...  Training loss: 1.7761...  0.3544 sec/batch
Epoch: 2/20...  Training Step: 7152...  Training loss: 1.6828...  0.2984 sec/batch
Epoch: 2/20...  Training Step: 7153...  Training loss: 1.7678...  0.2056 sec/batch
Epoc

Epoch: 2/20...  Training Step: 7241...  Training loss: 1.7171...  0.2680 sec/batch
Epoch: 2/20...  Training Step: 7242...  Training loss: 1.7418...  0.3068 sec/batch
Epoch: 2/20...  Training Step: 7243...  Training loss: 1.8049...  0.2428 sec/batch
Epoch: 2/20...  Training Step: 7244...  Training loss: 1.6935...  0.3021 sec/batch
Epoch: 2/20...  Training Step: 7245...  Training loss: 1.6902...  0.3134 sec/batch
Epoch: 2/20...  Training Step: 7246...  Training loss: 1.8206...  0.2959 sec/batch
Epoch: 2/20...  Training Step: 7247...  Training loss: 1.7486...  0.3023 sec/batch
Epoch: 2/20...  Training Step: 7248...  Training loss: 1.7488...  0.2428 sec/batch
Epoch: 2/20...  Training Step: 7249...  Training loss: 1.7317...  0.2328 sec/batch
Epoch: 2/20...  Training Step: 7250...  Training loss: 1.6406...  0.3199 sec/batch
Epoch: 2/20...  Training Step: 7251...  Training loss: 1.7435...  0.3175 sec/batch
Epoch: 2/20...  Training Step: 7252...  Training loss: 1.6188...  0.3074 sec/batch
Epoc

Epoch: 2/20...  Training Step: 7340...  Training loss: 1.7159...  0.3102 sec/batch
Epoch: 2/20...  Training Step: 7341...  Training loss: 1.8220...  0.2210 sec/batch
Epoch: 2/20...  Training Step: 7342...  Training loss: 1.8369...  0.2687 sec/batch
Epoch: 2/20...  Training Step: 7343...  Training loss: 1.7743...  0.2943 sec/batch
Epoch: 2/20...  Training Step: 7344...  Training loss: 1.7517...  0.2615 sec/batch
Epoch: 2/20...  Training Step: 7345...  Training loss: 1.8492...  0.2877 sec/batch
Epoch: 2/20...  Training Step: 7346...  Training loss: 1.7666...  0.3142 sec/batch
Epoch: 2/20...  Training Step: 7347...  Training loss: 1.9128...  0.3301 sec/batch
Epoch: 2/20...  Training Step: 7348...  Training loss: 1.6282...  0.3104 sec/batch
Epoch: 2/20...  Training Step: 7349...  Training loss: 1.6623...  0.2963 sec/batch
Epoch: 2/20...  Training Step: 7350...  Training loss: 1.6651...  0.2819 sec/batch
Epoch: 2/20...  Training Step: 7351...  Training loss: 1.6593...  0.2052 sec/batch
Epoc

Epoch: 2/20...  Training Step: 7439...  Training loss: 1.8752...  0.3201 sec/batch
Epoch: 2/20...  Training Step: 7440...  Training loss: 1.7926...  0.2742 sec/batch
Epoch: 2/20...  Training Step: 7441...  Training loss: 1.7579...  0.2335 sec/batch
Epoch: 2/20...  Training Step: 7442...  Training loss: 1.7236...  0.2561 sec/batch
Epoch: 2/20...  Training Step: 7443...  Training loss: 1.8315...  0.3120 sec/batch
Epoch: 2/20...  Training Step: 7444...  Training loss: 1.7491...  0.2657 sec/batch
Epoch: 2/20...  Training Step: 7445...  Training loss: 1.7725...  0.1647 sec/batch
Epoch: 2/20...  Training Step: 7446...  Training loss: 1.6483...  0.2481 sec/batch
Epoch: 2/20...  Training Step: 7447...  Training loss: 1.8723...  0.2946 sec/batch
Epoch: 2/20...  Training Step: 7448...  Training loss: 1.8474...  0.2110 sec/batch
Epoch: 2/20...  Training Step: 7449...  Training loss: 1.7155...  0.2325 sec/batch
Epoch: 2/20...  Training Step: 7450...  Training loss: 1.5664...  0.2665 sec/batch
Epoc

Epoch: 2/20...  Training Step: 7538...  Training loss: 1.6782...  0.2722 sec/batch
Epoch: 2/20...  Training Step: 7539...  Training loss: 1.7839...  0.2225 sec/batch
Epoch: 2/20...  Training Step: 7540...  Training loss: 1.5317...  0.2695 sec/batch
Epoch: 2/20...  Training Step: 7541...  Training loss: 1.7652...  0.3384 sec/batch
Epoch: 2/20...  Training Step: 7542...  Training loss: 1.8660...  361.2972 sec/batch
Epoch: 2/20...  Training Step: 7543...  Training loss: 1.7501...  0.2280 sec/batch
Epoch: 2/20...  Training Step: 7544...  Training loss: 1.6776...  0.3430 sec/batch
Epoch: 2/20...  Training Step: 7545...  Training loss: 1.5902...  0.5636 sec/batch
Epoch: 2/20...  Training Step: 7546...  Training loss: 1.6648...  0.2458 sec/batch
Epoch: 2/20...  Training Step: 7547...  Training loss: 1.7597...  0.3334 sec/batch
Epoch: 2/20...  Training Step: 7548...  Training loss: 1.6776...  0.3177 sec/batch
Epoch: 2/20...  Training Step: 7549...  Training loss: 1.7039...  0.3504 sec/batch
Ep

Epoch: 2/20...  Training Step: 7637...  Training loss: 1.6701...  0.2768 sec/batch
Epoch: 2/20...  Training Step: 7638...  Training loss: 1.9079...  0.2171 sec/batch
Epoch: 2/20...  Training Step: 7639...  Training loss: 1.8428...  0.2006 sec/batch
Epoch: 2/20...  Training Step: 7640...  Training loss: 1.7732...  0.2876 sec/batch
Epoch: 2/20...  Training Step: 7641...  Training loss: 1.7362...  0.2230 sec/batch
Epoch: 2/20...  Training Step: 7642...  Training loss: 1.8309...  0.2556 sec/batch
Epoch: 2/20...  Training Step: 7643...  Training loss: 1.7826...  0.2697 sec/batch
Epoch: 2/20...  Training Step: 7644...  Training loss: 1.8466...  0.2669 sec/batch
Epoch: 2/20...  Training Step: 7645...  Training loss: 1.7316...  0.2691 sec/batch
Epoch: 2/20...  Training Step: 7646...  Training loss: 1.9898...  0.2400 sec/batch
Epoch: 2/20...  Training Step: 7647...  Training loss: 1.8961...  0.2660 sec/batch
Epoch: 2/20...  Training Step: 7648...  Training loss: 1.8029...  0.2253 sec/batch
Epoc

Epoch: 2/20...  Training Step: 7737...  Training loss: 1.7613...  0.2525 sec/batch
Epoch: 2/20...  Training Step: 7738...  Training loss: 1.8860...  0.2404 sec/batch
Epoch: 2/20...  Training Step: 7739...  Training loss: 1.9186...  0.2588 sec/batch
Epoch: 2/20...  Training Step: 7740...  Training loss: 1.7850...  0.2714 sec/batch
Epoch: 2/20...  Training Step: 7741...  Training loss: 1.8225...  0.2361 sec/batch
Epoch: 2/20...  Training Step: 7742...  Training loss: 1.7446...  0.3026 sec/batch
Epoch: 2/20...  Training Step: 7743...  Training loss: 1.8585...  0.2814 sec/batch
Epoch: 2/20...  Training Step: 7744...  Training loss: 1.7976...  0.2277 sec/batch
Epoch: 2/20...  Training Step: 7745...  Training loss: 1.8662...  0.3153 sec/batch
Epoch: 2/20...  Training Step: 7746...  Training loss: 1.7886...  0.2987 sec/batch
Epoch: 2/20...  Training Step: 7747...  Training loss: 1.8029...  0.2850 sec/batch
Epoch: 2/20...  Training Step: 7748...  Training loss: 1.8184...  0.2701 sec/batch
Epoc

Epoch: 2/20...  Training Step: 7836...  Training loss: 1.8059...  0.2451 sec/batch
Epoch: 2/20...  Training Step: 7837...  Training loss: 1.8592...  0.3030 sec/batch
Epoch: 2/20...  Training Step: 7838...  Training loss: 1.7396...  0.2534 sec/batch
Epoch: 2/20...  Training Step: 7839...  Training loss: 1.8316...  0.3150 sec/batch
Epoch: 2/20...  Training Step: 7840...  Training loss: 1.9721...  0.2553 sec/batch
Epoch: 2/20...  Training Step: 7841...  Training loss: 1.8009...  0.3015 sec/batch
Epoch: 2/20...  Training Step: 7842...  Training loss: 1.8814...  0.2813 sec/batch
Epoch: 2/20...  Training Step: 7843...  Training loss: 1.8324...  0.1999 sec/batch
Epoch: 2/20...  Training Step: 7844...  Training loss: 1.7877...  0.3146 sec/batch
Epoch: 2/20...  Training Step: 7845...  Training loss: 1.7227...  0.1993 sec/batch
Epoch: 2/20...  Training Step: 7846...  Training loss: 1.8270...  0.2386 sec/batch
Epoch: 2/20...  Training Step: 7847...  Training loss: 1.8826...  0.2484 sec/batch
Epoc

Epoch: 2/20...  Training Step: 7935...  Training loss: 1.8136...  0.3039 sec/batch
Epoch: 2/20...  Training Step: 7936...  Training loss: 1.8173...  0.3120 sec/batch
Epoch: 2/20...  Training Step: 7937...  Training loss: 1.7595...  0.2890 sec/batch
Epoch: 2/20...  Training Step: 7938...  Training loss: 1.8347...  0.2302 sec/batch
Epoch: 2/20...  Training Step: 7939...  Training loss: 1.9207...  0.2579 sec/batch
Epoch: 2/20...  Training Step: 7940...  Training loss: 2.0188...  0.2250 sec/batch
Epoch: 3/20...  Training Step: 7941...  Training loss: 1.7537...  0.2426 sec/batch
Epoch: 3/20...  Training Step: 7942...  Training loss: 1.7635...  0.2681 sec/batch
Epoch: 3/20...  Training Step: 7943...  Training loss: 1.7041...  0.2703 sec/batch
Epoch: 3/20...  Training Step: 7944...  Training loss: 1.6588...  0.2785 sec/batch
Epoch: 3/20...  Training Step: 7945...  Training loss: 1.8302...  0.2470 sec/batch
Epoch: 3/20...  Training Step: 7946...  Training loss: 1.7346...  0.2784 sec/batch
Epoc

Epoch: 3/20...  Training Step: 8034...  Training loss: 1.7988...  0.2687 sec/batch
Epoch: 3/20...  Training Step: 8035...  Training loss: 1.6867...  0.3232 sec/batch
Epoch: 3/20...  Training Step: 8036...  Training loss: 1.7180...  0.3059 sec/batch
Epoch: 3/20...  Training Step: 8037...  Training loss: 1.8013...  0.2830 sec/batch
Epoch: 3/20...  Training Step: 8038...  Training loss: 1.7751...  0.2833 sec/batch
Epoch: 3/20...  Training Step: 8039...  Training loss: 1.8012...  0.2784 sec/batch
Epoch: 3/20...  Training Step: 8040...  Training loss: 1.7551...  0.2598 sec/batch
Epoch: 3/20...  Training Step: 8041...  Training loss: 1.8839...  0.2597 sec/batch
Epoch: 3/20...  Training Step: 8042...  Training loss: 1.7295...  0.3131 sec/batch
Epoch: 3/20...  Training Step: 8043...  Training loss: 1.7784...  0.2840 sec/batch
Epoch: 3/20...  Training Step: 8044...  Training loss: 1.6759...  0.2869 sec/batch
Epoch: 3/20...  Training Step: 8045...  Training loss: 1.7978...  0.3265 sec/batch
Epoc

Epoch: 3/20...  Training Step: 8133...  Training loss: 1.6961...  0.2957 sec/batch
Epoch: 3/20...  Training Step: 8134...  Training loss: 1.8826...  0.2808 sec/batch
Epoch: 3/20...  Training Step: 8135...  Training loss: 1.7577...  0.3340 sec/batch
Epoch: 3/20...  Training Step: 8136...  Training loss: 1.6541...  0.3379 sec/batch
Epoch: 3/20...  Training Step: 8137...  Training loss: 1.6061...  0.2632 sec/batch
Epoch: 3/20...  Training Step: 8138...  Training loss: 1.8586...  0.3131 sec/batch
Epoch: 3/20...  Training Step: 8139...  Training loss: 1.7404...  0.3117 sec/batch
Epoch: 3/20...  Training Step: 8140...  Training loss: 1.8886...  0.2697 sec/batch
Epoch: 3/20...  Training Step: 8141...  Training loss: 1.6300...  0.3134 sec/batch
Epoch: 3/20...  Training Step: 8142...  Training loss: 1.8348...  0.2979 sec/batch
Epoch: 3/20...  Training Step: 8143...  Training loss: 1.7433...  0.2662 sec/batch
Epoch: 3/20...  Training Step: 8144...  Training loss: 1.8168...  0.2097 sec/batch
Epoc

Epoch: 3/20...  Training Step: 8232...  Training loss: 1.6793...  0.5442 sec/batch
Epoch: 3/20...  Training Step: 8233...  Training loss: 1.7061...  0.2548 sec/batch
Epoch: 3/20...  Training Step: 8234...  Training loss: 1.6205...  0.3433 sec/batch
Epoch: 3/20...  Training Step: 8235...  Training loss: 1.8210...  0.2911 sec/batch
Epoch: 3/20...  Training Step: 8236...  Training loss: 1.7386...  0.2900 sec/batch
Epoch: 3/20...  Training Step: 8237...  Training loss: 1.6405...  0.3204 sec/batch
Epoch: 3/20...  Training Step: 8238...  Training loss: 1.7786...  0.2876 sec/batch
Epoch: 3/20...  Training Step: 8239...  Training loss: 1.6439...  0.2647 sec/batch
Epoch: 3/20...  Training Step: 8240...  Training loss: 1.7466...  0.3008 sec/batch
Epoch: 3/20...  Training Step: 8241...  Training loss: 1.7945...  0.2337 sec/batch
Epoch: 3/20...  Training Step: 8242...  Training loss: 1.7381...  0.2362 sec/batch
Epoch: 3/20...  Training Step: 8243...  Training loss: 1.7196...  0.2950 sec/batch
Epoc

Epoch: 3/20...  Training Step: 8331...  Training loss: 1.7220...  0.3057 sec/batch
Epoch: 3/20...  Training Step: 8332...  Training loss: 1.8886...  0.2474 sec/batch
Epoch: 3/20...  Training Step: 8333...  Training loss: 1.7023...  0.2974 sec/batch
Epoch: 3/20...  Training Step: 8334...  Training loss: 1.7373...  0.3150 sec/batch
Epoch: 3/20...  Training Step: 8335...  Training loss: 1.8062...  0.2147 sec/batch
Epoch: 3/20...  Training Step: 8336...  Training loss: 1.7573...  0.2796 sec/batch
Epoch: 3/20...  Training Step: 8337...  Training loss: 1.7533...  0.2720 sec/batch
Epoch: 3/20...  Training Step: 8338...  Training loss: 1.7711...  0.2929 sec/batch
Epoch: 3/20...  Training Step: 8339...  Training loss: 1.7757...  0.3048 sec/batch
Epoch: 3/20...  Training Step: 8340...  Training loss: 1.7571...  0.2134 sec/batch
Epoch: 3/20...  Training Step: 8341...  Training loss: 1.7199...  0.2791 sec/batch
Epoch: 3/20...  Training Step: 8342...  Training loss: 1.6971...  0.3174 sec/batch
Epoc

Epoch: 3/20...  Training Step: 8430...  Training loss: 1.7828...  0.2237 sec/batch
Epoch: 3/20...  Training Step: 8431...  Training loss: 1.7406...  0.2601 sec/batch
Epoch: 3/20...  Training Step: 8432...  Training loss: 1.8163...  0.3142 sec/batch
Epoch: 3/20...  Training Step: 8433...  Training loss: 1.9283...  0.2713 sec/batch
Epoch: 3/20...  Training Step: 8434...  Training loss: 1.8983...  0.1908 sec/batch
Epoch: 3/20...  Training Step: 8435...  Training loss: 1.7036...  0.2239 sec/batch
Epoch: 3/20...  Training Step: 8436...  Training loss: 1.6815...  0.2359 sec/batch
Epoch: 3/20...  Training Step: 8437...  Training loss: 1.6031...  0.2439 sec/batch
Epoch: 3/20...  Training Step: 8438...  Training loss: 1.7685...  0.3228 sec/batch
Epoch: 3/20...  Training Step: 8439...  Training loss: 1.6819...  0.2376 sec/batch
Epoch: 3/20...  Training Step: 8440...  Training loss: 1.8900...  0.2247 sec/batch
Epoch: 3/20...  Training Step: 8441...  Training loss: 1.6946...  0.3147 sec/batch
Epoc

Epoch: 3/20...  Training Step: 8529...  Training loss: 2.0936...  0.2345 sec/batch
Epoch: 3/20...  Training Step: 8530...  Training loss: 1.8376...  0.2976 sec/batch
Epoch: 3/20...  Training Step: 8531...  Training loss: 1.8818...  0.2992 sec/batch
Epoch: 3/20...  Training Step: 8532...  Training loss: 1.6955...  0.2391 sec/batch
Epoch: 3/20...  Training Step: 8533...  Training loss: 1.7246...  0.2820 sec/batch
Epoch: 3/20...  Training Step: 8534...  Training loss: 1.8885...  0.3176 sec/batch
Epoch: 3/20...  Training Step: 8535...  Training loss: 1.8972...  0.3321 sec/batch
Epoch: 3/20...  Training Step: 8536...  Training loss: 1.8306...  0.2823 sec/batch
Epoch: 3/20...  Training Step: 8537...  Training loss: 1.6234...  0.3000 sec/batch
Epoch: 3/20...  Training Step: 8538...  Training loss: 1.7799...  0.2260 sec/batch
Epoch: 3/20...  Training Step: 8539...  Training loss: 1.8079...  0.2889 sec/batch
Epoch: 3/20...  Training Step: 8540...  Training loss: 1.6492...  0.2597 sec/batch
Epoc

Epoch: 3/20...  Training Step: 8628...  Training loss: 1.6075...  0.2262 sec/batch
Epoch: 3/20...  Training Step: 8629...  Training loss: 1.6648...  0.1728 sec/batch
Epoch: 3/20...  Training Step: 8630...  Training loss: 1.6811...  0.2478 sec/batch
Epoch: 3/20...  Training Step: 8631...  Training loss: 1.8175...  0.3074 sec/batch
Epoch: 3/20...  Training Step: 8632...  Training loss: 1.8798...  0.2234 sec/batch
Epoch: 3/20...  Training Step: 8633...  Training loss: 1.8095...  0.2908 sec/batch
Epoch: 3/20...  Training Step: 8634...  Training loss: 1.8122...  0.3077 sec/batch
Epoch: 3/20...  Training Step: 8635...  Training loss: 1.6799...  0.2418 sec/batch
Epoch: 3/20...  Training Step: 8636...  Training loss: 1.8507...  0.2175 sec/batch
Epoch: 3/20...  Training Step: 8637...  Training loss: 1.8138...  0.2618 sec/batch
Epoch: 3/20...  Training Step: 8638...  Training loss: 1.8222...  0.2686 sec/batch
Epoch: 3/20...  Training Step: 8639...  Training loss: 1.8066...  0.1843 sec/batch
Epoc

Epoch: 3/20...  Training Step: 8727...  Training loss: 1.7585...  0.2884 sec/batch
Epoch: 3/20...  Training Step: 8728...  Training loss: 1.5438...  0.3352 sec/batch
Epoch: 3/20...  Training Step: 8729...  Training loss: 1.7849...  0.3528 sec/batch
Epoch: 3/20...  Training Step: 8730...  Training loss: 1.7208...  0.3156 sec/batch
Epoch: 3/20...  Training Step: 8731...  Training loss: 1.7697...  0.2747 sec/batch
Epoch: 3/20...  Training Step: 8732...  Training loss: 1.5532...  0.3183 sec/batch
Epoch: 3/20...  Training Step: 8733...  Training loss: 1.7869...  0.2390 sec/batch
Epoch: 3/20...  Training Step: 8734...  Training loss: 1.7565...  0.3277 sec/batch
Epoch: 3/20...  Training Step: 8735...  Training loss: 1.6199...  0.3240 sec/batch
Epoch: 3/20...  Training Step: 8736...  Training loss: 1.6291...  0.2663 sec/batch
Epoch: 3/20...  Training Step: 8737...  Training loss: 1.7810...  0.2406 sec/batch
Epoch: 3/20...  Training Step: 8738...  Training loss: 1.6415...  0.2587 sec/batch
Epoc

Epoch: 3/20...  Training Step: 8826...  Training loss: 1.7447...  0.2409 sec/batch
Epoch: 3/20...  Training Step: 8827...  Training loss: 1.7765...  0.2883 sec/batch
Epoch: 3/20...  Training Step: 8828...  Training loss: 1.7307...  0.2823 sec/batch
Epoch: 3/20...  Training Step: 8829...  Training loss: 1.6863...  0.2960 sec/batch
Epoch: 3/20...  Training Step: 8830...  Training loss: 1.7862...  0.3258 sec/batch
Epoch: 3/20...  Training Step: 8831...  Training loss: 1.7560...  0.3154 sec/batch
Epoch: 3/20...  Training Step: 8832...  Training loss: 1.7679...  0.2984 sec/batch
Epoch: 3/20...  Training Step: 8833...  Training loss: 1.9145...  0.2986 sec/batch
Epoch: 3/20...  Training Step: 8834...  Training loss: 1.7531...  0.2682 sec/batch
Epoch: 3/20...  Training Step: 8835...  Training loss: 1.7517...  0.2748 sec/batch
Epoch: 3/20...  Training Step: 8836...  Training loss: 1.7797...  0.2977 sec/batch
Epoch: 3/20...  Training Step: 8837...  Training loss: 1.9440...  0.2403 sec/batch
Epoc

Epoch: 3/20...  Training Step: 8925...  Training loss: 1.6415...  0.2384 sec/batch
Epoch: 3/20...  Training Step: 8926...  Training loss: 1.6752...  0.2347 sec/batch
Epoch: 3/20...  Training Step: 8927...  Training loss: 1.6608...  0.2859 sec/batch
Epoch: 3/20...  Training Step: 8928...  Training loss: 1.6939...  0.2245 sec/batch
Epoch: 3/20...  Training Step: 8929...  Training loss: 1.7278...  0.2595 sec/batch
Epoch: 3/20...  Training Step: 8930...  Training loss: 1.5836...  0.2451 sec/batch
Epoch: 3/20...  Training Step: 8931...  Training loss: 1.5775...  0.3044 sec/batch
Epoch: 3/20...  Training Step: 8932...  Training loss: 1.7199...  0.2064 sec/batch
Epoch: 3/20...  Training Step: 8933...  Training loss: 1.7595...  0.2351 sec/batch
Epoch: 3/20...  Training Step: 8934...  Training loss: 1.7800...  0.2908 sec/batch
Epoch: 3/20...  Training Step: 8935...  Training loss: 1.7655...  0.2810 sec/batch
Epoch: 3/20...  Training Step: 8936...  Training loss: 1.7884...  0.2052 sec/batch
Epoc

Epoch: 3/20...  Training Step: 9024...  Training loss: 1.4790...  0.3017 sec/batch
Epoch: 3/20...  Training Step: 9025...  Training loss: 1.6218...  0.2760 sec/batch
Epoch: 3/20...  Training Step: 9026...  Training loss: 1.7149...  0.2788 sec/batch
Epoch: 3/20...  Training Step: 9027...  Training loss: 1.6278...  0.1892 sec/batch
Epoch: 3/20...  Training Step: 9028...  Training loss: 1.5350...  0.2881 sec/batch
Epoch: 3/20...  Training Step: 9029...  Training loss: 1.7026...  0.2855 sec/batch
Epoch: 3/20...  Training Step: 9030...  Training loss: 1.9637...  0.2581 sec/batch
Epoch: 3/20...  Training Step: 9031...  Training loss: 1.7733...  0.3189 sec/batch
Epoch: 3/20...  Training Step: 9032...  Training loss: 1.8449...  0.2910 sec/batch
Epoch: 3/20...  Training Step: 9033...  Training loss: 1.6782...  0.2895 sec/batch
Epoch: 3/20...  Training Step: 9034...  Training loss: 1.7593...  0.2974 sec/batch
Epoch: 3/20...  Training Step: 9035...  Training loss: 1.6942...  0.2753 sec/batch
Epoc

Epoch: 3/20...  Training Step: 9123...  Training loss: 1.7008...  0.2958 sec/batch
Epoch: 3/20...  Training Step: 9124...  Training loss: 1.8211...  0.2982 sec/batch
Epoch: 3/20...  Training Step: 9125...  Training loss: 1.7091...  0.2630 sec/batch
Epoch: 3/20...  Training Step: 9126...  Training loss: 1.7110...  0.2654 sec/batch
Epoch: 3/20...  Training Step: 9127...  Training loss: 1.7891...  0.2867 sec/batch
Epoch: 3/20...  Training Step: 9128...  Training loss: 1.8521...  0.2700 sec/batch
Epoch: 3/20...  Training Step: 9129...  Training loss: 1.6402...  0.2068 sec/batch
Epoch: 3/20...  Training Step: 9130...  Training loss: 1.7594...  0.3122 sec/batch
Epoch: 3/20...  Training Step: 9131...  Training loss: 1.7284...  0.2756 sec/batch
Epoch: 3/20...  Training Step: 9132...  Training loss: 1.7360...  0.3090 sec/batch
Epoch: 3/20...  Training Step: 9133...  Training loss: 1.7313...  0.2725 sec/batch
Epoch: 3/20...  Training Step: 9134...  Training loss: 1.7654...  0.2517 sec/batch
Epoc

Epoch: 3/20...  Training Step: 9222...  Training loss: 1.8490...  0.3126 sec/batch
Epoch: 3/20...  Training Step: 9223...  Training loss: 1.7539...  0.3089 sec/batch
Epoch: 3/20...  Training Step: 9224...  Training loss: 1.8873...  0.2513 sec/batch
Epoch: 3/20...  Training Step: 9225...  Training loss: 1.9860...  0.2275 sec/batch
Epoch: 3/20...  Training Step: 9226...  Training loss: 1.6631...  0.2647 sec/batch
Epoch: 3/20...  Training Step: 9227...  Training loss: 1.8168...  0.3180 sec/batch
Epoch: 3/20...  Training Step: 9228...  Training loss: 1.6385...  0.3259 sec/batch
Epoch: 3/20...  Training Step: 9229...  Training loss: 1.7352...  0.2626 sec/batch
Epoch: 3/20...  Training Step: 9230...  Training loss: 1.8573...  0.2793 sec/batch
Epoch: 3/20...  Training Step: 9231...  Training loss: 1.8511...  0.2710 sec/batch
Epoch: 3/20...  Training Step: 9232...  Training loss: 1.9148...  0.2647 sec/batch
Epoch: 3/20...  Training Step: 9233...  Training loss: 1.7783...  0.2713 sec/batch
Epoc

Epoch: 3/20...  Training Step: 9321...  Training loss: 1.8779...  0.2708 sec/batch
Epoch: 3/20...  Training Step: 9322...  Training loss: 1.7969...  0.2932 sec/batch
Epoch: 3/20...  Training Step: 9323...  Training loss: 1.8336...  0.2730 sec/batch
Epoch: 3/20...  Training Step: 9324...  Training loss: 1.7645...  0.2934 sec/batch
Epoch: 3/20...  Training Step: 9325...  Training loss: 1.7689...  0.3144 sec/batch
Epoch: 3/20...  Training Step: 9326...  Training loss: 2.0412...  0.2901 sec/batch
Epoch: 3/20...  Training Step: 9327...  Training loss: 1.7078...  0.2770 sec/batch
Epoch: 3/20...  Training Step: 9328...  Training loss: 1.7518...  0.2560 sec/batch
Epoch: 3/20...  Training Step: 9329...  Training loss: 1.7643...  0.3086 sec/batch
Epoch: 3/20...  Training Step: 9330...  Training loss: 1.7365...  0.2440 sec/batch
Epoch: 3/20...  Training Step: 9331...  Training loss: 1.7110...  0.2355 sec/batch
Epoch: 3/20...  Training Step: 9332...  Training loss: 1.7238...  0.2850 sec/batch
Epoc

Epoch: 3/20...  Training Step: 9420...  Training loss: 1.7930...  0.2748 sec/batch
Epoch: 3/20...  Training Step: 9421...  Training loss: 1.7125...  0.2646 sec/batch
Epoch: 3/20...  Training Step: 9422...  Training loss: 1.6610...  0.3139 sec/batch
Epoch: 3/20...  Training Step: 9423...  Training loss: 1.6588...  0.3018 sec/batch
Epoch: 3/20...  Training Step: 9424...  Training loss: 1.8934...  0.2352 sec/batch
Epoch: 3/20...  Training Step: 9425...  Training loss: 1.7112...  0.2670 sec/batch
Epoch: 3/20...  Training Step: 9426...  Training loss: 1.8804...  0.2910 sec/batch
Epoch: 3/20...  Training Step: 9427...  Training loss: 1.7856...  0.2759 sec/batch
Epoch: 3/20...  Training Step: 9428...  Training loss: 1.7912...  0.2699 sec/batch
Epoch: 3/20...  Training Step: 9429...  Training loss: 1.8291...  0.2961 sec/batch
Epoch: 3/20...  Training Step: 9430...  Training loss: 1.7392...  0.3333 sec/batch
Epoch: 3/20...  Training Step: 9431...  Training loss: 1.8392...  0.3260 sec/batch
Epoc

Epoch: 3/20...  Training Step: 9519...  Training loss: 1.8186...  0.2537 sec/batch
Epoch: 3/20...  Training Step: 9520...  Training loss: 1.7438...  0.2595 sec/batch
Epoch: 3/20...  Training Step: 9521...  Training loss: 1.9301...  0.2888 sec/batch
Epoch: 3/20...  Training Step: 9522...  Training loss: 1.8259...  0.2566 sec/batch
Epoch: 3/20...  Training Step: 9523...  Training loss: 1.7577...  0.2918 sec/batch
Epoch: 3/20...  Training Step: 9524...  Training loss: 1.6586...  0.1994 sec/batch
Epoch: 3/20...  Training Step: 9525...  Training loss: 1.8482...  0.3004 sec/batch
Epoch: 3/20...  Training Step: 9526...  Training loss: 1.6753...  0.3193 sec/batch
Epoch: 3/20...  Training Step: 9527...  Training loss: 1.6940...  0.3050 sec/batch
Epoch: 3/20...  Training Step: 9528...  Training loss: 1.6770...  0.2809 sec/batch
Epoch: 3/20...  Training Step: 9529...  Training loss: 1.7986...  0.2925 sec/batch
Epoch: 3/20...  Training Step: 9530...  Training loss: 1.7624...  0.2871 sec/batch
Epoc

Epoch: 3/20...  Training Step: 9618...  Training loss: 1.8385...  0.2609 sec/batch
Epoch: 3/20...  Training Step: 9619...  Training loss: 1.7845...  0.2980 sec/batch
Epoch: 3/20...  Training Step: 9620...  Training loss: 1.6892...  0.3203 sec/batch
Epoch: 3/20...  Training Step: 9621...  Training loss: 1.8119...  0.2733 sec/batch
Epoch: 3/20...  Training Step: 9622...  Training loss: 1.7288...  0.1942 sec/batch
Epoch: 3/20...  Training Step: 9623...  Training loss: 1.8126...  0.2355 sec/batch
Epoch: 3/20...  Training Step: 9624...  Training loss: 1.7278...  0.3057 sec/batch
Epoch: 3/20...  Training Step: 9625...  Training loss: 1.7057...  0.2173 sec/batch
Epoch: 3/20...  Training Step: 9626...  Training loss: 1.6903...  0.2675 sec/batch
Epoch: 3/20...  Training Step: 9627...  Training loss: 1.7370...  0.2376 sec/batch
Epoch: 3/20...  Training Step: 9628...  Training loss: 1.8699...  0.2532 sec/batch
Epoch: 3/20...  Training Step: 9629...  Training loss: 1.8395...  0.2890 sec/batch
Epoc

Epoch: 3/20...  Training Step: 9717...  Training loss: 1.7234...  0.2960 sec/batch
Epoch: 3/20...  Training Step: 9718...  Training loss: 1.6395...  0.2664 sec/batch
Epoch: 3/20...  Training Step: 9719...  Training loss: 1.6512...  0.2271 sec/batch
Epoch: 3/20...  Training Step: 9720...  Training loss: 1.7942...  0.3023 sec/batch
Epoch: 3/20...  Training Step: 9721...  Training loss: 1.7482...  0.2685 sec/batch
Epoch: 3/20...  Training Step: 9722...  Training loss: 1.6379...  0.2778 sec/batch
Epoch: 3/20...  Training Step: 9723...  Training loss: 1.7866...  0.2284 sec/batch
Epoch: 3/20...  Training Step: 9724...  Training loss: 1.6892...  0.3121 sec/batch
Epoch: 3/20...  Training Step: 9725...  Training loss: 1.7385...  0.3343 sec/batch
Epoch: 3/20...  Training Step: 9726...  Training loss: 1.7501...  0.2740 sec/batch
Epoch: 3/20...  Training Step: 9727...  Training loss: 1.7473...  0.2928 sec/batch
Epoch: 3/20...  Training Step: 9728...  Training loss: 1.7077...  0.2178 sec/batch
Epoc

Epoch: 3/20...  Training Step: 9816...  Training loss: 1.8948...  0.2724 sec/batch
Epoch: 3/20...  Training Step: 9817...  Training loss: 1.7180...  0.3082 sec/batch
Epoch: 3/20...  Training Step: 9818...  Training loss: 1.7562...  0.2836 sec/batch
Epoch: 3/20...  Training Step: 9819...  Training loss: 1.6407...  0.2405 sec/batch
Epoch: 3/20...  Training Step: 9820...  Training loss: 1.7907...  0.2764 sec/batch
Epoch: 3/20...  Training Step: 9821...  Training loss: 1.8849...  0.2800 sec/batch
Epoch: 3/20...  Training Step: 9822...  Training loss: 1.5905...  0.2568 sec/batch
Epoch: 3/20...  Training Step: 9823...  Training loss: 1.8641...  0.2626 sec/batch
Epoch: 3/20...  Training Step: 9824...  Training loss: 1.8923...  0.2427 sec/batch
Epoch: 3/20...  Training Step: 9825...  Training loss: 1.6268...  0.2692 sec/batch
Epoch: 3/20...  Training Step: 9826...  Training loss: 1.8060...  0.2417 sec/batch
Epoch: 3/20...  Training Step: 9827...  Training loss: 1.8297...  0.2486 sec/batch
Epoc

Epoch: 3/20...  Training Step: 9915...  Training loss: 1.8968...  0.3084 sec/batch
Epoch: 3/20...  Training Step: 9916...  Training loss: 1.6355...  0.2503 sec/batch
Epoch: 3/20...  Training Step: 9917...  Training loss: 1.7965...  0.2389 sec/batch
Epoch: 3/20...  Training Step: 9918...  Training loss: 1.7277...  0.2628 sec/batch
Epoch: 3/20...  Training Step: 9919...  Training loss: 1.7444...  0.2147 sec/batch
Epoch: 3/20...  Training Step: 9920...  Training loss: 1.6768...  0.2419 sec/batch
Epoch: 3/20...  Training Step: 9921...  Training loss: 1.8268...  0.3354 sec/batch
Epoch: 3/20...  Training Step: 9922...  Training loss: 1.6510...  0.2943 sec/batch
Epoch: 3/20...  Training Step: 9923...  Training loss: 1.7449...  0.2179 sec/batch
Epoch: 3/20...  Training Step: 9924...  Training loss: 1.9478...  0.2228 sec/batch
Epoch: 3/20...  Training Step: 9925...  Training loss: 1.9597...  0.3149 sec/batch
Epoch: 3/20...  Training Step: 9926...  Training loss: 1.6863...  0.2060 sec/batch
Epoc

Epoch: 3/20...  Training Step: 10014...  Training loss: 1.8696...  0.2600 sec/batch
Epoch: 3/20...  Training Step: 10015...  Training loss: 1.6609...  0.3158 sec/batch
Epoch: 3/20...  Training Step: 10016...  Training loss: 1.6816...  0.3181 sec/batch
Epoch: 3/20...  Training Step: 10017...  Training loss: 1.6796...  0.3056 sec/batch
Epoch: 3/20...  Training Step: 10018...  Training loss: 1.7808...  0.2896 sec/batch
Epoch: 3/20...  Training Step: 10019...  Training loss: 1.7583...  0.2708 sec/batch
Epoch: 3/20...  Training Step: 10020...  Training loss: 1.6776...  0.2979 sec/batch
Epoch: 3/20...  Training Step: 10021...  Training loss: 1.7555...  0.2387 sec/batch
Epoch: 3/20...  Training Step: 10022...  Training loss: 1.7241...  0.3056 sec/batch
Epoch: 3/20...  Training Step: 10023...  Training loss: 1.7533...  0.2911 sec/batch
Epoch: 3/20...  Training Step: 10024...  Training loss: 1.6928...  0.2998 sec/batch
Epoch: 3/20...  Training Step: 10025...  Training loss: 1.6525...  0.3355 se

Epoch: 3/20...  Training Step: 10112...  Training loss: 1.7246...  0.3055 sec/batch
Epoch: 3/20...  Training Step: 10113...  Training loss: 1.8413...  0.2426 sec/batch
Epoch: 3/20...  Training Step: 10114...  Training loss: 1.6486...  0.3134 sec/batch
Epoch: 3/20...  Training Step: 10115...  Training loss: 1.8239...  0.3015 sec/batch
Epoch: 3/20...  Training Step: 10116...  Training loss: 1.7971...  0.2195 sec/batch
Epoch: 3/20...  Training Step: 10117...  Training loss: 1.7731...  0.2927 sec/batch
Epoch: 3/20...  Training Step: 10118...  Training loss: 1.8381...  0.2895 sec/batch
Epoch: 3/20...  Training Step: 10119...  Training loss: 1.7536...  0.2131 sec/batch
Epoch: 3/20...  Training Step: 10120...  Training loss: 1.7947...  0.2803 sec/batch
Epoch: 3/20...  Training Step: 10121...  Training loss: 1.7209...  0.3148 sec/batch
Epoch: 3/20...  Training Step: 10122...  Training loss: 2.0786...  0.2999 sec/batch
Epoch: 3/20...  Training Step: 10123...  Training loss: 1.6108...  0.3032 se

Epoch: 3/20...  Training Step: 10210...  Training loss: 1.6605...  0.2496 sec/batch
Epoch: 3/20...  Training Step: 10211...  Training loss: 1.7142...  0.2813 sec/batch
Epoch: 3/20...  Training Step: 10212...  Training loss: 1.7304...  0.2625 sec/batch
Epoch: 3/20...  Training Step: 10213...  Training loss: 1.5789...  0.2155 sec/batch
Epoch: 3/20...  Training Step: 10214...  Training loss: 1.5913...  0.2614 sec/batch
Epoch: 3/20...  Training Step: 10215...  Training loss: 1.5451...  0.2270 sec/batch
Epoch: 3/20...  Training Step: 10216...  Training loss: 1.6925...  0.2385 sec/batch
Epoch: 3/20...  Training Step: 10217...  Training loss: 1.7458...  0.2802 sec/batch
Epoch: 3/20...  Training Step: 10218...  Training loss: 1.7974...  0.3134 sec/batch
Epoch: 3/20...  Training Step: 10219...  Training loss: 1.7929...  0.2500 sec/batch
Epoch: 3/20...  Training Step: 10220...  Training loss: 1.6719...  0.3096 sec/batch
Epoch: 3/20...  Training Step: 10221...  Training loss: 1.7555...  0.2809 se

Epoch: 3/20...  Training Step: 10309...  Training loss: 1.8096...  0.2250 sec/batch
Epoch: 3/20...  Training Step: 10310...  Training loss: 1.7221...  0.2468 sec/batch
Epoch: 3/20...  Training Step: 10311...  Training loss: 1.8238...  0.2557 sec/batch
Epoch: 3/20...  Training Step: 10312...  Training loss: 1.7328...  0.1640 sec/batch
Epoch: 3/20...  Training Step: 10313...  Training loss: 1.6545...  0.2055 sec/batch
Epoch: 3/20...  Training Step: 10314...  Training loss: 1.7550...  0.2391 sec/batch
Epoch: 3/20...  Training Step: 10315...  Training loss: 1.7497...  0.2343 sec/batch
Epoch: 3/20...  Training Step: 10316...  Training loss: 1.7893...  0.1415 sec/batch
Epoch: 3/20...  Training Step: 10317...  Training loss: 1.8638...  0.1979 sec/batch
Epoch: 3/20...  Training Step: 10318...  Training loss: 1.7504...  0.2133 sec/batch
Epoch: 3/20...  Training Step: 10319...  Training loss: 1.7504...  0.2520 sec/batch
Epoch: 3/20...  Training Step: 10320...  Training loss: 1.7861...  0.2198 se

Epoch: 3/20...  Training Step: 10407...  Training loss: 1.7857...  0.2766 sec/batch
Epoch: 3/20...  Training Step: 10408...  Training loss: 1.7189...  0.2175 sec/batch
Epoch: 3/20...  Training Step: 10409...  Training loss: 1.7321...  0.1584 sec/batch
Epoch: 3/20...  Training Step: 10410...  Training loss: 1.7448...  0.1845 sec/batch
Epoch: 3/20...  Training Step: 10411...  Training loss: 1.7090...  0.2633 sec/batch
Epoch: 3/20...  Training Step: 10412...  Training loss: 1.7234...  0.2005 sec/batch
Epoch: 3/20...  Training Step: 10413...  Training loss: 1.5996...  0.2210 sec/batch
Epoch: 3/20...  Training Step: 10414...  Training loss: 1.7097...  0.1787 sec/batch
Epoch: 3/20...  Training Step: 10415...  Training loss: 1.6027...  0.1678 sec/batch
Epoch: 3/20...  Training Step: 10416...  Training loss: 1.6506...  0.1690 sec/batch
Epoch: 3/20...  Training Step: 10417...  Training loss: 1.6122...  0.1897 sec/batch
Epoch: 3/20...  Training Step: 10418...  Training loss: 1.7097...  0.2513 se

Epoch: 3/20...  Training Step: 10505...  Training loss: 1.5475...  0.1752 sec/batch
Epoch: 3/20...  Training Step: 10506...  Training loss: 1.7089...  0.2498 sec/batch
Epoch: 3/20...  Training Step: 10507...  Training loss: 1.6091...  0.2133 sec/batch
Epoch: 3/20...  Training Step: 10508...  Training loss: 1.7995...  0.1619 sec/batch
Epoch: 3/20...  Training Step: 10509...  Training loss: 1.6409...  0.2159 sec/batch
Epoch: 3/20...  Training Step: 10510...  Training loss: 1.7028...  0.2528 sec/batch
Epoch: 3/20...  Training Step: 10511...  Training loss: 1.6640...  0.1445 sec/batch
Epoch: 3/20...  Training Step: 10512...  Training loss: 1.7914...  0.2970 sec/batch
Epoch: 3/20...  Training Step: 10513...  Training loss: 1.7664...  0.1925 sec/batch
Epoch: 3/20...  Training Step: 10514...  Training loss: 1.7785...  0.1951 sec/batch
Epoch: 3/20...  Training Step: 10515...  Training loss: 1.6648...  0.1890 sec/batch
Epoch: 3/20...  Training Step: 10516...  Training loss: 1.6730...  0.2430 se

Epoch: 3/20...  Training Step: 10603...  Training loss: 1.6733...  0.2446 sec/batch
Epoch: 3/20...  Training Step: 10604...  Training loss: 1.6704...  0.1883 sec/batch
Epoch: 3/20...  Training Step: 10605...  Training loss: 1.8850...  0.2854 sec/batch
Epoch: 3/20...  Training Step: 10606...  Training loss: 1.7164...  0.1995 sec/batch
Epoch: 3/20...  Training Step: 10607...  Training loss: 1.7253...  0.2200 sec/batch
Epoch: 3/20...  Training Step: 10608...  Training loss: 1.6119...  0.2129 sec/batch
Epoch: 3/20...  Training Step: 10609...  Training loss: 1.8571...  0.2262 sec/batch
Epoch: 3/20...  Training Step: 10610...  Training loss: 1.7107...  0.1826 sec/batch
Epoch: 3/20...  Training Step: 10611...  Training loss: 1.9690...  0.2350 sec/batch
Epoch: 3/20...  Training Step: 10612...  Training loss: 1.7132...  0.2025 sec/batch
Epoch: 3/20...  Training Step: 10613...  Training loss: 1.7985...  0.2003 sec/batch
Epoch: 3/20...  Training Step: 10614...  Training loss: 1.7831...  0.2295 se

Epoch: 3/20...  Training Step: 10702...  Training loss: 1.7193...  0.2048 sec/batch
Epoch: 3/20...  Training Step: 10703...  Training loss: 1.7367...  0.2093 sec/batch
Epoch: 3/20...  Training Step: 10704...  Training loss: 1.6544...  0.1831 sec/batch
Epoch: 3/20...  Training Step: 10705...  Training loss: 1.8452...  0.2050 sec/batch
Epoch: 3/20...  Training Step: 10706...  Training loss: 1.6890...  0.1922 sec/batch
Epoch: 3/20...  Training Step: 10707...  Training loss: 1.6379...  0.1910 sec/batch
Epoch: 3/20...  Training Step: 10708...  Training loss: 1.8096...  0.2194 sec/batch
Epoch: 3/20...  Training Step: 10709...  Training loss: 1.8631...  0.1678 sec/batch
Epoch: 3/20...  Training Step: 10710...  Training loss: 1.8296...  0.2470 sec/batch
Epoch: 3/20...  Training Step: 10711...  Training loss: 1.7595...  0.2263 sec/batch
Epoch: 3/20...  Training Step: 10712...  Training loss: 1.8193...  0.1909 sec/batch
Epoch: 3/20...  Training Step: 10713...  Training loss: 1.7008...  0.2196 se

Epoch: 3/20...  Training Step: 10800...  Training loss: 1.6717...  0.2146 sec/batch
Epoch: 3/20...  Training Step: 10801...  Training loss: 1.8020...  0.2071 sec/batch
Epoch: 3/20...  Training Step: 10802...  Training loss: 1.6172...  0.1470 sec/batch
Epoch: 3/20...  Training Step: 10803...  Training loss: 1.7289...  0.2041 sec/batch
Epoch: 3/20...  Training Step: 10804...  Training loss: 1.6912...  0.2163 sec/batch
Epoch: 3/20...  Training Step: 10805...  Training loss: 1.6480...  0.1709 sec/batch
Epoch: 3/20...  Training Step: 10806...  Training loss: 1.6595...  0.2344 sec/batch
Epoch: 3/20...  Training Step: 10807...  Training loss: 1.6448...  0.1543 sec/batch
Epoch: 3/20...  Training Step: 10808...  Training loss: 1.7057...  0.2139 sec/batch
Epoch: 3/20...  Training Step: 10809...  Training loss: 1.8023...  0.1761 sec/batch
Epoch: 3/20...  Training Step: 10810...  Training loss: 1.6928...  0.1966 sec/batch
Epoch: 3/20...  Training Step: 10811...  Training loss: 1.6041...  0.2128 se

Epoch: 3/20...  Training Step: 10899...  Training loss: 1.6641...  0.1773 sec/batch
Epoch: 3/20...  Training Step: 10900...  Training loss: 1.7692...  0.1773 sec/batch
Epoch: 3/20...  Training Step: 10901...  Training loss: 1.7131...  0.1765 sec/batch
Epoch: 3/20...  Training Step: 10902...  Training loss: 1.8508...  0.2097 sec/batch
Epoch: 3/20...  Training Step: 10903...  Training loss: 1.7023...  0.2198 sec/batch
Epoch: 3/20...  Training Step: 10904...  Training loss: 1.5677...  0.1875 sec/batch
Epoch: 3/20...  Training Step: 10905...  Training loss: 1.6647...  0.1873 sec/batch
Epoch: 3/20...  Training Step: 10906...  Training loss: 1.6639...  0.2236 sec/batch
Epoch: 3/20...  Training Step: 10907...  Training loss: 1.6603...  0.2045 sec/batch
Epoch: 3/20...  Training Step: 10908...  Training loss: 1.7729...  0.1683 sec/batch
Epoch: 3/20...  Training Step: 10909...  Training loss: 1.7195...  0.1903 sec/batch
Epoch: 3/20...  Training Step: 10910...  Training loss: 1.6755...  0.2191 se

Epoch: 3/20...  Training Step: 10997...  Training loss: 1.8837...  0.2224 sec/batch
Epoch: 3/20...  Training Step: 10998...  Training loss: 1.6885...  0.2284 sec/batch
Epoch: 3/20...  Training Step: 10999...  Training loss: 1.9071...  0.2465 sec/batch
Epoch: 3/20...  Training Step: 11000...  Training loss: 1.7137...  0.1744 sec/batch
Epoch: 3/20...  Training Step: 11001...  Training loss: 1.8410...  0.1747 sec/batch
Epoch: 3/20...  Training Step: 11002...  Training loss: 1.8189...  0.1616 sec/batch
Epoch: 3/20...  Training Step: 11003...  Training loss: 1.8410...  0.1729 sec/batch
Epoch: 3/20...  Training Step: 11004...  Training loss: 1.6883...  0.2327 sec/batch
Epoch: 3/20...  Training Step: 11005...  Training loss: 1.6914...  0.1621 sec/batch
Epoch: 3/20...  Training Step: 11006...  Training loss: 1.6992...  0.2147 sec/batch
Epoch: 3/20...  Training Step: 11007...  Training loss: 1.8820...  0.2077 sec/batch
Epoch: 3/20...  Training Step: 11008...  Training loss: 1.7693...  0.1995 se

Epoch: 3/20...  Training Step: 11096...  Training loss: 1.7929...  0.2634 sec/batch
Epoch: 3/20...  Training Step: 11097...  Training loss: 1.6046...  0.1902 sec/batch
Epoch: 3/20...  Training Step: 11098...  Training loss: 1.7025...  0.2109 sec/batch
Epoch: 3/20...  Training Step: 11099...  Training loss: 1.6290...  0.1877 sec/batch
Epoch: 3/20...  Training Step: 11100...  Training loss: 1.7503...  0.1978 sec/batch
Epoch: 3/20...  Training Step: 11101...  Training loss: 1.8163...  0.2280 sec/batch
Epoch: 3/20...  Training Step: 11102...  Training loss: 1.7313...  0.1845 sec/batch
Epoch: 3/20...  Training Step: 11103...  Training loss: 1.8363...  0.1406 sec/batch
Epoch: 3/20...  Training Step: 11104...  Training loss: 1.6507...  0.2491 sec/batch
Epoch: 3/20...  Training Step: 11105...  Training loss: 1.6109...  0.2104 sec/batch
Epoch: 3/20...  Training Step: 11106...  Training loss: 1.6516...  0.1791 sec/batch
Epoch: 3/20...  Training Step: 11107...  Training loss: 1.6380...  0.1944 se

Epoch: 3/20...  Training Step: 11194...  Training loss: 1.7251...  0.2145 sec/batch
Epoch: 3/20...  Training Step: 11195...  Training loss: 1.7347...  0.1732 sec/batch
Epoch: 3/20...  Training Step: 11196...  Training loss: 1.7199...  0.1798 sec/batch
Epoch: 3/20...  Training Step: 11197...  Training loss: 1.6223...  0.2004 sec/batch
Epoch: 3/20...  Training Step: 11198...  Training loss: 1.7202...  0.1603 sec/batch
Epoch: 3/20...  Training Step: 11199...  Training loss: 1.7972...  0.2588 sec/batch
Epoch: 3/20...  Training Step: 11200...  Training loss: 1.6580...  0.1833 sec/batch
Epoch: 3/20...  Training Step: 11201...  Training loss: 1.7661...  0.1705 sec/batch
Epoch: 3/20...  Training Step: 11202...  Training loss: 1.7142...  0.1800 sec/batch
Epoch: 3/20...  Training Step: 11203...  Training loss: 1.7249...  0.1860 sec/batch
Epoch: 3/20...  Training Step: 11204...  Training loss: 1.6110...  0.2112 sec/batch
Epoch: 3/20...  Training Step: 11205...  Training loss: 1.6824...  0.1957 se

Epoch: 3/20...  Training Step: 11292...  Training loss: 1.6637...  0.2252 sec/batch
Epoch: 3/20...  Training Step: 11293...  Training loss: 1.6558...  0.1968 sec/batch
Epoch: 3/20...  Training Step: 11294...  Training loss: 1.6725...  0.1924 sec/batch
Epoch: 3/20...  Training Step: 11295...  Training loss: 1.6691...  0.1798 sec/batch
Epoch: 3/20...  Training Step: 11296...  Training loss: 1.6591...  0.2259 sec/batch
Epoch: 3/20...  Training Step: 11297...  Training loss: 1.7330...  0.1929 sec/batch
Epoch: 3/20...  Training Step: 11298...  Training loss: 1.8563...  0.1902 sec/batch
Epoch: 3/20...  Training Step: 11299...  Training loss: 1.7884...  0.2126 sec/batch
Epoch: 3/20...  Training Step: 11300...  Training loss: 1.7035...  0.1654 sec/batch
Epoch: 3/20...  Training Step: 11301...  Training loss: 1.6302...  0.2690 sec/batch
Epoch: 3/20...  Training Step: 11302...  Training loss: 1.6291...  0.1925 sec/batch
Epoch: 3/20...  Training Step: 11303...  Training loss: 1.6803...  0.1577 se

Epoch: 3/20...  Training Step: 11390...  Training loss: 1.6762...  0.2163 sec/batch
Epoch: 3/20...  Training Step: 11391...  Training loss: 1.8610...  0.1754 sec/batch
Epoch: 3/20...  Training Step: 11392...  Training loss: 1.7227...  0.1979 sec/batch
Epoch: 3/20...  Training Step: 11393...  Training loss: 1.6740...  0.1988 sec/batch
Epoch: 3/20...  Training Step: 11394...  Training loss: 1.6597...  0.2015 sec/batch
Epoch: 3/20...  Training Step: 11395...  Training loss: 1.6526...  0.2216 sec/batch
Epoch: 3/20...  Training Step: 11396...  Training loss: 1.7954...  0.1830 sec/batch
Epoch: 3/20...  Training Step: 11397...  Training loss: 1.6023...  0.2446 sec/batch
Epoch: 3/20...  Training Step: 11398...  Training loss: 1.5933...  0.1882 sec/batch
Epoch: 3/20...  Training Step: 11399...  Training loss: 1.6498...  0.1894 sec/batch
Epoch: 3/20...  Training Step: 11400...  Training loss: 1.6874...  0.2139 sec/batch
Epoch: 3/20...  Training Step: 11401...  Training loss: 1.7219...  0.2028 se

Epoch: 3/20...  Training Step: 11488...  Training loss: 1.5584...  0.2055 sec/batch
Epoch: 3/20...  Training Step: 11489...  Training loss: 1.6902...  0.1593 sec/batch
Epoch: 3/20...  Training Step: 11490...  Training loss: 1.8246...  0.2321 sec/batch
Epoch: 3/20...  Training Step: 11491...  Training loss: 1.6519...  0.2007 sec/batch
Epoch: 3/20...  Training Step: 11492...  Training loss: 1.6640...  0.1712 sec/batch
Epoch: 3/20...  Training Step: 11493...  Training loss: 1.7383...  0.2255 sec/batch
Epoch: 3/20...  Training Step: 11494...  Training loss: 1.7827...  0.1613 sec/batch
Epoch: 3/20...  Training Step: 11495...  Training loss: 1.6232...  0.2174 sec/batch
Epoch: 3/20...  Training Step: 11496...  Training loss: 1.7992...  0.2225 sec/batch
Epoch: 3/20...  Training Step: 11497...  Training loss: 1.6339...  0.2038 sec/batch
Epoch: 3/20...  Training Step: 11498...  Training loss: 1.5489...  0.2086 sec/batch
Epoch: 3/20...  Training Step: 11499...  Training loss: 1.8117...  0.2109 se

Epoch: 3/20...  Training Step: 11587...  Training loss: 1.7627...  0.2193 sec/batch
Epoch: 3/20...  Training Step: 11588...  Training loss: 1.8631...  0.1908 sec/batch
Epoch: 3/20...  Training Step: 11589...  Training loss: 1.7276...  0.2178 sec/batch
Epoch: 3/20...  Training Step: 11590...  Training loss: 1.8781...  0.2466 sec/batch
Epoch: 3/20...  Training Step: 11591...  Training loss: 1.8055...  0.1644 sec/batch
Epoch: 3/20...  Training Step: 11592...  Training loss: 1.7017...  0.2405 sec/batch
Epoch: 3/20...  Training Step: 11593...  Training loss: 1.7458...  0.1958 sec/batch
Epoch: 3/20...  Training Step: 11594...  Training loss: 1.7471...  0.1984 sec/batch
Epoch: 3/20...  Training Step: 11595...  Training loss: 1.9732...  0.2192 sec/batch
Epoch: 3/20...  Training Step: 11596...  Training loss: 1.9959...  0.1810 sec/batch
Epoch: 3/20...  Training Step: 11597...  Training loss: 1.7917...  0.2734 sec/batch
Epoch: 3/20...  Training Step: 11598...  Training loss: 1.8672...  0.1747 se

Epoch: 3/20...  Training Step: 11685...  Training loss: 1.7237...  0.2426 sec/batch
Epoch: 3/20...  Training Step: 11686...  Training loss: 1.9638...  0.1879 sec/batch
Epoch: 3/20...  Training Step: 11687...  Training loss: 1.8159...  0.1935 sec/batch
Epoch: 3/20...  Training Step: 11688...  Training loss: 2.0391...  0.2046 sec/batch
Epoch: 3/20...  Training Step: 11689...  Training loss: 1.7188...  0.2164 sec/batch
Epoch: 3/20...  Training Step: 11690...  Training loss: 1.8123...  0.2112 sec/batch
Epoch: 3/20...  Training Step: 11691...  Training loss: 1.6942...  0.2105 sec/batch
Epoch: 3/20...  Training Step: 11692...  Training loss: 2.0127...  0.1629 sec/batch
Epoch: 3/20...  Training Step: 11693...  Training loss: 1.7845...  0.2741 sec/batch
Epoch: 3/20...  Training Step: 11694...  Training loss: 1.7456...  0.1734 sec/batch
Epoch: 3/20...  Training Step: 11695...  Training loss: 1.7657...  0.2171 sec/batch
Epoch: 3/20...  Training Step: 11696...  Training loss: 1.7459...  0.1941 se

Epoch: 3/20...  Training Step: 11783...  Training loss: 2.0779...  0.2125 sec/batch
Epoch: 3/20...  Training Step: 11784...  Training loss: 2.0020...  0.1864 sec/batch
Epoch: 3/20...  Training Step: 11785...  Training loss: 1.8481...  0.2077 sec/batch
Epoch: 3/20...  Training Step: 11786...  Training loss: 1.8140...  0.2301 sec/batch
Epoch: 3/20...  Training Step: 11787...  Training loss: 1.7283...  0.2096 sec/batch
Epoch: 3/20...  Training Step: 11788...  Training loss: 1.7973...  0.2098 sec/batch
Epoch: 3/20...  Training Step: 11789...  Training loss: 1.6919...  0.1733 sec/batch
Epoch: 3/20...  Training Step: 11790...  Training loss: 1.6728...  0.2367 sec/batch
Epoch: 3/20...  Training Step: 11791...  Training loss: 1.9531...  0.2166 sec/batch
Epoch: 3/20...  Training Step: 11792...  Training loss: 1.8801...  0.1543 sec/batch
Epoch: 3/20...  Training Step: 11793...  Training loss: 1.9545...  0.2161 sec/batch
Epoch: 3/20...  Training Step: 11794...  Training loss: 1.7036...  0.1739 se

Epoch: 3/20...  Training Step: 11881...  Training loss: 1.9141...  0.2024 sec/batch
Epoch: 3/20...  Training Step: 11882...  Training loss: 1.8531...  0.2039 sec/batch
Epoch: 3/20...  Training Step: 11883...  Training loss: 1.6804...  0.1910 sec/batch
Epoch: 3/20...  Training Step: 11884...  Training loss: 1.7433...  0.1988 sec/batch
Epoch: 3/20...  Training Step: 11885...  Training loss: 1.7323...  0.2132 sec/batch
Epoch: 3/20...  Training Step: 11886...  Training loss: 1.9748...  0.1899 sec/batch
Epoch: 3/20...  Training Step: 11887...  Training loss: 1.9118...  0.2022 sec/batch
Epoch: 3/20...  Training Step: 11888...  Training loss: 1.7627...  0.1940 sec/batch
Epoch: 3/20...  Training Step: 11889...  Training loss: 1.7883...  0.2313 sec/batch
Epoch: 3/20...  Training Step: 11890...  Training loss: 1.7014...  0.2423 sec/batch
Epoch: 3/20...  Training Step: 11891...  Training loss: 1.7635...  0.1783 sec/batch
Epoch: 3/20...  Training Step: 11892...  Training loss: 1.6928...  0.1942 se

Epoch: 4/20...  Training Step: 11979...  Training loss: 1.9631...  0.2267 sec/batch
Epoch: 4/20...  Training Step: 11980...  Training loss: 1.7161...  0.1967 sec/batch
Epoch: 4/20...  Training Step: 11981...  Training loss: 1.8219...  0.2156 sec/batch
Epoch: 4/20...  Training Step: 11982...  Training loss: 1.7526...  0.2089 sec/batch
Epoch: 4/20...  Training Step: 11983...  Training loss: 1.6414...  0.1989 sec/batch
Epoch: 4/20...  Training Step: 11984...  Training loss: 1.7546...  0.2374 sec/batch
Epoch: 4/20...  Training Step: 11985...  Training loss: 1.5983...  0.1553 sec/batch
Epoch: 4/20...  Training Step: 11986...  Training loss: 1.7585...  0.2914 sec/batch
Epoch: 4/20...  Training Step: 11987...  Training loss: 1.6689...  0.1625 sec/batch
Epoch: 4/20...  Training Step: 11988...  Training loss: 1.6277...  0.1888 sec/batch
Epoch: 4/20...  Training Step: 11989...  Training loss: 1.6454...  0.2139 sec/batch
Epoch: 4/20...  Training Step: 11990...  Training loss: 1.7289...  0.2318 se

Epoch: 4/20...  Training Step: 12077...  Training loss: 1.7020...  0.1917 sec/batch
Epoch: 4/20...  Training Step: 12078...  Training loss: 1.9134...  0.2301 sec/batch
Epoch: 4/20...  Training Step: 12079...  Training loss: 1.7274...  0.1998 sec/batch
Epoch: 4/20...  Training Step: 12080...  Training loss: 1.7834...  0.1629 sec/batch
Epoch: 4/20...  Training Step: 12081...  Training loss: 1.7088...  0.2134 sec/batch
Epoch: 4/20...  Training Step: 12082...  Training loss: 1.7644...  0.1613 sec/batch
Epoch: 4/20...  Training Step: 12083...  Training loss: 1.7570...  0.2388 sec/batch
Epoch: 4/20...  Training Step: 12084...  Training loss: 1.7119...  0.1689 sec/batch
Epoch: 4/20...  Training Step: 12085...  Training loss: 1.7633...  0.2023 sec/batch
Epoch: 4/20...  Training Step: 12086...  Training loss: 1.7136...  0.2007 sec/batch
Epoch: 4/20...  Training Step: 12087...  Training loss: 1.6958...  0.2003 sec/batch
Epoch: 4/20...  Training Step: 12088...  Training loss: 1.6228...  0.1831 se

Epoch: 4/20...  Training Step: 12175...  Training loss: 1.5965...  0.2442 sec/batch
Epoch: 4/20...  Training Step: 12176...  Training loss: 1.7487...  0.1970 sec/batch
Epoch: 4/20...  Training Step: 12177...  Training loss: 1.6842...  0.2340 sec/batch
Epoch: 4/20...  Training Step: 12178...  Training loss: 1.7316...  0.1698 sec/batch
Epoch: 4/20...  Training Step: 12179...  Training loss: 1.7714...  0.2269 sec/batch
Epoch: 4/20...  Training Step: 12180...  Training loss: 1.7861...  0.1938 sec/batch
Epoch: 4/20...  Training Step: 12181...  Training loss: 1.7028...  0.2129 sec/batch
Epoch: 4/20...  Training Step: 12182...  Training loss: 1.6968...  0.1985 sec/batch
Epoch: 4/20...  Training Step: 12183...  Training loss: 1.8953...  0.1841 sec/batch
Epoch: 4/20...  Training Step: 12184...  Training loss: 1.8166...  0.2175 sec/batch
Epoch: 4/20...  Training Step: 12185...  Training loss: 1.8201...  0.2361 sec/batch
Epoch: 4/20...  Training Step: 12186...  Training loss: 1.6877...  0.1980 se

Epoch: 4/20...  Training Step: 12273...  Training loss: 1.8569...  0.2533 sec/batch
Epoch: 4/20...  Training Step: 12274...  Training loss: 1.5950...  0.2305 sec/batch
Epoch: 4/20...  Training Step: 12275...  Training loss: 1.8054...  0.1954 sec/batch
Epoch: 4/20...  Training Step: 12276...  Training loss: 1.6886...  0.2650 sec/batch
Epoch: 4/20...  Training Step: 12277...  Training loss: 1.8456...  0.2857 sec/batch
Epoch: 4/20...  Training Step: 12278...  Training loss: 1.7940...  0.1828 sec/batch
Epoch: 4/20...  Training Step: 12279...  Training loss: 1.7197...  0.2190 sec/batch
Epoch: 4/20...  Training Step: 12280...  Training loss: 1.7964...  0.2515 sec/batch
Epoch: 4/20...  Training Step: 12281...  Training loss: 1.6765...  0.2381 sec/batch
Epoch: 4/20...  Training Step: 12282...  Training loss: 1.7264...  0.2239 sec/batch
Epoch: 4/20...  Training Step: 12283...  Training loss: 1.7615...  0.2679 sec/batch
Epoch: 4/20...  Training Step: 12284...  Training loss: 1.6418...  0.2155 se

Epoch: 4/20...  Training Step: 12371...  Training loss: 1.6817...  0.2396 sec/batch
Epoch: 4/20...  Training Step: 12372...  Training loss: 1.7410...  0.1857 sec/batch
Epoch: 4/20...  Training Step: 12373...  Training loss: 1.7298...  0.2416 sec/batch
Epoch: 4/20...  Training Step: 12374...  Training loss: 1.6873...  0.2387 sec/batch
Epoch: 4/20...  Training Step: 12375...  Training loss: 1.6873...  0.2609 sec/batch
Epoch: 4/20...  Training Step: 12376...  Training loss: 1.7482...  0.2315 sec/batch
Epoch: 4/20...  Training Step: 12377...  Training loss: 1.7553...  0.1705 sec/batch
Epoch: 4/20...  Training Step: 12378...  Training loss: 1.6919...  0.2593 sec/batch
Epoch: 4/20...  Training Step: 12379...  Training loss: 1.5781...  0.1814 sec/batch
Epoch: 4/20...  Training Step: 12380...  Training loss: 1.7627...  0.2178 sec/batch
Epoch: 4/20...  Training Step: 12381...  Training loss: 1.6694...  0.1681 sec/batch
Epoch: 4/20...  Training Step: 12382...  Training loss: 1.7393...  0.2406 se

Epoch: 4/20...  Training Step: 12469...  Training loss: 1.6898...  0.2104 sec/batch
Epoch: 4/20...  Training Step: 12470...  Training loss: 1.7480...  0.1878 sec/batch
Epoch: 4/20...  Training Step: 12471...  Training loss: 1.7193...  0.2000 sec/batch
Epoch: 4/20...  Training Step: 12472...  Training loss: 1.7608...  0.1879 sec/batch
Epoch: 4/20...  Training Step: 12473...  Training loss: 1.6282...  0.2100 sec/batch
Epoch: 4/20...  Training Step: 12474...  Training loss: 1.6491...  0.1713 sec/batch
Epoch: 4/20...  Training Step: 12475...  Training loss: 1.6810...  0.2092 sec/batch
Epoch: 4/20...  Training Step: 12476...  Training loss: 1.6942...  0.2178 sec/batch
Epoch: 4/20...  Training Step: 12477...  Training loss: 1.6349...  0.2068 sec/batch
Epoch: 4/20...  Training Step: 12478...  Training loss: 1.7573...  0.2147 sec/batch
Epoch: 4/20...  Training Step: 12479...  Training loss: 1.7629...  0.2026 sec/batch
Epoch: 4/20...  Training Step: 12480...  Training loss: 1.7816...  0.1885 se

Epoch: 4/20...  Training Step: 12568...  Training loss: 1.7481...  0.1782 sec/batch
Epoch: 4/20...  Training Step: 12569...  Training loss: 1.7876...  0.2025 sec/batch
Epoch: 4/20...  Training Step: 12570...  Training loss: 1.6996...  0.2131 sec/batch
Epoch: 4/20...  Training Step: 12571...  Training loss: 1.7513...  0.1872 sec/batch
Epoch: 4/20...  Training Step: 12572...  Training loss: 1.7011...  0.2249 sec/batch
Epoch: 4/20...  Training Step: 12573...  Training loss: 1.7536...  0.1640 sec/batch
Epoch: 4/20...  Training Step: 12574...  Training loss: 1.8696...  0.2286 sec/batch
Epoch: 4/20...  Training Step: 12575...  Training loss: 1.7863...  0.2112 sec/batch
Epoch: 4/20...  Training Step: 12576...  Training loss: 1.7575...  0.1801 sec/batch
Epoch: 4/20...  Training Step: 12577...  Training loss: 1.7293...  0.2548 sec/batch
Epoch: 4/20...  Training Step: 12578...  Training loss: 1.5768...  0.1772 sec/batch
Epoch: 4/20...  Training Step: 12579...  Training loss: 1.6384...  0.2250 se

Epoch: 4/20...  Training Step: 12667...  Training loss: 1.6792...  0.1892 sec/batch
Epoch: 4/20...  Training Step: 12668...  Training loss: 1.6254...  0.1803 sec/batch
Epoch: 4/20...  Training Step: 12669...  Training loss: 1.7222...  0.1868 sec/batch
Epoch: 4/20...  Training Step: 12670...  Training loss: 1.6567...  0.1727 sec/batch
Epoch: 4/20...  Training Step: 12671...  Training loss: 1.6299...  0.2253 sec/batch
Epoch: 4/20...  Training Step: 12672...  Training loss: 1.7834...  0.2188 sec/batch
Epoch: 4/20...  Training Step: 12673...  Training loss: 1.7919...  0.1784 sec/batch
Epoch: 4/20...  Training Step: 12674...  Training loss: 1.7321...  0.1959 sec/batch
Epoch: 4/20...  Training Step: 12675...  Training loss: 1.7529...  0.2279 sec/batch
Epoch: 4/20...  Training Step: 12676...  Training loss: 1.7732...  0.1922 sec/batch
Epoch: 4/20...  Training Step: 12677...  Training loss: 1.7169...  0.1991 sec/batch
Epoch: 4/20...  Training Step: 12678...  Training loss: 1.7273...  0.1743 se

Epoch: 4/20...  Training Step: 12766...  Training loss: 1.7340...  0.2242 sec/batch
Epoch: 4/20...  Training Step: 12767...  Training loss: 1.8027...  0.2135 sec/batch
Epoch: 4/20...  Training Step: 12768...  Training loss: 1.7070...  0.1674 sec/batch
Epoch: 4/20...  Training Step: 12769...  Training loss: 1.7094...  0.2123 sec/batch
Epoch: 4/20...  Training Step: 12770...  Training loss: 1.8454...  0.2025 sec/batch
Epoch: 4/20...  Training Step: 12771...  Training loss: 1.7428...  0.2036 sec/batch
Epoch: 4/20...  Training Step: 12772...  Training loss: 1.7820...  0.1840 sec/batch
Epoch: 4/20...  Training Step: 12773...  Training loss: 1.8270...  0.1790 sec/batch
Epoch: 4/20...  Training Step: 12774...  Training loss: 1.7264...  0.2256 sec/batch
Epoch: 4/20...  Training Step: 12775...  Training loss: 1.8208...  0.1771 sec/batch
Epoch: 4/20...  Training Step: 12776...  Training loss: 1.9246...  0.2056 sec/batch
Epoch: 4/20...  Training Step: 12777...  Training loss: 1.8306...  0.2193 se

Epoch: 4/20...  Training Step: 12865...  Training loss: 1.7276...  0.1950 sec/batch
Epoch: 4/20...  Training Step: 12866...  Training loss: 1.6299...  0.1956 sec/batch
Epoch: 4/20...  Training Step: 12867...  Training loss: 1.8387...  0.2202 sec/batch
Epoch: 4/20...  Training Step: 12868...  Training loss: 1.7703...  0.1809 sec/batch
Epoch: 4/20...  Training Step: 12869...  Training loss: 1.6157...  0.1987 sec/batch
Epoch: 4/20...  Training Step: 12870...  Training loss: 1.7849...  0.2138 sec/batch
Epoch: 4/20...  Training Step: 12871...  Training loss: 1.5772...  0.2139 sec/batch
Epoch: 4/20...  Training Step: 12872...  Training loss: 1.6451...  0.1854 sec/batch
Epoch: 4/20...  Training Step: 12873...  Training loss: 1.5974...  0.1989 sec/batch
Epoch: 4/20...  Training Step: 12874...  Training loss: 1.6118...  0.2067 sec/batch
Epoch: 4/20...  Training Step: 12875...  Training loss: 1.7896...  0.2412 sec/batch
Epoch: 4/20...  Training Step: 12876...  Training loss: 1.7344...  0.1697 se

Epoch: 4/20...  Training Step: 12963...  Training loss: 1.6335...  0.2016 sec/batch
Epoch: 4/20...  Training Step: 12964...  Training loss: 1.6481...  0.1931 sec/batch
Epoch: 4/20...  Training Step: 12965...  Training loss: 1.6995...  0.1783 sec/batch
Epoch: 4/20...  Training Step: 12966...  Training loss: 1.6582...  0.1898 sec/batch
Epoch: 4/20...  Training Step: 12967...  Training loss: 1.5331...  0.1702 sec/batch
Epoch: 4/20...  Training Step: 12968...  Training loss: 1.7226...  0.2106 sec/batch
Epoch: 4/20...  Training Step: 12969...  Training loss: 1.7337...  0.1670 sec/batch
Epoch: 4/20...  Training Step: 12970...  Training loss: 1.7178...  0.1952 sec/batch
Epoch: 4/20...  Training Step: 12971...  Training loss: 1.7139...  0.1759 sec/batch
Epoch: 4/20...  Training Step: 12972...  Training loss: 1.6999...  0.2253 sec/batch
Epoch: 4/20...  Training Step: 12973...  Training loss: 1.7173...  0.1888 sec/batch
Epoch: 4/20...  Training Step: 12974...  Training loss: 1.6870...  0.1881 se

Epoch: 4/20...  Training Step: 13061...  Training loss: 1.7599...  0.2079 sec/batch
Epoch: 4/20...  Training Step: 13062...  Training loss: 1.8407...  0.1655 sec/batch
Epoch: 4/20...  Training Step: 13063...  Training loss: 1.8227...  0.2221 sec/batch
Epoch: 4/20...  Training Step: 13064...  Training loss: 1.7682...  0.1884 sec/batch
Epoch: 4/20...  Training Step: 13065...  Training loss: 1.7473...  0.1748 sec/batch
Epoch: 4/20...  Training Step: 13066...  Training loss: 1.5805...  0.1920 sec/batch
Epoch: 4/20...  Training Step: 13067...  Training loss: 1.6848...  0.1868 sec/batch
Epoch: 4/20...  Training Step: 13068...  Training loss: 1.5734...  0.1598 sec/batch
Epoch: 4/20...  Training Step: 13069...  Training loss: 1.7349...  0.2223 sec/batch
Epoch: 4/20...  Training Step: 13070...  Training loss: 1.5874...  0.1800 sec/batch
Epoch: 4/20...  Training Step: 13071...  Training loss: 1.6981...  0.1991 sec/batch
Epoch: 4/20...  Training Step: 13072...  Training loss: 1.7294...  0.2188 se

Epoch: 4/20...  Training Step: 13159...  Training loss: 1.6967...  0.1906 sec/batch
Epoch: 4/20...  Training Step: 13160...  Training loss: 1.5472...  0.1920 sec/batch
Epoch: 4/20...  Training Step: 13161...  Training loss: 1.7183...  0.2340 sec/batch
Epoch: 4/20...  Training Step: 13162...  Training loss: 1.6291...  0.1794 sec/batch
Epoch: 4/20...  Training Step: 13163...  Training loss: 1.5801...  0.1888 sec/batch
Epoch: 4/20...  Training Step: 13164...  Training loss: 1.7689...  0.2187 sec/batch
Epoch: 4/20...  Training Step: 13165...  Training loss: 1.6257...  0.2233 sec/batch
Epoch: 4/20...  Training Step: 13166...  Training loss: 1.7011...  0.1501 sec/batch
Epoch: 4/20...  Training Step: 13167...  Training loss: 1.7920...  0.1771 sec/batch
Epoch: 4/20...  Training Step: 13168...  Training loss: 1.6080...  0.1774 sec/batch
Epoch: 4/20...  Training Step: 13169...  Training loss: 1.6595...  0.2291 sec/batch
Epoch: 4/20...  Training Step: 13170...  Training loss: 1.7055...  0.1660 se

Epoch: 4/20...  Training Step: 13257...  Training loss: 1.6303...  0.2171 sec/batch
Epoch: 4/20...  Training Step: 13258...  Training loss: 1.6890...  0.1694 sec/batch
Epoch: 4/20...  Training Step: 13259...  Training loss: 1.7676...  0.1920 sec/batch
Epoch: 4/20...  Training Step: 13260...  Training loss: 1.5571...  0.2236 sec/batch
Epoch: 4/20...  Training Step: 13261...  Training loss: 1.5494...  0.1706 sec/batch
Epoch: 4/20...  Training Step: 13262...  Training loss: 1.6288...  0.1892 sec/batch
Epoch: 4/20...  Training Step: 13263...  Training loss: 1.7119...  0.1758 sec/batch
Epoch: 4/20...  Training Step: 13264...  Training loss: 1.7319...  0.2067 sec/batch
Epoch: 4/20...  Training Step: 13265...  Training loss: 1.7269...  0.1707 sec/batch
Epoch: 4/20...  Training Step: 13266...  Training loss: 1.6643...  0.2282 sec/batch
Epoch: 4/20...  Training Step: 13267...  Training loss: 1.7656...  0.1704 sec/batch
Epoch: 4/20...  Training Step: 13268...  Training loss: 1.5746...  0.2356 se

Epoch: 4/20...  Training Step: 13356...  Training loss: 1.8591...  0.2552 sec/batch
Epoch: 4/20...  Training Step: 13357...  Training loss: 1.9027...  0.1547 sec/batch
Epoch: 4/20...  Training Step: 13358...  Training loss: 1.7112...  0.1986 sec/batch
Epoch: 4/20...  Training Step: 13359...  Training loss: 1.7706...  0.2137 sec/batch
Epoch: 4/20...  Training Step: 13360...  Training loss: 1.7456...  0.2126 sec/batch
Epoch: 4/20...  Training Step: 13361...  Training loss: 1.6933...  0.1851 sec/batch
Epoch: 4/20...  Training Step: 13362...  Training loss: 1.6610...  0.2071 sec/batch
Epoch: 4/20...  Training Step: 13363...  Training loss: 1.8632...  0.1902 sec/batch
Epoch: 4/20...  Training Step: 13364...  Training loss: 1.8775...  0.1907 sec/batch
Epoch: 4/20...  Training Step: 13365...  Training loss: 1.8326...  0.1816 sec/batch
Epoch: 4/20...  Training Step: 13366...  Training loss: 1.6535...  0.2152 sec/batch
Epoch: 4/20...  Training Step: 13367...  Training loss: 1.7289...  0.1772 se

Epoch: 4/20...  Training Step: 13454...  Training loss: 1.8128...  0.2021 sec/batch
Epoch: 4/20...  Training Step: 13455...  Training loss: 1.5455...  0.2349 sec/batch
Epoch: 4/20...  Training Step: 13456...  Training loss: 1.7868...  0.2105 sec/batch
Epoch: 4/20...  Training Step: 13457...  Training loss: 1.6508...  0.1674 sec/batch
Epoch: 4/20...  Training Step: 13458...  Training loss: 1.5423...  0.1987 sec/batch
Epoch: 4/20...  Training Step: 13459...  Training loss: 1.5099...  0.2356 sec/batch
Epoch: 4/20...  Training Step: 13460...  Training loss: 1.6660...  0.1955 sec/batch
Epoch: 4/20...  Training Step: 13461...  Training loss: 1.5648...  0.1965 sec/batch
Epoch: 4/20...  Training Step: 13462...  Training loss: 1.7138...  0.2086 sec/batch
Epoch: 4/20...  Training Step: 13463...  Training loss: 1.8626...  0.1488 sec/batch
Epoch: 4/20...  Training Step: 13464...  Training loss: 1.8368...  0.2349 sec/batch
Epoch: 4/20...  Training Step: 13465...  Training loss: 1.6970...  0.1930 se

Epoch: 4/20...  Training Step: 13552...  Training loss: 1.6720...  0.2092 sec/batch
Epoch: 4/20...  Training Step: 13553...  Training loss: 1.7282...  0.2079 sec/batch
Epoch: 4/20...  Training Step: 13554...  Training loss: 1.6987...  0.1502 sec/batch
Epoch: 4/20...  Training Step: 13555...  Training loss: 1.6401...  0.2089 sec/batch
Epoch: 4/20...  Training Step: 13556...  Training loss: 1.6761...  0.2109 sec/batch
Epoch: 4/20...  Training Step: 13557...  Training loss: 1.6549...  0.2347 sec/batch
Epoch: 4/20...  Training Step: 13558...  Training loss: 1.8067...  0.1710 sec/batch
Epoch: 4/20...  Training Step: 13559...  Training loss: 1.6078...  0.2274 sec/batch
Epoch: 4/20...  Training Step: 13560...  Training loss: 1.6428...  0.1828 sec/batch
Epoch: 4/20...  Training Step: 13561...  Training loss: 1.6543...  0.1890 sec/batch
Epoch: 4/20...  Training Step: 13562...  Training loss: 1.8442...  0.1788 sec/batch
Epoch: 4/20...  Training Step: 13563...  Training loss: 1.7346...  0.2456 se

Epoch: 4/20...  Training Step: 13650...  Training loss: 1.7436...  0.2437 sec/batch
Epoch: 4/20...  Training Step: 13651...  Training loss: 1.6978...  0.1877 sec/batch
Epoch: 4/20...  Training Step: 13652...  Training loss: 1.7238...  0.2029 sec/batch
Epoch: 4/20...  Training Step: 13653...  Training loss: 1.8142...  0.1677 sec/batch
Epoch: 4/20...  Training Step: 13654...  Training loss: 1.6445...  0.1995 sec/batch
Epoch: 4/20...  Training Step: 13655...  Training loss: 1.6586...  0.1987 sec/batch
Epoch: 4/20...  Training Step: 13656...  Training loss: 1.5563...  0.2273 sec/batch
Epoch: 4/20...  Training Step: 13657...  Training loss: 1.7811...  0.2023 sec/batch
Epoch: 4/20...  Training Step: 13658...  Training loss: 1.7404...  0.1906 sec/batch
Epoch: 4/20...  Training Step: 13659...  Training loss: 1.6352...  0.2116 sec/batch
Epoch: 4/20...  Training Step: 13660...  Training loss: 1.7832...  0.2135 sec/batch
Epoch: 4/20...  Training Step: 13661...  Training loss: 1.6777...  0.1937 se

Epoch: 4/20...  Training Step: 13748...  Training loss: 1.6448...  0.2097 sec/batch
Epoch: 4/20...  Training Step: 13749...  Training loss: 1.6740...  0.1867 sec/batch
Epoch: 4/20...  Training Step: 13750...  Training loss: 1.6587...  0.2175 sec/batch
Epoch: 4/20...  Training Step: 13751...  Training loss: 1.6210...  0.2049 sec/batch
Epoch: 4/20...  Training Step: 13752...  Training loss: 1.7242...  0.2163 sec/batch
Epoch: 4/20...  Training Step: 13753...  Training loss: 1.5338...  0.2165 sec/batch
Epoch: 4/20...  Training Step: 13754...  Training loss: 1.5565...  0.1880 sec/batch
Epoch: 4/20...  Training Step: 13755...  Training loss: 1.7386...  0.1584 sec/batch
Epoch: 4/20...  Training Step: 13756...  Training loss: 1.8032...  0.2456 sec/batch
Epoch: 4/20...  Training Step: 13757...  Training loss: 1.7552...  0.1905 sec/batch
Epoch: 4/20...  Training Step: 13758...  Training loss: 1.6980...  0.2341 sec/batch
Epoch: 4/20...  Training Step: 13759...  Training loss: 1.5838...  0.1723 se

Epoch: 4/20...  Training Step: 13846...  Training loss: 1.6965...  0.2062 sec/batch
Epoch: 4/20...  Training Step: 13847...  Training loss: 1.8359...  0.1671 sec/batch
Epoch: 4/20...  Training Step: 13848...  Training loss: 2.1390...  0.2122 sec/batch
Epoch: 4/20...  Training Step: 13849...  Training loss: 1.7937...  0.2066 sec/batch
Epoch: 4/20...  Training Step: 13850...  Training loss: 1.9169...  0.1810 sec/batch
Epoch: 4/20...  Training Step: 13851...  Training loss: 1.6569...  0.2070 sec/batch
Epoch: 4/20...  Training Step: 13852...  Training loss: 1.6813...  0.1655 sec/batch
Epoch: 4/20...  Training Step: 13853...  Training loss: 1.7991...  0.2156 sec/batch
Epoch: 4/20...  Training Step: 13854...  Training loss: 1.6333...  0.1724 sec/batch
Epoch: 4/20...  Training Step: 13855...  Training loss: 1.8194...  0.2057 sec/batch
Epoch: 4/20...  Training Step: 13856...  Training loss: 1.5665...  0.1994 sec/batch
Epoch: 4/20...  Training Step: 13857...  Training loss: 1.7252...  0.1914 se

Epoch: 4/20...  Training Step: 13944...  Training loss: 1.7204...  0.2125 sec/batch
Epoch: 4/20...  Training Step: 13945...  Training loss: 1.7162...  0.1785 sec/batch
Epoch: 4/20...  Training Step: 13946...  Training loss: 1.6764...  0.1883 sec/batch
Epoch: 4/20...  Training Step: 13947...  Training loss: 1.5623...  0.2856 sec/batch
Epoch: 4/20...  Training Step: 13948...  Training loss: 1.6423...  0.1673 sec/batch
Epoch: 4/20...  Training Step: 13949...  Training loss: 1.6014...  0.1665 sec/batch
Epoch: 4/20...  Training Step: 13950...  Training loss: 1.7363...  0.2218 sec/batch
Epoch: 4/20...  Training Step: 13951...  Training loss: 1.6306...  0.2150 sec/batch
Epoch: 4/20...  Training Step: 13952...  Training loss: 1.7606...  0.2339 sec/batch
Epoch: 4/20...  Training Step: 13953...  Training loss: 1.7715...  0.2067 sec/batch
Epoch: 4/20...  Training Step: 13954...  Training loss: 1.7446...  0.1944 sec/batch
Epoch: 4/20...  Training Step: 13955...  Training loss: 1.7863...  0.1824 se

Epoch: 4/20...  Training Step: 14042...  Training loss: 1.6308...  0.2217 sec/batch
Epoch: 4/20...  Training Step: 14043...  Training loss: 1.8050...  0.2525 sec/batch
Epoch: 4/20...  Training Step: 14044...  Training loss: 1.6628...  0.2464 sec/batch
Epoch: 4/20...  Training Step: 14045...  Training loss: 1.6637...  0.1846 sec/batch
Epoch: 4/20...  Training Step: 14046...  Training loss: 1.7141...  0.2192 sec/batch
Epoch: 4/20...  Training Step: 14047...  Training loss: 1.6940...  0.1723 sec/batch
Epoch: 4/20...  Training Step: 14048...  Training loss: 1.7091...  0.2204 sec/batch
Epoch: 4/20...  Training Step: 14049...  Training loss: 1.7271...  0.2067 sec/batch
Epoch: 4/20...  Training Step: 14050...  Training loss: 1.7400...  0.1944 sec/batch
Epoch: 4/20...  Training Step: 14051...  Training loss: 1.6989...  0.2044 sec/batch
Epoch: 4/20...  Training Step: 14052...  Training loss: 1.7700...  0.2052 sec/batch
Epoch: 4/20...  Training Step: 14053...  Training loss: 1.7601...  0.2002 se

Epoch: 4/20...  Training Step: 14140...  Training loss: 1.7201...  0.2094 sec/batch
Epoch: 4/20...  Training Step: 14141...  Training loss: 1.7507...  0.1841 sec/batch
Epoch: 4/20...  Training Step: 14142...  Training loss: 1.6406...  0.2050 sec/batch
Epoch: 4/20...  Training Step: 14143...  Training loss: 1.6428...  0.1988 sec/batch
Epoch: 4/20...  Training Step: 14144...  Training loss: 1.6990...  0.2125 sec/batch
Epoch: 4/20...  Training Step: 14145...  Training loss: 1.5988...  0.2074 sec/batch
Epoch: 4/20...  Training Step: 14146...  Training loss: 1.4994...  0.1934 sec/batch
Epoch: 4/20...  Training Step: 14147...  Training loss: 1.6672...  0.2398 sec/batch
Epoch: 4/20...  Training Step: 14148...  Training loss: 1.5843...  0.2598 sec/batch
Epoch: 4/20...  Training Step: 14149...  Training loss: 1.7563...  0.1909 sec/batch
Epoch: 4/20...  Training Step: 14150...  Training loss: 1.9601...  0.1655 sec/batch
Epoch: 4/20...  Training Step: 14151...  Training loss: 1.7827...  0.2210 se

Epoch: 4/20...  Training Step: 14237...  Training loss: 1.7159...  0.1986 sec/batch
Epoch: 4/20...  Training Step: 14238...  Training loss: 1.6924...  0.2032 sec/batch
Epoch: 4/20...  Training Step: 14239...  Training loss: 1.8534...  0.2232 sec/batch
Epoch: 4/20...  Training Step: 14240...  Training loss: 1.6115...  0.2118 sec/batch
Epoch: 4/20...  Training Step: 14241...  Training loss: 1.6078...  0.1921 sec/batch
Epoch: 4/20...  Training Step: 14242...  Training loss: 1.7523...  0.2338 sec/batch
Epoch: 4/20...  Training Step: 14243...  Training loss: 1.6191...  0.2060 sec/batch
Epoch: 4/20...  Training Step: 14244...  Training loss: 1.5925...  0.1973 sec/batch
Epoch: 4/20...  Training Step: 14245...  Training loss: 1.6916...  0.1864 sec/batch
Epoch: 4/20...  Training Step: 14246...  Training loss: 1.7132...  0.2313 sec/batch
Epoch: 4/20...  Training Step: 14247...  Training loss: 1.8506...  0.1415 sec/batch
Epoch: 4/20...  Training Step: 14248...  Training loss: 1.8834...  0.2155 se

Epoch: 4/20...  Training Step: 14335...  Training loss: 1.6223...  0.1785 sec/batch
Epoch: 4/20...  Training Step: 14336...  Training loss: 1.7322...  0.2114 sec/batch
Epoch: 4/20...  Training Step: 14337...  Training loss: 1.6944...  0.1906 sec/batch
Epoch: 4/20...  Training Step: 14338...  Training loss: 1.7941...  0.1888 sec/batch
Epoch: 4/20...  Training Step: 14339...  Training loss: 1.8619...  0.2053 sec/batch
Epoch: 4/20...  Training Step: 14340...  Training loss: 1.7000...  0.2176 sec/batch
Epoch: 4/20...  Training Step: 14341...  Training loss: 1.8120...  0.1917 sec/batch
Epoch: 4/20...  Training Step: 14342...  Training loss: 1.8172...  0.1918 sec/batch
Epoch: 4/20...  Training Step: 14343...  Training loss: 1.7532...  0.2303 sec/batch
Epoch: 4/20...  Training Step: 14344...  Training loss: 1.6487...  0.1928 sec/batch
Epoch: 4/20...  Training Step: 14345...  Training loss: 1.5676...  0.2014 sec/batch
Epoch: 4/20...  Training Step: 14346...  Training loss: 1.7163...  0.2056 se

Epoch: 4/20...  Training Step: 14434...  Training loss: 1.8211...  0.2194 sec/batch
Epoch: 4/20...  Training Step: 14435...  Training loss: 1.7096...  0.1948 sec/batch
Epoch: 4/20...  Training Step: 14436...  Training loss: 1.6403...  0.2194 sec/batch
Epoch: 4/20...  Training Step: 14437...  Training loss: 1.8812...  0.1847 sec/batch
Epoch: 4/20...  Training Step: 14438...  Training loss: 1.6676...  0.2028 sec/batch
Epoch: 4/20...  Training Step: 14439...  Training loss: 1.7769...  0.1968 sec/batch
Epoch: 4/20...  Training Step: 14440...  Training loss: 1.6759...  0.2283 sec/batch
Epoch: 4/20...  Training Step: 14441...  Training loss: 1.6496...  0.1933 sec/batch
Epoch: 4/20...  Training Step: 14442...  Training loss: 1.6334...  0.1645 sec/batch
Epoch: 4/20...  Training Step: 14443...  Training loss: 1.8004...  0.2443 sec/batch
Epoch: 4/20...  Training Step: 14444...  Training loss: 1.6370...  0.1885 sec/batch
Epoch: 4/20...  Training Step: 14445...  Training loss: 1.6404...  0.2244 se

Epoch: 4/20...  Training Step: 14533...  Training loss: 1.6474...  0.2334 sec/batch
Epoch: 4/20...  Training Step: 14534...  Training loss: 1.8190...  0.1973 sec/batch
Epoch: 4/20...  Training Step: 14535...  Training loss: 1.8104...  0.1948 sec/batch
Epoch: 4/20...  Training Step: 14536...  Training loss: 1.7010...  0.2111 sec/batch
Epoch: 4/20...  Training Step: 14537...  Training loss: 1.7409...  0.1866 sec/batch
Epoch: 4/20...  Training Step: 14538...  Training loss: 1.7299...  0.2248 sec/batch
Epoch: 4/20...  Training Step: 14539...  Training loss: 1.6055...  0.2111 sec/batch
Epoch: 4/20...  Training Step: 14540...  Training loss: 1.8931...  0.1615 sec/batch
Epoch: 4/20...  Training Step: 14541...  Training loss: 1.8152...  0.2065 sec/batch
Epoch: 4/20...  Training Step: 14542...  Training loss: 1.5808...  0.2137 sec/batch
Epoch: 4/20...  Training Step: 14543...  Training loss: 1.6999...  0.1746 sec/batch
Epoch: 4/20...  Training Step: 14544...  Training loss: 1.7709...  0.2002 se

Epoch: 4/20...  Training Step: 14631...  Training loss: 1.6897...  0.2247 sec/batch
Epoch: 4/20...  Training Step: 14632...  Training loss: 1.6149...  0.2245 sec/batch
Epoch: 4/20...  Training Step: 14633...  Training loss: 1.6795...  0.2489 sec/batch
Epoch: 4/20...  Training Step: 14634...  Training loss: 1.6962...  0.2398 sec/batch
Epoch: 4/20...  Training Step: 14635...  Training loss: 1.7047...  0.2045 sec/batch
Epoch: 4/20...  Training Step: 14636...  Training loss: 1.8438...  0.2361 sec/batch
Epoch: 4/20...  Training Step: 14637...  Training loss: 1.7939...  0.1935 sec/batch
Epoch: 4/20...  Training Step: 14638...  Training loss: 1.8224...  0.1916 sec/batch
Epoch: 4/20...  Training Step: 14639...  Training loss: 1.8012...  0.1924 sec/batch
Epoch: 4/20...  Training Step: 14640...  Training loss: 1.6507...  0.2375 sec/batch
Epoch: 4/20...  Training Step: 14641...  Training loss: 1.8283...  0.2574 sec/batch
Epoch: 4/20...  Training Step: 14642...  Training loss: 1.6495...  0.2270 se

Epoch: 4/20...  Training Step: 14730...  Training loss: 1.8138...  0.2257 sec/batch
Epoch: 4/20...  Training Step: 14731...  Training loss: 1.8425...  0.1896 sec/batch
Epoch: 4/20...  Training Step: 14732...  Training loss: 1.7009...  0.2185 sec/batch
Epoch: 4/20...  Training Step: 14733...  Training loss: 1.8240...  0.1815 sec/batch
Epoch: 4/20...  Training Step: 14734...  Training loss: 1.5632...  0.1925 sec/batch
Epoch: 4/20...  Training Step: 14735...  Training loss: 1.9095...  0.2102 sec/batch
Epoch: 4/20...  Training Step: 14736...  Training loss: 1.7981...  0.1744 sec/batch
Epoch: 4/20...  Training Step: 14737...  Training loss: 1.7613...  0.2293 sec/batch
Epoch: 4/20...  Training Step: 14738...  Training loss: 1.8269...  0.2051 sec/batch
Epoch: 4/20...  Training Step: 14739...  Training loss: 1.7357...  0.2464 sec/batch
Epoch: 4/20...  Training Step: 14740...  Training loss: 1.8122...  0.1749 sec/batch
Epoch: 4/20...  Training Step: 14741...  Training loss: 1.7399...  0.1750 se

Epoch: 4/20...  Training Step: 14828...  Training loss: 1.7516...  0.2465 sec/batch
Epoch: 4/20...  Training Step: 14829...  Training loss: 1.7800...  0.1803 sec/batch
Epoch: 4/20...  Training Step: 14830...  Training loss: 1.8353...  0.2143 sec/batch
Epoch: 4/20...  Training Step: 14831...  Training loss: 1.7172...  0.2159 sec/batch
Epoch: 4/20...  Training Step: 14832...  Training loss: 1.6774...  0.2016 sec/batch
Epoch: 4/20...  Training Step: 14833...  Training loss: 1.6348...  0.2242 sec/batch
Epoch: 4/20...  Training Step: 14834...  Training loss: 1.7948...  0.1806 sec/batch
Epoch: 4/20...  Training Step: 14835...  Training loss: 1.6096...  0.1980 sec/batch
Epoch: 4/20...  Training Step: 14836...  Training loss: 1.6263...  0.1881 sec/batch
Epoch: 4/20...  Training Step: 14837...  Training loss: 1.6834...  0.2356 sec/batch
Epoch: 4/20...  Training Step: 14838...  Training loss: 1.7832...  0.1901 sec/batch
Epoch: 4/20...  Training Step: 14839...  Training loss: 1.6173...  0.1865 se

Epoch: 4/20...  Training Step: 14926...  Training loss: 1.6454...  0.1748 sec/batch
Epoch: 4/20...  Training Step: 14927...  Training loss: 1.7221...  0.1451 sec/batch
Epoch: 4/20...  Training Step: 14928...  Training loss: 1.7761...  0.2002 sec/batch
Epoch: 4/20...  Training Step: 14929...  Training loss: 1.6001...  0.1721 sec/batch
Epoch: 4/20...  Training Step: 14930...  Training loss: 1.8427...  0.2322 sec/batch
Epoch: 4/20...  Training Step: 14931...  Training loss: 1.7318...  0.1757 sec/batch
Epoch: 4/20...  Training Step: 14932...  Training loss: 1.7336...  0.2160 sec/batch
Epoch: 4/20...  Training Step: 14933...  Training loss: 1.7004...  0.2252 sec/batch
Epoch: 4/20...  Training Step: 14934...  Training loss: 1.6766...  0.1899 sec/batch
Epoch: 4/20...  Training Step: 14935...  Training loss: 1.6647...  0.1969 sec/batch
Epoch: 4/20...  Training Step: 14936...  Training loss: 1.6863...  0.2361 sec/batch
Epoch: 4/20...  Training Step: 14937...  Training loss: 1.6951...  0.1793 se

Epoch: 4/20...  Training Step: 15024...  Training loss: 1.6591...  0.2058 sec/batch
Epoch: 4/20...  Training Step: 15025...  Training loss: 1.6956...  0.2156 sec/batch
Epoch: 4/20...  Training Step: 15026...  Training loss: 1.6340...  0.1907 sec/batch
Epoch: 4/20...  Training Step: 15027...  Training loss: 1.5703...  0.2093 sec/batch
Epoch: 4/20...  Training Step: 15028...  Training loss: 1.7192...  0.2144 sec/batch
Epoch: 4/20...  Training Step: 15029...  Training loss: 1.7192...  0.2464 sec/batch
Epoch: 4/20...  Training Step: 15030...  Training loss: 1.6571...  0.2159 sec/batch
Epoch: 4/20...  Training Step: 15031...  Training loss: 1.7022...  0.2454 sec/batch
Epoch: 4/20...  Training Step: 15032...  Training loss: 1.6181...  0.1871 sec/batch
Epoch: 4/20...  Training Step: 15033...  Training loss: 1.7931...  0.1854 sec/batch
Epoch: 4/20...  Training Step: 15034...  Training loss: 1.7128...  0.1901 sec/batch
Epoch: 4/20...  Training Step: 15035...  Training loss: 1.7019...  0.1976 se

Epoch: 4/20...  Training Step: 15122...  Training loss: 1.7388...  0.2089 sec/batch
Epoch: 4/20...  Training Step: 15123...  Training loss: 1.8638...  0.1923 sec/batch
Epoch: 4/20...  Training Step: 15124...  Training loss: 1.7429...  0.1915 sec/batch
Epoch: 4/20...  Training Step: 15125...  Training loss: 1.7040...  0.1827 sec/batch
Epoch: 4/20...  Training Step: 15126...  Training loss: 1.7157...  0.1924 sec/batch
Epoch: 4/20...  Training Step: 15127...  Training loss: 1.6881...  0.2273 sec/batch
Epoch: 4/20...  Training Step: 15128...  Training loss: 1.5982...  0.2082 sec/batch
Epoch: 4/20...  Training Step: 15129...  Training loss: 1.6273...  0.1858 sec/batch
Epoch: 4/20...  Training Step: 15130...  Training loss: 1.6886...  0.1872 sec/batch
Epoch: 4/20...  Training Step: 15131...  Training loss: 1.6166...  0.1798 sec/batch
Epoch: 4/20...  Training Step: 15132...  Training loss: 1.7569...  0.2034 sec/batch
Epoch: 4/20...  Training Step: 15133...  Training loss: 1.6993...  0.1936 se

Epoch: 4/20...  Training Step: 15221...  Training loss: 1.6198...  0.1956 sec/batch
Epoch: 4/20...  Training Step: 15222...  Training loss: 1.6614...  0.2092 sec/batch
Epoch: 4/20...  Training Step: 15223...  Training loss: 1.6525...  0.1940 sec/batch
Epoch: 4/20...  Training Step: 15224...  Training loss: 1.7182...  0.2034 sec/batch
Epoch: 4/20...  Training Step: 15225...  Training loss: 1.7738...  0.1982 sec/batch
Epoch: 4/20...  Training Step: 15226...  Training loss: 1.8116...  0.2008 sec/batch
Epoch: 4/20...  Training Step: 15227...  Training loss: 1.7263...  0.2078 sec/batch
Epoch: 4/20...  Training Step: 15228...  Training loss: 1.7238...  0.2163 sec/batch
Epoch: 4/20...  Training Step: 15229...  Training loss: 1.7710...  0.1946 sec/batch
Epoch: 4/20...  Training Step: 15230...  Training loss: 1.7515...  0.2460 sec/batch
Epoch: 4/20...  Training Step: 15231...  Training loss: 1.6509...  0.1964 sec/batch
Epoch: 4/20...  Training Step: 15232...  Training loss: 1.6135...  0.1854 se

Epoch: 4/20...  Training Step: 15319...  Training loss: 1.7348...  0.2199 sec/batch
Epoch: 4/20...  Training Step: 15320...  Training loss: 1.6740...  0.1899 sec/batch
Epoch: 4/20...  Training Step: 15321...  Training loss: 1.6520...  0.1783 sec/batch
Epoch: 4/20...  Training Step: 15322...  Training loss: 1.5927...  0.2735 sec/batch
Epoch: 4/20...  Training Step: 15323...  Training loss: 1.5955...  0.2230 sec/batch
Epoch: 4/20...  Training Step: 15324...  Training loss: 1.6151...  0.2088 sec/batch
Epoch: 4/20...  Training Step: 15325...  Training loss: 1.6560...  0.2032 sec/batch
Epoch: 4/20...  Training Step: 15326...  Training loss: 1.7105...  0.2092 sec/batch
Epoch: 4/20...  Training Step: 15327...  Training loss: 1.7974...  0.2148 sec/batch
Epoch: 4/20...  Training Step: 15328...  Training loss: 1.7437...  0.1758 sec/batch
Epoch: 4/20...  Training Step: 15329...  Training loss: 1.7649...  0.2286 sec/batch
Epoch: 4/20...  Training Step: 15330...  Training loss: 1.8069...  0.1798 se

Epoch: 4/20...  Training Step: 15417...  Training loss: 1.6258...  0.2081 sec/batch
Epoch: 4/20...  Training Step: 15418...  Training loss: 1.6656...  0.2252 sec/batch
Epoch: 4/20...  Training Step: 15419...  Training loss: 1.6072...  0.1628 sec/batch
Epoch: 4/20...  Training Step: 15420...  Training loss: 1.6900...  0.2051 sec/batch
Epoch: 4/20...  Training Step: 15421...  Training loss: 1.7808...  0.1719 sec/batch
Epoch: 4/20...  Training Step: 15422...  Training loss: 1.6545...  0.2561 sec/batch
Epoch: 4/20...  Training Step: 15423...  Training loss: 1.7668...  0.1742 sec/batch
Epoch: 4/20...  Training Step: 15424...  Training loss: 1.6105...  0.2028 sec/batch
Epoch: 4/20...  Training Step: 15425...  Training loss: 1.5300...  0.2342 sec/batch
Epoch: 4/20...  Training Step: 15426...  Training loss: 1.7390...  0.1965 sec/batch
Epoch: 4/20...  Training Step: 15427...  Training loss: 1.6340...  0.1960 sec/batch
Epoch: 4/20...  Training Step: 15428...  Training loss: 1.7544...  0.2219 se

Epoch: 4/20...  Training Step: 15514...  Training loss: 1.8210...  0.1974 sec/batch
Epoch: 4/20...  Training Step: 15515...  Training loss: 1.7006...  0.1840 sec/batch
Epoch: 4/20...  Training Step: 15516...  Training loss: 1.6824...  0.2127 sec/batch
Epoch: 4/20...  Training Step: 15517...  Training loss: 1.7732...  0.2062 sec/batch
Epoch: 4/20...  Training Step: 15518...  Training loss: 1.7116...  0.2032 sec/batch
Epoch: 4/20...  Training Step: 15519...  Training loss: 1.7878...  0.1974 sec/batch
Epoch: 4/20...  Training Step: 15520...  Training loss: 1.7497...  0.2207 sec/batch
Epoch: 4/20...  Training Step: 15521...  Training loss: 1.7367...  0.2233 sec/batch
Epoch: 4/20...  Training Step: 15522...  Training loss: 1.7207...  0.1971 sec/batch
Epoch: 4/20...  Training Step: 15523...  Training loss: 1.6623...  0.1826 sec/batch
Epoch: 4/20...  Training Step: 15524...  Training loss: 1.7021...  0.2223 sec/batch
Epoch: 4/20...  Training Step: 15525...  Training loss: 1.7308...  0.2007 se

Epoch: 4/20...  Training Step: 15612...  Training loss: 1.7379...  0.2170 sec/batch
Epoch: 4/20...  Training Step: 15613...  Training loss: 1.7086...  0.1971 sec/batch
Epoch: 4/20...  Training Step: 15614...  Training loss: 1.8306...  0.1816 sec/batch
Epoch: 4/20...  Training Step: 15615...  Training loss: 1.6458...  0.2283 sec/batch
Epoch: 4/20...  Training Step: 15616...  Training loss: 1.5893...  0.1720 sec/batch
Epoch: 4/20...  Training Step: 15617...  Training loss: 1.6117...  0.2079 sec/batch
Epoch: 4/20...  Training Step: 15618...  Training loss: 1.7506...  0.2065 sec/batch
Epoch: 4/20...  Training Step: 15619...  Training loss: 1.6997...  0.2210 sec/batch
Epoch: 4/20...  Training Step: 15620...  Training loss: 1.7892...  0.1979 sec/batch
Epoch: 4/20...  Training Step: 15621...  Training loss: 1.8165...  0.1757 sec/batch
Epoch: 4/20...  Training Step: 15622...  Training loss: 1.8814...  0.1988 sec/batch
Epoch: 4/20...  Training Step: 15623...  Training loss: 1.7958...  0.1996 se

Epoch: 4/20...  Training Step: 15710...  Training loss: 1.6390...  0.2151 sec/batch
Epoch: 4/20...  Training Step: 15711...  Training loss: 1.7422...  0.2425 sec/batch
Epoch: 4/20...  Training Step: 15712...  Training loss: 1.6765...  0.1859 sec/batch
Epoch: 4/20...  Training Step: 15713...  Training loss: 1.6666...  0.2095 sec/batch
Epoch: 4/20...  Training Step: 15714...  Training loss: 1.6550...  0.1863 sec/batch
Epoch: 4/20...  Training Step: 15715...  Training loss: 1.7648...  0.2142 sec/batch
Epoch: 4/20...  Training Step: 15716...  Training loss: 1.5789...  0.2055 sec/batch
Epoch: 4/20...  Training Step: 15717...  Training loss: 1.7208...  0.1889 sec/batch
Epoch: 4/20...  Training Step: 15718...  Training loss: 2.0994...  0.2295 sec/batch
Epoch: 4/20...  Training Step: 15719...  Training loss: 1.7676...  0.1628 sec/batch
Epoch: 4/20...  Training Step: 15720...  Training loss: 1.7847...  0.2743 sec/batch
Epoch: 4/20...  Training Step: 15721...  Training loss: 1.6194...  0.1919 se

Epoch: 4/20...  Training Step: 15809...  Training loss: 1.8176...  0.2346 sec/batch
Epoch: 4/20...  Training Step: 15810...  Training loss: 1.8100...  0.1961 sec/batch
Epoch: 4/20...  Training Step: 15811...  Training loss: 1.7676...  0.1776 sec/batch
Epoch: 4/20...  Training Step: 15812...  Training loss: 2.0141...  0.1805 sec/batch
Epoch: 4/20...  Training Step: 15813...  Training loss: 1.8822...  0.2209 sec/batch
Epoch: 4/20...  Training Step: 15814...  Training loss: 1.8474...  0.1864 sec/batch
Epoch: 4/20...  Training Step: 15815...  Training loss: 2.0884...  0.2327 sec/batch
Epoch: 4/20...  Training Step: 15816...  Training loss: 2.0020...  0.1826 sec/batch
Epoch: 4/20...  Training Step: 15817...  Training loss: 1.8563...  0.2204 sec/batch
Epoch: 4/20...  Training Step: 15818...  Training loss: 1.8102...  0.1951 sec/batch
Epoch: 4/20...  Training Step: 15819...  Training loss: 1.9432...  0.1725 sec/batch
Epoch: 4/20...  Training Step: 15820...  Training loss: 1.9913...  0.2141 se

Epoch: 5/20...  Training Step: 15907...  Training loss: 1.8478...  0.2180 sec/batch
Epoch: 5/20...  Training Step: 15908...  Training loss: 1.9273...  0.1992 sec/batch
Epoch: 5/20...  Training Step: 15909...  Training loss: 1.8852...  0.2260 sec/batch
Epoch: 5/20...  Training Step: 15910...  Training loss: 1.6613...  0.2024 sec/batch
Epoch: 5/20...  Training Step: 15911...  Training loss: 1.8136...  0.1868 sec/batch
Epoch: 5/20...  Training Step: 15912...  Training loss: 1.8669...  0.2208 sec/batch
Epoch: 5/20...  Training Step: 15913...  Training loss: 1.6558...  0.1937 sec/batch
Epoch: 5/20...  Training Step: 15914...  Training loss: 1.8059...  0.2219 sec/batch
Epoch: 5/20...  Training Step: 15915...  Training loss: 1.6937...  0.1752 sec/batch
Epoch: 5/20...  Training Step: 15916...  Training loss: 1.6053...  0.2193 sec/batch
Epoch: 5/20...  Training Step: 15917...  Training loss: 1.6728...  0.1972 sec/batch
Epoch: 5/20...  Training Step: 15918...  Training loss: 1.7979...  0.1985 se

Epoch: 5/20...  Training Step: 16005...  Training loss: 1.6662...  0.2383 sec/batch
Epoch: 5/20...  Training Step: 16006...  Training loss: 1.7173...  0.2026 sec/batch
Epoch: 5/20...  Training Step: 16007...  Training loss: 1.6838...  0.2029 sec/batch
Epoch: 5/20...  Training Step: 16008...  Training loss: 1.6058...  0.2558 sec/batch
Epoch: 5/20...  Training Step: 16009...  Training loss: 1.7836...  0.1825 sec/batch
Epoch: 5/20...  Training Step: 16010...  Training loss: 1.6237...  0.1826 sec/batch
Epoch: 5/20...  Training Step: 16011...  Training loss: 1.6655...  0.2223 sec/batch
Epoch: 5/20...  Training Step: 16012...  Training loss: 1.7104...  0.1824 sec/batch
Epoch: 5/20...  Training Step: 16013...  Training loss: 1.7538...  0.1873 sec/batch
Epoch: 5/20...  Training Step: 16014...  Training loss: 1.7006...  0.2214 sec/batch
Epoch: 5/20...  Training Step: 16015...  Training loss: 1.7759...  0.1863 sec/batch
Epoch: 5/20...  Training Step: 16016...  Training loss: 1.5607...  0.2316 se

Epoch: 5/20...  Training Step: 16104...  Training loss: 1.8074...  0.2067 sec/batch
Epoch: 5/20...  Training Step: 16105...  Training loss: 1.6803...  0.1582 sec/batch
Epoch: 5/20...  Training Step: 16106...  Training loss: 1.7742...  0.2050 sec/batch
Epoch: 5/20...  Training Step: 16107...  Training loss: 1.8461...  0.1823 sec/batch
Epoch: 5/20...  Training Step: 16108...  Training loss: 1.8377...  0.2090 sec/batch
Epoch: 5/20...  Training Step: 16109...  Training loss: 1.6361...  0.2081 sec/batch
Epoch: 5/20...  Training Step: 16110...  Training loss: 1.6069...  0.2165 sec/batch
Epoch: 5/20...  Training Step: 16111...  Training loss: 1.6646...  0.2001 sec/batch
Epoch: 5/20...  Training Step: 16112...  Training loss: 1.7313...  0.1507 sec/batch
Epoch: 5/20...  Training Step: 16113...  Training loss: 1.7179...  0.2036 sec/batch
Epoch: 5/20...  Training Step: 16114...  Training loss: 1.6789...  0.1716 sec/batch
Epoch: 5/20...  Training Step: 16115...  Training loss: 1.7492...  0.2029 se

Epoch: 5/20...  Training Step: 16203...  Training loss: 1.8230...  0.1655 sec/batch
Epoch: 5/20...  Training Step: 16204...  Training loss: 1.6993...  0.2014 sec/batch
Epoch: 5/20...  Training Step: 16205...  Training loss: 1.6595...  0.1666 sec/batch
Epoch: 5/20...  Training Step: 16206...  Training loss: 1.8594...  0.1889 sec/batch
Epoch: 5/20...  Training Step: 16207...  Training loss: 1.7589...  0.1950 sec/batch
Epoch: 5/20...  Training Step: 16208...  Training loss: 1.6731...  0.2057 sec/batch
Epoch: 5/20...  Training Step: 16209...  Training loss: 1.8133...  0.1935 sec/batch
Epoch: 5/20...  Training Step: 16210...  Training loss: 1.6999...  0.2245 sec/batch
Epoch: 5/20...  Training Step: 16211...  Training loss: 1.8071...  0.1741 sec/batch
Epoch: 5/20...  Training Step: 16212...  Training loss: 1.7474...  0.2163 sec/batch
Epoch: 5/20...  Training Step: 16213...  Training loss: 1.6787...  0.2009 sec/batch
Epoch: 5/20...  Training Step: 16214...  Training loss: 1.8105...  0.1693 se

Epoch: 5/20...  Training Step: 16302...  Training loss: 1.7359...  0.2441 sec/batch
Epoch: 5/20...  Training Step: 16303...  Training loss: 1.6432...  0.2139 sec/batch
Epoch: 5/20...  Training Step: 16304...  Training loss: 1.6832...  0.2137 sec/batch
Epoch: 5/20...  Training Step: 16305...  Training loss: 1.7354...  0.1729 sec/batch
Epoch: 5/20...  Training Step: 16306...  Training loss: 1.7111...  0.2377 sec/batch
Epoch: 5/20...  Training Step: 16307...  Training loss: 1.7606...  0.1782 sec/batch
Epoch: 5/20...  Training Step: 16308...  Training loss: 1.7076...  0.2624 sec/batch
Epoch: 5/20...  Training Step: 16309...  Training loss: 1.6828...  0.2025 sec/batch
Epoch: 5/20...  Training Step: 16310...  Training loss: 1.6420...  0.2246 sec/batch
Epoch: 5/20...  Training Step: 16311...  Training loss: 1.6486...  0.1623 sec/batch
Epoch: 5/20...  Training Step: 16312...  Training loss: 1.7070...  0.2168 sec/batch
Epoch: 5/20...  Training Step: 16313...  Training loss: 1.6167...  0.2143 se

Epoch: 5/20...  Training Step: 16401...  Training loss: 1.6822...  0.2308 sec/batch
Epoch: 5/20...  Training Step: 16402...  Training loss: 1.8058...  0.1913 sec/batch
Epoch: 5/20...  Training Step: 16403...  Training loss: 1.7394...  0.2478 sec/batch
Epoch: 5/20...  Training Step: 16404...  Training loss: 1.7078...  0.1944 sec/batch
Epoch: 5/20...  Training Step: 16405...  Training loss: 1.7488...  0.1989 sec/batch
Epoch: 5/20...  Training Step: 16406...  Training loss: 1.6718...  0.1921 sec/batch
Epoch: 5/20...  Training Step: 16407...  Training loss: 1.6974...  0.2316 sec/batch
Epoch: 5/20...  Training Step: 16408...  Training loss: 1.7340...  0.2087 sec/batch
Epoch: 5/20...  Training Step: 16409...  Training loss: 1.7509...  0.1828 sec/batch
Epoch: 5/20...  Training Step: 16410...  Training loss: 1.7281...  0.1956 sec/batch
Epoch: 5/20...  Training Step: 16411...  Training loss: 1.8184...  0.1645 sec/batch
Epoch: 5/20...  Training Step: 16412...  Training loss: 1.6846...  0.2152 se

Epoch: 5/20...  Training Step: 16499...  Training loss: 1.6551...  0.2137 sec/batch
Epoch: 5/20...  Training Step: 16500...  Training loss: 1.5725...  0.1700 sec/batch
Epoch: 5/20...  Training Step: 16501...  Training loss: 1.5953...  0.2226 sec/batch
Epoch: 5/20...  Training Step: 16502...  Training loss: 1.6695...  0.1593 sec/batch
Epoch: 5/20...  Training Step: 16503...  Training loss: 1.7245...  0.2108 sec/batch
Epoch: 5/20...  Training Step: 16504...  Training loss: 1.6780...  0.1796 sec/batch
Epoch: 5/20...  Training Step: 16505...  Training loss: 1.5767...  0.2294 sec/batch
Epoch: 5/20...  Training Step: 16506...  Training loss: 1.6815...  0.1959 sec/batch
Epoch: 5/20...  Training Step: 16507...  Training loss: 1.6358...  0.2057 sec/batch
Epoch: 5/20...  Training Step: 16508...  Training loss: 1.5903...  0.2065 sec/batch
Epoch: 5/20...  Training Step: 16509...  Training loss: 1.6467...  0.2274 sec/batch
Epoch: 5/20...  Training Step: 16510...  Training loss: 1.8464...  0.1720 se

Epoch: 5/20...  Training Step: 16597...  Training loss: 1.6697...  0.2369 sec/batch
Epoch: 5/20...  Training Step: 16598...  Training loss: 1.5900...  0.1910 sec/batch
Epoch: 5/20...  Training Step: 16599...  Training loss: 1.6085...  0.1692 sec/batch
Epoch: 5/20...  Training Step: 16600...  Training loss: 1.7557...  0.2207 sec/batch
Epoch: 5/20...  Training Step: 16601...  Training loss: 1.6884...  0.1550 sec/batch
Epoch: 5/20...  Training Step: 16602...  Training loss: 1.5633...  0.1516 sec/batch
Epoch: 5/20...  Training Step: 16603...  Training loss: 1.6645...  0.1723 sec/batch
Epoch: 5/20...  Training Step: 16604...  Training loss: 1.7161...  0.2053 sec/batch
Epoch: 5/20...  Training Step: 16605...  Training loss: 1.6365...  0.1929 sec/batch
Epoch: 5/20...  Training Step: 16606...  Training loss: 1.6394...  0.2143 sec/batch
Epoch: 5/20...  Training Step: 16607...  Training loss: 1.7181...  0.1970 sec/batch
Epoch: 5/20...  Training Step: 16608...  Training loss: 1.6580...  0.2066 se

Epoch: 5/20...  Training Step: 16696...  Training loss: 1.6238...  0.2012 sec/batch
Epoch: 5/20...  Training Step: 16697...  Training loss: 1.5494...  0.1693 sec/batch
Epoch: 5/20...  Training Step: 16698...  Training loss: 1.6217...  0.1800 sec/batch
Epoch: 5/20...  Training Step: 16699...  Training loss: 1.6662...  0.2375 sec/batch
Epoch: 5/20...  Training Step: 16700...  Training loss: 1.7795...  0.1677 sec/batch
Epoch: 5/20...  Training Step: 16701...  Training loss: 1.7781...  0.2091 sec/batch
Epoch: 5/20...  Training Step: 16702...  Training loss: 1.7873...  0.1997 sec/batch
Epoch: 5/20...  Training Step: 16703...  Training loss: 1.6806...  0.1831 sec/batch
Epoch: 5/20...  Training Step: 16704...  Training loss: 1.5703...  0.1910 sec/batch
Epoch: 5/20...  Training Step: 16705...  Training loss: 1.6019...  0.1853 sec/batch
Epoch: 5/20...  Training Step: 16706...  Training loss: 1.6733...  0.1804 sec/batch
Epoch: 5/20...  Training Step: 16707...  Training loss: 1.7018...  0.1710 se

Epoch: 5/20...  Training Step: 16795...  Training loss: 1.6880...  0.2111 sec/batch
Epoch: 5/20...  Training Step: 16796...  Training loss: 1.7699...  0.1815 sec/batch
Epoch: 5/20...  Training Step: 16797...  Training loss: 1.6857...  0.1830 sec/batch
Epoch: 5/20...  Training Step: 16798...  Training loss: 1.6100...  0.1788 sec/batch
Epoch: 5/20...  Training Step: 16799...  Training loss: 1.5816...  0.2077 sec/batch
Epoch: 5/20...  Training Step: 16800...  Training loss: 1.6194...  0.1704 sec/batch
Epoch: 5/20...  Training Step: 16801...  Training loss: 1.6128...  0.2013 sec/batch
Epoch: 5/20...  Training Step: 16802...  Training loss: 1.7768...  0.1490 sec/batch
Epoch: 5/20...  Training Step: 16803...  Training loss: 1.7953...  0.1847 sec/batch
Epoch: 5/20...  Training Step: 16804...  Training loss: 1.7248...  0.1749 sec/batch
Epoch: 5/20...  Training Step: 16805...  Training loss: 1.7949...  0.2095 sec/batch
Epoch: 5/20...  Training Step: 16806...  Training loss: 1.7446...  0.2245 se

Epoch: 5/20...  Training Step: 16893...  Training loss: 1.6635...  0.2351 sec/batch
Epoch: 5/20...  Training Step: 16894...  Training loss: 1.7392...  0.1825 sec/batch
Epoch: 5/20...  Training Step: 16895...  Training loss: 1.5412...  0.1930 sec/batch
Epoch: 5/20...  Training Step: 16896...  Training loss: 1.7548...  0.2198 sec/batch
Epoch: 5/20...  Training Step: 16897...  Training loss: 1.7534...  0.1968 sec/batch
Epoch: 5/20...  Training Step: 16898...  Training loss: 1.7260...  0.1835 sec/batch
Epoch: 5/20...  Training Step: 16899...  Training loss: 1.7635...  0.1907 sec/batch
Epoch: 5/20...  Training Step: 16900...  Training loss: 1.5062...  0.1898 sec/batch
Epoch: 5/20...  Training Step: 16901...  Training loss: 1.6220...  0.1913 sec/batch
Epoch: 5/20...  Training Step: 16902...  Training loss: 1.6766...  0.1719 sec/batch
Epoch: 5/20...  Training Step: 16903...  Training loss: 1.6141...  0.1617 sec/batch
Epoch: 5/20...  Training Step: 16904...  Training loss: 1.6791...  0.1940 se

Epoch: 5/20...  Training Step: 16991...  Training loss: 1.6829...  0.2024 sec/batch
Epoch: 5/20...  Training Step: 16992...  Training loss: 1.7019...  0.2209 sec/batch
Epoch: 5/20...  Training Step: 16993...  Training loss: 1.5055...  0.1689 sec/batch
Epoch: 5/20...  Training Step: 16994...  Training loss: 1.6564...  0.1973 sec/batch
Epoch: 5/20...  Training Step: 16995...  Training loss: 1.6885...  0.2817 sec/batch
Epoch: 5/20...  Training Step: 16996...  Training loss: 1.7962...  0.1797 sec/batch
Epoch: 5/20...  Training Step: 16997...  Training loss: 1.6470...  0.1741 sec/batch
Epoch: 5/20...  Training Step: 16998...  Training loss: 1.7177...  0.2177 sec/batch
Epoch: 5/20...  Training Step: 16999...  Training loss: 1.5924...  0.2016 sec/batch
Epoch: 5/20...  Training Step: 17000...  Training loss: 1.5069...  0.2278 sec/batch
Epoch: 5/20...  Training Step: 17001...  Training loss: 1.7759...  0.2003 sec/batch
Epoch: 5/20...  Training Step: 17002...  Training loss: 1.6942...  0.1599 se

Epoch: 5/20...  Training Step: 17089...  Training loss: 1.8027...  0.2322 sec/batch
Epoch: 5/20...  Training Step: 17090...  Training loss: 1.7288...  0.1492 sec/batch
Epoch: 5/20...  Training Step: 17091...  Training loss: 1.6752...  0.2024 sec/batch
Epoch: 5/20...  Training Step: 17092...  Training loss: 1.7484...  0.1702 sec/batch
Epoch: 5/20...  Training Step: 17093...  Training loss: 1.9067...  0.2235 sec/batch
Epoch: 5/20...  Training Step: 17094...  Training loss: 1.6558...  0.2163 sec/batch
Epoch: 5/20...  Training Step: 17095...  Training loss: 1.6129...  0.1777 sec/batch
Epoch: 5/20...  Training Step: 17096...  Training loss: 1.8178...  0.2132 sec/batch
Epoch: 5/20...  Training Step: 17097...  Training loss: 1.8433...  0.1876 sec/batch
Epoch: 5/20...  Training Step: 17098...  Training loss: 1.7636...  0.2291 sec/batch
Epoch: 5/20...  Training Step: 17099...  Training loss: 1.6975...  0.1867 sec/batch
Epoch: 5/20...  Training Step: 17100...  Training loss: 1.7100...  0.1983 se

Epoch: 5/20...  Training Step: 17188...  Training loss: 1.7522...  0.2153 sec/batch
Epoch: 5/20...  Training Step: 17189...  Training loss: 1.7046...  0.2051 sec/batch
Epoch: 5/20...  Training Step: 17190...  Training loss: 1.7795...  0.1933 sec/batch
Epoch: 5/20...  Training Step: 17191...  Training loss: 1.7014...  0.1739 sec/batch
Epoch: 5/20...  Training Step: 17192...  Training loss: 1.7300...  0.1960 sec/batch
Epoch: 5/20...  Training Step: 17193...  Training loss: 1.6356...  0.2294 sec/batch
Epoch: 5/20...  Training Step: 17194...  Training loss: 1.7724...  0.1744 sec/batch
Epoch: 5/20...  Training Step: 17195...  Training loss: 1.8332...  0.2238 sec/batch
Epoch: 5/20...  Training Step: 17196...  Training loss: 1.7738...  0.1674 sec/batch
Epoch: 5/20...  Training Step: 17197...  Training loss: 1.8018...  0.2577 sec/batch
Epoch: 5/20...  Training Step: 17198...  Training loss: 1.6543...  0.1598 sec/batch
Epoch: 5/20...  Training Step: 17199...  Training loss: 1.6537...  0.2695 se

Epoch: 5/20...  Training Step: 17286...  Training loss: 1.5544...  0.1992 sec/batch
Epoch: 5/20...  Training Step: 17287...  Training loss: 1.7200...  0.1999 sec/batch
Epoch: 5/20...  Training Step: 17288...  Training loss: 1.7398...  0.2788 sec/batch
Epoch: 5/20...  Training Step: 17289...  Training loss: 1.7321...  0.2120 sec/batch
Epoch: 5/20...  Training Step: 17290...  Training loss: 1.7423...  0.1789 sec/batch
Epoch: 5/20...  Training Step: 17291...  Training loss: 1.5941...  0.1613 sec/batch
Epoch: 5/20...  Training Step: 17292...  Training loss: 1.7413...  0.2327 sec/batch
Epoch: 5/20...  Training Step: 17293...  Training loss: 1.6549...  0.2012 sec/batch
Epoch: 5/20...  Training Step: 17294...  Training loss: 1.9899...  0.2330 sec/batch
Epoch: 5/20...  Training Step: 17295...  Training loss: 1.7857...  0.1950 sec/batch
Epoch: 5/20...  Training Step: 17296...  Training loss: 1.7596...  0.1940 sec/batch
Epoch: 5/20...  Training Step: 17297...  Training loss: 1.8229...  0.1981 se

Epoch: 5/20...  Training Step: 17384...  Training loss: 1.9322...  0.2093 sec/batch
Epoch: 5/20...  Training Step: 17385...  Training loss: 1.5723...  0.1620 sec/batch
Epoch: 5/20...  Training Step: 17386...  Training loss: 1.5906...  0.2069 sec/batch
Epoch: 5/20...  Training Step: 17387...  Training loss: 1.7086...  0.2098 sec/batch
Epoch: 5/20...  Training Step: 17388...  Training loss: 1.6321...  0.1959 sec/batch
Epoch: 5/20...  Training Step: 17389...  Training loss: 1.7327...  0.2104 sec/batch
Epoch: 5/20...  Training Step: 17390...  Training loss: 1.6501...  0.1542 sec/batch
Epoch: 5/20...  Training Step: 17391...  Training loss: 1.6610...  0.2145 sec/batch
Epoch: 5/20...  Training Step: 17392...  Training loss: 1.5694...  0.1795 sec/batch
Epoch: 5/20...  Training Step: 17393...  Training loss: 1.6195...  0.1940 sec/batch
Epoch: 5/20...  Training Step: 17394...  Training loss: 1.6231...  0.1705 sec/batch
Epoch: 5/20...  Training Step: 17395...  Training loss: 1.6569...  0.1945 se

Epoch: 5/20...  Training Step: 17482...  Training loss: 1.6652...  0.1890 sec/batch
Epoch: 5/20...  Training Step: 17483...  Training loss: 1.7259...  0.2121 sec/batch
Epoch: 5/20...  Training Step: 17484...  Training loss: 1.8325...  0.2218 sec/batch
Epoch: 5/20...  Training Step: 17485...  Training loss: 1.9012...  0.1907 sec/batch
Epoch: 5/20...  Training Step: 17486...  Training loss: 1.7623...  0.2320 sec/batch
Epoch: 5/20...  Training Step: 17487...  Training loss: 1.9134...  0.2084 sec/batch
Epoch: 5/20...  Training Step: 17488...  Training loss: 1.9254...  0.2562 sec/batch
Epoch: 5/20...  Training Step: 17489...  Training loss: 1.8086...  0.1822 sec/batch
Epoch: 5/20...  Training Step: 17490...  Training loss: 1.7513...  0.1668 sec/batch
Epoch: 5/20...  Training Step: 17491...  Training loss: 1.9008...  0.2078 sec/batch
Epoch: 5/20...  Training Step: 17492...  Training loss: 1.7131...  0.2032 sec/batch
Epoch: 5/20...  Training Step: 17493...  Training loss: 1.7738...  0.1887 se

Epoch: 5/20...  Training Step: 17580...  Training loss: 1.7566...  0.1878 sec/batch
Epoch: 5/20...  Training Step: 17581...  Training loss: 1.6261...  0.1889 sec/batch
Epoch: 5/20...  Training Step: 17582...  Training loss: 1.6749...  0.1950 sec/batch
Epoch: 5/20...  Training Step: 17583...  Training loss: 1.7241...  0.1919 sec/batch
Epoch: 5/20...  Training Step: 17584...  Training loss: 1.7800...  0.2524 sec/batch
Epoch: 5/20...  Training Step: 17585...  Training loss: 1.8359...  0.2098 sec/batch
Epoch: 5/20...  Training Step: 17586...  Training loss: 1.8488...  0.2212 sec/batch
Epoch: 5/20...  Training Step: 17587...  Training loss: 1.7084...  0.1695 sec/batch
Epoch: 5/20...  Training Step: 17588...  Training loss: 1.6653...  0.1984 sec/batch
Epoch: 5/20...  Training Step: 17589...  Training loss: 1.8104...  0.2019 sec/batch
Epoch: 5/20...  Training Step: 17590...  Training loss: 1.7024...  0.1763 sec/batch
Epoch: 5/20...  Training Step: 17591...  Training loss: 1.6956...  0.1749 se

Epoch: 5/20...  Training Step: 17678...  Training loss: 1.6300...  0.1926 sec/batch
Epoch: 5/20...  Training Step: 17679...  Training loss: 1.6150...  0.1922 sec/batch
Epoch: 5/20...  Training Step: 17680...  Training loss: 1.6866...  0.2192 sec/batch
Epoch: 5/20...  Training Step: 17681...  Training loss: 1.6889...  0.2051 sec/batch
Epoch: 5/20...  Training Step: 17682...  Training loss: 1.7058...  0.1964 sec/batch
Epoch: 5/20...  Training Step: 17683...  Training loss: 1.7454...  0.1838 sec/batch
Epoch: 5/20...  Training Step: 17684...  Training loss: 1.6194...  0.1845 sec/batch
Epoch: 5/20...  Training Step: 17685...  Training loss: 1.5358...  0.2330 sec/batch
Epoch: 5/20...  Training Step: 17686...  Training loss: 1.6207...  0.1548 sec/batch
Epoch: 5/20...  Training Step: 17687...  Training loss: 1.5815...  0.2176 sec/batch
Epoch: 5/20...  Training Step: 17688...  Training loss: 1.7073...  0.1659 sec/batch
Epoch: 5/20...  Training Step: 17689...  Training loss: 1.6366...  0.2486 se

Epoch: 5/20...  Training Step: 17776...  Training loss: 1.6659...  0.2336 sec/batch
Epoch: 5/20...  Training Step: 17777...  Training loss: 1.7688...  0.2277 sec/batch
Epoch: 5/20...  Training Step: 17778...  Training loss: 1.8055...  0.2045 sec/batch
Epoch: 5/20...  Training Step: 17779...  Training loss: 1.6778...  0.1947 sec/batch
Epoch: 5/20...  Training Step: 17780...  Training loss: 1.5449...  0.1450 sec/batch
Epoch: 5/20...  Training Step: 17781...  Training loss: 1.7502...  0.2311 sec/batch
Epoch: 5/20...  Training Step: 17782...  Training loss: 1.7828...  0.1938 sec/batch
Epoch: 5/20...  Training Step: 17783...  Training loss: 1.6626...  0.2219 sec/batch
Epoch: 5/20...  Training Step: 17784...  Training loss: 1.5271...  0.1677 sec/batch
Epoch: 5/20...  Training Step: 17785...  Training loss: 1.5934...  0.1874 sec/batch
Epoch: 5/20...  Training Step: 17786...  Training loss: 1.7545...  0.1840 sec/batch
Epoch: 5/20...  Training Step: 17787...  Training loss: 1.7002...  0.2498 se

Epoch: 5/20...  Training Step: 17875...  Training loss: 1.7463...  0.1744 sec/batch
Epoch: 5/20...  Training Step: 17876...  Training loss: 1.7433...  0.2219 sec/batch
Epoch: 5/20...  Training Step: 17877...  Training loss: 1.7302...  0.2172 sec/batch
Epoch: 5/20...  Training Step: 17878...  Training loss: 1.6929...  0.2195 sec/batch
Epoch: 5/20...  Training Step: 17879...  Training loss: 1.6914...  0.2132 sec/batch
Epoch: 5/20...  Training Step: 17880...  Training loss: 1.7438...  0.2019 sec/batch
Epoch: 5/20...  Training Step: 17881...  Training loss: 1.6070...  0.2281 sec/batch
Epoch: 5/20...  Training Step: 17882...  Training loss: 1.6000...  0.1918 sec/batch
Epoch: 5/20...  Training Step: 17883...  Training loss: 1.5729...  0.1688 sec/batch
Epoch: 5/20...  Training Step: 17884...  Training loss: 1.5626...  0.1893 sec/batch
Epoch: 5/20...  Training Step: 17885...  Training loss: 1.7275...  0.2112 sec/batch
Epoch: 5/20...  Training Step: 17886...  Training loss: 1.5713...  0.1863 se

Epoch: 5/20...  Training Step: 17973...  Training loss: 1.6763...  0.2042 sec/batch
Epoch: 5/20...  Training Step: 17974...  Training loss: 1.6473...  0.1939 sec/batch
Epoch: 5/20...  Training Step: 17975...  Training loss: 1.6850...  0.1909 sec/batch
Epoch: 5/20...  Training Step: 17976...  Training loss: 1.7178...  0.2150 sec/batch
Epoch: 5/20...  Training Step: 17977...  Training loss: 1.7236...  0.1995 sec/batch
Epoch: 5/20...  Training Step: 17978...  Training loss: 1.7034...  0.1686 sec/batch
Epoch: 5/20...  Training Step: 17979...  Training loss: 1.5539...  0.1966 sec/batch
Epoch: 5/20...  Training Step: 17980...  Training loss: 1.6637...  0.1926 sec/batch
Epoch: 5/20...  Training Step: 17981...  Training loss: 1.5417...  0.2076 sec/batch
Epoch: 5/20...  Training Step: 17982...  Training loss: 1.6737...  0.2336 sec/batch
Epoch: 5/20...  Training Step: 17983...  Training loss: 1.8777...  0.1671 sec/batch
Epoch: 5/20...  Training Step: 17984...  Training loss: 1.7381...  0.1975 se

Epoch: 5/20...  Training Step: 18072...  Training loss: 1.6388...  0.1707 sec/batch
Epoch: 5/20...  Training Step: 18073...  Training loss: 1.5649...  0.1789 sec/batch
Epoch: 5/20...  Training Step: 18074...  Training loss: 1.6227...  0.2358 sec/batch
Epoch: 5/20...  Training Step: 18075...  Training loss: 1.6565...  0.1707 sec/batch
Epoch: 5/20...  Training Step: 18076...  Training loss: 1.7521...  0.2607 sec/batch
Epoch: 5/20...  Training Step: 18077...  Training loss: 1.6642...  0.1859 sec/batch
Epoch: 5/20...  Training Step: 18078...  Training loss: 1.7808...  0.1700 sec/batch
Epoch: 5/20...  Training Step: 18079...  Training loss: 1.6135...  0.1977 sec/batch
Epoch: 5/20...  Training Step: 18080...  Training loss: 1.6674...  0.1806 sec/batch
Epoch: 5/20...  Training Step: 18081...  Training loss: 1.8078...  0.2192 sec/batch
Epoch: 5/20...  Training Step: 18082...  Training loss: 1.8150...  0.2076 sec/batch
Epoch: 5/20...  Training Step: 18083...  Training loss: 1.7612...  0.1771 se

Epoch: 5/20...  Training Step: 18170...  Training loss: 1.7140...  0.1595 sec/batch
Epoch: 5/20...  Training Step: 18171...  Training loss: 1.6176...  0.2071 sec/batch
Epoch: 5/20...  Training Step: 18172...  Training loss: 1.7045...  0.2053 sec/batch
Epoch: 5/20...  Training Step: 18173...  Training loss: 1.5686...  0.2092 sec/batch
Epoch: 5/20...  Training Step: 18174...  Training loss: 1.6763...  0.2341 sec/batch
Epoch: 5/20...  Training Step: 18175...  Training loss: 1.7128...  0.2469 sec/batch
Epoch: 5/20...  Training Step: 18176...  Training loss: 1.5531...  0.1422 sec/batch
Epoch: 5/20...  Training Step: 18177...  Training loss: 1.6651...  0.2150 sec/batch
Epoch: 5/20...  Training Step: 18178...  Training loss: 1.6015...  0.1674 sec/batch
Epoch: 5/20...  Training Step: 18179...  Training loss: 1.8626...  0.2620 sec/batch
Epoch: 5/20...  Training Step: 18180...  Training loss: 1.6416...  0.1634 sec/batch
Epoch: 5/20...  Training Step: 18181...  Training loss: 1.7437...  0.2882 se

Epoch: 5/20...  Training Step: 18268...  Training loss: 1.7609...  0.2042 sec/batch
Epoch: 5/20...  Training Step: 18269...  Training loss: 1.7406...  0.1658 sec/batch
Epoch: 5/20...  Training Step: 18270...  Training loss: 1.7281...  0.2111 sec/batch
Epoch: 5/20...  Training Step: 18271...  Training loss: 1.7851...  0.1655 sec/batch
Epoch: 5/20...  Training Step: 18272...  Training loss: 1.7634...  0.1964 sec/batch
Epoch: 5/20...  Training Step: 18273...  Training loss: 1.8206...  0.2019 sec/batch
Epoch: 5/20...  Training Step: 18274...  Training loss: 1.8091...  0.2178 sec/batch
Epoch: 5/20...  Training Step: 18275...  Training loss: 1.7181...  0.2124 sec/batch
Epoch: 5/20...  Training Step: 18276...  Training loss: 1.6924...  0.1628 sec/batch
Epoch: 5/20...  Training Step: 18277...  Training loss: 1.6720...  0.2038 sec/batch
Epoch: 5/20...  Training Step: 18278...  Training loss: 1.6430...  0.1982 sec/batch
Epoch: 5/20...  Training Step: 18279...  Training loss: 1.6754...  0.2037 se

Epoch: 5/20...  Training Step: 18366...  Training loss: 1.6372...  0.2052 sec/batch
Epoch: 5/20...  Training Step: 18367...  Training loss: 1.5760...  0.1743 sec/batch
Epoch: 5/20...  Training Step: 18368...  Training loss: 1.5770...  0.1757 sec/batch
Epoch: 5/20...  Training Step: 18369...  Training loss: 1.8356...  0.1878 sec/batch
Epoch: 5/20...  Training Step: 18370...  Training loss: 1.7419...  0.2379 sec/batch
Epoch: 5/20...  Training Step: 18371...  Training loss: 1.7719...  0.2746 sec/batch
Epoch: 5/20...  Training Step: 18372...  Training loss: 1.7166...  0.1708 sec/batch
Epoch: 5/20...  Training Step: 18373...  Training loss: 1.6392...  0.2480 sec/batch
Epoch: 5/20...  Training Step: 18374...  Training loss: 1.7140...  0.1950 sec/batch
Epoch: 5/20...  Training Step: 18375...  Training loss: 1.6163...  0.1571 sec/batch
Epoch: 5/20...  Training Step: 18376...  Training loss: 1.6310...  0.2063 sec/batch
Epoch: 5/20...  Training Step: 18377...  Training loss: 1.7009...  0.2078 se

Epoch: 5/20...  Training Step: 18464...  Training loss: 1.6171...  0.2242 sec/batch
Epoch: 5/20...  Training Step: 18465...  Training loss: 1.6804...  0.2406 sec/batch
Epoch: 5/20...  Training Step: 18466...  Training loss: 1.8508...  0.1736 sec/batch
Epoch: 5/20...  Training Step: 18467...  Training loss: 1.7869...  0.2410 sec/batch
Epoch: 5/20...  Training Step: 18468...  Training loss: 1.8843...  0.1884 sec/batch
Epoch: 5/20...  Training Step: 18469...  Training loss: 1.7620...  0.2082 sec/batch
Epoch: 5/20...  Training Step: 18470...  Training loss: 1.8017...  0.2549 sec/batch
Epoch: 5/20...  Training Step: 18471...  Training loss: 1.6542...  0.1723 sec/batch
Epoch: 5/20...  Training Step: 18472...  Training loss: 1.7008...  0.2104 sec/batch
Epoch: 5/20...  Training Step: 18473...  Training loss: 1.7416...  0.1771 sec/batch
Epoch: 5/20...  Training Step: 18474...  Training loss: 1.6959...  0.2369 sec/batch
Epoch: 5/20...  Training Step: 18475...  Training loss: 1.6568...  0.1974 se

Epoch: 5/20...  Training Step: 18563...  Training loss: 1.7756...  0.1564 sec/batch
Epoch: 5/20...  Training Step: 18564...  Training loss: 1.5989...  0.2782 sec/batch
Epoch: 5/20...  Training Step: 18565...  Training loss: 1.6687...  0.1820 sec/batch
Epoch: 5/20...  Training Step: 18566...  Training loss: 1.5873...  0.2115 sec/batch
Epoch: 5/20...  Training Step: 18567...  Training loss: 1.6788...  0.2482 sec/batch
Epoch: 5/20...  Training Step: 18568...  Training loss: 1.6827...  0.2418 sec/batch
Epoch: 5/20...  Training Step: 18569...  Training loss: 1.6746...  0.1927 sec/batch
Epoch: 5/20...  Training Step: 18570...  Training loss: 1.7419...  0.1801 sec/batch
Epoch: 5/20...  Training Step: 18571...  Training loss: 1.7746...  0.1944 sec/batch
Epoch: 5/20...  Training Step: 18572...  Training loss: 1.6963...  0.2198 sec/batch
Epoch: 5/20...  Training Step: 18573...  Training loss: 1.5846...  0.1818 sec/batch
Epoch: 5/20...  Training Step: 18574...  Training loss: 1.7443...  0.1695 se

Epoch: 5/20...  Training Step: 18661...  Training loss: 1.5903...  0.2132 sec/batch
Epoch: 5/20...  Training Step: 18662...  Training loss: 1.9040...  0.2113 sec/batch
Epoch: 5/20...  Training Step: 18663...  Training loss: 1.5939...  0.1540 sec/batch
Epoch: 5/20...  Training Step: 18664...  Training loss: 1.5793...  0.2087 sec/batch
Epoch: 5/20...  Training Step: 18665...  Training loss: 1.6195...  0.1796 sec/batch
Epoch: 5/20...  Training Step: 18666...  Training loss: 1.8284...  0.2201 sec/batch
Epoch: 5/20...  Training Step: 18667...  Training loss: 1.5998...  0.1698 sec/batch
Epoch: 5/20...  Training Step: 18668...  Training loss: 1.6234...  0.1988 sec/batch
Epoch: 5/20...  Training Step: 18669...  Training loss: 1.5699...  0.1791 sec/batch
Epoch: 5/20...  Training Step: 18670...  Training loss: 1.6719...  0.2271 sec/batch
Epoch: 5/20...  Training Step: 18671...  Training loss: 1.6556...  0.2287 sec/batch
Epoch: 5/20...  Training Step: 18672...  Training loss: 1.5897...  0.1757 se

Epoch: 5/20...  Training Step: 18759...  Training loss: 1.9076...  0.1708 sec/batch
Epoch: 5/20...  Training Step: 18760...  Training loss: 1.6386...  0.2165 sec/batch
Epoch: 5/20...  Training Step: 18761...  Training loss: 1.8053...  0.1948 sec/batch
Epoch: 5/20...  Training Step: 18762...  Training loss: 1.8669...  0.2003 sec/batch
Epoch: 5/20...  Training Step: 18763...  Training loss: 1.7454...  0.1857 sec/batch
Epoch: 5/20...  Training Step: 18764...  Training loss: 1.7222...  0.2353 sec/batch
Epoch: 5/20...  Training Step: 18765...  Training loss: 1.6209...  0.1877 sec/batch
Epoch: 5/20...  Training Step: 18766...  Training loss: 1.5687...  0.2290 sec/batch
Epoch: 5/20...  Training Step: 18767...  Training loss: 1.6848...  0.1844 sec/batch
Epoch: 5/20...  Training Step: 18768...  Training loss: 1.8006...  0.1996 sec/batch
Epoch: 5/20...  Training Step: 18769...  Training loss: 1.7102...  0.2226 sec/batch
Epoch: 5/20...  Training Step: 18770...  Training loss: 1.6187...  0.1503 se

Epoch: 5/20...  Training Step: 18858...  Training loss: 1.7619...  0.2086 sec/batch
Epoch: 5/20...  Training Step: 18859...  Training loss: 1.6985...  0.1692 sec/batch
Epoch: 5/20...  Training Step: 18860...  Training loss: 1.8109...  0.2335 sec/batch
Epoch: 5/20...  Training Step: 18861...  Training loss: 1.8332...  0.2207 sec/batch
Epoch: 5/20...  Training Step: 18862...  Training loss: 1.8098...  0.1749 sec/batch
Epoch: 5/20...  Training Step: 18863...  Training loss: 1.5312...  0.2442 sec/batch
Epoch: 5/20...  Training Step: 18864...  Training loss: 1.5326...  0.1523 sec/batch
Epoch: 5/20...  Training Step: 18865...  Training loss: 1.7095...  0.1853 sec/batch
Epoch: 5/20...  Training Step: 18866...  Training loss: 1.7670...  0.2098 sec/batch
Epoch: 5/20...  Training Step: 18867...  Training loss: 1.6244...  0.2355 sec/batch
Epoch: 5/20...  Training Step: 18868...  Training loss: 1.7436...  0.2058 sec/batch
Epoch: 5/20...  Training Step: 18869...  Training loss: 1.7375...  0.2007 se

Epoch: 5/20...  Training Step: 18956...  Training loss: 1.6981...  0.2156 sec/batch
Epoch: 5/20...  Training Step: 18957...  Training loss: 1.7060...  0.1878 sec/batch
Epoch: 5/20...  Training Step: 18958...  Training loss: 1.6085...  0.1625 sec/batch
Epoch: 5/20...  Training Step: 18959...  Training loss: 1.5926...  0.2038 sec/batch
Epoch: 5/20...  Training Step: 18960...  Training loss: 1.6705...  0.1715 sec/batch
Epoch: 5/20...  Training Step: 18961...  Training loss: 1.6506...  0.1931 sec/batch
Epoch: 5/20...  Training Step: 18962...  Training loss: 1.5714...  0.1918 sec/batch
Epoch: 5/20...  Training Step: 18963...  Training loss: 1.7860...  0.1748 sec/batch
Epoch: 5/20...  Training Step: 18964...  Training loss: 1.6280...  0.1843 sec/batch
Epoch: 5/20...  Training Step: 18965...  Training loss: 1.7516...  0.2558 sec/batch
Epoch: 5/20...  Training Step: 18966...  Training loss: 1.5781...  0.1832 sec/batch
Epoch: 5/20...  Training Step: 18967...  Training loss: 1.6230...  0.2017 se

Epoch: 5/20...  Training Step: 19054...  Training loss: 1.5368...  0.2198 sec/batch
Epoch: 5/20...  Training Step: 19055...  Training loss: 1.6278...  0.1676 sec/batch
Epoch: 5/20...  Training Step: 19056...  Training loss: 1.7225...  0.1761 sec/batch
Epoch: 5/20...  Training Step: 19057...  Training loss: 1.8677...  0.2036 sec/batch
Epoch: 5/20...  Training Step: 19058...  Training loss: 1.5943...  0.1984 sec/batch
Epoch: 5/20...  Training Step: 19059...  Training loss: 1.7356...  0.1697 sec/batch
Epoch: 5/20...  Training Step: 19060...  Training loss: 1.5341...  0.2209 sec/batch
Epoch: 5/20...  Training Step: 19061...  Training loss: 1.6896...  0.1897 sec/batch
Epoch: 5/20...  Training Step: 19062...  Training loss: 1.6565...  0.2471 sec/batch
Epoch: 5/20...  Training Step: 19063...  Training loss: 1.6922...  0.1608 sec/batch
Epoch: 5/20...  Training Step: 19064...  Training loss: 1.7166...  0.2110 sec/batch
Epoch: 5/20...  Training Step: 19065...  Training loss: 1.7462...  0.1841 se

Epoch: 5/20...  Training Step: 19153...  Training loss: 1.7099...  0.1712 sec/batch
Epoch: 5/20...  Training Step: 19154...  Training loss: 1.5820...  0.1972 sec/batch
Epoch: 5/20...  Training Step: 19155...  Training loss: 1.6443...  0.2002 sec/batch
Epoch: 5/20...  Training Step: 19156...  Training loss: 1.7108...  0.2135 sec/batch
Epoch: 5/20...  Training Step: 19157...  Training loss: 1.7115...  0.1750 sec/batch
Epoch: 5/20...  Training Step: 19158...  Training loss: 1.6164...  0.1978 sec/batch
Epoch: 5/20...  Training Step: 19159...  Training loss: 1.6580...  0.1844 sec/batch
Epoch: 5/20...  Training Step: 19160...  Training loss: 1.5342...  0.2259 sec/batch
Epoch: 5/20...  Training Step: 19161...  Training loss: 1.6951...  0.2068 sec/batch
Epoch: 5/20...  Training Step: 19162...  Training loss: 1.5816...  0.1766 sec/batch
Epoch: 5/20...  Training Step: 19163...  Training loss: 1.7332...  0.1978 sec/batch
Epoch: 5/20...  Training Step: 19164...  Training loss: 1.6440...  0.1656 se

Epoch: 5/20...  Training Step: 19251...  Training loss: 1.7566...  0.1745 sec/batch
Epoch: 5/20...  Training Step: 19252...  Training loss: 1.7905...  0.1973 sec/batch
Epoch: 5/20...  Training Step: 19253...  Training loss: 1.7217...  0.1749 sec/batch
Epoch: 5/20...  Training Step: 19254...  Training loss: 1.6562...  0.2014 sec/batch
Epoch: 5/20...  Training Step: 19255...  Training loss: 1.7174...  0.1655 sec/batch
Epoch: 5/20...  Training Step: 19256...  Training loss: 1.7294...  0.2330 sec/batch
Epoch: 5/20...  Training Step: 19257...  Training loss: 1.7702...  0.1644 sec/batch
Epoch: 5/20...  Training Step: 19258...  Training loss: 1.5404...  0.1820 sec/batch
Epoch: 5/20...  Training Step: 19259...  Training loss: 1.4430...  0.1646 sec/batch
Epoch: 5/20...  Training Step: 19260...  Training loss: 1.5544...  0.1886 sec/batch
Epoch: 5/20...  Training Step: 19261...  Training loss: 1.6331...  0.1766 sec/batch
Epoch: 5/20...  Training Step: 19262...  Training loss: 1.5899...  0.1926 se

Epoch: 5/20...  Training Step: 19350...  Training loss: 1.7247...  0.2180 sec/batch
Epoch: 5/20...  Training Step: 19351...  Training loss: 1.6873...  0.2201 sec/batch
Epoch: 5/20...  Training Step: 19352...  Training loss: 1.6561...  0.2149 sec/batch
Epoch: 5/20...  Training Step: 19353...  Training loss: 1.7959...  0.1882 sec/batch
Epoch: 5/20...  Training Step: 19354...  Training loss: 1.6966...  0.2087 sec/batch
Epoch: 5/20...  Training Step: 19355...  Training loss: 1.7113...  0.1788 sec/batch
Epoch: 5/20...  Training Step: 19356...  Training loss: 1.5379...  0.2075 sec/batch
Epoch: 5/20...  Training Step: 19357...  Training loss: 1.7902...  0.2225 sec/batch
Epoch: 5/20...  Training Step: 19358...  Training loss: 1.8136...  0.1951 sec/batch
Epoch: 5/20...  Training Step: 19359...  Training loss: 1.6366...  0.1697 sec/batch
Epoch: 5/20...  Training Step: 19360...  Training loss: 1.4489...  0.1991 sec/batch
Epoch: 5/20...  Training Step: 19361...  Training loss: 1.6517...  0.1701 se

Epoch: 5/20...  Training Step: 19448...  Training loss: 1.5987...  0.2077 sec/batch
Epoch: 5/20...  Training Step: 19449...  Training loss: 1.6510...  0.2041 sec/batch
Epoch: 5/20...  Training Step: 19450...  Training loss: 1.4787...  0.1912 sec/batch
Epoch: 5/20...  Training Step: 19451...  Training loss: 1.6865...  0.1915 sec/batch
Epoch: 5/20...  Training Step: 19452...  Training loss: 1.8324...  0.2429 sec/batch
Epoch: 5/20...  Training Step: 19453...  Training loss: 1.6269...  0.2158 sec/batch
Epoch: 5/20...  Training Step: 19454...  Training loss: 1.6249...  0.2099 sec/batch
Epoch: 5/20...  Training Step: 19455...  Training loss: 1.5236...  0.1600 sec/batch
Epoch: 5/20...  Training Step: 19456...  Training loss: 1.5783...  0.1906 sec/batch
Epoch: 5/20...  Training Step: 19457...  Training loss: 1.5739...  0.1853 sec/batch
Epoch: 5/20...  Training Step: 19458...  Training loss: 1.5988...  0.2091 sec/batch
Epoch: 5/20...  Training Step: 19459...  Training loss: 1.6440...  0.1563 se

Epoch: 5/20...  Training Step: 19547...  Training loss: 1.5668...  0.1877 sec/batch
Epoch: 5/20...  Training Step: 19548...  Training loss: 1.8428...  0.1735 sec/batch
Epoch: 5/20...  Training Step: 19549...  Training loss: 1.8271...  0.2445 sec/batch
Epoch: 5/20...  Training Step: 19550...  Training loss: 1.7000...  0.2026 sec/batch
Epoch: 5/20...  Training Step: 19551...  Training loss: 1.6511...  0.1860 sec/batch
Epoch: 5/20...  Training Step: 19552...  Training loss: 1.7512...  0.1652 sec/batch
Epoch: 5/20...  Training Step: 19553...  Training loss: 1.7779...  0.1945 sec/batch
Epoch: 5/20...  Training Step: 19554...  Training loss: 1.7481...  0.2034 sec/batch
Epoch: 5/20...  Training Step: 19555...  Training loss: 1.5927...  0.1954 sec/batch
Epoch: 5/20...  Training Step: 19556...  Training loss: 1.8873...  0.2112 sec/batch
Epoch: 5/20...  Training Step: 19557...  Training loss: 1.7800...  0.2079 sec/batch
Epoch: 5/20...  Training Step: 19558...  Training loss: 1.7117...  0.1986 se

Epoch: 5/20...  Training Step: 19645...  Training loss: 1.5947...  0.2064 sec/batch
Epoch: 5/20...  Training Step: 19646...  Training loss: 1.6907...  0.1893 sec/batch
Epoch: 5/20...  Training Step: 19647...  Training loss: 1.7038...  0.1988 sec/batch
Epoch: 5/20...  Training Step: 19648...  Training loss: 1.8035...  0.1773 sec/batch
Epoch: 5/20...  Training Step: 19649...  Training loss: 1.8705...  0.2052 sec/batch
Epoch: 5/20...  Training Step: 19650...  Training loss: 1.6859...  0.2377 sec/batch
Epoch: 5/20...  Training Step: 19651...  Training loss: 1.6895...  0.1799 sec/batch
Epoch: 5/20...  Training Step: 19652...  Training loss: 1.6665...  0.2038 sec/batch
Epoch: 5/20...  Training Step: 19653...  Training loss: 1.8430...  0.2075 sec/batch
Epoch: 5/20...  Training Step: 19654...  Training loss: 1.6757...  0.2167 sec/batch
Epoch: 5/20...  Training Step: 19655...  Training loss: 1.7749...  0.2057 sec/batch
Epoch: 5/20...  Training Step: 19656...  Training loss: 1.6774...  0.1921 se

Epoch: 5/20...  Training Step: 19743...  Training loss: 1.6984...  0.1724 sec/batch
Epoch: 5/20...  Training Step: 19744...  Training loss: 1.7127...  0.2011 sec/batch
Epoch: 5/20...  Training Step: 19745...  Training loss: 1.8480...  0.2185 sec/batch
Epoch: 5/20...  Training Step: 19746...  Training loss: 1.7515...  0.1984 sec/batch
Epoch: 5/20...  Training Step: 19747...  Training loss: 1.7609...  0.1979 sec/batch
Epoch: 5/20...  Training Step: 19748...  Training loss: 1.6855...  0.2222 sec/batch
Epoch: 5/20...  Training Step: 19749...  Training loss: 1.7180...  0.1946 sec/batch
Epoch: 5/20...  Training Step: 19750...  Training loss: 1.7712...  0.1875 sec/batch
Epoch: 5/20...  Training Step: 19751...  Training loss: 1.7149...  0.1749 sec/batch
Epoch: 5/20...  Training Step: 19752...  Training loss: 1.7466...  0.1738 sec/batch
Epoch: 5/20...  Training Step: 19753...  Training loss: 1.7316...  0.2069 sec/batch
Epoch: 5/20...  Training Step: 19754...  Training loss: 1.6951...  0.1897 se

Epoch: 5/20...  Training Step: 19842...  Training loss: 1.8195...  0.2254 sec/batch
Epoch: 5/20...  Training Step: 19843...  Training loss: 1.8101...  0.2300 sec/batch
Epoch: 5/20...  Training Step: 19844...  Training loss: 1.7087...  0.1699 sec/batch
Epoch: 5/20...  Training Step: 19845...  Training loss: 1.7432...  0.2037 sec/batch
Epoch: 5/20...  Training Step: 19846...  Training loss: 1.7221...  0.2123 sec/batch
Epoch: 5/20...  Training Step: 19847...  Training loss: 1.7244...  0.1943 sec/batch
Epoch: 5/20...  Training Step: 19848...  Training loss: 1.7635...  0.1809 sec/batch
Epoch: 5/20...  Training Step: 19849...  Training loss: 1.8338...  0.2537 sec/batch
Epoch: 5/20...  Training Step: 19850...  Training loss: 1.8760...  0.1621 sec/batch
Epoch: 6/20...  Training Step: 19851...  Training loss: 1.6770...  0.2186 sec/batch
Epoch: 6/20...  Training Step: 19852...  Training loss: 1.7039...  0.1914 sec/batch
Epoch: 6/20...  Training Step: 19853...  Training loss: 1.6388...  0.2078 se

Epoch: 6/20...  Training Step: 19941...  Training loss: 1.6650...  0.2094 sec/batch
Epoch: 6/20...  Training Step: 19942...  Training loss: 1.6678...  0.2041 sec/batch
Epoch: 6/20...  Training Step: 19943...  Training loss: 1.6690...  0.2082 sec/batch
Epoch: 6/20...  Training Step: 19944...  Training loss: 1.7505...  0.1934 sec/batch
Epoch: 6/20...  Training Step: 19945...  Training loss: 1.5615...  0.1843 sec/batch
Epoch: 6/20...  Training Step: 19946...  Training loss: 1.6262...  0.1837 sec/batch
Epoch: 6/20...  Training Step: 19947...  Training loss: 1.7274...  0.2019 sec/batch
Epoch: 6/20...  Training Step: 19948...  Training loss: 1.6587...  0.1760 sec/batch
Epoch: 6/20...  Training Step: 19949...  Training loss: 1.7166...  0.1905 sec/batch
Epoch: 6/20...  Training Step: 19950...  Training loss: 1.6802...  0.1957 sec/batch
Epoch: 6/20...  Training Step: 19951...  Training loss: 1.8032...  0.1768 sec/batch
Epoch: 6/20...  Training Step: 19952...  Training loss: 1.6192...  0.2195 se

Epoch: 6/20...  Training Step: 20039...  Training loss: 1.6674...  0.1941 sec/batch
Epoch: 6/20...  Training Step: 20040...  Training loss: 1.7445...  0.1854 sec/batch
Epoch: 6/20...  Training Step: 20041...  Training loss: 1.6443...  0.2105 sec/batch
Epoch: 6/20...  Training Step: 20042...  Training loss: 1.5761...  0.1420 sec/batch
Epoch: 6/20...  Training Step: 20043...  Training loss: 1.6494...  0.1895 sec/batch
Epoch: 6/20...  Training Step: 20044...  Training loss: 1.7544...  0.1803 sec/batch
Epoch: 6/20...  Training Step: 20045...  Training loss: 1.7088...  0.1833 sec/batch
Epoch: 6/20...  Training Step: 20046...  Training loss: 1.6805...  0.1688 sec/batch
Epoch: 6/20...  Training Step: 20047...  Training loss: 1.5518...  0.1671 sec/batch
Epoch: 6/20...  Training Step: 20048...  Training loss: 1.7359...  0.2278 sec/batch
Epoch: 6/20...  Training Step: 20049...  Training loss: 1.6299...  0.1602 sec/batch
Epoch: 6/20...  Training Step: 20050...  Training loss: 1.7707...  0.2485 se

Epoch: 6/20...  Training Step: 20138...  Training loss: 1.6935...  0.1640 sec/batch
Epoch: 6/20...  Training Step: 20139...  Training loss: 1.5711...  0.1539 sec/batch
Epoch: 6/20...  Training Step: 20140...  Training loss: 1.5754...  0.1522 sec/batch
Epoch: 6/20...  Training Step: 20141...  Training loss: 1.7154...  0.1996 sec/batch
Epoch: 6/20...  Training Step: 20142...  Training loss: 1.6470...  0.3411 sec/batch
Epoch: 6/20...  Training Step: 20143...  Training loss: 1.5965...  0.1979 sec/batch
Epoch: 6/20...  Training Step: 20144...  Training loss: 1.5139...  0.2707 sec/batch
Epoch: 6/20...  Training Step: 20145...  Training loss: 1.7187...  0.1387 sec/batch
Epoch: 6/20...  Training Step: 20146...  Training loss: 1.7614...  0.1806 sec/batch
Epoch: 6/20...  Training Step: 20147...  Training loss: 1.5415...  0.2236 sec/batch
Epoch: 6/20...  Training Step: 20148...  Training loss: 1.6879...  0.1670 sec/batch
Epoch: 6/20...  Training Step: 20149...  Training loss: 1.5302...  0.1616 se

Epoch: 6/20...  Training Step: 20236...  Training loss: 1.7000...  0.2201 sec/batch
Epoch: 6/20...  Training Step: 20237...  Training loss: 1.7381...  0.1883 sec/batch
Epoch: 6/20...  Training Step: 20238...  Training loss: 1.7569...  0.1837 sec/batch
Epoch: 6/20...  Training Step: 20239...  Training loss: 1.6353...  0.1563 sec/batch
Epoch: 6/20...  Training Step: 20240...  Training loss: 1.7031...  0.1525 sec/batch
Epoch: 6/20...  Training Step: 20241...  Training loss: 1.6403...  0.1531 sec/batch
Epoch: 6/20...  Training Step: 20242...  Training loss: 1.8280...  0.1701 sec/batch
Epoch: 6/20...  Training Step: 20243...  Training loss: 1.6777...  0.1645 sec/batch
Epoch: 6/20...  Training Step: 20244...  Training loss: 1.6823...  0.2118 sec/batch
Epoch: 6/20...  Training Step: 20245...  Training loss: 1.7228...  0.1646 sec/batch
Epoch: 6/20...  Training Step: 20246...  Training loss: 1.7240...  0.1696 sec/batch
Epoch: 6/20...  Training Step: 20247...  Training loss: 1.6749...  0.1521 se

Epoch: 6/20...  Training Step: 20335...  Training loss: 1.7430...  0.1573 sec/batch
Epoch: 6/20...  Training Step: 20336...  Training loss: 1.8040...  0.2730 sec/batch
Epoch: 6/20...  Training Step: 20337...  Training loss: 1.8176...  0.1746 sec/batch
Epoch: 6/20...  Training Step: 20338...  Training loss: 1.6971...  0.1727 sec/batch
Epoch: 6/20...  Training Step: 20339...  Training loss: 1.6952...  0.1400 sec/batch
Epoch: 6/20...  Training Step: 20340...  Training loss: 1.7586...  0.2031 sec/batch
Epoch: 6/20...  Training Step: 20341...  Training loss: 1.6229...  0.1380 sec/batch
Epoch: 6/20...  Training Step: 20342...  Training loss: 1.7115...  0.1843 sec/batch
Epoch: 6/20...  Training Step: 20343...  Training loss: 1.7408...  0.1576 sec/batch
Epoch: 6/20...  Training Step: 20344...  Training loss: 1.8089...  0.2333 sec/batch
Epoch: 6/20...  Training Step: 20345...  Training loss: 1.6109...  0.1469 sec/batch
Epoch: 6/20...  Training Step: 20346...  Training loss: 1.6442...  0.2041 se

Epoch: 6/20...  Training Step: 20434...  Training loss: 1.7953...  0.1549 sec/batch
Epoch: 6/20...  Training Step: 20435...  Training loss: 1.5495...  0.1639 sec/batch
Epoch: 6/20...  Training Step: 20436...  Training loss: 1.7395...  0.1832 sec/batch
Epoch: 6/20...  Training Step: 20437...  Training loss: 1.5774...  0.1905 sec/batch
Epoch: 6/20...  Training Step: 20438...  Training loss: 1.6411...  0.2001 sec/batch
Epoch: 6/20...  Training Step: 20439...  Training loss: 2.0220...  0.1485 sec/batch
Epoch: 6/20...  Training Step: 20440...  Training loss: 1.7704...  0.1713 sec/batch
Epoch: 6/20...  Training Step: 20441...  Training loss: 1.8110...  0.2729 sec/batch
Epoch: 6/20...  Training Step: 20442...  Training loss: 1.6538...  0.1834 sec/batch
Epoch: 6/20...  Training Step: 20443...  Training loss: 1.6431...  0.1696 sec/batch
Epoch: 6/20...  Training Step: 20444...  Training loss: 1.8436...  0.2274 sec/batch
Epoch: 6/20...  Training Step: 20445...  Training loss: 1.8206...  0.1534 se

Epoch: 6/20...  Training Step: 20533...  Training loss: 1.7588...  0.2015 sec/batch
Epoch: 6/20...  Training Step: 20534...  Training loss: 1.6793...  0.2188 sec/batch
Epoch: 6/20...  Training Step: 20535...  Training loss: 1.6879...  0.2032 sec/batch
Epoch: 6/20...  Training Step: 20536...  Training loss: 1.7306...  0.1482 sec/batch
Epoch: 6/20...  Training Step: 20537...  Training loss: 1.7387...  0.2096 sec/batch
Epoch: 6/20...  Training Step: 20538...  Training loss: 1.5731...  0.2370 sec/batch
Epoch: 6/20...  Training Step: 20539...  Training loss: 1.6401...  0.2215 sec/batch
Epoch: 6/20...  Training Step: 20540...  Training loss: 1.6598...  0.2127 sec/batch
Epoch: 6/20...  Training Step: 20541...  Training loss: 1.7553...  0.2433 sec/batch
Epoch: 6/20...  Training Step: 20542...  Training loss: 1.7744...  0.1973 sec/batch
Epoch: 6/20...  Training Step: 20543...  Training loss: 1.6870...  0.2299 sec/batch
Epoch: 6/20...  Training Step: 20544...  Training loss: 1.7667...  0.2176 se

#### Saved checkpoints

Read up on saving and loading checkpoints here: https://www.tensorflow.org/programmers_guide/variables

In [None]:
tf.train.get_checkpoint_state('checkpoints')

## Sampling

Now that the network is trained, we'll can use it to generate new text. The idea is that we pass in a character, then the network will predict the next character. We can use the new one, to predict the next one. And we keep doing this to generate all new text. I also included some functionality to prime the network with some text by passing in a string and building up a state from that.

The network gives us predictions for each character. To reduce noise and make things a little less random, I'm going to only choose a new character from the top N most likely characters.



In [None]:
def pick_top_n(preds, vocab_size, top_n=5):
    p = np.squeeze(preds)
    p[np.argsort(p)[:-top_n]] = 0
    p = p / np.sum(p)
    c = np.random.choice(vocab_size, 1, p=p)[0]
    return c

In [None]:
def sample(checkpoint, n_samples, lstm_size, vocab_size, prime="The "):
    samples = [c for c in prime]
    model = CharRNN(len(vocab), lstm_size=lstm_size, sampling=True)
    saver = tf.train.Saver()
    with tf.Session() as sess:
        saver.restore(sess, checkpoint)
        new_state = sess.run(model.initial_state)
        for c in prime:
            x = np.zeros((1, 1))
            x[0,0] = vocab_to_int[c]
            feed = {model.inputs: x,
                    model.keep_prob: 1.,
                    model.initial_state: new_state}
            preds, new_state = sess.run([model.prediction, model.final_state], 
                                         feed_dict=feed)

        c = pick_top_n(preds, len(vocab))
        samples.append(int_to_vocab[c])

        for i in range(n_samples):
            x[0,0] = c
            feed = {model.inputs: x,
                    model.keep_prob: 1.,
                    model.initial_state: new_state}
            preds, new_state = sess.run([model.prediction, model.final_state], 
                                         feed_dict=feed)

            c = pick_top_n(preds, len(vocab))
            samples.append(int_to_vocab[c])
        
    return ''.join(samples)

Here, pass in the path to a checkpoint and sample from the network.

In [None]:
tf.train.latest_checkpoint('checkpoints')

In [None]:
checkpoint = tf.train.latest_checkpoint('checkpoints')
samp = sample(checkpoint, 2000, lstm_size, len(vocab), prime="Far")
print(samp)

In [None]:
checkpoint = 'checkpoints/i200_l512.ckpt'
samp = sample(checkpoint, 1000, lstm_size, len(vocab), prime="Far")
print(samp)

In [None]:
checkpoint = 'checkpoints/i600_l512.ckpt'
samp = sample(checkpoint, 1000, lstm_size, len(vocab), prime="Far")
print(samp)

In [None]:
checkpoint = 'checkpoints/i1200_l512.ckpt'
samp = sample(checkpoint, 1000, lstm_size, len(vocab), prime="Far")
print(samp)