# Computer Vision Nanodegree

## Project: Image Captioning

---

In this notebook, you will train your CNN-RNN model.  

You are welcome and encouraged to try out many different architectures and hyperparameters when searching for a good model.

This does have the potential to make the project quite messy!  Before submitting your project, make sure that you clean up:
- the code you write in this notebook.  The notebook should describe how to train a single CNN-RNN architecture, corresponding to your final choice of hyperparameters.  You should structure the notebook so that the reviewer can replicate your results by running the code in this notebook.  
- the output of the code cell in **Step 2**.  The output should show the output obtained when training the model from scratch.

This notebook **will be graded**.  

Feel free to use the links below to navigate the notebook:
- [Step 1](#step1): Training Setup
- [Step 2](#step2): Train your Model
- [Step 3](#step3): (Optional) Validate your Model

<a id='step1'></a>
## Step 1: Training Setup

In this step of the notebook, you will customize the training of your CNN-RNN model by specifying hyperparameters and setting other options that are important to the training procedure.  The values you set now will be used when training your model in **Step 2** below.

You should only amend blocks of code that are preceded by a `TODO` statement.  **Any code blocks that are not preceded by a `TODO` statement should not be modified**.

### Task #1

Begin by setting the following variables:
- `batch_size` - the batch size of each training batch.  It is the number of image-caption pairs used to amend the model weights in each training step. 
- `vocab_threshold` - the minimum word count threshold.  Note that a larger threshold will result in a smaller vocabulary, whereas a smaller threshold will include rarer words and result in a larger vocabulary.  
- `vocab_from_file` - a Boolean that decides whether to load the vocabulary from file. 
- `embed_size` - the dimensionality of the image and word embeddings.  
- `hidden_size` - the number of features in the hidden state of the RNN decoder.  
- `num_epochs` - the number of epochs to train the model.  We recommend that you set `num_epochs=3`, but feel free to increase or decrease this number as you wish.  [This paper](https://arxiv.org/pdf/1502.03044.pdf) trained a captioning model on a single state-of-the-art GPU for 3 days, but you'll soon see that you can get reasonable results in a matter of a few hours!  (_But of course, if you want your model to compete with current research, you will have to train for much longer._)
- `save_every` - determines how often to save the model weights.  We recommend that you set `save_every=1`, to save the model weights after each epoch.  This way, after the `i`th epoch, the encoder and decoder weights will be saved in the `models/` folder as `encoder-i.pkl` and `decoder-i.pkl`, respectively.
- `print_every` - determines how often to print the batch loss to the Jupyter notebook while training.  Note that you **will not** observe a monotonic decrease in the loss function while training - this is perfectly fine and completely expected!  You are encouraged to keep this at its default value of `100` to avoid clogging the notebook, but feel free to change it.
- `log_file` - the name of the text file containing - for every step - how the loss and perplexity evolved during training.

If you're not sure where to begin to set some of the values above, you can peruse [this paper](https://arxiv.org/pdf/1502.03044.pdf) and [this paper](https://arxiv.org/pdf/1411.4555.pdf) for useful guidance!  **To avoid spending too long on this notebook**, you are encouraged to consult these suggested research papers to obtain a strong initial guess for which hyperparameters are likely to work best.  Then, train a single model, and proceed to the next notebook (**3_Inference.ipynb**).  If you are unhappy with your performance, you can return to this notebook to tweak the hyperparameters (and/or the architecture in **model.py**) and re-train your model.

### Question 1

**Question:** Describe your CNN-RNN architecture in detail.  With this architecture in mind, how did you select the values of the variables in Task 1?  If you consulted a research paper detailing a successful implementation of an image captioning model, please provide the reference.

**Answer:**  My Architecture start with EncoderCNN where resnet50 pretrained model was used to obtained feature which was of shape(10,2048,1,1) which was then flatten to shape(10,2048) and to standarize the shape for input to DecoderRNN it was send to Linear model with output in shape (10, embed_size).

Then, DecoderRNN was expected to get inputs from both features from EncoderCNN and captions which could be obtained by following way.
1.All captions were taken except for <end> tokens as DecoderRNN takes previous outputs as end token.
2.Then it was passed through Embedding Layer to make them in shape of embed_size as features from EncoderCNN was
in embed_size. So after passing Embedding Layer, caption will be of shape (batch_size,caption_size-1,embed_size)
3.Now we added additional dimension to the features as it is 2D and caption is 3D so concat is not possible so we added dimension in dim=1 as other two have dimension i.e. batch_size and embed_size.
4.After concat input shape become (batch_size,caption_size,embed_size), it is send to lstm with batch_first=True
and gives output of shape (batch_size,caption_size,hidden_size)
5.Then as we want our output to be in reference to vocabulary we pass that output to linear layer to convert that output to (batch_size,caption_size,vocab_size
    
With above architecture in mind: 
Initially i choose following hyperparameters randomly based on earlier lesson tuning hyperparameter in LSTM.
batch_size = 10          #as all previous visualization was done with batch_size=10     
vocab_threshold = 4      (#as previously we have seen with value 4 and 5, i assumed vocab with words occuring 4 will be good for model as important words will not be excluded nor less important words will be included)
vocab_from_file = True    (#as in previous notebook atlast vocab was found with threshold=4 so for fast implementation it was choose True)
embed_size = 256           (#from lesson tuning hypermeters i learned embed_size to be good around 50 to 200 so choose randomly to check output)
hidden_size = 512          (#hidden size also chose randomly based on previously learning)
num_epochs = 3             (#as data were large epoch 3 would be okay as instructed)
save_every = 1             
print_every = 100          
log_file = 'training_log.txt' 

Then, from show attend and tell paper i found out model will perform better on bigger batch_size. So, i increased it to batch_size = 64

### (Optional) Task #2

Note that we have provided a recommended image transform `transform_train` for pre-processing the training images, but you are welcome (and encouraged!) to modify it as you wish.  When modifying this transform, keep in mind that:
- the images in the dataset have varying heights and widths, and 
- if using a pre-trained model, you must perform the corresponding appropriate normalization.

### Question 2

**Question:** How did you select the transform in `transform_train`?  If you left the transform at its provided value, why do you think that it is a good choice for your CNN architecture?

**Answer:** I left it as it is. As all edges were resized to value  256 which would be sufficient for CNN with input width and height 224 which is standard for Resnet50 CNN. And for Normalization these default normalized value were considered after training on huge datasets by Resnet50 

### Task #3

Next, you will specify a Python list containing the learnable parameters of the model.  For instance, if you decide to make all weights in the decoder trainable, but only want to train the weights in the embedding layer of the encoder, then you should set `params` to something like:
```
params = list(decoder.parameters()) + list(encoder.embed.parameters()) 
```

### Question 3

**Question:** How did you select the trainable parameters of your architecture?  Why do you think this is a good choice?

**Answer:** I selected params as mentioned above as i found that to be relevant. Because in decoder each layers weight will be trained to reach their best value for best prediction.But in encoder as we have used pretrained CNN with its best weight on certain field i felt it should be left as it and only embeded layers weight can be trained.

### Task #4

Finally, you will select an [optimizer](http://pytorch.org/docs/master/optim.html#torch.optim.Optimizer).

### Question 4

**Question:** How did you select the optimizer used to train your model?

**Answer:**  Here i chose Adam as it gets better with epochs and doesnot much depends on learning rate(lr). However i first tried with lr=0.01 with intial hyperparameters but later change it to lr =0.001

In [None]:
import nltk
nltk.download('punkt')
import torch
import torch.nn as nn
from torchvision import transforms
import sys
sys.path.append('/opt/cocoapi/PythonAPI')
from pycocotools.coco import COCO
from data_loader import get_loader
from model import EncoderCNN, DecoderRNN
import math


## TODO #1: Select appropriate values for the Python variables below.
batch_size = 64         # batch size
vocab_threshold = 4        # minimum word count threshold
vocab_from_file = True    # if True, load existing vocab file
embed_size = 256           # dimensionality of image and word embeddings
hidden_size = 512          # number of features in hidden state of the RNN decoder
num_epochs = 3             # number of training epochs
save_every = 1             # determines frequency of saving model weights
print_every = 100          # determines window for printing average loss
log_file = 'training_log.txt'       # name of file with saved training loss and perplexity

# (Optional) TODO #2: Amend the image transform below.
transform_train = transforms.Compose([ 
    transforms.Resize(256),                          # smaller edge of image resized to 256
    transforms.RandomCrop(224),                      # get 224x224 crop from random location
    transforms.RandomHorizontalFlip(),               # horizontally flip image with probability=0.5
    transforms.ToTensor(),                           # convert the PIL Image to a tensor
    transforms.Normalize((0.485, 0.456, 0.406),      # normalize image for pre-trained model
                         (0.229, 0.224, 0.225))])

# Build data loader.
data_loader = get_loader(transform=transform_train,
                         mode='train',
                         batch_size=batch_size,
                         vocab_threshold=vocab_threshold,
                         vocab_from_file=vocab_from_file)

# The size of the vocabulary.
vocab_size = len(data_loader.dataset.vocab)

# Initialize the encoder and decoder. 
encoder = EncoderCNN(embed_size)
decoder = DecoderRNN(embed_size, hidden_size, vocab_size)

# Move models to GPU if CUDA is available. 
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
encoder.to(device)
decoder.to(device)

# Define the loss function. 
criterion = nn.CrossEntropyLoss().cuda() if torch.cuda.is_available() else nn.CrossEntropyLoss()

# TODO #3: Specify the learnable parameters of the model.
params = list(decoder.parameters())+list(encoder.embed.parameters())

# TODO #4: Define the optimizer.
optimizer = torch.optim.Adam(params,lr=0.001)

# Set the total number of training steps per epoch.
total_step = math.ceil(len(data_loader.dataset.caption_lengths) / data_loader.batch_sampler.batch_size)

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
Vocabulary successfully loaded from vocab.pkl file!
loading annotations into memory...
Done (t=1.04s)
creating index...


  0%|          | 789/414113 [00:00<01:49, 3781.39it/s]

index created!
Obtaining caption lengths...


 46%|████▋     | 192323/414113 [00:43<00:49, 4470.52it/s]

In [4]:
from workspace_utils import active_session

NameError: name 'encoder_file' is not defined

<a id='step2'></a>
## Step 2: Train your Model

Once you have executed the code cell in **Step 1**, the training procedure below should run without issue.  

It is completely fine to leave the code cell below as-is without modifications to train your model.  However, if you would like to modify the code used to train the model below, you must ensure that your changes are easily parsed by your reviewer.  In other words, make sure to provide appropriate comments to describe how your code works!  

You may find it useful to load saved weights to resume training.  In that case, note the names of the files containing the encoder and decoder weights that you'd like to load (`encoder_file` and `decoder_file`).  Then you can load the weights by using the lines below:

```python
# Load pre-trained weights before resuming training.
encoder.load_state_dict(torch.load(os.path.join('./models', encoder_file)))
decoder.load_state_dict(torch.load(os.path.join('./models', decoder_file)))
```

While trying out parameters, make sure to take extensive notes and record the settings that you used in your various training runs.  In particular, you don't want to encounter a situation where you've trained a model for several hours but can't remember what settings you used :).

### A Note on Tuning Hyperparameters

To figure out how well your model is doing, you can look at how the training loss and perplexity evolve during training - and for the purposes of this project, you are encouraged to amend the hyperparameters based on this information.  

However, this will not tell you if your model is overfitting to the training data, and, unfortunately, overfitting is a problem that is commonly encountered when training image captioning models.  

For this project, you need not worry about overfitting. **This project does not have strict requirements regarding the performance of your model**, and you just need to demonstrate that your model has learned **_something_** when you generate captions on the test data.  For now, we strongly encourage you to train your model for the suggested 3 epochs without worrying about performance; then, you should immediately transition to the next notebook in the sequence (**3_Inference.ipynb**) to see how your model performs on the test data.  If your model needs to be changed, you can come back to this notebook, amend hyperparameters (if necessary), and re-train the model.

That said, if you would like to go above and beyond in this project, you can read about some approaches to minimizing overfitting in section 4.3.1 of [this paper](http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=7505636).  In the next (optional) step of this notebook, we provide some guidance for assessing the performance on the validation dataset.

In [None]:
import torch.utils.data as data
import numpy as np
import os
import requests
import time

# Open the training log file.
f = open(log_file, 'w')

old_time = time.time()
response = requests.request("GET", 
                            "http://metadata.google.internal/computeMetadata/v1/instance/attributes/keep_alive_token", 
                            headers={"Metadata-Flavor":"Google"})
with active_session():
    
    for epoch in range(1, num_epochs+1):
    
        for i_step in range(1, total_step+1):
        
            if time.time() - old_time > 60:
                old_time = time.time()
                requests.request("POST", 
                                 "https://nebula.udacity.com/api/v1/remote/keep-alive", 
                                 headers={'Authorization': "STAR " + response.text})
        
        # Randomly sample a caption length, and sample indices with that length.
            indices = data_loader.dataset.get_train_indices()
        # Create and assign a batch sampler to retrieve a batch with the sampled indices.
            new_sampler = data.sampler.SubsetRandomSampler(indices=indices)
            data_loader.batch_sampler.sampler = new_sampler
        
        # Obtain the batch.
            images, captions = next(iter(data_loader))

        # Move batch of images and captions to GPU if CUDA is available.
            images = images.to(device)
            captions = captions.to(device)
        
        # Zero the gradients.
            decoder.zero_grad()
            encoder.zero_grad()
        
        # Pass the inputs through the CNN-RNN model.
            features = encoder(images)
            outputs = decoder(features, captions)
        
        # Calculate the batch loss.
            loss = criterion(outputs.view(-1, vocab_size), captions.view(-1))
        
        # Backward pass.
            loss.backward()
        
        # Update the parameters in the optimizer.
            optimizer.step()
            
        # Get training statistics.
            stats = 'Epoch [%d/%d], Step [%d/%d], Loss: %.4f, Perplexity: %5.4f' % (epoch, num_epochs, i_step, total_step, loss.item(), np.exp(loss.item()))
        
        # Print training statistics (on same line).
            print('\r' + stats, end="")
            sys.stdout.flush()
        
        # Print training statistics to file.
            f.write(stats + '\n')
            f.flush()
        
        # Print training statistics (on different line).
            if i_step % print_every == 0:
                print('\r' + stats)
            
    # Save the weights.
        if epoch % save_every == 0:
            torch.save(decoder.state_dict(), os.path.join('./models', 'decoder-%d.pkl' % epoch))
            torch.save(encoder.state_dict(), os.path.join('./models', 'encoder-%d.pkl' % epoch))

# Close the training log file.
    f.close()

sel length 15
Epoch [1/3], Step [1/6471], Loss: 9.2116, Perplexity: 10012.4163sel length 12
Epoch [1/3], Step [2/6471], Loss: 9.0633, Perplexity: 8632.5924sel length 11
Epoch [1/3], Step [3/6471], Loss: 8.8782, Perplexity: 7173.7947sel length 10
Epoch [1/3], Step [4/6471], Loss: 8.6037, Perplexity: 5452.0106sel length 12
Epoch [1/3], Step [5/6471], Loss: 8.2817, Perplexity: 3950.8342sel length 11
Epoch [1/3], Step [6/6471], Loss: 7.6633, Perplexity: 2128.7150sel length 9
Epoch [1/3], Step [7/6471], Loss: 6.9253, Perplexity: 1017.6798sel length 9
Epoch [1/3], Step [8/6471], Loss: 6.3168, Perplexity: 553.7717sel length 10
Epoch [1/3], Step [9/6471], Loss: 5.5521, Perplexity: 257.7795sel length 15
Epoch [1/3], Step [10/6471], Loss: 5.4496, Perplexity: 232.6644sel length 10
Epoch [1/3], Step [11/6471], Loss: 5.1820, Perplexity: 178.0393sel length 7
Epoch [1/3], Step [12/6471], Loss: 5.4622, Perplexity: 235.6090sel length 9
Epoch [1/3], Step [13/6471], Loss: 4.7658, Perplexity: 117.4277sel 

Epoch [1/3], Step [214/6471], Loss: 3.3989, Perplexity: 29.9314sel length 13
Epoch [1/3], Step [215/6471], Loss: 3.2255, Perplexity: 25.1657sel length 12
Epoch [1/3], Step [216/6471], Loss: 3.4321, Perplexity: 30.9401sel length 11
Epoch [1/3], Step [217/6471], Loss: 3.5225, Perplexity: 33.8701sel length 9
Epoch [1/3], Step [218/6471], Loss: 3.7887, Perplexity: 44.2010sel length 9
Epoch [1/3], Step [219/6471], Loss: 3.5602, Perplexity: 35.1709sel length 13
Epoch [1/3], Step [220/6471], Loss: 3.5240, Perplexity: 33.9199sel length 9
Epoch [1/3], Step [221/6471], Loss: 3.7011, Perplexity: 40.4926sel length 10
Epoch [1/3], Step [222/6471], Loss: 3.5500, Perplexity: 34.8142sel length 10
Epoch [1/3], Step [223/6471], Loss: 3.3713, Perplexity: 29.1163sel length 13
Epoch [1/3], Step [224/6471], Loss: 3.5785, Perplexity: 35.8200sel length 11
Epoch [1/3], Step [225/6471], Loss: 3.6091, Perplexity: 36.9328sel length 10
Epoch [1/3], Step [226/6471], Loss: 3.2791, Perplexity: 26.5513sel length 10
Ep

Epoch [1/3], Step [426/6471], Loss: 3.2316, Perplexity: 25.3197sel length 9
Epoch [1/3], Step [427/6471], Loss: 3.2235, Perplexity: 25.1146sel length 11
Epoch [1/3], Step [428/6471], Loss: 3.2808, Perplexity: 26.5979sel length 14
Epoch [1/3], Step [429/6471], Loss: 3.2238, Perplexity: 25.1225sel length 17
Epoch [1/3], Step [430/6471], Loss: 3.7138, Perplexity: 41.0101sel length 9
Epoch [1/3], Step [431/6471], Loss: 3.5089, Perplexity: 33.4119sel length 29
Epoch [1/3], Step [432/6471], Loss: 4.6339, Perplexity: 102.9187sel length 13
Epoch [1/3], Step [433/6471], Loss: 3.1347, Perplexity: 22.9821sel length 13
Epoch [1/3], Step [434/6471], Loss: 3.3338, Perplexity: 28.0453sel length 11
Epoch [1/3], Step [435/6471], Loss: 2.9236, Perplexity: 18.6076sel length 14
Epoch [1/3], Step [436/6471], Loss: 3.3627, Perplexity: 28.8658sel length 7
Epoch [1/3], Step [437/6471], Loss: 3.9621, Perplexity: 52.5680sel length 10
Epoch [1/3], Step [438/6471], Loss: 3.1511, Perplexity: 23.3624sel length 14
E

Epoch [1/3], Step [638/6471], Loss: 3.2489, Perplexity: 25.7616sel length 20
Epoch [1/3], Step [639/6471], Loss: 3.8444, Perplexity: 46.7326sel length 12
Epoch [1/3], Step [640/6471], Loss: 2.9832, Perplexity: 19.7513sel length 10
Epoch [1/3], Step [641/6471], Loss: 3.2857, Perplexity: 26.7287sel length 9
Epoch [1/3], Step [642/6471], Loss: 3.0731, Perplexity: 21.6099sel length 11
Epoch [1/3], Step [643/6471], Loss: 3.1122, Perplexity: 22.4711sel length 11
Epoch [1/3], Step [644/6471], Loss: 2.9991, Perplexity: 20.0683sel length 19
Epoch [1/3], Step [645/6471], Loss: 3.5770, Perplexity: 35.7674sel length 13
Epoch [1/3], Step [646/6471], Loss: 3.0817, Perplexity: 21.7945sel length 9
Epoch [1/3], Step [647/6471], Loss: 3.1739, Perplexity: 23.9002sel length 20
Epoch [1/3], Step [648/6471], Loss: 4.0372, Perplexity: 56.6654sel length 8
Epoch [1/3], Step [649/6471], Loss: 3.2399, Perplexity: 25.5322sel length 10
Epoch [1/3], Step [650/6471], Loss: 3.2096, Perplexity: 24.7699sel length 12
Ep

Epoch [1/3], Step [850/6471], Loss: 2.7798, Perplexity: 16.1154sel length 18
Epoch [1/3], Step [851/6471], Loss: 3.4493, Perplexity: 31.4783sel length 16
Epoch [1/3], Step [852/6471], Loss: 3.1977, Perplexity: 24.4771sel length 9
Epoch [1/3], Step [853/6471], Loss: 2.9479, Perplexity: 19.0663sel length 12
Epoch [1/3], Step [854/6471], Loss: 2.8396, Perplexity: 17.1097sel length 12
Epoch [1/3], Step [855/6471], Loss: 2.8036, Perplexity: 16.5036sel length 11
Epoch [1/3], Step [856/6471], Loss: 2.9004, Perplexity: 18.1818sel length 11
Epoch [1/3], Step [857/6471], Loss: 2.8445, Perplexity: 17.1936sel length 9
Epoch [1/3], Step [858/6471], Loss: 3.0561, Perplexity: 21.2441sel length 11
Epoch [1/3], Step [859/6471], Loss: 2.7241, Perplexity: 15.2419sel length 14
Epoch [1/3], Step [860/6471], Loss: 2.8507, Perplexity: 17.2994sel length 17
Epoch [1/3], Step [861/6471], Loss: 3.3303, Perplexity: 27.9460sel length 12
Epoch [1/3], Step [862/6471], Loss: 2.9543, Perplexity: 19.1889sel length 13
E

Epoch [1/3], Step [1062/6471], Loss: 2.8588, Perplexity: 17.4399sel length 9
Epoch [1/3], Step [1063/6471], Loss: 2.8026, Perplexity: 16.4881sel length 12
Epoch [1/3], Step [1064/6471], Loss: 2.6420, Perplexity: 14.0414sel length 9
Epoch [1/3], Step [1065/6471], Loss: 2.6852, Perplexity: 14.6607sel length 11
Epoch [1/3], Step [1066/6471], Loss: 2.5359, Perplexity: 12.6276sel length 10
Epoch [1/3], Step [1067/6471], Loss: 2.6615, Perplexity: 14.3179sel length 10
Epoch [1/3], Step [1068/6471], Loss: 2.7633, Perplexity: 15.8519sel length 10
Epoch [1/3], Step [1069/6471], Loss: 2.7180, Perplexity: 15.1494sel length 10
Epoch [1/3], Step [1070/6471], Loss: 2.5320, Perplexity: 12.5792sel length 11
Epoch [1/3], Step [1071/6471], Loss: 2.7744, Perplexity: 16.0283sel length 9
Epoch [1/3], Step [1072/6471], Loss: 2.7908, Perplexity: 16.2940sel length 12
Epoch [1/3], Step [1073/6471], Loss: 2.7128, Perplexity: 15.0716sel length 11
Epoch [1/3], Step [1074/6471], Loss: 2.7405, Perplexity: 15.4941sel

Epoch [1/3], Step [1272/6471], Loss: 2.5994, Perplexity: 13.4562sel length 12
Epoch [1/3], Step [1273/6471], Loss: 2.6710, Perplexity: 14.4544sel length 12
Epoch [1/3], Step [1274/6471], Loss: 2.7601, Perplexity: 15.8016sel length 13
Epoch [1/3], Step [1275/6471], Loss: 2.8202, Perplexity: 16.7799sel length 9
Epoch [1/3], Step [1276/6471], Loss: 2.5000, Perplexity: 12.1830sel length 10
Epoch [1/3], Step [1277/6471], Loss: 2.6015, Perplexity: 13.4837sel length 10
Epoch [1/3], Step [1278/6471], Loss: 2.7710, Perplexity: 15.9750sel length 18
Epoch [1/3], Step [1279/6471], Loss: 3.3578, Perplexity: 28.7272sel length 15
Epoch [1/3], Step [1280/6471], Loss: 2.8684, Perplexity: 17.6090sel length 13
Epoch [1/3], Step [1281/6471], Loss: 2.8390, Perplexity: 17.0985sel length 13
Epoch [1/3], Step [1282/6471], Loss: 2.7936, Perplexity: 16.3389sel length 12
Epoch [1/3], Step [1283/6471], Loss: 2.5752, Perplexity: 13.1333sel length 11
Epoch [1/3], Step [1284/6471], Loss: 2.4607, Perplexity: 11.7134s

Epoch [1/3], Step [1482/6471], Loss: 2.4781, Perplexity: 11.9182sel length 15
Epoch [1/3], Step [1483/6471], Loss: 3.0056, Perplexity: 20.1984sel length 9
Epoch [1/3], Step [1484/6471], Loss: 2.5413, Perplexity: 12.6962sel length 9
Epoch [1/3], Step [1485/6471], Loss: 2.7048, Perplexity: 14.9520sel length 11
Epoch [1/3], Step [1486/6471], Loss: 2.6530, Perplexity: 14.1966sel length 11
Epoch [1/3], Step [1487/6471], Loss: 2.4651, Perplexity: 11.7647sel length 16
Epoch [1/3], Step [1488/6471], Loss: 3.0534, Perplexity: 21.1880sel length 9
Epoch [1/3], Step [1489/6471], Loss: 2.6516, Perplexity: 14.1773sel length 9
Epoch [1/3], Step [1490/6471], Loss: 2.8488, Perplexity: 17.2671sel length 8
Epoch [1/3], Step [1491/6471], Loss: 2.5931, Perplexity: 13.3714sel length 11
Epoch [1/3], Step [1492/6471], Loss: 2.5642, Perplexity: 12.9906sel length 11
Epoch [1/3], Step [1493/6471], Loss: 2.4790, Perplexity: 11.9293sel length 11
Epoch [1/3], Step [1494/6471], Loss: 2.5935, Perplexity: 13.3766sel l

Epoch [1/3], Step [1692/6471], Loss: 2.4451, Perplexity: 11.5315sel length 13
Epoch [1/3], Step [1693/6471], Loss: 2.7418, Perplexity: 15.5144sel length 9
Epoch [1/3], Step [1694/6471], Loss: 2.6038, Perplexity: 13.5154sel length 9
Epoch [1/3], Step [1695/6471], Loss: 2.5729, Perplexity: 13.1042sel length 14
Epoch [1/3], Step [1696/6471], Loss: 2.6905, Perplexity: 14.7392sel length 11
Epoch [1/3], Step [1697/6471], Loss: 2.4523, Perplexity: 11.6146sel length 11
Epoch [1/3], Step [1698/6471], Loss: 2.4961, Perplexity: 12.1356sel length 10
Epoch [1/3], Step [1699/6471], Loss: 2.5927, Perplexity: 13.3653sel length 14
Epoch [1/3], Step [1700/6471], Loss: 2.9360, Perplexity: 18.8395
sel length 14
Epoch [1/3], Step [1701/6471], Loss: 2.7329, Perplexity: 15.3776sel length 11
Epoch [1/3], Step [1702/6471], Loss: 2.1515, Perplexity: 8.5978sel length 9
Epoch [1/3], Step [1703/6471], Loss: 2.4699, Perplexity: 11.8208sel length 11
Epoch [1/3], Step [1704/6471], Loss: 2.6044, Perplexity: 13.5233sel

Epoch [1/3], Step [1902/6471], Loss: 2.6998, Perplexity: 14.8770sel length 9
Epoch [1/3], Step [1903/6471], Loss: 2.4715, Perplexity: 11.8403sel length 10
Epoch [1/3], Step [1904/6471], Loss: 2.4553, Perplexity: 11.6494sel length 11
Epoch [1/3], Step [1905/6471], Loss: 2.3844, Perplexity: 10.8526sel length 14
Epoch [1/3], Step [1906/6471], Loss: 2.7624, Perplexity: 15.8385sel length 12
Epoch [1/3], Step [1907/6471], Loss: 2.4404, Perplexity: 11.4778sel length 12
Epoch [1/3], Step [1908/6471], Loss: 2.4851, Perplexity: 12.0018sel length 10
Epoch [1/3], Step [1909/6471], Loss: 2.2396, Perplexity: 9.3893sel length 9
Epoch [1/3], Step [1910/6471], Loss: 2.4490, Perplexity: 11.5766sel length 11
Epoch [1/3], Step [1911/6471], Loss: 2.1759, Perplexity: 8.8102sel length 12
Epoch [1/3], Step [1912/6471], Loss: 2.5364, Perplexity: 12.6340sel length 17
Epoch [1/3], Step [1913/6471], Loss: 3.0973, Perplexity: 22.1381sel length 9
Epoch [1/3], Step [1914/6471], Loss: 2.4703, Perplexity: 11.8261sel l

Epoch [1/3], Step [2112/6471], Loss: 2.5040, Perplexity: 12.2312sel length 11
Epoch [1/3], Step [2113/6471], Loss: 2.3140, Perplexity: 10.1144sel length 11
Epoch [1/3], Step [2114/6471], Loss: 2.3608, Perplexity: 10.5992sel length 10
Epoch [1/3], Step [2115/6471], Loss: 2.3128, Perplexity: 10.1028sel length 9
Epoch [1/3], Step [2116/6471], Loss: 2.6887, Perplexity: 14.7128sel length 15
Epoch [1/3], Step [2117/6471], Loss: 2.7311, Perplexity: 15.3496sel length 11
Epoch [1/3], Step [2118/6471], Loss: 2.2015, Perplexity: 9.0388sel length 11
Epoch [1/3], Step [2119/6471], Loss: 2.4051, Perplexity: 11.0800sel length 11
Epoch [1/3], Step [2120/6471], Loss: 2.4825, Perplexity: 11.9715sel length 10
Epoch [1/3], Step [2121/6471], Loss: 2.3847, Perplexity: 10.8555sel length 11
Epoch [1/3], Step [2122/6471], Loss: 2.2587, Perplexity: 9.5710sel length 11
Epoch [1/3], Step [2123/6471], Loss: 2.3348, Perplexity: 10.3273sel length 12
Epoch [1/3], Step [2124/6471], Loss: 2.2900, Perplexity: 9.8751sel 

Epoch [1/3], Step [2322/6471], Loss: 2.4549, Perplexity: 11.6448sel length 12
Epoch [1/3], Step [2323/6471], Loss: 2.6295, Perplexity: 13.8671sel length 10
Epoch [1/3], Step [2324/6471], Loss: 2.4823, Perplexity: 11.9690sel length 11
Epoch [1/3], Step [2325/6471], Loss: 2.3773, Perplexity: 10.7754sel length 10
Epoch [1/3], Step [2326/6471], Loss: 2.5118, Perplexity: 12.3267sel length 16
Epoch [1/3], Step [2327/6471], Loss: 2.8687, Perplexity: 17.6139sel length 12
Epoch [1/3], Step [2328/6471], Loss: 2.3089, Perplexity: 10.0634sel length 8
Epoch [1/3], Step [2329/6471], Loss: 2.6857, Perplexity: 14.6685sel length 10
Epoch [1/3], Step [2330/6471], Loss: 2.3844, Perplexity: 10.8528sel length 11
Epoch [1/3], Step [2331/6471], Loss: 2.2864, Perplexity: 9.8393sel length 10
Epoch [1/3], Step [2332/6471], Loss: 2.3211, Perplexity: 10.1872sel length 15
Epoch [1/3], Step [2333/6471], Loss: 2.6765, Perplexity: 14.5337sel length 8
Epoch [1/3], Step [2334/6471], Loss: 2.6978, Perplexity: 14.8474sel

Epoch [1/3], Step [2532/6471], Loss: 2.5945, Perplexity: 13.3900sel length 14
Epoch [1/3], Step [2533/6471], Loss: 2.5925, Perplexity: 13.3633sel length 14
Epoch [1/3], Step [2534/6471], Loss: 2.6731, Perplexity: 14.4848sel length 12
Epoch [1/3], Step [2535/6471], Loss: 2.4836, Perplexity: 11.9840sel length 9
Epoch [1/3], Step [2536/6471], Loss: 2.3964, Perplexity: 10.9841sel length 20
Epoch [1/3], Step [2537/6471], Loss: 3.1563, Perplexity: 23.4829sel length 11
Epoch [1/3], Step [2538/6471], Loss: 2.2997, Perplexity: 9.9715sel length 11
Epoch [1/3], Step [2539/6471], Loss: 2.2600, Perplexity: 9.5834sel length 13
Epoch [1/3], Step [2540/6471], Loss: 2.5648, Perplexity: 12.9978sel length 10
Epoch [1/3], Step [2541/6471], Loss: 2.4067, Perplexity: 11.0975sel length 8
Epoch [1/3], Step [2542/6471], Loss: 2.6952, Perplexity: 14.8088sel length 10
Epoch [1/3], Step [2543/6471], Loss: 2.4365, Perplexity: 11.4330sel length 12
Epoch [1/3], Step [2544/6471], Loss: 2.1821, Perplexity: 8.8650sel l

Epoch [1/3], Step [2742/6471], Loss: 2.3680, Perplexity: 10.6755sel length 11
Epoch [1/3], Step [2743/6471], Loss: 2.3138, Perplexity: 10.1128sel length 10
Epoch [1/3], Step [2744/6471], Loss: 2.3495, Perplexity: 10.4806sel length 11
Epoch [1/3], Step [2745/6471], Loss: 2.3340, Perplexity: 10.3194sel length 11
Epoch [1/3], Step [2746/6471], Loss: 2.4281, Perplexity: 11.3368sel length 10
Epoch [1/3], Step [2747/6471], Loss: 2.3083, Perplexity: 10.0569sel length 11
Epoch [1/3], Step [2748/6471], Loss: 2.3347, Perplexity: 10.3267sel length 12
Epoch [1/3], Step [2749/6471], Loss: 2.3287, Perplexity: 10.2649sel length 12
Epoch [1/3], Step [2750/6471], Loss: 2.3288, Perplexity: 10.2660sel length 13
Epoch [1/3], Step [2751/6471], Loss: 2.4434, Perplexity: 11.5123sel length 8
Epoch [1/3], Step [2752/6471], Loss: 2.6454, Perplexity: 14.0894sel length 11
Epoch [1/3], Step [2753/6471], Loss: 2.2235, Perplexity: 9.2401sel length 11
Epoch [1/3], Step [2754/6471], Loss: 2.1777, Perplexity: 8.8256sel

Epoch [1/3], Step [2952/6471], Loss: 2.4533, Perplexity: 11.6266sel length 10
Epoch [1/3], Step [2953/6471], Loss: 2.3337, Perplexity: 10.3164sel length 8
Epoch [1/3], Step [2954/6471], Loss: 2.6485, Perplexity: 14.1323sel length 9
Epoch [1/3], Step [2955/6471], Loss: 2.2103, Perplexity: 9.1186sel length 11
Epoch [1/3], Step [2956/6471], Loss: 2.3130, Perplexity: 10.1049sel length 15
Epoch [1/3], Step [2957/6471], Loss: 2.9577, Perplexity: 19.2533sel length 10
Epoch [1/3], Step [2958/6471], Loss: 2.1074, Perplexity: 8.2269sel length 10
Epoch [1/3], Step [2959/6471], Loss: 2.4256, Perplexity: 11.3092sel length 10
Epoch [1/3], Step [2960/6471], Loss: 2.2156, Perplexity: 9.1668sel length 13
Epoch [1/3], Step [2961/6471], Loss: 2.3357, Perplexity: 10.3363sel length 13
Epoch [1/3], Step [2962/6471], Loss: 2.4397, Perplexity: 11.4698sel length 13
Epoch [1/3], Step [2963/6471], Loss: 2.4289, Perplexity: 11.3462sel length 12
Epoch [1/3], Step [2964/6471], Loss: 2.3325, Perplexity: 10.3034sel l

Epoch [1/3], Step [3162/6471], Loss: 2.4529, Perplexity: 11.6217sel length 8
Epoch [1/3], Step [3163/6471], Loss: 2.6228, Perplexity: 13.7744sel length 8
Epoch [1/3], Step [3164/6471], Loss: 2.3239, Perplexity: 10.2151sel length 13
Epoch [1/3], Step [3165/6471], Loss: 2.4626, Perplexity: 11.7355sel length 9
Epoch [1/3], Step [3166/6471], Loss: 2.2707, Perplexity: 9.6858sel length 11
Epoch [1/3], Step [3167/6471], Loss: 2.2913, Perplexity: 9.8877sel length 9
Epoch [1/3], Step [3168/6471], Loss: 2.4672, Perplexity: 11.7891sel length 11
Epoch [1/3], Step [3169/6471], Loss: 2.3831, Perplexity: 10.8379sel length 15
Epoch [1/3], Step [3170/6471], Loss: 2.7795, Perplexity: 16.1113sel length 10
Epoch [1/3], Step [3171/6471], Loss: 2.3113, Perplexity: 10.0870sel length 12
Epoch [1/3], Step [3172/6471], Loss: 2.3438, Perplexity: 10.4205sel length 9
Epoch [1/3], Step [3173/6471], Loss: 2.4224, Perplexity: 11.2733sel length 9
Epoch [1/3], Step [3174/6471], Loss: 2.4985, Perplexity: 12.1642sel leng

Epoch [1/3], Step [3268/6471], Loss: 2.5774, Perplexity: 13.1634sel length 10
Epoch [1/3], Step [3269/6471], Loss: 2.0231, Perplexity: 7.5615sel length 9
Epoch [1/3], Step [3270/6471], Loss: 2.4886, Perplexity: 12.0444sel length 9
Epoch [1/3], Step [3271/6471], Loss: 2.2747, Perplexity: 9.7248sel length 7
Epoch [1/3], Step [3272/6471], Loss: 2.4316, Perplexity: 11.3766sel length 11
Epoch [1/3], Step [3273/6471], Loss: 2.2300, Perplexity: 9.3000sel length 11
Epoch [1/3], Step [3274/6471], Loss: 2.2542, Perplexity: 9.5275sel length 10
Epoch [1/3], Step [3275/6471], Loss: 2.1372, Perplexity: 8.4761sel length 16
Epoch [1/3], Step [3276/6471], Loss: 2.6641, Perplexity: 14.3550sel length 9
Epoch [1/3], Step [3277/6471], Loss: 2.2617, Perplexity: 9.5997sel length 11
Epoch [1/3], Step [3278/6471], Loss: 2.2357, Perplexity: 9.3528sel length 15
Epoch [1/3], Step [3279/6471], Loss: 2.5784, Perplexity: 13.1762sel length 11
Epoch [1/3], Step [3280/6471], Loss: 2.3289, Perplexity: 10.2671sel length 

Epoch [1/3], Step [3480/6471], Loss: 2.3610, Perplexity: 10.6017sel length 9
Epoch [1/3], Step [3481/6471], Loss: 2.3735, Perplexity: 10.7346sel length 13
Epoch [1/3], Step [3482/6471], Loss: 2.3183, Perplexity: 10.1583sel length 13
Epoch [1/3], Step [3483/6471], Loss: 2.4140, Perplexity: 11.1787sel length 13
Epoch [1/3], Step [3484/6471], Loss: 2.3051, Perplexity: 10.0253sel length 11
Epoch [1/3], Step [3485/6471], Loss: 2.1401, Perplexity: 8.5000sel length 11
Epoch [1/3], Step [3486/6471], Loss: 2.0792, Perplexity: 7.9981sel length 9
Epoch [1/3], Step [3487/6471], Loss: 2.1987, Perplexity: 9.0136sel length 12
Epoch [1/3], Step [3488/6471], Loss: 2.3411, Perplexity: 10.3928sel length 12
Epoch [1/3], Step [3489/6471], Loss: 2.1510, Perplexity: 8.5930sel length 17
Epoch [1/3], Step [3490/6471], Loss: 2.8750, Perplexity: 17.7262sel length 9
Epoch [1/3], Step [3491/6471], Loss: 2.1554, Perplexity: 8.6310sel length 9
Epoch [1/3], Step [3492/6471], Loss: 2.2299, Perplexity: 9.2985sel length

Epoch [1/3], Step [3586/6471], Loss: 2.3514, Perplexity: 10.5007sel length 10
Epoch [1/3], Step [3587/6471], Loss: 2.2176, Perplexity: 9.1849sel length 10
Epoch [1/3], Step [3588/6471], Loss: 2.3493, Perplexity: 10.4785sel length 12
Epoch [1/3], Step [3589/6471], Loss: 2.1800, Perplexity: 8.8465sel length 9
Epoch [1/3], Step [3590/6471], Loss: 2.3107, Perplexity: 10.0810sel length 14
Epoch [1/3], Step [3591/6471], Loss: 2.4834, Perplexity: 11.9818sel length 9
Epoch [1/3], Step [3592/6471], Loss: 2.1418, Perplexity: 8.5147sel length 12
Epoch [1/3], Step [3593/6471], Loss: 2.2384, Perplexity: 9.3783sel length 10
Epoch [1/3], Step [3594/6471], Loss: 2.2154, Perplexity: 9.1648sel length 8
Epoch [1/3], Step [3595/6471], Loss: 2.4751, Perplexity: 11.8827sel length 9
Epoch [1/3], Step [3596/6471], Loss: 2.3453, Perplexity: 10.4360sel length 9
Epoch [1/3], Step [3597/6471], Loss: 2.2655, Perplexity: 9.6362sel length 20
Epoch [1/3], Step [3598/6471], Loss: 3.2186, Perplexity: 24.9939sel length 

Epoch [1/3], Step [3798/6471], Loss: 2.6050, Perplexity: 13.5308sel length 14
Epoch [1/3], Step [3799/6471], Loss: 2.3835, Perplexity: 10.8433sel length 9
Epoch [1/3], Step [3800/6471], Loss: 2.4710, Perplexity: 11.8346
sel length 13
Epoch [1/3], Step [3801/6471], Loss: 2.3060, Perplexity: 10.0343sel length 10
Epoch [1/3], Step [3802/6471], Loss: 2.1031, Perplexity: 8.1915sel length 10
Epoch [1/3], Step [3803/6471], Loss: 2.3655, Perplexity: 10.6498sel length 9
Epoch [1/3], Step [3804/6471], Loss: 2.4506, Perplexity: 11.5959sel length 11
Epoch [1/3], Step [3805/6471], Loss: 2.1271, Perplexity: 8.3908sel length 9
Epoch [1/3], Step [3806/6471], Loss: 2.1193, Perplexity: 8.3250sel length 10
Epoch [1/3], Step [3807/6471], Loss: 2.1362, Perplexity: 8.4673sel length 13
Epoch [1/3], Step [3808/6471], Loss: 2.4078, Perplexity: 11.1094sel length 11
Epoch [1/3], Step [3809/6471], Loss: 2.2203, Perplexity: 9.2098sel length 9
Epoch [1/3], Step [3810/6471], Loss: 2.3492, Perplexity: 10.4777sel leng

Epoch [1/3], Step [4010/6471], Loss: 2.3022, Perplexity: 9.9963sel length 12
Epoch [1/3], Step [4011/6471], Loss: 2.1072, Perplexity: 8.2256sel length 10
Epoch [1/3], Step [4012/6471], Loss: 2.3660, Perplexity: 10.6551sel length 8
Epoch [1/3], Step [4013/6471], Loss: 2.6699, Perplexity: 14.4380sel length 9
Epoch [1/3], Step [4014/6471], Loss: 2.0782, Perplexity: 7.9897sel length 11
Epoch [1/3], Step [4015/6471], Loss: 2.3366, Perplexity: 10.3455sel length 9
Epoch [1/3], Step [4016/6471], Loss: 2.2136, Perplexity: 9.1482sel length 8
Epoch [1/3], Step [4017/6471], Loss: 2.4827, Perplexity: 11.9736sel length 14
Epoch [1/3], Step [4018/6471], Loss: 2.4318, Perplexity: 11.3788sel length 14
Epoch [1/3], Step [4019/6471], Loss: 2.2825, Perplexity: 9.8015sel length 11
Epoch [1/3], Step [4020/6471], Loss: 2.4724, Perplexity: 11.8505sel length 14
Epoch [1/3], Step [4021/6471], Loss: 2.4654, Perplexity: 11.7683sel length 10
Epoch [1/3], Step [4022/6471], Loss: 2.3607, Perplexity: 10.5988sel lengt

Epoch [1/3], Step [4222/6471], Loss: 2.4247, Perplexity: 11.2993sel length 12
Epoch [1/3], Step [4223/6471], Loss: 2.2043, Perplexity: 9.0635sel length 9
Epoch [1/3], Step [4224/6471], Loss: 2.2083, Perplexity: 9.1005sel length 18
Epoch [1/3], Step [4225/6471], Loss: 2.8874, Perplexity: 17.9474sel length 13
Epoch [1/3], Step [4226/6471], Loss: 2.3591, Perplexity: 10.5819sel length 9
Epoch [1/3], Step [4227/6471], Loss: 2.2409, Perplexity: 9.4022sel length 10
Epoch [1/3], Step [4228/6471], Loss: 2.1928, Perplexity: 8.9605sel length 11
Epoch [1/3], Step [4229/6471], Loss: 2.2954, Perplexity: 9.9284sel length 14
Epoch [1/3], Step [4230/6471], Loss: 2.4978, Perplexity: 12.1558sel length 11
Epoch [1/3], Step [4231/6471], Loss: 2.0330, Perplexity: 7.6372sel length 12
Epoch [1/3], Step [4232/6471], Loss: 2.3519, Perplexity: 10.5055sel length 10
Epoch [1/3], Step [4233/6471], Loss: 2.0284, Perplexity: 7.6018sel length 11
Epoch [1/3], Step [4234/6471], Loss: 2.1626, Perplexity: 8.6933sel length

Epoch [1/3], Step [4434/6471], Loss: 2.1451, Perplexity: 8.5428sel length 10
Epoch [1/3], Step [4435/6471], Loss: 2.1979, Perplexity: 9.0061sel length 12
Epoch [1/3], Step [4436/6471], Loss: 2.4168, Perplexity: 11.2099sel length 11
Epoch [1/3], Step [4437/6471], Loss: 2.0981, Perplexity: 8.1510sel length 11
Epoch [1/3], Step [4438/6471], Loss: 2.1340, Perplexity: 8.4488sel length 10
Epoch [1/3], Step [4439/6471], Loss: 2.0931, Perplexity: 8.1102sel length 11
Epoch [1/3], Step [4440/6471], Loss: 2.1457, Perplexity: 8.5477sel length 15
Epoch [1/3], Step [4441/6471], Loss: 2.4861, Perplexity: 12.0143sel length 12
Epoch [1/3], Step [4442/6471], Loss: 2.0394, Perplexity: 7.6856sel length 12
Epoch [1/3], Step [4443/6471], Loss: 2.1252, Perplexity: 8.3746sel length 12
Epoch [1/3], Step [4444/6471], Loss: 2.1914, Perplexity: 8.9475sel length 9
Epoch [1/3], Step [4445/6471], Loss: 2.3758, Perplexity: 10.7597sel length 10
Epoch [1/3], Step [4446/6471], Loss: 2.1298, Perplexity: 8.4134sel length 

Epoch [1/3], Step [4646/6471], Loss: 2.1601, Perplexity: 8.6719sel length 14
Epoch [1/3], Step [4647/6471], Loss: 2.6159, Perplexity: 13.6788sel length 11
Epoch [1/3], Step [4648/6471], Loss: 2.1192, Perplexity: 8.3242sel length 14
Epoch [1/3], Step [4649/6471], Loss: 2.5426, Perplexity: 12.7128sel length 10
Epoch [1/3], Step [4650/6471], Loss: 2.1580, Perplexity: 8.6537sel length 10
Epoch [1/3], Step [4651/6471], Loss: 2.1435, Perplexity: 8.5292sel length 7
Epoch [1/3], Step [4652/6471], Loss: 2.3916, Perplexity: 10.9310sel length 13
Epoch [1/3], Step [4653/6471], Loss: 2.3700, Perplexity: 10.6976sel length 15
Epoch [1/3], Step [4654/6471], Loss: 2.5347, Perplexity: 12.6127sel length 17
Epoch [1/3], Step [4655/6471], Loss: 2.8100, Perplexity: 16.6097sel length 15
Epoch [1/3], Step [4656/6471], Loss: 2.3306, Perplexity: 10.2846sel length 11
Epoch [1/3], Step [4657/6471], Loss: 2.0787, Perplexity: 7.9942sel length 10
Epoch [1/3], Step [4658/6471], Loss: 2.2032, Perplexity: 9.0542sel len

Epoch [1/3], Step [4933/6471], Loss: 2.2346, Perplexity: 9.3428sel length 11
Epoch [1/3], Step [4934/6471], Loss: 2.1341, Perplexity: 8.4496sel length 11
Epoch [1/3], Step [4935/6471], Loss: 2.0389, Perplexity: 7.6820sel length 10
Epoch [1/3], Step [4936/6471], Loss: 2.2528, Perplexity: 9.5139sel length 8
Epoch [1/3], Step [4937/6471], Loss: 2.5080, Perplexity: 12.2798sel length 9
Epoch [1/3], Step [4938/6471], Loss: 2.0145, Perplexity: 7.4969sel length 10
Epoch [1/3], Step [4939/6471], Loss: 2.0579, Perplexity: 7.8298sel length 9
Epoch [1/3], Step [4940/6471], Loss: 2.1412, Perplexity: 8.5096sel length 28
Epoch [1/3], Step [4941/6471], Loss: 3.6018, Perplexity: 36.6635sel length 11
Epoch [1/3], Step [4942/6471], Loss: 2.0276, Perplexity: 7.5959sel length 12
Epoch [1/3], Step [4943/6471], Loss: 2.2877, Perplexity: 9.8525sel length 9
Epoch [1/3], Step [4944/6471], Loss: 2.1279, Perplexity: 8.3972sel length 9
Epoch [1/3], Step [4945/6471], Loss: 2.0817, Perplexity: 8.0181sel length 15
Ep

Epoch [1/3], Step [5145/6471], Loss: 2.6152, Perplexity: 13.6703sel length 9
Epoch [1/3], Step [5146/6471], Loss: 2.2345, Perplexity: 9.3420sel length 11
Epoch [1/3], Step [5147/6471], Loss: 2.0332, Perplexity: 7.6385sel length 13
Epoch [1/3], Step [5148/6471], Loss: 2.2339, Perplexity: 9.3364sel length 10
Epoch [1/3], Step [5149/6471], Loss: 2.2771, Perplexity: 9.7481sel length 10
Epoch [1/3], Step [5150/6471], Loss: 2.1998, Perplexity: 9.0228sel length 11
Epoch [1/3], Step [5151/6471], Loss: 2.2223, Perplexity: 9.2285sel length 11
Epoch [1/3], Step [5152/6471], Loss: 2.1820, Perplexity: 8.8644sel length 10
Epoch [1/3], Step [5153/6471], Loss: 2.1723, Perplexity: 8.7786sel length 13
Epoch [1/3], Step [5154/6471], Loss: 2.3086, Perplexity: 10.0602sel length 9
Epoch [1/3], Step [5155/6471], Loss: 2.3293, Perplexity: 10.2708sel length 10
Epoch [1/3], Step [5156/6471], Loss: 2.0387, Perplexity: 7.6805sel length 12
Epoch [1/3], Step [5157/6471], Loss: 2.2523, Perplexity: 9.5099sel length 1

Epoch [1/3], Step [5428/6471], Loss: 2.1046, Perplexity: 8.2038sel length 12
Epoch [1/3], Step [5429/6471], Loss: 2.2321, Perplexity: 9.3190sel length 10
Epoch [1/3], Step [5430/6471], Loss: 1.9388, Perplexity: 6.9504sel length 9
Epoch [1/3], Step [5431/6471], Loss: 2.3212, Perplexity: 10.1875sel length 8
Epoch [1/3], Step [5432/6471], Loss: 2.4533, Perplexity: 11.6269sel length 10
Epoch [1/3], Step [5433/6471], Loss: 2.0570, Perplexity: 7.8224sel length 9
Epoch [1/3], Step [5434/6471], Loss: 2.1509, Perplexity: 8.5925sel length 11
Epoch [1/3], Step [5435/6471], Loss: 2.0851, Perplexity: 8.0456sel length 12
Epoch [1/3], Step [5436/6471], Loss: 2.1005, Perplexity: 8.1706sel length 18
Epoch [1/3], Step [5437/6471], Loss: 2.7020, Perplexity: 14.9102sel length 12
Epoch [1/3], Step [5438/6471], Loss: 2.1914, Perplexity: 8.9474sel length 9
Epoch [1/3], Step [5439/6471], Loss: 2.2496, Perplexity: 9.4843sel length 14
Epoch [1/3], Step [5440/6471], Loss: 2.3731, Perplexity: 10.7306sel length 14

Epoch [1/3], Step [5717/6471], Loss: 2.4285, Perplexity: 11.3417sel length 11
Epoch [1/3], Step [5718/6471], Loss: 1.9984, Perplexity: 7.3774sel length 15
Epoch [1/3], Step [5719/6471], Loss: 2.4510, Perplexity: 11.5994sel length 8
Epoch [1/3], Step [5720/6471], Loss: 2.3190, Perplexity: 10.1656sel length 10
Epoch [1/3], Step [5721/6471], Loss: 2.0768, Perplexity: 7.9788sel length 9
Epoch [1/3], Step [5722/6471], Loss: 2.2807, Perplexity: 9.7833sel length 14
Epoch [1/3], Step [5723/6471], Loss: 2.4071, Perplexity: 11.1012sel length 10
Epoch [1/3], Step [5724/6471], Loss: 2.0935, Perplexity: 8.1130sel length 12
Epoch [1/3], Step [5725/6471], Loss: 2.0509, Perplexity: 7.7748sel length 11
Epoch [1/3], Step [5726/6471], Loss: 2.1787, Perplexity: 8.8346sel length 14
Epoch [1/3], Step [5727/6471], Loss: 2.4281, Perplexity: 11.3370sel length 10
Epoch [1/3], Step [5728/6471], Loss: 1.9484, Perplexity: 7.0176sel length 13
Epoch [1/3], Step [5729/6471], Loss: 2.5789, Perplexity: 13.1829sel lengt

Epoch [1/3], Step [5929/6471], Loss: 2.0625, Perplexity: 7.8653sel length 13
Epoch [1/3], Step [5930/6471], Loss: 2.3764, Perplexity: 10.7658sel length 16
Epoch [1/3], Step [5931/6471], Loss: 2.6782, Perplexity: 14.5583sel length 18
Epoch [1/3], Step [5932/6471], Loss: 2.8526, Perplexity: 17.3322sel length 13
Epoch [1/3], Step [5933/6471], Loss: 2.3445, Perplexity: 10.4279sel length 11
Epoch [1/3], Step [5934/6471], Loss: 2.0782, Perplexity: 7.9897sel length 10
Epoch [1/3], Step [5935/6471], Loss: 2.0159, Perplexity: 7.5074sel length 10
Epoch [1/3], Step [5936/6471], Loss: 2.0134, Perplexity: 7.4891sel length 12
Epoch [1/3], Step [5937/6471], Loss: 2.3863, Perplexity: 10.8727sel length 11
Epoch [1/3], Step [5938/6471], Loss: 2.0883, Perplexity: 8.0708sel length 21
Epoch [1/3], Step [5939/6471], Loss: 2.8281, Perplexity: 16.9138sel length 9
Epoch [1/3], Step [5940/6471], Loss: 2.1911, Perplexity: 8.9452sel length 13
Epoch [1/3], Step [5941/6471], Loss: 2.2233, Perplexity: 9.2382sel leng

Epoch [1/3], Step [6141/6471], Loss: 2.1173, Perplexity: 8.3089sel length 15
Epoch [1/3], Step [6142/6471], Loss: 2.5009, Perplexity: 12.1938sel length 10
Epoch [1/3], Step [6143/6471], Loss: 2.0017, Perplexity: 7.4019sel length 11
Epoch [1/3], Step [6144/6471], Loss: 1.9997, Perplexity: 7.3866sel length 9
Epoch [1/3], Step [6145/6471], Loss: 2.1925, Perplexity: 8.9572sel length 9
Epoch [1/3], Step [6146/6471], Loss: 2.1745, Perplexity: 8.7974sel length 9
Epoch [1/3], Step [6147/6471], Loss: 2.0965, Perplexity: 8.1377sel length 14
Epoch [1/3], Step [6148/6471], Loss: 2.4360, Perplexity: 11.4269sel length 12
Epoch [1/3], Step [6149/6471], Loss: 1.9947, Perplexity: 7.3498sel length 11
Epoch [1/3], Step [6150/6471], Loss: 2.2085, Perplexity: 9.1018sel length 9
Epoch [1/3], Step [6151/6471], Loss: 2.0554, Perplexity: 7.8102sel length 11
Epoch [1/3], Step [6152/6471], Loss: 2.0026, Perplexity: 7.4083sel length 10
Epoch [1/3], Step [6153/6471], Loss: 2.0900, Perplexity: 8.0846sel length 12
E

Epoch [1/3], Step [6353/6471], Loss: 2.0619, Perplexity: 7.8610sel length 12
Epoch [1/3], Step [6354/6471], Loss: 2.1746, Perplexity: 8.7989sel length 14
Epoch [1/3], Step [6355/6471], Loss: 2.4440, Perplexity: 11.5191sel length 11
Epoch [1/3], Step [6356/6471], Loss: 2.1399, Perplexity: 8.4985sel length 13
Epoch [1/3], Step [6357/6471], Loss: 2.3026, Perplexity: 9.9999sel length 11
Epoch [1/3], Step [6358/6471], Loss: 1.9693, Perplexity: 7.1655sel length 9
Epoch [1/3], Step [6359/6471], Loss: 2.0497, Perplexity: 7.7656sel length 13
Epoch [1/3], Step [6360/6471], Loss: 2.0553, Perplexity: 7.8091sel length 9
Epoch [1/3], Step [6361/6471], Loss: 2.3466, Perplexity: 10.4502sel length 11
Epoch [1/3], Step [6362/6471], Loss: 1.8441, Perplexity: 6.3221sel length 9
Epoch [1/3], Step [6363/6471], Loss: 2.2932, Perplexity: 9.9064sel length 10
Epoch [1/3], Step [6364/6471], Loss: 2.1203, Perplexity: 8.3335sel length 10
Epoch [1/3], Step [6365/6471], Loss: 2.1112, Perplexity: 8.2584sel length 12


Epoch [2/3], Step [97/6471], Loss: 2.0935, Perplexity: 8.1134sel length 10
Epoch [2/3], Step [98/6471], Loss: 2.1260, Perplexity: 8.3811sel length 9
Epoch [2/3], Step [99/6471], Loss: 2.0855, Perplexity: 8.0486sel length 7
Epoch [2/3], Step [100/6471], Loss: 2.4708, Perplexity: 11.8315
sel length 10
Epoch [2/3], Step [101/6471], Loss: 2.1730, Perplexity: 8.7842sel length 16
Epoch [2/3], Step [102/6471], Loss: 2.7251, Perplexity: 15.2577sel length 8
Epoch [2/3], Step [103/6471], Loss: 2.2935, Perplexity: 9.9097sel length 10
Epoch [2/3], Step [104/6471], Loss: 2.0678, Perplexity: 7.9075sel length 12
Epoch [2/3], Step [105/6471], Loss: 2.2735, Perplexity: 9.7135sel length 10
Epoch [2/3], Step [106/6471], Loss: 2.2540, Perplexity: 9.5258sel length 11
Epoch [2/3], Step [107/6471], Loss: 2.2544, Perplexity: 9.5293sel length 17
Epoch [2/3], Step [108/6471], Loss: 2.6357, Perplexity: 13.9535sel length 9
Epoch [2/3], Step [109/6471], Loss: 2.1187, Perplexity: 8.3200sel length 9
Epoch [2/3], Ste

Epoch [2/3], Step [205/6471], Loss: 2.2607, Perplexity: 9.5899sel length 9
Epoch [2/3], Step [206/6471], Loss: 2.0845, Perplexity: 8.0409sel length 10
Epoch [2/3], Step [207/6471], Loss: 2.0929, Perplexity: 8.1087sel length 10
Epoch [2/3], Step [208/6471], Loss: 2.0224, Perplexity: 7.5562sel length 9
Epoch [2/3], Step [209/6471], Loss: 2.1638, Perplexity: 8.7042sel length 13
Epoch [2/3], Step [210/6471], Loss: 2.1812, Perplexity: 8.8567sel length 9
Epoch [2/3], Step [211/6471], Loss: 2.0212, Perplexity: 7.5476sel length 10
Epoch [2/3], Step [212/6471], Loss: 2.0684, Perplexity: 7.9119sel length 9
Epoch [2/3], Step [213/6471], Loss: 2.2219, Perplexity: 9.2248sel length 11
Epoch [2/3], Step [214/6471], Loss: 2.1782, Perplexity: 8.8301sel length 11
Epoch [2/3], Step [215/6471], Loss: 2.1946, Perplexity: 8.9762sel length 9
Epoch [2/3], Step [216/6471], Loss: 2.1928, Perplexity: 8.9605sel length 9
Epoch [2/3], Step [217/6471], Loss: 2.2472, Perplexity: 9.4614sel length 8
Epoch [2/3], Step [

Epoch [2/3], Step [313/6471], Loss: 2.1255, Perplexity: 8.3768sel length 12
Epoch [2/3], Step [314/6471], Loss: 1.9744, Perplexity: 7.2025sel length 14
Epoch [2/3], Step [315/6471], Loss: 2.3268, Perplexity: 10.2455sel length 11
Epoch [2/3], Step [316/6471], Loss: 2.1392, Perplexity: 8.4922sel length 8
Epoch [2/3], Step [317/6471], Loss: 2.4085, Perplexity: 11.1172sel length 12
Epoch [2/3], Step [318/6471], Loss: 1.9139, Perplexity: 6.7793sel length 12
Epoch [2/3], Step [319/6471], Loss: 2.1577, Perplexity: 8.6515sel length 9
Epoch [2/3], Step [320/6471], Loss: 2.2947, Perplexity: 9.9216sel length 17
Epoch [2/3], Step [321/6471], Loss: 2.8537, Perplexity: 17.3521sel length 9
Epoch [2/3], Step [322/6471], Loss: 2.3245, Perplexity: 10.2216sel length 9
Epoch [2/3], Step [323/6471], Loss: 2.3169, Perplexity: 10.1438sel length 10
Epoch [2/3], Step [324/6471], Loss: 2.1510, Perplexity: 8.5935sel length 11
Epoch [2/3], Step [325/6471], Loss: 2.0118, Perplexity: 7.4768sel length 13
Epoch [2/3]

Epoch [2/3], Step [421/6471], Loss: 2.1707, Perplexity: 8.7646sel length 11
Epoch [2/3], Step [422/6471], Loss: 2.1567, Perplexity: 8.6423sel length 13
Epoch [2/3], Step [423/6471], Loss: 2.2290, Perplexity: 9.2906sel length 13
Epoch [2/3], Step [424/6471], Loss: 2.0323, Perplexity: 7.6316sel length 13
Epoch [2/3], Step [425/6471], Loss: 2.1627, Perplexity: 8.6946sel length 12
Epoch [2/3], Step [426/6471], Loss: 2.0963, Perplexity: 8.1364sel length 11
Epoch [2/3], Step [427/6471], Loss: 1.9110, Perplexity: 6.7598sel length 9
Epoch [2/3], Step [428/6471], Loss: 2.4075, Perplexity: 11.1067sel length 12
Epoch [2/3], Step [429/6471], Loss: 2.2753, Perplexity: 9.7311sel length 14
Epoch [2/3], Step [430/6471], Loss: 2.3126, Perplexity: 10.1010sel length 11
Epoch [2/3], Step [431/6471], Loss: 2.1502, Perplexity: 8.5867sel length 10
Epoch [2/3], Step [432/6471], Loss: 1.9766, Perplexity: 7.2178sel length 8
Epoch [2/3], Step [433/6471], Loss: 2.2803, Perplexity: 9.7796sel length 11
Epoch [2/3],

Epoch [2/3], Step [592/6471], Loss: 2.1567, Perplexity: 8.6426sel length 12
Epoch [2/3], Step [593/6471], Loss: 2.0841, Perplexity: 8.0371sel length 12
Epoch [2/3], Step [594/6471], Loss: 2.0391, Perplexity: 7.6840sel length 10
Epoch [2/3], Step [595/6471], Loss: 2.1035, Perplexity: 8.1944sel length 8
Epoch [2/3], Step [596/6471], Loss: 2.0767, Perplexity: 7.9782sel length 14
Epoch [2/3], Step [597/6471], Loss: 2.4322, Perplexity: 11.3837sel length 11
Epoch [2/3], Step [598/6471], Loss: 1.9949, Perplexity: 7.3517sel length 11
Epoch [2/3], Step [599/6471], Loss: 2.2678, Perplexity: 9.6582sel length 22
Epoch [2/3], Step [600/6471], Loss: 2.8080, Perplexity: 16.5769
sel length 12
Epoch [2/3], Step [601/6471], Loss: 2.0643, Perplexity: 7.8796sel length 12
Epoch [2/3], Step [602/6471], Loss: 2.2420, Perplexity: 9.4125sel length 12
Epoch [2/3], Step [603/6471], Loss: 2.0731, Perplexity: 7.9497sel length 11
Epoch [2/3], Step [604/6471], Loss: 1.9450, Perplexity: 6.9935sel length 14
Epoch [2/3

Epoch [2/3], Step [700/6471], Loss: 2.1758, Perplexity: 8.8092
sel length 11
Epoch [2/3], Step [701/6471], Loss: 2.1804, Perplexity: 8.8495sel length 45
Epoch [2/3], Step [702/6471], Loss: 4.4978, Perplexity: 89.8170sel length 14
Epoch [2/3], Step [703/6471], Loss: 2.2497, Perplexity: 9.4844sel length 11
Epoch [2/3], Step [704/6471], Loss: 1.9839, Perplexity: 7.2707sel length 11
Epoch [2/3], Step [705/6471], Loss: 2.2545, Perplexity: 9.5302sel length 11
Epoch [2/3], Step [706/6471], Loss: 2.1999, Perplexity: 9.0245sel length 9
Epoch [2/3], Step [707/6471], Loss: 2.0894, Perplexity: 8.0799sel length 11
Epoch [2/3], Step [708/6471], Loss: 2.0487, Perplexity: 7.7579sel length 11
Epoch [2/3], Step [709/6471], Loss: 2.0952, Perplexity: 8.1267sel length 14
Epoch [2/3], Step [710/6471], Loss: 2.2871, Perplexity: 9.8463sel length 11
Epoch [2/3], Step [711/6471], Loss: 2.0464, Perplexity: 7.7398sel length 15
Epoch [2/3], Step [712/6471], Loss: 2.3197, Perplexity: 10.1726sel length 14
Epoch [2/3

Epoch [2/3], Step [895/6471], Loss: 2.1372, Perplexity: 8.4758sel length 13
Epoch [2/3], Step [896/6471], Loss: 2.1742, Perplexity: 8.7951sel length 10
Epoch [2/3], Step [897/6471], Loss: 2.0265, Perplexity: 7.5873sel length 12
Epoch [2/3], Step [898/6471], Loss: 2.1886, Perplexity: 8.9227sel length 12
Epoch [2/3], Step [899/6471], Loss: 2.1124, Perplexity: 8.2678sel length 12
Epoch [2/3], Step [900/6471], Loss: 2.1734, Perplexity: 8.7880
sel length 12
Epoch [2/3], Step [901/6471], Loss: 2.1299, Perplexity: 8.4143sel length 10
Epoch [2/3], Step [902/6471], Loss: 2.1673, Perplexity: 8.7348sel length 13
Epoch [2/3], Step [903/6471], Loss: 2.2306, Perplexity: 9.3055sel length 11
Epoch [2/3], Step [904/6471], Loss: 2.1120, Perplexity: 8.2645sel length 11
Epoch [2/3], Step [905/6471], Loss: 1.9305, Perplexity: 6.8926sel length 12
Epoch [2/3], Step [906/6471], Loss: 2.1704, Perplexity: 8.7615sel length 9
Epoch [2/3], Step [907/6471], Loss: 2.3413, Perplexity: 10.3950sel length 9
Epoch [2/3],

Epoch [2/3], Step [1108/6471], Loss: 2.0499, Perplexity: 7.7668sel length 11
Epoch [2/3], Step [1109/6471], Loss: 1.9474, Perplexity: 7.0102sel length 13
Epoch [2/3], Step [1110/6471], Loss: 2.3991, Perplexity: 11.0135sel length 11
Epoch [2/3], Step [1111/6471], Loss: 2.1335, Perplexity: 8.4443sel length 10
Epoch [2/3], Step [1112/6471], Loss: 2.1700, Perplexity: 8.7582sel length 10
Epoch [2/3], Step [1113/6471], Loss: 2.0063, Perplexity: 7.4355sel length 18
Epoch [2/3], Step [1114/6471], Loss: 2.6961, Perplexity: 14.8224sel length 12
Epoch [2/3], Step [1115/6471], Loss: 2.0510, Perplexity: 7.7754sel length 13
Epoch [2/3], Step [1116/6471], Loss: 2.1764, Perplexity: 8.8149sel length 11
Epoch [2/3], Step [1117/6471], Loss: 1.9965, Perplexity: 7.3636sel length 11
Epoch [2/3], Step [1118/6471], Loss: 2.0339, Perplexity: 7.6437sel length 10
Epoch [2/3], Step [1119/6471], Loss: 2.0113, Perplexity: 7.4731sel length 14
Epoch [2/3], Step [1120/6471], Loss: 2.3141, Perplexity: 10.1162sel length

Epoch [2/3], Step [1320/6471], Loss: 1.8384, Perplexity: 6.2862sel length 11
Epoch [2/3], Step [1321/6471], Loss: 1.9024, Perplexity: 6.7020sel length 7
Epoch [2/3], Step [1322/6471], Loss: 2.3722, Perplexity: 10.7213sel length 13
Epoch [2/3], Step [1323/6471], Loss: 2.0293, Perplexity: 7.6086sel length 15
Epoch [2/3], Step [1324/6471], Loss: 2.4811, Perplexity: 11.9539sel length 14
Epoch [2/3], Step [1325/6471], Loss: 2.2747, Perplexity: 9.7249sel length 9
Epoch [2/3], Step [1326/6471], Loss: 2.1647, Perplexity: 8.7122sel length 11
Epoch [2/3], Step [1327/6471], Loss: 2.1179, Perplexity: 8.3139sel length 10
Epoch [2/3], Step [1328/6471], Loss: 2.2413, Perplexity: 9.4051sel length 11
Epoch [2/3], Step [1329/6471], Loss: 2.1741, Perplexity: 8.7945sel length 12
Epoch [2/3], Step [1330/6471], Loss: 1.9938, Perplexity: 7.3433sel length 12
Epoch [2/3], Step [1331/6471], Loss: 2.2709, Perplexity: 9.6880sel length 13
Epoch [2/3], Step [1332/6471], Loss: 2.2551, Perplexity: 9.5358sel length 13

Epoch [2/3], Step [1532/6471], Loss: 2.1472, Perplexity: 8.5613sel length 12
Epoch [2/3], Step [1533/6471], Loss: 2.1270, Perplexity: 8.3901sel length 9
Epoch [2/3], Step [1534/6471], Loss: 2.1375, Perplexity: 8.4780sel length 11
Epoch [2/3], Step [1535/6471], Loss: 2.0373, Perplexity: 7.6701sel length 15
Epoch [2/3], Step [1536/6471], Loss: 2.3717, Perplexity: 10.7153sel length 11
Epoch [2/3], Step [1537/6471], Loss: 2.1351, Perplexity: 8.4578sel length 13
Epoch [2/3], Step [1538/6471], Loss: 2.2677, Perplexity: 9.6571sel length 10
Epoch [2/3], Step [1539/6471], Loss: 2.1834, Perplexity: 8.8761sel length 11
Epoch [2/3], Step [1540/6471], Loss: 1.9564, Perplexity: 7.0740sel length 12
Epoch [2/3], Step [1541/6471], Loss: 2.0248, Perplexity: 7.5749sel length 8
Epoch [2/3], Step [1542/6471], Loss: 2.3089, Perplexity: 10.0635sel length 10
Epoch [2/3], Step [1543/6471], Loss: 1.9864, Perplexity: 7.2892sel length 10
Epoch [2/3], Step [1544/6471], Loss: 1.9808, Perplexity: 7.2487sel length 10

Epoch [2/3], Step [1744/6471], Loss: 2.0771, Perplexity: 7.9812sel length 9
Epoch [2/3], Step [1745/6471], Loss: 1.9106, Perplexity: 6.7568sel length 20
Epoch [2/3], Step [1746/6471], Loss: 2.8730, Perplexity: 17.6904sel length 9
Epoch [2/3], Step [1747/6471], Loss: 2.0324, Perplexity: 7.6323sel length 11
Epoch [2/3], Step [1748/6471], Loss: 1.9566, Perplexity: 7.0753sel length 12
Epoch [2/3], Step [1749/6471], Loss: 2.1203, Perplexity: 8.3340sel length 13
Epoch [2/3], Step [1750/6471], Loss: 2.1494, Perplexity: 8.5795sel length 9
Epoch [2/3], Step [1751/6471], Loss: 2.0412, Perplexity: 7.6996sel length 10
Epoch [2/3], Step [1752/6471], Loss: 2.1950, Perplexity: 8.9799sel length 13
Epoch [2/3], Step [1753/6471], Loss: 2.1596, Perplexity: 8.6675sel length 16
Epoch [2/3], Step [1754/6471], Loss: 2.4530, Perplexity: 11.6234sel length 10
Epoch [2/3], Step [1755/6471], Loss: 2.0021, Perplexity: 7.4042sel length 10
Epoch [2/3], Step [1756/6471], Loss: 1.8760, Perplexity: 6.5273sel length 11


Epoch [2/3], Step [1956/6471], Loss: 1.8902, Perplexity: 6.6206sel length 12
Epoch [2/3], Step [1957/6471], Loss: 2.0157, Perplexity: 7.5059sel length 9
Epoch [2/3], Step [1958/6471], Loss: 2.0995, Perplexity: 8.1619sel length 12
Epoch [2/3], Step [1959/6471], Loss: 2.2590, Perplexity: 9.5737sel length 11
Epoch [2/3], Step [1960/6471], Loss: 1.9285, Perplexity: 6.8790sel length 10
Epoch [2/3], Step [1961/6471], Loss: 2.0402, Perplexity: 7.6918sel length 11
Epoch [2/3], Step [1962/6471], Loss: 1.9729, Perplexity: 7.1915sel length 16
Epoch [2/3], Step [1963/6471], Loss: 2.5137, Perplexity: 12.3509sel length 11
Epoch [2/3], Step [1964/6471], Loss: 1.9835, Perplexity: 7.2679sel length 9
Epoch [2/3], Step [1965/6471], Loss: 2.1353, Perplexity: 8.4594sel length 8
Epoch [2/3], Step [1966/6471], Loss: 2.2768, Perplexity: 9.7454sel length 12
Epoch [2/3], Step [1967/6471], Loss: 2.1156, Perplexity: 8.2949sel length 9
Epoch [2/3], Step [1968/6471], Loss: 2.1543, Perplexity: 8.6216sel length 18
Ep

Epoch [2/3], Step [2168/6471], Loss: 2.3089, Perplexity: 10.0634sel length 9
Epoch [2/3], Step [2169/6471], Loss: 1.9620, Perplexity: 7.1138sel length 10
Epoch [2/3], Step [2170/6471], Loss: 1.8967, Perplexity: 6.6640sel length 9
Epoch [2/3], Step [2171/6471], Loss: 2.0848, Perplexity: 8.0429sel length 13
Epoch [2/3], Step [2172/6471], Loss: 2.0343, Perplexity: 7.6466sel length 12
Epoch [2/3], Step [2173/6471], Loss: 2.1454, Perplexity: 8.5451sel length 17
Epoch [2/3], Step [2174/6471], Loss: 2.5895, Perplexity: 13.3225sel length 12
Epoch [2/3], Step [2175/6471], Loss: 1.9890, Perplexity: 7.3081sel length 11
Epoch [2/3], Step [2176/6471], Loss: 1.8844, Perplexity: 6.5822sel length 9
Epoch [2/3], Step [2177/6471], Loss: 2.1821, Perplexity: 8.8648sel length 10
Epoch [2/3], Step [2178/6471], Loss: 2.0315, Perplexity: 7.6253sel length 10
Epoch [2/3], Step [2179/6471], Loss: 2.0812, Perplexity: 8.0139sel length 12
Epoch [2/3], Step [2180/6471], Loss: 2.0608, Perplexity: 7.8520sel length 13


Epoch [2/3], Step [2380/6471], Loss: 2.0141, Perplexity: 7.4940sel length 12
Epoch [2/3], Step [2381/6471], Loss: 2.1805, Perplexity: 8.8511sel length 10
Epoch [2/3], Step [2382/6471], Loss: 1.9763, Perplexity: 7.2159sel length 13
Epoch [2/3], Step [2383/6471], Loss: 2.1695, Perplexity: 8.7537sel length 13
Epoch [2/3], Step [2384/6471], Loss: 2.1552, Perplexity: 8.6294sel length 10
Epoch [2/3], Step [2385/6471], Loss: 1.8849, Perplexity: 6.5857sel length 10
Epoch [2/3], Step [2386/6471], Loss: 2.0712, Perplexity: 7.9345sel length 9
Epoch [2/3], Step [2387/6471], Loss: 2.1240, Perplexity: 8.3649sel length 13
Epoch [2/3], Step [2388/6471], Loss: 2.2074, Perplexity: 9.0921sel length 13
Epoch [2/3], Step [2389/6471], Loss: 2.1810, Perplexity: 8.8556sel length 9
Epoch [2/3], Step [2390/6471], Loss: 2.1064, Perplexity: 8.2183sel length 17
Epoch [2/3], Step [2391/6471], Loss: 2.5629, Perplexity: 12.9735sel length 9
Epoch [2/3], Step [2392/6471], Loss: 2.1666, Perplexity: 8.7286sel length 9
Ep

Epoch [2/3], Step [2592/6471], Loss: 2.0851, Perplexity: 8.0454sel length 12
Epoch [2/3], Step [2593/6471], Loss: 1.9801, Perplexity: 7.2437sel length 19
Epoch [2/3], Step [2594/6471], Loss: 2.7966, Perplexity: 16.3895sel length 9
Epoch [2/3], Step [2595/6471], Loss: 2.0559, Perplexity: 7.8142sel length 12
Epoch [2/3], Step [2596/6471], Loss: 2.1292, Perplexity: 8.4077sel length 13
Epoch [2/3], Step [2597/6471], Loss: 2.2201, Perplexity: 9.2081sel length 10
Epoch [2/3], Step [2598/6471], Loss: 1.9130, Perplexity: 6.7737sel length 11
Epoch [2/3], Step [2599/6471], Loss: 1.9062, Perplexity: 6.7276sel length 9
Epoch [2/3], Step [2600/6471], Loss: 2.1161, Perplexity: 8.2984
sel length 13
Epoch [2/3], Step [2601/6471], Loss: 2.2215, Perplexity: 9.2210sel length 9
Epoch [2/3], Step [2602/6471], Loss: 1.9795, Perplexity: 7.2390sel length 9
Epoch [2/3], Step [2603/6471], Loss: 2.2082, Perplexity: 9.0992sel length 9
Epoch [2/3], Step [2604/6471], Loss: 1.9516, Perplexity: 7.0397sel length 10
Ep

Epoch [2/3], Step [2804/6471], Loss: 3.6404, Perplexity: 38.1054sel length 16
Epoch [2/3], Step [2805/6471], Loss: 2.3476, Perplexity: 10.4602sel length 9
Epoch [2/3], Step [2806/6471], Loss: 1.9797, Perplexity: 7.2405sel length 9
Epoch [2/3], Step [2807/6471], Loss: 1.9998, Perplexity: 7.3878sel length 15
Epoch [2/3], Step [2808/6471], Loss: 2.5539, Perplexity: 12.8567sel length 10
Epoch [2/3], Step [2809/6471], Loss: 1.9815, Perplexity: 7.2537sel length 11
Epoch [2/3], Step [2810/6471], Loss: 2.1480, Perplexity: 8.5673sel length 9
Epoch [2/3], Step [2811/6471], Loss: 2.1646, Perplexity: 8.7110sel length 16
Epoch [2/3], Step [2812/6471], Loss: 2.4973, Perplexity: 12.1501sel length 11
Epoch [2/3], Step [2813/6471], Loss: 1.9306, Perplexity: 6.8935sel length 13
Epoch [2/3], Step [2814/6471], Loss: 2.2268, Perplexity: 9.2698sel length 10
Epoch [2/3], Step [2815/6471], Loss: 2.1717, Perplexity: 8.7732sel length 12
Epoch [2/3], Step [2816/6471], Loss: 2.0384, Perplexity: 7.6784sel length 1

Epoch [2/3], Step [3016/6471], Loss: 2.1334, Perplexity: 8.4436sel length 17
Epoch [2/3], Step [3017/6471], Loss: 2.4715, Perplexity: 11.8407sel length 11
Epoch [2/3], Step [3018/6471], Loss: 2.0000, Perplexity: 7.3890sel length 10
Epoch [2/3], Step [3019/6471], Loss: 2.0489, Perplexity: 7.7595sel length 10
Epoch [2/3], Step [3020/6471], Loss: 2.0535, Perplexity: 7.7948sel length 13
Epoch [2/3], Step [3021/6471], Loss: 2.1611, Perplexity: 8.6809sel length 10
Epoch [2/3], Step [3022/6471], Loss: 2.1912, Perplexity: 8.9456sel length 12
Epoch [2/3], Step [3023/6471], Loss: 2.2172, Perplexity: 9.1818sel length 17
Epoch [2/3], Step [3024/6471], Loss: 2.4268, Perplexity: 11.3225sel length 11
Epoch [2/3], Step [3025/6471], Loss: 1.9675, Perplexity: 7.1529sel length 9
Epoch [2/3], Step [3026/6471], Loss: 2.2326, Perplexity: 9.3244sel length 13
Epoch [2/3], Step [3027/6471], Loss: 2.1724, Perplexity: 8.7793sel length 9
Epoch [2/3], Step [3028/6471], Loss: 2.0709, Perplexity: 7.9322sel length 12

Epoch [2/3], Step [3228/6471], Loss: 2.0791, Perplexity: 7.9972sel length 11
Epoch [2/3], Step [3229/6471], Loss: 1.9564, Perplexity: 7.0739sel length 13
Epoch [2/3], Step [3230/6471], Loss: 2.1507, Perplexity: 8.5909sel length 9
Epoch [2/3], Step [3231/6471], Loss: 2.2907, Perplexity: 9.8823sel length 9
Epoch [2/3], Step [3232/6471], Loss: 1.8721, Perplexity: 6.5020sel length 9
Epoch [2/3], Step [3233/6471], Loss: 2.0396, Perplexity: 7.6879sel length 12
Epoch [2/3], Step [3234/6471], Loss: 1.8644, Perplexity: 6.4520sel length 9
Epoch [2/3], Step [3235/6471], Loss: 2.0995, Perplexity: 8.1622sel length 9
Epoch [2/3], Step [3236/6471], Loss: 2.1006, Perplexity: 8.1709sel length 11
Epoch [2/3], Step [3237/6471], Loss: 1.9816, Perplexity: 7.2545sel length 9
Epoch [2/3], Step [3238/6471], Loss: 2.2097, Perplexity: 9.1134sel length 11
Epoch [2/3], Step [3239/6471], Loss: 1.8398, Perplexity: 6.2954sel length 10
Epoch [2/3], Step [3240/6471], Loss: 1.9914, Perplexity: 7.3257sel length 10
Epoch

Epoch [2/3], Step [3440/6471], Loss: 2.0280, Perplexity: 7.5988sel length 9
Epoch [2/3], Step [3441/6471], Loss: 2.0029, Perplexity: 7.4104sel length 9
Epoch [2/3], Step [3442/6471], Loss: 1.9751, Perplexity: 7.2077sel length 9
Epoch [2/3], Step [3443/6471], Loss: 2.0993, Perplexity: 8.1608sel length 12
Epoch [2/3], Step [3444/6471], Loss: 2.0917, Perplexity: 8.0986sel length 9
Epoch [2/3], Step [3445/6471], Loss: 1.9178, Perplexity: 6.8062sel length 17
Epoch [2/3], Step [3446/6471], Loss: 2.6533, Perplexity: 14.2011sel length 8
Epoch [2/3], Step [3447/6471], Loss: 2.1877, Perplexity: 8.9146sel length 10
Epoch [2/3], Step [3448/6471], Loss: 1.8523, Perplexity: 6.3746sel length 12
Epoch [2/3], Step [3449/6471], Loss: 1.9198, Perplexity: 6.8194sel length 10
Epoch [2/3], Step [3450/6471], Loss: 1.9431, Perplexity: 6.9802sel length 8
Epoch [2/3], Step [3451/6471], Loss: 2.2965, Perplexity: 9.9390sel length 11
Epoch [2/3], Step [3452/6471], Loss: 1.9764, Perplexity: 7.2167sel length 8
Epoch

Epoch [2/3], Step [3652/6471], Loss: 2.2360, Perplexity: 9.3560sel length 13
Epoch [2/3], Step [3653/6471], Loss: 2.1554, Perplexity: 8.6315sel length 9
Epoch [2/3], Step [3654/6471], Loss: 1.9993, Perplexity: 7.3837sel length 11
Epoch [2/3], Step [3655/6471], Loss: 1.8588, Perplexity: 6.4161sel length 18
Epoch [2/3], Step [3656/6471], Loss: 2.6322, Perplexity: 13.9041sel length 18
Epoch [2/3], Step [3657/6471], Loss: 2.6279, Perplexity: 13.8446sel length 12
Epoch [2/3], Step [3658/6471], Loss: 2.0257, Perplexity: 7.5812sel length 11
Epoch [2/3], Step [3659/6471], Loss: 2.1793, Perplexity: 8.8402sel length 10
Epoch [2/3], Step [3660/6471], Loss: 2.0710, Perplexity: 7.9325sel length 13
Epoch [2/3], Step [3661/6471], Loss: 1.9967, Perplexity: 7.3645sel length 8
Epoch [2/3], Step [3662/6471], Loss: 2.3732, Perplexity: 10.7319sel length 11
Epoch [2/3], Step [3663/6471], Loss: 1.9913, Perplexity: 7.3249sel length 13
Epoch [2/3], Step [3664/6471], Loss: 2.2356, Perplexity: 9.3521sel length 1

Epoch [2/3], Step [3864/6471], Loss: 1.9357, Perplexity: 6.9290sel length 11
Epoch [2/3], Step [3865/6471], Loss: 1.9879, Perplexity: 7.2999sel length 15
Epoch [2/3], Step [3866/6471], Loss: 2.3164, Perplexity: 10.1394sel length 11
Epoch [2/3], Step [3867/6471], Loss: 2.0474, Perplexity: 7.7480sel length 10
Epoch [2/3], Step [3868/6471], Loss: 2.0671, Perplexity: 7.9020sel length 14
Epoch [2/3], Step [3869/6471], Loss: 2.2987, Perplexity: 9.9608sel length 9
Epoch [2/3], Step [3870/6471], Loss: 2.2909, Perplexity: 9.8839sel length 13
Epoch [2/3], Step [3871/6471], Loss: 1.9442, Perplexity: 6.9881sel length 12
Epoch [2/3], Step [3872/6471], Loss: 2.0782, Perplexity: 7.9902sel length 12
Epoch [2/3], Step [3873/6471], Loss: 2.0098, Perplexity: 7.4617sel length 10
Epoch [2/3], Step [3874/6471], Loss: 1.9322, Perplexity: 6.9046sel length 14
Epoch [2/3], Step [3875/6471], Loss: 2.2405, Perplexity: 9.3976sel length 10
Epoch [2/3], Step [3876/6471], Loss: 1.8998, Perplexity: 6.6849sel length 10

Epoch [2/3], Step [4076/6471], Loss: 1.9617, Perplexity: 7.1118sel length 11
Epoch [2/3], Step [4077/6471], Loss: 1.8367, Perplexity: 6.2759sel length 13
Epoch [2/3], Step [4078/6471], Loss: 2.0870, Perplexity: 8.0605sel length 8
Epoch [2/3], Step [4079/6471], Loss: 2.3924, Perplexity: 10.9400sel length 8
Epoch [2/3], Step [4080/6471], Loss: 2.2578, Perplexity: 9.5623sel length 11
Epoch [2/3], Step [4081/6471], Loss: 2.0371, Perplexity: 7.6681sel length 9
Epoch [2/3], Step [4082/6471], Loss: 1.9117, Perplexity: 6.7643sel length 13
Epoch [2/3], Step [4083/6471], Loss: 2.1736, Perplexity: 8.7901sel length 10
Epoch [2/3], Step [4084/6471], Loss: 1.9717, Perplexity: 7.1829sel length 10
Epoch [2/3], Step [4085/6471], Loss: 2.0646, Perplexity: 7.8825sel length 14
Epoch [2/3], Step [4086/6471], Loss: 2.3087, Perplexity: 10.0613sel length 12
Epoch [2/3], Step [4087/6471], Loss: 2.1246, Perplexity: 8.3698sel length 11
Epoch [2/3], Step [4088/6471], Loss: 2.0476, Perplexity: 7.7494sel length 9
E

Epoch [2/3], Step [4288/6471], Loss: 2.0685, Perplexity: 7.9133sel length 10
Epoch [2/3], Step [4289/6471], Loss: 1.9129, Perplexity: 6.7727sel length 9
Epoch [2/3], Step [4290/6471], Loss: 1.8716, Perplexity: 6.4984sel length 14
Epoch [2/3], Step [4291/6471], Loss: 2.3652, Perplexity: 10.6457sel length 11
Epoch [2/3], Step [4292/6471], Loss: 2.0109, Perplexity: 7.4700sel length 16
Epoch [2/3], Step [4293/6471], Loss: 2.4851, Perplexity: 12.0019sel length 9
Epoch [2/3], Step [4294/6471], Loss: 1.9516, Perplexity: 7.0397sel length 11
Epoch [2/3], Step [4295/6471], Loss: 1.9147, Perplexity: 6.7850sel length 11
Epoch [2/3], Step [4296/6471], Loss: 1.9029, Perplexity: 6.7052sel length 10
Epoch [2/3], Step [4297/6471], Loss: 1.9850, Perplexity: 7.2788sel length 11
Epoch [2/3], Step [4298/6471], Loss: 1.9717, Perplexity: 7.1827sel length 11
Epoch [2/3], Step [4299/6471], Loss: 1.9576, Perplexity: 7.0822sel length 11
Epoch [2/3], Step [4300/6471], Loss: 2.0263, Perplexity: 7.5856
sel length 1

Epoch [2/3], Step [4500/6471], Loss: 2.2137, Perplexity: 9.1499
sel length 10
Epoch [2/3], Step [4501/6471], Loss: 1.9247, Perplexity: 6.8528sel length 8
Epoch [2/3], Step [4502/6471], Loss: 2.5245, Perplexity: 12.4847sel length 9
Epoch [2/3], Step [4503/6471], Loss: 2.1214, Perplexity: 8.3424sel length 13
Epoch [2/3], Step [4504/6471], Loss: 2.2576, Perplexity: 9.5600sel length 15
Epoch [2/3], Step [4505/6471], Loss: 2.3925, Perplexity: 10.9405sel length 9
Epoch [2/3], Step [4506/6471], Loss: 1.8884, Perplexity: 6.6086sel length 16
Epoch [2/3], Step [4507/6471], Loss: 2.3754, Perplexity: 10.7558sel length 11
Epoch [2/3], Step [4508/6471], Loss: 2.0418, Perplexity: 7.7047sel length 10
Epoch [2/3], Step [4509/6471], Loss: 2.0395, Perplexity: 7.6868sel length 12
Epoch [2/3], Step [4510/6471], Loss: 2.1778, Perplexity: 8.8264sel length 11
Epoch [2/3], Step [4511/6471], Loss: 2.0546, Perplexity: 7.8040sel length 10
Epoch [2/3], Step [4512/6471], Loss: 1.9225, Perplexity: 6.8380sel length 1

Epoch [2/3], Step [4712/6471], Loss: 1.9066, Perplexity: 6.7299sel length 13
Epoch [2/3], Step [4713/6471], Loss: 1.9441, Perplexity: 6.9876sel length 11
Epoch [2/3], Step [4714/6471], Loss: 1.9152, Perplexity: 6.7886sel length 14
Epoch [2/3], Step [4715/6471], Loss: 2.0944, Perplexity: 8.1205sel length 11
Epoch [2/3], Step [4716/6471], Loss: 2.0146, Perplexity: 7.4979sel length 10
Epoch [2/3], Step [4717/6471], Loss: 2.0365, Perplexity: 7.6638sel length 16
Epoch [2/3], Step [4718/6471], Loss: 2.7194, Perplexity: 15.1719sel length 13
Epoch [2/3], Step [4719/6471], Loss: 1.9966, Perplexity: 7.3637sel length 9
Epoch [2/3], Step [4720/6471], Loss: 2.0374, Perplexity: 7.6710sel length 9
Epoch [2/3], Step [4721/6471], Loss: 2.0390, Perplexity: 7.6827sel length 13
Epoch [2/3], Step [4722/6471], Loss: 1.8981, Perplexity: 6.6733sel length 12
Epoch [2/3], Step [4723/6471], Loss: 2.1491, Perplexity: 8.5768sel length 8
Epoch [2/3], Step [4724/6471], Loss: 2.1342, Perplexity: 8.4507sel length 9
Ep

Epoch [2/3], Step [4924/6471], Loss: 1.9153, Perplexity: 6.7891sel length 9
Epoch [2/3], Step [4925/6471], Loss: 2.2005, Perplexity: 9.0295sel length 10
Epoch [2/3], Step [4926/6471], Loss: 1.9163, Perplexity: 6.7958sel length 9
Epoch [2/3], Step [4927/6471], Loss: 1.9710, Perplexity: 7.1778sel length 8
Epoch [2/3], Step [4928/6471], Loss: 2.3563, Perplexity: 10.5517sel length 13
Epoch [2/3], Step [4929/6471], Loss: 2.0968, Perplexity: 8.1400sel length 24
Epoch [2/3], Step [4930/6471], Loss: 3.0002, Perplexity: 20.0894sel length 9
Epoch [2/3], Step [4931/6471], Loss: 2.0344, Perplexity: 7.6477sel length 9
Epoch [2/3], Step [4932/6471], Loss: 2.0708, Perplexity: 7.9313sel length 13
Epoch [2/3], Step [4933/6471], Loss: 2.2550, Perplexity: 9.5355sel length 10
Epoch [2/3], Step [4934/6471], Loss: 2.0741, Perplexity: 7.9572sel length 11
Epoch [2/3], Step [4935/6471], Loss: 1.9414, Perplexity: 6.9685sel length 16
Epoch [2/3], Step [4936/6471], Loss: 2.4899, Perplexity: 12.0598sel length 11
E

Epoch [2/3], Step [5136/6471], Loss: 1.9660, Perplexity: 7.1419sel length 12
Epoch [2/3], Step [5137/6471], Loss: 2.1839, Perplexity: 8.8812sel length 12
Epoch [2/3], Step [5138/6471], Loss: 1.9154, Perplexity: 6.7899sel length 8
Epoch [2/3], Step [5139/6471], Loss: 2.2676, Perplexity: 9.6566sel length 11
Epoch [2/3], Step [5140/6471], Loss: 1.8842, Perplexity: 6.5809sel length 12
Epoch [2/3], Step [5141/6471], Loss: 1.9514, Perplexity: 7.0384sel length 10
Epoch [2/3], Step [5142/6471], Loss: 1.8598, Perplexity: 6.4226sel length 9
Epoch [2/3], Step [5143/6471], Loss: 1.9206, Perplexity: 6.8248sel length 13
Epoch [2/3], Step [5144/6471], Loss: 2.0264, Perplexity: 7.5871sel length 11
Epoch [2/3], Step [5145/6471], Loss: 1.8460, Perplexity: 6.3343sel length 11
Epoch [2/3], Step [5146/6471], Loss: 1.9496, Perplexity: 7.0259sel length 9
Epoch [2/3], Step [5147/6471], Loss: 1.9828, Perplexity: 7.2634sel length 9
Epoch [2/3], Step [5148/6471], Loss: 2.0832, Perplexity: 8.0299sel length 13
Epo

Epoch [2/3], Step [5348/6471], Loss: 1.8700, Perplexity: 6.4886sel length 16
Epoch [2/3], Step [5349/6471], Loss: 2.4906, Perplexity: 12.0680sel length 13
Epoch [2/3], Step [5350/6471], Loss: 2.0712, Perplexity: 7.9340sel length 13
Epoch [2/3], Step [5351/6471], Loss: 2.1982, Perplexity: 9.0091sel length 13
Epoch [2/3], Step [5352/6471], Loss: 2.1394, Perplexity: 8.4946sel length 11
Epoch [2/3], Step [5353/6471], Loss: 1.8443, Perplexity: 6.3238sel length 10
Epoch [2/3], Step [5354/6471], Loss: 1.9527, Perplexity: 7.0479sel length 9
Epoch [2/3], Step [5355/6471], Loss: 1.9470, Perplexity: 7.0076sel length 8
Epoch [2/3], Step [5356/6471], Loss: 2.2951, Perplexity: 9.9259sel length 9
Epoch [2/3], Step [5357/6471], Loss: 1.9190, Perplexity: 6.8143sel length 11
Epoch [2/3], Step [5358/6471], Loss: 2.0161, Perplexity: 7.5089sel length 11
Epoch [2/3], Step [5359/6471], Loss: 1.8723, Perplexity: 6.5030sel length 12
Epoch [2/3], Step [5360/6471], Loss: 1.9332, Perplexity: 6.9118sel length 10
E

Epoch [2/3], Step [5560/6471], Loss: 1.7908, Perplexity: 5.9940sel length 10
Epoch [2/3], Step [5561/6471], Loss: 1.7092, Perplexity: 5.5244sel length 11
Epoch [2/3], Step [5562/6471], Loss: 2.0627, Perplexity: 7.8673sel length 16
Epoch [2/3], Step [5563/6471], Loss: 2.3179, Perplexity: 10.1540sel length 11
Epoch [2/3], Step [5564/6471], Loss: 1.9228, Perplexity: 6.8399sel length 9
Epoch [2/3], Step [5565/6471], Loss: 2.0678, Perplexity: 7.9076sel length 13
Epoch [2/3], Step [5566/6471], Loss: 2.1438, Perplexity: 8.5320sel length 11
Epoch [2/3], Step [5567/6471], Loss: 2.0801, Perplexity: 8.0049sel length 10
Epoch [2/3], Step [5568/6471], Loss: 1.9975, Perplexity: 7.3703sel length 12
Epoch [2/3], Step [5569/6471], Loss: 1.9728, Perplexity: 7.1905sel length 9
Epoch [2/3], Step [5570/6471], Loss: 1.9312, Perplexity: 6.8978sel length 8
Epoch [2/3], Step [5571/6471], Loss: 2.3784, Perplexity: 10.7878sel length 10
Epoch [2/3], Step [5572/6471], Loss: 2.0936, Perplexity: 8.1138sel length 17


Epoch [2/3], Step [5772/6471], Loss: 1.7017, Perplexity: 5.4832sel length 11
Epoch [2/3], Step [5773/6471], Loss: 1.8576, Perplexity: 6.4081sel length 12
Epoch [2/3], Step [5774/6471], Loss: 2.0425, Perplexity: 7.7100sel length 15
Epoch [2/3], Step [5775/6471], Loss: 2.2894, Perplexity: 9.8689sel length 11
Epoch [2/3], Step [5776/6471], Loss: 1.8096, Perplexity: 6.1081sel length 11
Epoch [2/3], Step [5777/6471], Loss: 1.7577, Perplexity: 5.7992sel length 9
Epoch [2/3], Step [5778/6471], Loss: 1.9568, Perplexity: 7.0769sel length 13
Epoch [2/3], Step [5779/6471], Loss: 1.8562, Perplexity: 6.3995sel length 11
Epoch [2/3], Step [5780/6471], Loss: 2.0840, Perplexity: 8.0362sel length 9
Epoch [2/3], Step [5781/6471], Loss: 2.0081, Perplexity: 7.4490sel length 12
Epoch [2/3], Step [5782/6471], Loss: 1.8764, Perplexity: 6.5301sel length 19
Epoch [2/3], Step [5783/6471], Loss: 2.7661, Perplexity: 15.8972sel length 12
Epoch [2/3], Step [5784/6471], Loss: 1.9588, Perplexity: 7.0911sel length 11


Epoch [2/3], Step [5984/6471], Loss: 1.9726, Perplexity: 7.1890sel length 26
Epoch [2/3], Step [5985/6471], Loss: 3.1030, Perplexity: 22.2642sel length 12
Epoch [2/3], Step [5986/6471], Loss: 2.0481, Perplexity: 7.7529sel length 13
Epoch [2/3], Step [5987/6471], Loss: 2.0859, Perplexity: 8.0518sel length 9
Epoch [2/3], Step [5988/6471], Loss: 2.1152, Perplexity: 8.2915sel length 11
Epoch [2/3], Step [5989/6471], Loss: 1.9702, Perplexity: 7.1723sel length 9
Epoch [2/3], Step [5990/6471], Loss: 1.9635, Perplexity: 7.1240sel length 12
Epoch [2/3], Step [5991/6471], Loss: 2.0021, Perplexity: 7.4045sel length 10
Epoch [2/3], Step [5992/6471], Loss: 1.9887, Perplexity: 7.3062sel length 9
Epoch [2/3], Step [5993/6471], Loss: 1.9690, Perplexity: 7.1632sel length 11
Epoch [2/3], Step [5994/6471], Loss: 1.9289, Perplexity: 6.8819sel length 10
Epoch [2/3], Step [5995/6471], Loss: 2.1903, Perplexity: 8.9377sel length 10
Epoch [2/3], Step [5996/6471], Loss: 1.8129, Perplexity: 6.1281sel length 11
E

Epoch [2/3], Step [6196/6471], Loss: 2.0353, Perplexity: 7.6546sel length 10
Epoch [2/3], Step [6197/6471], Loss: 1.8506, Perplexity: 6.3637sel length 14
Epoch [2/3], Step [6198/6471], Loss: 2.4026, Perplexity: 11.0515sel length 12
Epoch [2/3], Step [6199/6471], Loss: 1.9917, Perplexity: 7.3279sel length 10
Epoch [2/3], Step [6200/6471], Loss: 1.8475, Perplexity: 6.3441
sel length 11
Epoch [2/3], Step [6201/6471], Loss: 1.9696, Perplexity: 7.1678sel length 12
Epoch [2/3], Step [6202/6471], Loss: 2.0406, Perplexity: 7.6949sel length 9
Epoch [2/3], Step [6203/6471], Loss: 1.9370, Perplexity: 6.9382sel length 13
Epoch [2/3], Step [6204/6471], Loss: 2.0234, Perplexity: 7.5638sel length 11
Epoch [2/3], Step [6205/6471], Loss: 1.9870, Perplexity: 7.2937sel length 11
Epoch [2/3], Step [6206/6471], Loss: 2.0186, Perplexity: 7.5281sel length 12
Epoch [2/3], Step [6207/6471], Loss: 2.0039, Perplexity: 7.4180sel length 9
Epoch [2/3], Step [6208/6471], Loss: 2.1922, Perplexity: 8.9546sel length 12

Epoch [2/3], Step [6408/6471], Loss: 1.8146, Perplexity: 6.1385sel length 12
Epoch [2/3], Step [6409/6471], Loss: 1.8752, Perplexity: 6.5224sel length 9
Epoch [2/3], Step [6410/6471], Loss: 1.8996, Perplexity: 6.6834sel length 9
Epoch [2/3], Step [6411/6471], Loss: 2.0342, Perplexity: 7.6458sel length 12
Epoch [2/3], Step [6412/6471], Loss: 2.1186, Perplexity: 8.3194sel length 10
Epoch [2/3], Step [6413/6471], Loss: 1.8676, Perplexity: 6.4724sel length 11
Epoch [2/3], Step [6414/6471], Loss: 1.9442, Perplexity: 6.9883sel length 11
Epoch [2/3], Step [6415/6471], Loss: 1.9680, Perplexity: 7.1567sel length 13
Epoch [2/3], Step [6416/6471], Loss: 2.1925, Perplexity: 8.9579sel length 11
Epoch [2/3], Step [6417/6471], Loss: 1.8402, Perplexity: 6.2978sel length 10
Epoch [2/3], Step [6418/6471], Loss: 1.9830, Perplexity: 7.2644sel length 13
Epoch [2/3], Step [6419/6471], Loss: 2.0485, Perplexity: 7.7564sel length 11
Epoch [2/3], Step [6420/6471], Loss: 1.9080, Perplexity: 6.7399sel length 11
E

Epoch [3/3], Step [45/6471], Loss: 2.0272, Perplexity: 7.5926sel length 13
Epoch [3/3], Step [46/6471], Loss: 1.9548, Perplexity: 7.0625sel length 12
Epoch [3/3], Step [47/6471], Loss: 2.0067, Perplexity: 7.4390sel length 12
Epoch [3/3], Step [48/6471], Loss: 1.8882, Perplexity: 6.6075sel length 11
Epoch [3/3], Step [49/6471], Loss: 1.8751, Perplexity: 6.5215sel length 8
Epoch [3/3], Step [50/6471], Loss: 2.1608, Perplexity: 8.6777sel length 10
Epoch [3/3], Step [51/6471], Loss: 1.8209, Perplexity: 6.1772sel length 11
Epoch [3/3], Step [52/6471], Loss: 1.8177, Perplexity: 6.1575sel length 10
Epoch [3/3], Step [53/6471], Loss: 2.0494, Perplexity: 7.7629sel length 10
Epoch [3/3], Step [54/6471], Loss: 2.0157, Perplexity: 7.5061sel length 8
Epoch [3/3], Step [55/6471], Loss: 2.2357, Perplexity: 9.3530sel length 10
Epoch [3/3], Step [56/6471], Loss: 2.0132, Perplexity: 7.4870sel length 11
Epoch [3/3], Step [57/6471], Loss: 2.1775, Perplexity: 8.8242sel length 10
Epoch [3/3], Step [58/6471]

Epoch [3/3], Step [261/6471], Loss: 2.0927, Perplexity: 8.1071sel length 13
Epoch [3/3], Step [262/6471], Loss: 2.1073, Perplexity: 8.2261sel length 10
Epoch [3/3], Step [263/6471], Loss: 1.8527, Perplexity: 6.3772sel length 8
Epoch [3/3], Step [264/6471], Loss: 2.1566, Perplexity: 8.6416sel length 10
Epoch [3/3], Step [265/6471], Loss: 2.0894, Perplexity: 8.0797sel length 11
Epoch [3/3], Step [266/6471], Loss: 1.9017, Perplexity: 6.6970sel length 10
Epoch [3/3], Step [267/6471], Loss: 1.9340, Perplexity: 6.9168sel length 12
Epoch [3/3], Step [268/6471], Loss: 2.0229, Perplexity: 7.5604sel length 13
Epoch [3/3], Step [269/6471], Loss: 2.0258, Perplexity: 7.5825sel length 13
Epoch [3/3], Step [270/6471], Loss: 2.2106, Perplexity: 9.1211sel length 11
Epoch [3/3], Step [271/6471], Loss: 1.9104, Perplexity: 6.7561sel length 12
Epoch [3/3], Step [272/6471], Loss: 1.9253, Perplexity: 6.8571sel length 12
Epoch [3/3], Step [273/6471], Loss: 1.9346, Perplexity: 6.9213sel length 15
Epoch [3/3], 

Epoch [3/3], Step [477/6471], Loss: 2.1696, Perplexity: 8.7546sel length 13
Epoch [3/3], Step [478/6471], Loss: 1.9053, Perplexity: 6.7216sel length 12
Epoch [3/3], Step [479/6471], Loss: 1.8259, Perplexity: 6.2084sel length 11
Epoch [3/3], Step [480/6471], Loss: 2.1545, Perplexity: 8.6237sel length 17
Epoch [3/3], Step [481/6471], Loss: 2.4976, Perplexity: 12.1534sel length 12
Epoch [3/3], Step [482/6471], Loss: 1.8347, Perplexity: 6.2630sel length 16
Epoch [3/3], Step [483/6471], Loss: 2.3561, Perplexity: 10.5502sel length 8
Epoch [3/3], Step [484/6471], Loss: 2.3439, Perplexity: 10.4216sel length 11
Epoch [3/3], Step [485/6471], Loss: 1.8385, Perplexity: 6.2871sel length 10
Epoch [3/3], Step [486/6471], Loss: 2.0823, Perplexity: 8.0228sel length 16
Epoch [3/3], Step [487/6471], Loss: 2.4402, Perplexity: 11.4753sel length 10
Epoch [3/3], Step [488/6471], Loss: 1.9595, Perplexity: 7.0957sel length 12
Epoch [3/3], Step [489/6471], Loss: 1.9932, Perplexity: 7.3389sel length 12
Epoch [3/

Epoch [3/3], Step [585/6471], Loss: 2.0509, Perplexity: 7.7747sel length 11
Epoch [3/3], Step [586/6471], Loss: 1.8251, Perplexity: 6.2032sel length 10
Epoch [3/3], Step [587/6471], Loss: 1.9280, Perplexity: 6.8760sel length 12
Epoch [3/3], Step [588/6471], Loss: 1.9388, Perplexity: 6.9502sel length 13
Epoch [3/3], Step [589/6471], Loss: 2.1588, Perplexity: 8.6610sel length 11
Epoch [3/3], Step [590/6471], Loss: 2.0750, Perplexity: 7.9647sel length 12
Epoch [3/3], Step [591/6471], Loss: 2.0391, Perplexity: 7.6834sel length 9
Epoch [3/3], Step [592/6471], Loss: 2.0970, Perplexity: 8.1419sel length 12
Epoch [3/3], Step [593/6471], Loss: 2.0104, Perplexity: 7.4663sel length 9
Epoch [3/3], Step [594/6471], Loss: 2.2500, Perplexity: 9.4873sel length 13
Epoch [3/3], Step [595/6471], Loss: 2.0176, Perplexity: 7.5205sel length 11
Epoch [3/3], Step [596/6471], Loss: 1.9899, Perplexity: 7.3146sel length 11
Epoch [3/3], Step [597/6471], Loss: 1.9600, Perplexity: 7.0992sel length 9
Epoch [3/3], St

Epoch [3/3], Step [801/6471], Loss: 1.9815, Perplexity: 7.2537sel length 13
Epoch [3/3], Step [802/6471], Loss: 2.0684, Perplexity: 7.9123sel length 12
Epoch [3/3], Step [803/6471], Loss: 2.0478, Perplexity: 7.7509sel length 11
Epoch [3/3], Step [804/6471], Loss: 1.8854, Perplexity: 6.5888sel length 9
Epoch [3/3], Step [805/6471], Loss: 1.9294, Perplexity: 6.8856sel length 15
Epoch [3/3], Step [806/6471], Loss: 2.4407, Perplexity: 11.4810sel length 12
Epoch [3/3], Step [807/6471], Loss: 1.8306, Perplexity: 6.2374sel length 13
Epoch [3/3], Step [808/6471], Loss: 2.2293, Perplexity: 9.2934sel length 13
Epoch [3/3], Step [809/6471], Loss: 2.1681, Perplexity: 8.7412sel length 10
Epoch [3/3], Step [810/6471], Loss: 2.0063, Perplexity: 7.4354sel length 9
Epoch [3/3], Step [811/6471], Loss: 1.9094, Perplexity: 6.7494sel length 11
Epoch [3/3], Step [812/6471], Loss: 1.9726, Perplexity: 7.1892sel length 11
Epoch [3/3], Step [813/6471], Loss: 1.8567, Perplexity: 6.4023sel length 10
Epoch [3/3], 

Epoch [3/3], Step [1017/6471], Loss: 2.0831, Perplexity: 8.0293sel length 11
Epoch [3/3], Step [1018/6471], Loss: 1.8013, Perplexity: 6.0577sel length 13
Epoch [3/3], Step [1019/6471], Loss: 2.1560, Perplexity: 8.6365sel length 10
Epoch [3/3], Step [1020/6471], Loss: 1.9296, Perplexity: 6.8865sel length 9
Epoch [3/3], Step [1021/6471], Loss: 2.0956, Perplexity: 8.1299sel length 10
Epoch [3/3], Step [1022/6471], Loss: 1.8423, Perplexity: 6.3108sel length 12
Epoch [3/3], Step [1023/6471], Loss: 1.9739, Perplexity: 7.1990sel length 9
Epoch [3/3], Step [1024/6471], Loss: 1.8774, Perplexity: 6.5367sel length 9
Epoch [3/3], Step [1025/6471], Loss: 1.9440, Perplexity: 6.9866sel length 7
Epoch [3/3], Step [1026/6471], Loss: 2.3292, Perplexity: 10.2695sel length 9
Epoch [3/3], Step [1027/6471], Loss: 1.8316, Perplexity: 6.2436sel length 13
Epoch [3/3], Step [1028/6471], Loss: 1.9685, Perplexity: 7.1600sel length 12
Epoch [3/3], Step [1029/6471], Loss: 1.9160, Perplexity: 6.7937sel length 10
Epo

Epoch [3/3], Step [1229/6471], Loss: 1.8798, Perplexity: 6.5521sel length 9
Epoch [3/3], Step [1230/6471], Loss: 1.9587, Perplexity: 7.0899sel length 9
Epoch [3/3], Step [1231/6471], Loss: 2.0870, Perplexity: 8.0604sel length 9
Epoch [3/3], Step [1232/6471], Loss: 2.0563, Perplexity: 7.8169sel length 13
Epoch [3/3], Step [1233/6471], Loss: 2.1431, Perplexity: 8.5255sel length 10
Epoch [3/3], Step [1234/6471], Loss: 1.8449, Perplexity: 6.3272sel length 14
Epoch [3/3], Step [1235/6471], Loss: 2.1458, Perplexity: 8.5487sel length 11
Epoch [3/3], Step [1236/6471], Loss: 2.0077, Perplexity: 7.4458sel length 11
Epoch [3/3], Step [1237/6471], Loss: 2.1955, Perplexity: 8.9848sel length 16
Epoch [3/3], Step [1238/6471], Loss: 2.2895, Perplexity: 9.8698sel length 12
Epoch [3/3], Step [1239/6471], Loss: 2.0332, Perplexity: 7.6385sel length 10
Epoch [3/3], Step [1240/6471], Loss: 1.8891, Perplexity: 6.6133sel length 11
Epoch [3/3], Step [1241/6471], Loss: 1.7933, Perplexity: 6.0092sel length 13
Ep

Epoch [3/3], Step [1518/6471], Loss: 2.3214, Perplexity: 10.1900sel length 11
Epoch [3/3], Step [1519/6471], Loss: 1.9015, Perplexity: 6.6960sel length 11
Epoch [3/3], Step [1520/6471], Loss: 1.9415, Perplexity: 6.9694sel length 17
Epoch [3/3], Step [1521/6471], Loss: 2.6088, Perplexity: 13.5822sel length 9
Epoch [3/3], Step [1522/6471], Loss: 2.1918, Perplexity: 8.9515sel length 13
Epoch [3/3], Step [1523/6471], Loss: 1.9365, Perplexity: 6.9346sel length 14
Epoch [3/3], Step [1524/6471], Loss: 2.1602, Perplexity: 8.6732sel length 9
Epoch [3/3], Step [1525/6471], Loss: 2.2002, Perplexity: 9.0271sel length 11
Epoch [3/3], Step [1526/6471], Loss: 1.8696, Perplexity: 6.4858sel length 11
Epoch [3/3], Step [1527/6471], Loss: 1.9038, Perplexity: 6.7117sel length 10
Epoch [3/3], Step [1528/6471], Loss: 1.9428, Perplexity: 6.9780sel length 9
Epoch [3/3], Step [1529/6471], Loss: 2.0386, Perplexity: 7.6801sel length 12
Epoch [3/3], Step [1530/6471], Loss: 1.8839, Perplexity: 6.5790sel length 11


Epoch [3/3], Step [1730/6471], Loss: 1.9418, Perplexity: 6.9716sel length 14
Epoch [3/3], Step [1731/6471], Loss: 2.1787, Perplexity: 8.8348sel length 13
Epoch [3/3], Step [1732/6471], Loss: 1.9777, Perplexity: 7.2263sel length 8
Epoch [3/3], Step [1733/6471], Loss: 2.0738, Perplexity: 7.9548sel length 12
Epoch [3/3], Step [1734/6471], Loss: 2.0577, Perplexity: 7.8281sel length 13
Epoch [3/3], Step [1735/6471], Loss: 2.0034, Perplexity: 7.4146sel length 16
Epoch [3/3], Step [1736/6471], Loss: 2.4780, Perplexity: 11.9179sel length 9
Epoch [3/3], Step [1737/6471], Loss: 1.9728, Perplexity: 7.1904sel length 11
Epoch [3/3], Step [1738/6471], Loss: 1.9095, Perplexity: 6.7500sel length 11
Epoch [3/3], Step [1739/6471], Loss: 1.9548, Perplexity: 7.0626sel length 11
Epoch [3/3], Step [1740/6471], Loss: 1.8809, Perplexity: 6.5597sel length 12
Epoch [3/3], Step [1741/6471], Loss: 2.1670, Perplexity: 8.7323sel length 10
Epoch [3/3], Step [1742/6471], Loss: 1.9335, Perplexity: 6.9134sel length 9
E

Epoch [3/3], Step [1942/6471], Loss: 2.1013, Perplexity: 8.1770sel length 10
Epoch [3/3], Step [1943/6471], Loss: 1.9288, Perplexity: 6.8810sel length 10
Epoch [3/3], Step [1944/6471], Loss: 1.8321, Perplexity: 6.2468sel length 10
Epoch [3/3], Step [1945/6471], Loss: 1.7468, Perplexity: 5.7363sel length 13
Epoch [3/3], Step [1946/6471], Loss: 2.0697, Perplexity: 7.9222sel length 20
Epoch [3/3], Step [1947/6471], Loss: 2.7197, Perplexity: 15.1754sel length 13
Epoch [3/3], Step [1948/6471], Loss: 1.8805, Perplexity: 6.5567sel length 12
Epoch [3/3], Step [1949/6471], Loss: 2.0398, Perplexity: 7.6889sel length 10
Epoch [3/3], Step [1950/6471], Loss: 1.8976, Perplexity: 6.6698sel length 12
Epoch [3/3], Step [1951/6471], Loss: 1.9570, Perplexity: 7.0780sel length 10
Epoch [3/3], Step [1952/6471], Loss: 1.8581, Perplexity: 6.4115sel length 9
Epoch [3/3], Step [1953/6471], Loss: 2.0971, Perplexity: 8.1427sel length 12
Epoch [3/3], Step [1954/6471], Loss: 1.9164, Perplexity: 6.7963sel length 10

Epoch [3/3], Step [2154/6471], Loss: 2.1005, Perplexity: 8.1706sel length 19
Epoch [3/3], Step [2155/6471], Loss: 2.6419, Perplexity: 14.0398sel length 10
Epoch [3/3], Step [2156/6471], Loss: 1.8393, Perplexity: 6.2919sel length 18
Epoch [3/3], Step [2157/6471], Loss: 2.5634, Perplexity: 12.9803sel length 10
Epoch [3/3], Step [2158/6471], Loss: 1.6981, Perplexity: 5.4634sel length 9
Epoch [3/3], Step [2159/6471], Loss: 2.0514, Perplexity: 7.7788sel length 12
Epoch [3/3], Step [2160/6471], Loss: 1.8258, Perplexity: 6.2076sel length 12
Epoch [3/3], Step [2161/6471], Loss: 2.0119, Perplexity: 7.4778sel length 9
Epoch [3/3], Step [2162/6471], Loss: 1.9194, Perplexity: 6.8166sel length 10
Epoch [3/3], Step [2163/6471], Loss: 1.8018, Perplexity: 6.0608sel length 9
Epoch [3/3], Step [2164/6471], Loss: 1.8699, Perplexity: 6.4878sel length 9
Epoch [3/3], Step [2165/6471], Loss: 2.0073, Perplexity: 7.4430sel length 11
Epoch [3/3], Step [2166/6471], Loss: 1.7881, Perplexity: 5.9780sel length 9
Ep

Epoch [3/3], Step [2366/6471], Loss: 1.9791, Perplexity: 7.2363sel length 9
Epoch [3/3], Step [2367/6471], Loss: 1.8156, Perplexity: 6.1448sel length 11
Epoch [3/3], Step [2368/6471], Loss: 1.8604, Perplexity: 6.4263sel length 13
Epoch [3/3], Step [2369/6471], Loss: 2.0091, Perplexity: 7.4566sel length 8
Epoch [3/3], Step [2370/6471], Loss: 2.0339, Perplexity: 7.6438sel length 11
Epoch [3/3], Step [2371/6471], Loss: 1.7649, Perplexity: 5.8412sel length 12
Epoch [3/3], Step [2372/6471], Loss: 1.9840, Perplexity: 7.2718sel length 11
Epoch [3/3], Step [2373/6471], Loss: 1.8496, Perplexity: 6.3574sel length 17
Epoch [3/3], Step [2374/6471], Loss: 2.5121, Perplexity: 12.3305sel length 8
Epoch [3/3], Step [2375/6471], Loss: 2.1592, Perplexity: 8.6646sel length 12
Epoch [3/3], Step [2376/6471], Loss: 2.0490, Perplexity: 7.7600sel length 9
Epoch [3/3], Step [2377/6471], Loss: 2.0053, Perplexity: 7.4285sel length 10
Epoch [3/3], Step [2378/6471], Loss: 2.0355, Perplexity: 7.6563sel length 12
Ep

Epoch [3/3], Step [2578/6471], Loss: 2.3107, Perplexity: 10.0812sel length 12
Epoch [3/3], Step [2579/6471], Loss: 2.0757, Perplexity: 7.9701sel length 12
Epoch [3/3], Step [2580/6471], Loss: 1.8351, Perplexity: 6.2660sel length 9
Epoch [3/3], Step [2581/6471], Loss: 1.9681, Perplexity: 7.1567sel length 10
Epoch [3/3], Step [2582/6471], Loss: 1.9092, Perplexity: 6.7476sel length 10
Epoch [3/3], Step [2583/6471], Loss: 1.9412, Perplexity: 6.9673sel length 10
Epoch [3/3], Step [2584/6471], Loss: 1.9206, Perplexity: 6.8253sel length 8
Epoch [3/3], Step [2585/6471], Loss: 1.9954, Perplexity: 7.3555sel length 11
Epoch [3/3], Step [2586/6471], Loss: 1.9912, Perplexity: 7.3243sel length 12
Epoch [3/3], Step [2587/6471], Loss: 2.1033, Perplexity: 8.1928sel length 12
Epoch [3/3], Step [2588/6471], Loss: 1.9571, Perplexity: 7.0786sel length 12
Epoch [3/3], Step [2589/6471], Loss: 1.8933, Perplexity: 6.6415sel length 11
Epoch [3/3], Step [2590/6471], Loss: 1.8584, Perplexity: 6.4133sel length 9
E

Epoch [3/3], Step [2790/6471], Loss: 2.0309, Perplexity: 7.6213sel length 12
Epoch [3/3], Step [2791/6471], Loss: 1.9324, Perplexity: 6.9064sel length 13
Epoch [3/3], Step [2792/6471], Loss: 2.1354, Perplexity: 8.4607sel length 13
Epoch [3/3], Step [2793/6471], Loss: 1.9845, Perplexity: 7.2756sel length 14
Epoch [3/3], Step [2794/6471], Loss: 2.1880, Perplexity: 8.9171sel length 11
Epoch [3/3], Step [2795/6471], Loss: 1.9190, Perplexity: 6.8140sel length 9
Epoch [3/3], Step [2796/6471], Loss: 1.9659, Perplexity: 7.1414sel length 9
Epoch [3/3], Step [2797/6471], Loss: 1.9333, Perplexity: 6.9121sel length 9
Epoch [3/3], Step [2798/6471], Loss: 1.7586, Perplexity: 5.8041sel length 7
Epoch [3/3], Step [2799/6471], Loss: 2.1013, Perplexity: 8.1766sel length 19
Epoch [3/3], Step [2800/6471], Loss: 2.6979, Perplexity: 14.8485
sel length 12
Epoch [3/3], Step [2801/6471], Loss: 2.0028, Perplexity: 7.4101sel length 11
Epoch [3/3], Step [2802/6471], Loss: 1.9825, Perplexity: 7.2607sel length 11
E

Epoch [3/3], Step [3002/6471], Loss: 1.8459, Perplexity: 6.3335sel length 11
Epoch [3/3], Step [3003/6471], Loss: 1.9314, Perplexity: 6.8988sel length 15
Epoch [3/3], Step [3004/6471], Loss: 2.2432, Perplexity: 9.4234sel length 12
Epoch [3/3], Step [3005/6471], Loss: 1.8991, Perplexity: 6.6796sel length 14
Epoch [3/3], Step [3006/6471], Loss: 2.0548, Perplexity: 7.8049sel length 18
Epoch [3/3], Step [3007/6471], Loss: 2.3789, Perplexity: 10.7935sel length 12
Epoch [3/3], Step [3008/6471], Loss: 1.8886, Perplexity: 6.6101sel length 10
Epoch [3/3], Step [3009/6471], Loss: 1.9911, Perplexity: 7.3236sel length 15
Epoch [3/3], Step [3010/6471], Loss: 2.2297, Perplexity: 9.2969sel length 17
Epoch [3/3], Step [3011/6471], Loss: 2.5619, Perplexity: 12.9602sel length 12
Epoch [3/3], Step [3012/6471], Loss: 1.9506, Perplexity: 7.0329sel length 10
Epoch [3/3], Step [3013/6471], Loss: 1.9344, Perplexity: 6.9196sel length 8
Epoch [3/3], Step [3014/6471], Loss: 2.1718, Perplexity: 8.7738sel length 1

Epoch [3/3], Step [3214/6471], Loss: 2.0475, Perplexity: 7.7482sel length 12
Epoch [3/3], Step [3215/6471], Loss: 1.9025, Perplexity: 6.7029sel length 13
Epoch [3/3], Step [3216/6471], Loss: 2.2292, Perplexity: 9.2920sel length 13
Epoch [3/3], Step [3217/6471], Loss: 1.9807, Perplexity: 7.2475sel length 8
Epoch [3/3], Step [3218/6471], Loss: 2.0957, Perplexity: 8.1310sel length 9
Epoch [3/3], Step [3219/6471], Loss: 1.9208, Perplexity: 6.8264sel length 11
Epoch [3/3], Step [3220/6471], Loss: 1.8672, Perplexity: 6.4700sel length 12
Epoch [3/3], Step [3221/6471], Loss: 1.7889, Perplexity: 5.9830sel length 11
Epoch [3/3], Step [3222/6471], Loss: 1.8652, Perplexity: 6.4574sel length 11
Epoch [3/3], Step [3223/6471], Loss: 1.5693, Perplexity: 4.8034sel length 12
Epoch [3/3], Step [3224/6471], Loss: 2.0974, Perplexity: 8.1446sel length 11
Epoch [3/3], Step [3225/6471], Loss: 1.7386, Perplexity: 5.6894sel length 12
Epoch [3/3], Step [3226/6471], Loss: 1.9896, Perplexity: 7.3125sel length 14
E

Epoch [3/3], Step [3426/6471], Loss: 1.9522, Perplexity: 7.0441sel length 16
Epoch [3/3], Step [3427/6471], Loss: 2.3272, Perplexity: 10.2495sel length 13
Epoch [3/3], Step [3428/6471], Loss: 1.9686, Perplexity: 7.1605sel length 17
Epoch [3/3], Step [3429/6471], Loss: 2.4912, Perplexity: 12.0762sel length 12
Epoch [3/3], Step [3430/6471], Loss: 1.9835, Perplexity: 7.2683sel length 14
Epoch [3/3], Step [3431/6471], Loss: 2.1580, Perplexity: 8.6539sel length 14
Epoch [3/3], Step [3432/6471], Loss: 2.3608, Perplexity: 10.5993sel length 11
Epoch [3/3], Step [3433/6471], Loss: 1.9800, Perplexity: 7.2426sel length 11
Epoch [3/3], Step [3434/6471], Loss: 1.9612, Perplexity: 7.1078sel length 11
Epoch [3/3], Step [3435/6471], Loss: 1.7426, Perplexity: 5.7120sel length 10
Epoch [3/3], Step [3436/6471], Loss: 1.9141, Perplexity: 6.7811sel length 9
Epoch [3/3], Step [3437/6471], Loss: 1.9435, Perplexity: 6.9834sel length 9
Epoch [3/3], Step [3438/6471], Loss: 2.0049, Perplexity: 7.4251sel length 1

Epoch [3/3], Step [3638/6471], Loss: 2.0876, Perplexity: 8.0657sel length 10
Epoch [3/3], Step [3639/6471], Loss: 1.8593, Perplexity: 6.4192sel length 9
Epoch [3/3], Step [3640/6471], Loss: 1.9253, Perplexity: 6.8575sel length 13
Epoch [3/3], Step [3641/6471], Loss: 2.0591, Perplexity: 7.8387sel length 10
Epoch [3/3], Step [3642/6471], Loss: 1.8139, Perplexity: 6.1342sel length 11
Epoch [3/3], Step [3643/6471], Loss: 1.9340, Perplexity: 6.9172sel length 11
Epoch [3/3], Step [3644/6471], Loss: 2.0660, Perplexity: 7.8933sel length 10
Epoch [3/3], Step [3645/6471], Loss: 1.8893, Perplexity: 6.6145sel length 10
Epoch [3/3], Step [3646/6471], Loss: 2.0214, Perplexity: 7.5489sel length 9
Epoch [3/3], Step [3647/6471], Loss: 1.7945, Perplexity: 6.0164sel length 13
Epoch [3/3], Step [3648/6471], Loss: 1.9683, Perplexity: 7.1588sel length 11
Epoch [3/3], Step [3649/6471], Loss: 1.7611, Perplexity: 5.8186sel length 14
Epoch [3/3], Step [3650/6471], Loss: 2.2145, Perplexity: 9.1567sel length 13
E

Epoch [3/3], Step [3850/6471], Loss: 1.8205, Perplexity: 6.1748sel length 10
Epoch [3/3], Step [3851/6471], Loss: 1.9156, Perplexity: 6.7907sel length 9
Epoch [3/3], Step [3852/6471], Loss: 1.8763, Perplexity: 6.5290sel length 11
Epoch [3/3], Step [3853/6471], Loss: 2.0027, Perplexity: 7.4091sel length 13
Epoch [3/3], Step [3854/6471], Loss: 1.9490, Perplexity: 7.0213sel length 12
Epoch [3/3], Step [3855/6471], Loss: 2.0192, Perplexity: 7.5325sel length 13
Epoch [3/3], Step [3856/6471], Loss: 2.0873, Perplexity: 8.0634sel length 15
Epoch [3/3], Step [3857/6471], Loss: 2.3097, Perplexity: 10.0710sel length 11
Epoch [3/3], Step [3858/6471], Loss: 2.0030, Perplexity: 7.4113sel length 9
Epoch [3/3], Step [3859/6471], Loss: 1.8986, Perplexity: 6.6768sel length 15
Epoch [3/3], Step [3860/6471], Loss: 2.2801, Perplexity: 9.7780sel length 15
Epoch [3/3], Step [3861/6471], Loss: 2.1584, Perplexity: 8.6575sel length 15
Epoch [3/3], Step [3862/6471], Loss: 2.3455, Perplexity: 10.4383sel length 10

Epoch [3/3], Step [4062/6471], Loss: 1.7277, Perplexity: 5.6278sel length 14
Epoch [3/3], Step [4063/6471], Loss: 2.0609, Perplexity: 7.8534sel length 10
Epoch [3/3], Step [4064/6471], Loss: 1.8029, Perplexity: 6.0673sel length 10
Epoch [3/3], Step [4065/6471], Loss: 1.8240, Perplexity: 6.1967sel length 15
Epoch [3/3], Step [4066/6471], Loss: 2.3512, Perplexity: 10.4985sel length 9
Epoch [3/3], Step [4067/6471], Loss: 1.9649, Perplexity: 7.1340sel length 11
Epoch [3/3], Step [4068/6471], Loss: 1.8998, Perplexity: 6.6844sel length 11
Epoch [3/3], Step [4069/6471], Loss: 1.7758, Perplexity: 5.9052sel length 9
Epoch [3/3], Step [4070/6471], Loss: 1.9797, Perplexity: 7.2405sel length 18
Epoch [3/3], Step [4071/6471], Loss: 2.6534, Perplexity: 14.2022sel length 16
Epoch [3/3], Step [4072/6471], Loss: 2.3967, Perplexity: 10.9867sel length 9
Epoch [3/3], Step [4073/6471], Loss: 1.8293, Perplexity: 6.2295sel length 11
Epoch [3/3], Step [4074/6471], Loss: 1.9672, Perplexity: 7.1508sel length 11

Epoch [3/3], Step [4274/6471], Loss: 2.4399, Perplexity: 11.4714sel length 11
Epoch [3/3], Step [4275/6471], Loss: 1.8091, Perplexity: 6.1050sel length 9
Epoch [3/3], Step [4276/6471], Loss: 2.0046, Perplexity: 7.4232sel length 9
Epoch [3/3], Step [4277/6471], Loss: 1.9607, Perplexity: 7.1045sel length 8
Epoch [3/3], Step [4278/6471], Loss: 2.1949, Perplexity: 8.9792sel length 10
Epoch [3/3], Step [4279/6471], Loss: 1.7625, Perplexity: 5.8268sel length 9
Epoch [3/3], Step [4280/6471], Loss: 2.0009, Perplexity: 7.3958sel length 9
Epoch [3/3], Step [4281/6471], Loss: 1.8957, Perplexity: 6.6571sel length 10
Epoch [3/3], Step [4282/6471], Loss: 1.8468, Perplexity: 6.3393sel length 10
Epoch [3/3], Step [4283/6471], Loss: 1.8248, Perplexity: 6.2014sel length 11
Epoch [3/3], Step [4284/6471], Loss: 1.8211, Perplexity: 6.1787sel length 9
Epoch [3/3], Step [4285/6471], Loss: 1.8755, Perplexity: 6.5242sel length 11
Epoch [3/3], Step [4286/6471], Loss: 1.9692, Perplexity: 7.1651sel length 9
Epoch

Epoch [3/3], Step [4486/6471], Loss: 1.8571, Perplexity: 6.4051sel length 10
Epoch [3/3], Step [4487/6471], Loss: 1.6931, Perplexity: 5.4365sel length 9
Epoch [3/3], Step [4488/6471], Loss: 2.0089, Perplexity: 7.4555sel length 10
Epoch [3/3], Step [4489/6471], Loss: 2.0840, Perplexity: 8.0365sel length 11
Epoch [3/3], Step [4490/6471], Loss: 1.9647, Perplexity: 7.1326sel length 9
Epoch [3/3], Step [4491/6471], Loss: 1.9660, Perplexity: 7.1422sel length 10
Epoch [3/3], Step [4492/6471], Loss: 1.8222, Perplexity: 6.1857sel length 11
Epoch [3/3], Step [4493/6471], Loss: 1.9196, Perplexity: 6.8183sel length 9
Epoch [3/3], Step [4494/6471], Loss: 2.0991, Perplexity: 8.1587sel length 11
Epoch [3/3], Step [4495/6471], Loss: 1.8276, Perplexity: 6.2188sel length 11
Epoch [3/3], Step [4496/6471], Loss: 1.8070, Perplexity: 6.0922sel length 12
Epoch [3/3], Step [4497/6471], Loss: 1.9450, Perplexity: 6.9934sel length 9
Epoch [3/3], Step [4498/6471], Loss: 2.0642, Perplexity: 7.8794sel length 8
Epoc

Epoch [3/3], Step [4698/6471], Loss: 1.7785, Perplexity: 5.9207sel length 8
Epoch [3/3], Step [4699/6471], Loss: 2.1419, Perplexity: 8.5153sel length 12
Epoch [3/3], Step [4700/6471], Loss: 1.8130, Perplexity: 6.1291
sel length 12
Epoch [3/3], Step [4701/6471], Loss: 1.9020, Perplexity: 6.6991sel length 9
Epoch [3/3], Step [4702/6471], Loss: 2.0096, Perplexity: 7.4604sel length 13
Epoch [3/3], Step [4703/6471], Loss: 2.0149, Perplexity: 7.4998sel length 12
Epoch [3/3], Step [4704/6471], Loss: 2.1092, Perplexity: 8.2419sel length 10
Epoch [3/3], Step [4705/6471], Loss: 1.7593, Perplexity: 5.8083sel length 10
Epoch [3/3], Step [4706/6471], Loss: 2.0706, Perplexity: 7.9297sel length 15
Epoch [3/3], Step [4707/6471], Loss: 2.2348, Perplexity: 9.3449sel length 15
Epoch [3/3], Step [4708/6471], Loss: 2.2906, Perplexity: 9.8806sel length 11
Epoch [3/3], Step [4709/6471], Loss: 1.8613, Perplexity: 6.4320sel length 13
Epoch [3/3], Step [4710/6471], Loss: 1.9688, Perplexity: 7.1622sel length 10


Epoch [3/3], Step [4910/6471], Loss: 2.2848, Perplexity: 9.8238sel length 11
Epoch [3/3], Step [4911/6471], Loss: 1.8144, Perplexity: 6.1374sel length 11
Epoch [3/3], Step [4912/6471], Loss: 1.9206, Perplexity: 6.8251sel length 13
Epoch [3/3], Step [4913/6471], Loss: 2.0385, Perplexity: 7.6794sel length 10
Epoch [3/3], Step [4914/6471], Loss: 1.7829, Perplexity: 5.9472sel length 13
Epoch [3/3], Step [4915/6471], Loss: 2.0587, Perplexity: 7.8360sel length 16
Epoch [3/3], Step [4916/6471], Loss: 2.3547, Perplexity: 10.5351sel length 10
Epoch [3/3], Step [4917/6471], Loss: 1.8287, Perplexity: 6.2255sel length 11
Epoch [3/3], Step [4918/6471], Loss: 1.7872, Perplexity: 5.9728sel length 9
Epoch [3/3], Step [4919/6471], Loss: 1.9862, Perplexity: 7.2880sel length 9
Epoch [3/3], Step [4920/6471], Loss: 2.0106, Perplexity: 7.4679sel length 9
Epoch [3/3], Step [4921/6471], Loss: 1.9803, Perplexity: 7.2447sel length 10
Epoch [3/3], Step [4922/6471], Loss: 1.7417, Perplexity: 5.7069sel length 16
E

Epoch [3/3], Step [5122/6471], Loss: 1.9222, Perplexity: 6.8360sel length 11
Epoch [3/3], Step [5123/6471], Loss: 1.8925, Perplexity: 6.6356sel length 14
Epoch [3/3], Step [5124/6471], Loss: 2.0216, Perplexity: 7.5506sel length 12
Epoch [3/3], Step [5125/6471], Loss: 1.9007, Perplexity: 6.6904sel length 11
Epoch [3/3], Step [5126/6471], Loss: 1.8193, Perplexity: 6.1674sel length 10
Epoch [3/3], Step [5127/6471], Loss: 1.8977, Perplexity: 6.6702sel length 10
Epoch [3/3], Step [5128/6471], Loss: 1.8348, Perplexity: 6.2636sel length 14
Epoch [3/3], Step [5129/6471], Loss: 2.1340, Perplexity: 8.4483sel length 11
Epoch [3/3], Step [5130/6471], Loss: 1.8867, Perplexity: 6.5974sel length 11
Epoch [3/3], Step [5131/6471], Loss: 1.9407, Perplexity: 6.9633sel length 11
Epoch [3/3], Step [5132/6471], Loss: 1.8783, Perplexity: 6.5421sel length 8
Epoch [3/3], Step [5133/6471], Loss: 2.0940, Perplexity: 8.1171sel length 9
Epoch [3/3], Step [5134/6471], Loss: 1.8115, Perplexity: 6.1194sel length 9
Ep

<a id='step3'></a>
## Step 3: (Optional) Validate your Model

To assess potential overfitting, one approach is to assess performance on a validation set.  If you decide to do this **optional** task, you are required to first complete all of the steps in the next notebook in the sequence (**3_Inference.ipynb**); as part of that notebook, you will write and test code (specifically, the `sample` method in the `DecoderRNN` class) that uses your RNN decoder to generate captions.  That code will prove incredibly useful here. 

If you decide to validate your model, please do not edit the data loader in **data_loader.py**.  Instead, create a new file named **data_loader_val.py** containing the code for obtaining the data loader for the validation data.  You can access:
- the validation images at filepath `'/opt/cocoapi/images/train2014/'`, and
- the validation image caption annotation file at filepath `'/opt/cocoapi/annotations/captions_val2014.json'`.

The suggested approach to validating your model involves creating a json file such as [this one](https://github.com/cocodataset/cocoapi/blob/master/results/captions_val2014_fakecap_results.json) containing your model's predicted captions for the validation images.  Then, you can write your own script or use one that you [find online](https://github.com/tylin/coco-caption) to calculate the BLEU score of your model.  You can read more about the BLEU score, along with other evaluation metrics (such as TEOR and Cider) in section 4.1 of [this paper](https://arxiv.org/pdf/1411.4555.pdf).  For more information about how to use the annotation file, check out the [website](http://cocodataset.org/#download) for the COCO dataset.

In [None]:
# (Optional) TODO: Validate your model.