# Computer Vision Nanodegree

## Project: Image Captioning

---

In this notebook, you will train your CNN-RNN model.  

You are welcome and encouraged to try out many different architectures and hyperparameters when searching for a good model.

This does have the potential to make the project quite messy!  Before submitting your project, make sure that you clean up:
- the code you write in this notebook.  The notebook should describe how to train a single CNN-RNN architecture, corresponding to your final choice of hyperparameters.  You should structure the notebook so that the reviewer can replicate your results by running the code in this notebook.  
- the output of the code cell in **Step 2**.  The output should show the output obtained when training the model from scratch.

This notebook **will be graded**.  

Feel free to use the links below to navigate the notebook:
- [Step 1](#step1): Training Setup
- [Step 2](#step2): Train your Model
- [Step 3](#step3): (Optional) Validate your Model

<a id='step1'></a>
## Step 1: Training Setup

In this step of the notebook, you will customize the training of your CNN-RNN model by specifying hyperparameters and setting other options that are important to the training procedure.  The values you set now will be used when training your model in **Step 2** below.

You should only amend blocks of code that are preceded by a `TODO` statement.  **Any code blocks that are not preceded by a `TODO` statement should not be modified**.

### Task #1

Begin by setting the following variables:
- `batch_size` - the batch size of each training batch.  It is the number of image-caption pairs used to amend the model weights in each training step. 
- `vocab_threshold` - the minimum word count threshold.  Note that a larger threshold will result in a smaller vocabulary, whereas a smaller threshold will include rarer words and result in a larger vocabulary.  
- `vocab_from_file` - a Boolean that decides whether to load the vocabulary from file. 
- `embed_size` - the dimensionality of the image and word embeddings.  
- `hidden_size` - the number of features in the hidden state of the RNN decoder.  
- `num_epochs` - the number of epochs to train the model.  We recommend that you set `num_epochs=3`, but feel free to increase or decrease this number as you wish.  [This paper](https://arxiv.org/pdf/1502.03044.pdf) trained a captioning model on a single state-of-the-art GPU for 3 days, but you'll soon see that you can get reasonable results in a matter of a few hours!  (_But of course, if you want your model to compete with current research, you will have to train for much longer._)
- `save_every` - determines how often to save the model weights.  We recommend that you set `save_every=1`, to save the model weights after each epoch.  This way, after the `i`th epoch, the encoder and decoder weights will be saved in the `models/` folder as `encoder-i.pkl` and `decoder-i.pkl`, respectively.
- `print_every` - determines how often to print the batch loss to the Jupyter notebook while training.  Note that you **will not** observe a monotonic decrease in the loss function while training - this is perfectly fine and completely expected!  You are encouraged to keep this at its default value of `100` to avoid clogging the notebook, but feel free to change it.
- `log_file` - the name of the text file containing - for every step - how the loss and perplexity evolved during training.

If you're not sure where to begin to set some of the values above, you can peruse [this paper](https://arxiv.org/pdf/1502.03044.pdf) and [this paper](https://arxiv.org/pdf/1411.4555.pdf) for useful guidance!  **To avoid spending too long on this notebook**, you are encouraged to consult these suggested research papers to obtain a strong initial guess for which hyperparameters are likely to work best.  Then, train a single model, and proceed to the next notebook (**3_Inference.ipynb**).  If you are unhappy with your performance, you can return to this notebook to tweak the hyperparameters (and/or the architecture in **model.py**) and re-train your model.

### Question 1

**Question:** Describe your CNN-RNN architecture in detail.  With this architecture in mind, how did you select the values of the variables in Task 1?  If you consulted a research paper detailing a successful implementation of an image captioning model, please provide the reference.

**Answer:** 


### (Optional) Task #2

Note that we have provided a recommended image transform `transform_train` for pre-processing the training images, but you are welcome (and encouraged!) to modify it as you wish.  When modifying this transform, keep in mind that:
- the images in the dataset have varying heights and widths, and 
- if using a pre-trained model, you must perform the corresponding appropriate normalization.

### Question 2

**Question:** How did you select the transform in `transform_train`?  If you left the transform at its provided value, why do you think that it is a good choice for your CNN architecture?

**Answer:** 

### Task #3

Next, you will specify a Python list containing the learnable parameters of the model.  For instance, if you decide to make all weights in the decoder trainable, but only want to train the weights in the embedding layer of the encoder, then you should set `params` to something like:
```
params = list(decoder.parameters()) + list(encoder.embed.parameters()) 
```

### Question 3

**Question:** How did you select the trainable parameters of your architecture?  Why do you think this is a good choice?

**Answer:** 

### Task #4

Finally, you will select an [optimizer](http://pytorch.org/docs/master/optim.html#torch.optim.Optimizer).

### Question 4

**Question:** How did you select the optimizer used to train your model?

**Answer:** 

# Answer 1:
CNN architecure is the resnet50 model as provided in the EncoderRNN. The output of the resnet50 is connected to a fully connected layer whose output dimension is the same as the embedding size of the captions.

RNN is a LSTM whose parameters are chosen after running some simulations on my local machine.
See Answer 3 below on how some of the training parameters were chosen

# Answer 2:
The transform in transform_train is choosen to be the same as provided in code. The input size of 224 X 224 is a standard choice in extracting features from images. Random crop also helps in generating non-perfect images thus making the model more robust.

# Answer 3:

The list of trainable parameters consists of the weights and bias of the final layer in the encoder and all the parameters of the decoder. 


In the encoder, the parameters for the resnet50 architecture are not trained, since they have pretrained to produce feature sets. The final fully connected layer that has has added in this project need to be optimized for this problem and hance considered trainable.


All parameters in the decoder are treated as trainable since these are not optimized for this problem and hance need to trained.  Hence the trainable parameters include all the parameters from and the parameters involved in converting the hidden to output tag.


# Answer 4:
Adam optimizer was chosen. This was based on the discussion in Slack and the papers above. I would like to experiment with other flavors of Optimizer, but is not done yet due to lack of time

In [4]:
import torch
import torch.nn as nn
from torchvision import transforms
import sys
sys.path.append('/opt/cocoapi/PythonAPI')
from pycocotools.coco import COCO
from data_loader import get_loader
from model import EncoderCNN, DecoderRNN
import math


## TODO #1: Select appropriate values for the Python variables below.
batch_size = 64            # batch size
vocab_threshold = 2        # minimum word count threshold
vocab_from_file = True     # if True, load existing vocab file
embed_size = 512           # dimensionality of image and word embeddings
hidden_size = 512          # number of features in hidden state of the RNN decoder
num_epochs = 3             # number of training epochs
save_every = 1             # determines frequency of saving model weights
print_every = 100          # determines window for printing average loss
log_file = 'training_log.txt'       # name of file with saved training loss and perplexity

# (Optional) TODO #2: Amend the image transform below.
transform_train = transforms.Compose([ 
    transforms.Resize(256),                          # smaller edge of image resized to 256
    transforms.RandomCrop(224),                      # get 224x224 crop from random location
    transforms.RandomHorizontalFlip(),               # horizontally flip image with probability=0.5
    transforms.ToTensor(),                           # convert the PIL Image to a tensor
    transforms.Normalize((0.485, 0.456, 0.406),      # normalize image for pre-trained model
                         (0.229, 0.224, 0.225))])

# Build data loader.
data_loader = get_loader(transform=transform_train,
                         mode='train',
                         batch_size=batch_size,
                         vocab_threshold=vocab_threshold,
                         vocab_from_file=vocab_from_file)

# The size of the vocabulary.
vocab_size = len(data_loader.dataset.vocab)

# Initialize the encoder and decoder. 
encoder = EncoderCNN(embed_size)
decoder = DecoderRNN(embed_size, hidden_size, vocab_size)

# Move models to GPU if CUDA is available. 
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
encoder.to(device)
decoder.to(device)

# Define the loss function. 
criterion = nn.CrossEntropyLoss().cuda() if torch.cuda.is_available() else nn.CrossEntropyLoss()

# TODO #3: Specify the learnable parameters of the model.
params = list(encoder.embed.parameters()) + list(decoder.lstm.parameters()) + list(decoder.hidden2Tag.parameters())

# TODO #4: Define the optimizer.
optimizer = torch.optim.Adam(params, lr = 0.001)

# Set the total number of training steps per epoch.
total_step = math.ceil(len(data_loader.dataset.caption_lengths) / data_loader.batch_sampler.batch_size)

Vocabulary successfully loaded from vocab.pkl file!
loading annotations into memory...


  0%|          | 1091/414113 [00:00<01:14, 5518.86it/s]

Done (t=0.81s)
creating index...
index created!
Obtaining caption lengths...


100%|██████████| 414113/414113 [01:04<00:00, 6467.22it/s]


<a id='step2'></a>
## Step 2: Train your Model

Once you have executed the code cell in **Step 1**, the training procedure below should run without issue.  

It is completely fine to leave the code cell below as-is without modifications to train your model.  However, if you would like to modify the code used to train the model below, you must ensure that your changes are easily parsed by your reviewer.  In other words, make sure to provide appropriate comments to describe how your code works!  

You may find it useful to load saved weights to resume training.  In that case, note the names of the files containing the encoder and decoder weights that you'd like to load (`encoder_file` and `decoder_file`).  Then you can load the weights by using the lines below:

```python
# Load pre-trained weights before resuming training.
encoder.load_state_dict(torch.load(os.path.join('./models', encoder_file)))
decoder.load_state_dict(torch.load(os.path.join('./models', decoder_file)))
```

While trying out parameters, make sure to take extensive notes and record the settings that you used in your various training runs.  In particular, you don't want to encounter a situation where you've trained a model for several hours but can't remember what settings you used :).

### A Note on Tuning Hyperparameters

To figure out how well your model is doing, you can look at how the training loss and perplexity evolve during training - and for the purposes of this project, you are encouraged to amend the hyperparameters based on this information.  

However, this will not tell you if your model is overfitting to the training data, and, unfortunately, overfitting is a problem that is commonly encountered when training image captioning models.  

For this project, you need not worry about overfitting. **This project does not have strict requirements regarding the performance of your model**, and you just need to demonstrate that your model has learned **_something_** when you generate captions on the test data.  For now, we strongly encourage you to train your model for the suggested 3 epochs without worrying about performance; then, you should immediately transition to the next notebook in the sequence (**3_Inference.ipynb**) to see how your model performs on the test data.  If your model needs to be changed, you can come back to this notebook, amend hyperparameters (if necessary), and re-train the model.

That said, if you would like to go above and beyond in this project, you can read about some approaches to minimizing overfitting in section 4.3.1 of [this paper](http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=7505636).  In the next (optional) step of this notebook, we provide some guidance for assessing the performance on the validation dataset.

In [5]:
import torch.utils.data as data
import numpy as np
import os
import requests
import time

# Open the training log file.
f = open(log_file, 'w')

old_time = time.time()
response = requests.request("GET", 
                            "http://metadata.google.internal/computeMetadata/v1/instance/attributes/keep_alive_token", 
                            headers={"Metadata-Flavor":"Google"})

for epoch in range(1, num_epochs+1):
    
    for i_step in range(1, total_step+1):
        
        if time.time() - old_time > 60:
            old_time = time.time()
            requests.request("POST", 
                             "https://nebula.udacity.com/api/v1/remote/keep-alive", 
                             headers={'Authorization': "STAR " + response.text})
        
        # Randomly sample a caption length, and sample indices with that length.
        indices = data_loader.dataset.get_train_indices()
        # Create and assign a batch sampler to retrieve a batch with the sampled indices.
        new_sampler = data.sampler.SubsetRandomSampler(indices=indices)
        data_loader.batch_sampler.sampler = new_sampler
        
        # Obtain the batch.
        images, captions = next(iter(data_loader))

        # Move batch of images and captions to GPU if CUDA is available.
        images = images.to(device)
        captions = captions.to(device)
        
        # Zero the gradients.
        decoder.zero_grad()
        encoder.zero_grad()
        
        # Pass the inputs through the CNN-RNN model.
        features = encoder(images)
        outputs = decoder(features, captions)
        
        # Calculate the batch loss.
        loss = criterion(outputs.view(-1, vocab_size), captions.view(-1))
        
        # Backward pass.
        loss.backward()
        
        # Update the parameters in the optimizer.
        optimizer.step()
            
        # Get training statistics.
        stats = 'Epoch [%d/%d], Step [%d/%d], Loss: %.4f, Perplexity: %5.4f' % (epoch, num_epochs, i_step, total_step, loss.item(), np.exp(loss.item()))
        
        # Print training statistics (on same line).
        print('\r' + stats, end="")
        sys.stdout.flush()
        
        # Print training statistics to file.
        f.write(stats + '\n')
        f.flush()
        
        # Print training statistics (on different line).
        if i_step % print_every == 0:
            print('\r' + stats)
            
    # Save the weights.
    if epoch % save_every == 0:
        torch.save(decoder.state_dict(), os.path.join('./models', 'decoder-%d.pkl' % epoch))
        torch.save(encoder.state_dict(), os.path.join('./models', 'encoder-%d.pkl' % epoch))

# Close the training log file.
f.close()

 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [1/6471], Loss: 9.2224, Perplexity: 10121.2366 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [2/6471], Loss: 9.1285, Perplexity: 9214.4186 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [3/6471], Loss: 9.0528, Perplexity: 8542.4166 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [4/6471], Loss: 8.9684, Perplexity: 7851.1872 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features sha

Epoch [1/3], Step [37/6471], Loss: 6.9101, Perplexity: 1002.3384 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [38/6471], Loss: 6.8298, Perplexity: 924.9865 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 15, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [39/6471], Loss: 6.8594, Perplexity: 952.8018 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 15, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [40/6471], Loss: 6.6754, Perplexity: 792.6662 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [41/6471], Loss: 6.5484, Perplexity: 698.1236 types **##    torch.cuda.FloatTenso

Epoch [1/3], Step [74/6471], Loss: 4.9051, Perplexity: 134.9790 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [75/6471], Loss: 5.0022, Perplexity: 148.7448 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [76/6471], Loss: 5.0771, Perplexity: 160.3120 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [77/6471], Loss: 4.9186, Perplexity: 136.8094 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 16, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [78/6471], Loss: 4.8791, Perplexity: 131.5167 types **##    torch.cuda.FloatTensor

Epoch [1/3], Step [111/6471], Loss: 4.4359, Perplexity: 84.4319 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [112/6471], Loss: 4.5656, Perplexity: 96.1210 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [113/6471], Loss: 4.6223, Perplexity: 101.7299 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [114/6471], Loss: 4.6494, Perplexity: 104.5205 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [115/6471], Loss: 4.4107, Perplexity: 82.3241 types **##    torch.cuda.FloatTens

Epoch [1/3], Step [148/6471], Loss: 4.3657, Perplexity: 78.7049 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [149/6471], Loss: 4.3943, Perplexity: 80.9847 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [150/6471], Loss: 4.4288, Perplexity: 83.8339 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [151/6471], Loss: 4.5018, Perplexity: 90.1799 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 18, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [152/6471], Loss: 4.4069, Perplexity: 82.0156 types **##    torch.cuda.FloatTensor

Epoch [1/3], Step [185/6471], Loss: 4.1327, Perplexity: 62.3479 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [186/6471], Loss: 4.2052, Perplexity: 67.0317 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [187/6471], Loss: 4.1746, Perplexity: 65.0123 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [188/6471], Loss: 4.3014, Perplexity: 73.8006 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [189/6471], Loss: 4.0747, Perplexity: 58.8316 types **##    torch.cuda.FloatTensor

Epoch [1/3], Step [222/6471], Loss: 4.2773, Perplexity: 72.0462 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [223/6471], Loss: 3.8991, Perplexity: 49.3571 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [224/6471], Loss: 4.1570, Perplexity: 63.8780 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 15, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [225/6471], Loss: 4.3187, Perplexity: 75.0947 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 16, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [226/6471], Loss: 4.2716, Perplexity: 71.6362 types **##    torch.cuda.FloatTensor

Epoch [1/3], Step [259/6471], Loss: 4.0358, Perplexity: 56.5869 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [260/6471], Loss: 4.0809, Perplexity: 59.2002 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 24, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [261/6471], Loss: 4.8325, Perplexity: 125.5234 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 10, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [262/6471], Loss: 4.4199, Perplexity: 83.0907 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 15, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [263/6471], Loss: 3.9841, Perplexity: 53.7376 types **##    torch.cuda.FloatTenso

Epoch [1/3], Step [296/6471], Loss: 3.9328, Perplexity: 51.0515 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [297/6471], Loss: 4.1496, Perplexity: 63.4098 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 15, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [298/6471], Loss: 4.1127, Perplexity: 61.1117 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [299/6471], Loss: 4.0468, Perplexity: 57.2115 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 15, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [300/6471], Loss: 4.0850, Perplexity: 59.4432
 types **##    torch.cuda.FloatTenso

Epoch [1/3], Step [333/6471], Loss: 4.0004, Perplexity: 54.6214 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [334/6471], Loss: 4.0057, Perplexity: 54.9086 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [335/6471], Loss: 3.9647, Perplexity: 52.7059 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [336/6471], Loss: 4.0899, Perplexity: 59.7356 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [337/6471], Loss: 4.0172, Perplexity: 55.5430 types **##    torch.cuda.FloatTensor

Epoch [1/3], Step [370/6471], Loss: 3.8672, Perplexity: 47.8083 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [371/6471], Loss: 3.7613, Perplexity: 43.0027 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 10, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [372/6471], Loss: 4.2118, Perplexity: 67.4769 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [373/6471], Loss: 3.7392, Perplexity: 42.0645 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [374/6471], Loss: 3.7030, Perplexity: 40.5707 types **##    torch.cuda.FloatTensor

Epoch [1/3], Step [407/6471], Loss: 3.7131, Perplexity: 40.9800 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 16, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [408/6471], Loss: 3.9082, Perplexity: 49.8080 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 21, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [409/6471], Loss: 4.3021, Perplexity: 73.8565 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [410/6471], Loss: 3.6598, Perplexity: 38.8533 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [411/6471], Loss: 3.7598, Perplexity: 42.9405 types **##    torch.cuda.FloatTensor

Epoch [1/3], Step [444/6471], Loss: 3.7885, Perplexity: 44.1890 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [445/6471], Loss: 3.4289, Perplexity: 30.8433 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 10, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [446/6471], Loss: 4.2163, Perplexity: 67.7792 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [447/6471], Loss: 3.5716, Perplexity: 35.5745 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 15, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [448/6471], Loss: 3.7207, Perplexity: 41.2916 types **##    torch.cuda.FloatTensor

Epoch [1/3], Step [481/6471], Loss: 4.4581, Perplexity: 86.3199 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 10, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [482/6471], Loss: 4.0492, Perplexity: 57.3529 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [483/6471], Loss: 3.7584, Perplexity: 42.8813 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 15, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [484/6471], Loss: 3.7542, Perplexity: 42.6991 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [485/6471], Loss: 3.7973, Perplexity: 44.5811 types **##    torch.cuda.FloatTensor

Epoch [1/3], Step [518/6471], Loss: 3.6154, Perplexity: 37.1659 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [519/6471], Loss: 3.4596, Perplexity: 31.8057 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [520/6471], Loss: 3.6188, Perplexity: 37.2929 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 16, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [521/6471], Loss: 3.9199, Perplexity: 50.3961 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [522/6471], Loss: 3.4959, Perplexity: 32.9788 types **##    torch.cuda.FloatTensor

Epoch [1/3], Step [555/6471], Loss: 3.7265, Perplexity: 41.5333 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 17, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [556/6471], Loss: 3.7248, Perplexity: 41.4640 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [557/6471], Loss: 3.7228, Perplexity: 41.3803 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 15, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [558/6471], Loss: 3.5708, Perplexity: 35.5454 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 10, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [559/6471], Loss: 4.0265, Perplexity: 56.0651 types **##    torch.cuda.FloatTensor

Epoch [1/3], Step [592/6471], Loss: 3.8374, Perplexity: 46.4033 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 16, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [593/6471], Loss: 3.7497, Perplexity: 42.5085 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [594/6471], Loss: 3.7134, Perplexity: 40.9942 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [595/6471], Loss: 3.7451, Perplexity: 42.3150 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [596/6471], Loss: 3.5880, Perplexity: 36.1606 types **##    torch.cuda.FloatTensor

Epoch [1/3], Step [629/6471], Loss: 3.9886, Perplexity: 53.9787 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [630/6471], Loss: 3.6920, Perplexity: 40.1235 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [631/6471], Loss: 3.6787, Perplexity: 39.5942 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 9, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [632/6471], Loss: 3.8951, Perplexity: 49.1609 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 16, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [633/6471], Loss: 3.6607, Perplexity: 38.8888 types **##    torch.cuda.FloatTensor 

Epoch [1/3], Step [666/6471], Loss: 3.5003, Perplexity: 33.1265 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [667/6471], Loss: 3.6320, Perplexity: 37.7871 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [668/6471], Loss: 3.5167, Perplexity: 33.6732 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 18, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [669/6471], Loss: 4.0028, Perplexity: 54.7512 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 15, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [670/6471], Loss: 3.6922, Perplexity: 40.1315 types **##    torch.cuda.FloatTensor

Epoch [1/3], Step [703/6471], Loss: 3.8614, Perplexity: 47.5326 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 18, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [704/6471], Loss: 3.9010, Perplexity: 49.4528 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [705/6471], Loss: 3.5282, Perplexity: 34.0642 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [706/6471], Loss: 3.3936, Perplexity: 29.7744 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 15, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [707/6471], Loss: 3.4842, Perplexity: 32.5967 types **##    torch.cuda.FloatTensor

Epoch [1/3], Step [740/6471], Loss: 3.5104, Perplexity: 33.4633 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 15, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [741/6471], Loss: 3.6716, Perplexity: 39.3132 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 15, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [742/6471], Loss: 3.6012, Perplexity: 36.6408 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [743/6471], Loss: 3.4001, Perplexity: 29.9683 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 16, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [744/6471], Loss: 3.6007, Perplexity: 36.6232 types **##    torch.cuda.FloatTensor

Epoch [1/3], Step [777/6471], Loss: 3.7893, Perplexity: 44.2247 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [778/6471], Loss: 3.5738, Perplexity: 35.6513 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 15, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [779/6471], Loss: 3.6321, Perplexity: 37.7937 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 10, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [780/6471], Loss: 3.8160, Perplexity: 45.4211 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 15, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [781/6471], Loss: 3.4667, Perplexity: 32.0304 types **##    torch.cuda.FloatTensor

Epoch [1/3], Step [814/6471], Loss: 3.4575, Perplexity: 31.7371 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [815/6471], Loss: 3.4570, Perplexity: 31.7210 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 20, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [816/6471], Loss: 4.1861, Perplexity: 65.7643 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [817/6471], Loss: 3.1090, Perplexity: 22.3979 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [818/6471], Loss: 3.4333, Perplexity: 30.9773 types **##    torch.cuda.FloatTensor

Epoch [1/3], Step [851/6471], Loss: 3.7429, Perplexity: 42.2223 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [852/6471], Loss: 3.4157, Perplexity: 30.4371 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [853/6471], Loss: 3.5475, Perplexity: 34.7249 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 10, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [854/6471], Loss: 3.8084, Perplexity: 45.0771 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 15, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [855/6471], Loss: 3.4862, Perplexity: 32.6626 types **##    torch.cuda.FloatTensor

Epoch [1/3], Step [888/6471], Loss: 3.3612, Perplexity: 28.8233 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 17, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [889/6471], Loss: 3.7292, Perplexity: 41.6472 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [890/6471], Loss: 3.4138, Perplexity: 30.3796 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [891/6471], Loss: 3.3313, Perplexity: 27.9734 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 20, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [892/6471], Loss: 4.0448, Perplexity: 57.1020 types **##    torch.cuda.FloatTensor

Epoch [1/3], Step [925/6471], Loss: 4.0307, Perplexity: 56.3001 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 19, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [926/6471], Loss: 3.7605, Perplexity: 42.9679 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [927/6471], Loss: 3.4425, Perplexity: 31.2655 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [928/6471], Loss: 3.5537, Perplexity: 34.9427 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [929/6471], Loss: 3.2380, Perplexity: 25.4815 types **##    torch.cuda.FloatTensor

Epoch [1/3], Step [962/6471], Loss: 3.6216, Perplexity: 37.3975 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 15, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [963/6471], Loss: 3.6107, Perplexity: 36.9923 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [964/6471], Loss: 3.4048, Perplexity: 30.1075 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [965/6471], Loss: 3.2892, Perplexity: 26.8222 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 15, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [966/6471], Loss: 3.4559, Perplexity: 31.6881 types **##    torch.cuda.FloatTensor

Epoch [1/3], Step [999/6471], Loss: 3.5648, Perplexity: 35.3314 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [1000/6471], Loss: 3.4741, Perplexity: 32.2675
 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [1001/6471], Loss: 3.2974, Perplexity: 27.0422 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 15, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [1002/6471], Loss: 3.5316, Perplexity: 34.1800 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [1003/6471], Loss: 3.5807, Perplexity: 35.8969 types **##    torch.cuda.FloatT

Epoch [1/3], Step [1071/6471], Loss: 3.2597, Perplexity: 26.0423 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 17, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [1072/6471], Loss: 3.7401, Perplexity: 42.1018 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [1073/6471], Loss: 3.3650, Perplexity: 28.9341 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [1074/6471], Loss: 3.3966, Perplexity: 29.8618 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 15, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [1075/6471], Loss: 3.4105, Perplexity: 30.2817 types **##    torch.cuda.FloatT

Epoch [1/3], Step [1143/6471], Loss: 3.3683, Perplexity: 29.0279 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 16, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [1144/6471], Loss: 3.6333, Perplexity: 37.8360 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 15, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [1145/6471], Loss: 3.3810, Perplexity: 29.3992 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [1146/6471], Loss: 3.4102, Perplexity: 30.2711 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 19, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [1147/6471], Loss: 3.9179, Perplexity: 50.2955 types **##    torch.cuda.FloatT

Epoch [1/3], Step [1215/6471], Loss: 3.2261, Perplexity: 25.1820 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [1216/6471], Loss: 3.2341, Perplexity: 25.3828 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [1217/6471], Loss: 3.0970, Perplexity: 22.1309 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [1218/6471], Loss: 3.4803, Perplexity: 32.4701 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 19, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [1219/6471], Loss: 3.9526, Perplexity: 52.0718 types **##    torch.cuda.FloatT

Epoch [1/3], Step [1287/6471], Loss: 3.1829, Perplexity: 24.1161 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [1288/6471], Loss: 3.2696, Perplexity: 26.3004 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [1289/6471], Loss: 3.5042, Perplexity: 33.2537 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [1290/6471], Loss: 3.2455, Perplexity: 25.6734 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [1291/6471], Loss: 3.3230, Perplexity: 27.7442 types **##    torch.cuda.FloatT

Epoch [1/3], Step [1359/6471], Loss: 3.4150, Perplexity: 30.4158 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [1360/6471], Loss: 3.4216, Perplexity: 30.6181 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [1361/6471], Loss: 3.3491, Perplexity: 28.4763 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [1362/6471], Loss: 3.3855, Perplexity: 29.5317 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 19, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [1363/6471], Loss: 3.5852, Perplexity: 36.0615 types **##    torch.cuda.FloatT

Epoch [1/3], Step [1431/6471], Loss: 3.3783, Perplexity: 29.3196 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [1432/6471], Loss: 3.2069, Perplexity: 24.7033 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 15, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [1433/6471], Loss: 3.1140, Perplexity: 22.5109 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [1434/6471], Loss: 3.4056, Perplexity: 30.1332 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [1435/6471], Loss: 3.2126, Perplexity: 24.8429 types **##    torch.cuda.FloatT

Epoch [1/3], Step [1503/6471], Loss: 3.5292, Perplexity: 34.0967 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 17, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [1504/6471], Loss: 3.5643, Perplexity: 35.3136 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [1505/6471], Loss: 3.2758, Perplexity: 26.4645 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [1506/6471], Loss: 3.1676, Perplexity: 23.7514 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 15, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [1507/6471], Loss: 3.4615, Perplexity: 31.8661 types **##    torch.cuda.FloatT

Epoch [1/3], Step [1575/6471], Loss: 3.1261, Perplexity: 22.7856 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [1576/6471], Loss: 3.3823, Perplexity: 29.4370 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [1577/6471], Loss: 3.2680, Perplexity: 26.2584 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [1578/6471], Loss: 3.1585, Perplexity: 23.5362 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [1579/6471], Loss: 3.1465, Perplexity: 23.2544 types **##    torch.cuda.FloatT

Epoch [1/3], Step [1647/6471], Loss: 3.1236, Perplexity: 22.7284 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [1648/6471], Loss: 3.2280, Perplexity: 25.2280 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [1649/6471], Loss: 3.1896, Perplexity: 24.2780 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 17, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [1650/6471], Loss: 3.7377, Perplexity: 42.0009 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [1651/6471], Loss: 3.1294, Perplexity: 22.8593 types **##    torch.cuda.FloatT

Epoch [1/3], Step [1719/6471], Loss: 3.4927, Perplexity: 32.8743 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [1720/6471], Loss: 3.1877, Perplexity: 24.2330 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [1721/6471], Loss: 3.2365, Perplexity: 25.4441 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 15, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [1722/6471], Loss: 3.3802, Perplexity: 29.3752 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [1723/6471], Loss: 3.2157, Perplexity: 24.9201 types **##    torch.cuda.FloatT

Epoch [1/3], Step [1791/6471], Loss: 3.9552, Perplexity: 52.2058 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [1792/6471], Loss: 3.1441, Perplexity: 23.1981 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [1793/6471], Loss: 3.3195, Perplexity: 27.6459 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 17, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [1794/6471], Loss: 3.4999, Perplexity: 33.1107 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [1795/6471], Loss: 3.0657, Perplexity: 21.4499 types **##    torch.cuda.FloatT

Epoch [1/3], Step [1863/6471], Loss: 3.4144, Perplexity: 30.4001 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [1864/6471], Loss: 3.1617, Perplexity: 23.6099 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [1865/6471], Loss: 3.2185, Perplexity: 24.9911 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [1866/6471], Loss: 3.5365, Perplexity: 34.3455 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [1867/6471], Loss: 3.1378, Perplexity: 23.0521 types **##    torch.cuda.FloatT

Epoch [1/3], Step [1935/6471], Loss: 3.1423, Perplexity: 23.1562 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [1936/6471], Loss: 3.2056, Perplexity: 24.6714 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [1937/6471], Loss: 3.0593, Perplexity: 21.3135 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [1938/6471], Loss: 3.1007, Perplexity: 22.2125 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [1939/6471], Loss: 3.2298, Perplexity: 25.2747 types **##    torch.cuda.FloatT

Epoch [1/3], Step [2007/6471], Loss: 3.1203, Perplexity: 22.6540 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 10, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [2008/6471], Loss: 3.6274, Perplexity: 37.6145 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [2009/6471], Loss: 3.3564, Perplexity: 28.6871 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [2010/6471], Loss: 2.9570, Perplexity: 19.2404 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [2011/6471], Loss: 3.0819, Perplexity: 21.7995 types **##    torch.cuda.FloatT

Epoch [1/3], Step [2079/6471], Loss: 3.2696, Perplexity: 26.3016 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [2080/6471], Loss: 3.2838, Perplexity: 26.6782 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [2081/6471], Loss: 2.9434, Perplexity: 18.9799 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 15, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [2082/6471], Loss: 3.1962, Perplexity: 24.4399 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [2083/6471], Loss: 2.9948, Perplexity: 19.9812 types **##    torch.cuda.FloatT

Epoch [1/3], Step [2151/6471], Loss: 3.0659, Perplexity: 21.4544 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [2152/6471], Loss: 3.1894, Perplexity: 24.2746 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 17, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [2153/6471], Loss: 3.2947, Perplexity: 26.9689 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [2154/6471], Loss: 3.2339, Perplexity: 25.3776 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [2155/6471], Loss: 3.2554, Perplexity: 25.9302 types **##    torch.cuda.FloatT

Epoch [1/3], Step [2223/6471], Loss: 3.0467, Perplexity: 21.0449 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 17, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [2224/6471], Loss: 3.5483, Perplexity: 34.7544 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 22, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [2225/6471], Loss: 3.9285, Perplexity: 50.8309 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [2226/6471], Loss: 3.1917, Perplexity: 24.3296 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [2227/6471], Loss: 3.0547, Perplexity: 21.2142 types **##    torch.cuda.FloatT

Epoch [1/3], Step [2295/6471], Loss: 3.1362, Perplexity: 23.0170 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 15, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [2296/6471], Loss: 2.8916, Perplexity: 18.0228 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [2297/6471], Loss: 3.2305, Perplexity: 25.2919 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [2298/6471], Loss: 3.2093, Perplexity: 24.7623 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [2299/6471], Loss: 3.1935, Perplexity: 24.3738 types **##    torch.cuda.FloatT

Epoch [1/3], Step [2367/6471], Loss: 3.0401, Perplexity: 20.9065 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [2368/6471], Loss: 3.2007, Perplexity: 24.5502 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [2369/6471], Loss: 2.9914, Perplexity: 19.9132 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [2370/6471], Loss: 3.0991, Perplexity: 22.1783 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 10, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [2371/6471], Loss: 3.6095, Perplexity: 36.9476 types **##    torch.cuda.FloatT

Epoch [1/3], Step [2439/6471], Loss: 3.2319, Perplexity: 25.3281 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [2440/6471], Loss: 3.1627, Perplexity: 23.6342 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [2441/6471], Loss: 2.9647, Perplexity: 19.3887 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [2442/6471], Loss: 3.2266, Perplexity: 25.1949 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [2443/6471], Loss: 3.1908, Perplexity: 24.3085 types **##    torch.cuda.FloatT

Epoch [1/3], Step [2511/6471], Loss: 3.2714, Perplexity: 26.3473 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [2512/6471], Loss: 3.2153, Perplexity: 24.9100 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [2513/6471], Loss: 2.9427, Perplexity: 18.9679 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [2514/6471], Loss: 3.0072, Perplexity: 20.2312 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 15, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [2515/6471], Loss: 3.1120, Perplexity: 22.4661 types **##    torch.cuda.FloatT

Epoch [1/3], Step [2583/6471], Loss: 3.2479, Perplexity: 25.7370 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [2584/6471], Loss: 3.1722, Perplexity: 23.8606 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [2585/6471], Loss: 3.1676, Perplexity: 23.7494 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [2586/6471], Loss: 2.8159, Perplexity: 16.7089 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 17, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [2587/6471], Loss: 3.3350, Perplexity: 28.0796 types **##    torch.cuda.FloatT

Epoch [1/3], Step [2655/6471], Loss: 2.8849, Perplexity: 17.9018 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [2656/6471], Loss: 2.9344, Perplexity: 18.8096 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 15, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [2657/6471], Loss: 2.9317, Perplexity: 18.7603 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 15, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [2658/6471], Loss: 3.0256, Perplexity: 20.6060 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [2659/6471], Loss: 3.1731, Perplexity: 23.8803 types **##    torch.cuda.FloatT

Epoch [1/3], Step [2727/6471], Loss: 3.1409, Perplexity: 23.1241 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [2728/6471], Loss: 2.9995, Perplexity: 20.0760 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 15, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [2729/6471], Loss: 3.0973, Perplexity: 22.1383 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [2730/6471], Loss: 3.0011, Perplexity: 20.1086 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [2731/6471], Loss: 2.9651, Perplexity: 19.3966 types **##    torch.cuda.FloatT

Epoch [1/3], Step [2799/6471], Loss: 3.2102, Perplexity: 24.7834 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [2800/6471], Loss: 2.8455, Perplexity: 17.2106
 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 16, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [2801/6471], Loss: 3.2824, Perplexity: 26.6399 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 17, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [2802/6471], Loss: 3.3968, Perplexity: 29.8691 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [2803/6471], Loss: 3.0634, Perplexity: 21.4010 types **##    torch.cuda.Float

Epoch [1/3], Step [2871/6471], Loss: 3.0330, Perplexity: 20.7596 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [2872/6471], Loss: 3.0061, Perplexity: 20.2080 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [2873/6471], Loss: 2.8426, Perplexity: 17.1611 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [2874/6471], Loss: 3.0911, Perplexity: 22.0016 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [2875/6471], Loss: 3.1206, Perplexity: 22.6603 types **##    torch.cuda.FloatT

Epoch [1/3], Step [2943/6471], Loss: 3.0431, Perplexity: 20.9709 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [2944/6471], Loss: 3.2583, Perplexity: 26.0060 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [2945/6471], Loss: 3.1778, Perplexity: 23.9949 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 15, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [2946/6471], Loss: 3.1330, Perplexity: 22.9437 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 16, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [2947/6471], Loss: 3.2078, Perplexity: 24.7242 types **##    torch.cuda.FloatT

Epoch [1/3], Step [3015/6471], Loss: 2.8573, Perplexity: 17.4148 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [3016/6471], Loss: 2.9069, Perplexity: 18.2995 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [3017/6471], Loss: 3.0311, Perplexity: 20.7190 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 16, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [3018/6471], Loss: 3.2858, Perplexity: 26.7315 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 15, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [3019/6471], Loss: 3.1238, Perplexity: 22.7330 types **##    torch.cuda.FloatT

Epoch [1/3], Step [3087/6471], Loss: 3.1708, Perplexity: 23.8255 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 15, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [3088/6471], Loss: 3.1764, Perplexity: 23.9607 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [3089/6471], Loss: 2.9931, Perplexity: 19.9467 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [3090/6471], Loss: 2.9462, Perplexity: 19.0330 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [3091/6471], Loss: 3.2477, Perplexity: 25.7299 types **##    torch.cuda.FloatT

Epoch [1/3], Step [3159/6471], Loss: 3.2110, Perplexity: 24.8047 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 21, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [3160/6471], Loss: 3.8006, Perplexity: 44.7281 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 10, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [3161/6471], Loss: 3.5310, Perplexity: 34.1586 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [3162/6471], Loss: 3.3634, Perplexity: 28.8860 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [3163/6471], Loss: 2.9616, Perplexity: 19.3282 types **##    torch.cuda.FloatT

Epoch [1/3], Step [3231/6471], Loss: 3.5666, Perplexity: 35.3946 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 17, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [3232/6471], Loss: 3.4451, Perplexity: 31.3476 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 17, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [3233/6471], Loss: 3.2582, Perplexity: 26.0028 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [3234/6471], Loss: 2.9016, Perplexity: 18.2035 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 19, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [3235/6471], Loss: 3.4552, Perplexity: 31.6636 types **##    torch.cuda.FloatT

Epoch [1/3], Step [3303/6471], Loss: 3.0927, Perplexity: 22.0359 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [3304/6471], Loss: 2.9286, Perplexity: 18.7019 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [3305/6471], Loss: 3.0490, Perplexity: 21.0944 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 17, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [3306/6471], Loss: 3.2613, Perplexity: 26.0822 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [3307/6471], Loss: 2.9092, Perplexity: 18.3421 types **##    torch.cuda.FloatT

Epoch [1/3], Step [3375/6471], Loss: 3.0470, Perplexity: 21.0529 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [3376/6471], Loss: 3.1559, Perplexity: 23.4733 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 15, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [3377/6471], Loss: 3.0868, Perplexity: 21.9060 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [3378/6471], Loss: 3.2362, Perplexity: 25.4366 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [3379/6471], Loss: 2.8296, Perplexity: 16.9387 types **##    torch.cuda.FloatT

Epoch [1/3], Step [3447/6471], Loss: 2.9805, Perplexity: 19.6968 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [3448/6471], Loss: 2.7069, Perplexity: 14.9826 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [3449/6471], Loss: 3.0042, Perplexity: 20.1698 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [3450/6471], Loss: 3.1328, Perplexity: 22.9375 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [3451/6471], Loss: 2.8805, Perplexity: 17.8231 types **##    torch.cuda.FloatT

Epoch [1/3], Step [3519/6471], Loss: 3.0169, Perplexity: 20.4288 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 15, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [3520/6471], Loss: 3.0247, Perplexity: 20.5884 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [3521/6471], Loss: 2.8555, Perplexity: 17.3825 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [3522/6471], Loss: 2.6744, Perplexity: 14.5032 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 15, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [3523/6471], Loss: 2.9497, Perplexity: 19.0993 types **##    torch.cuda.FloatT

Epoch [1/3], Step [3591/6471], Loss: 3.2578, Perplexity: 25.9911 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [3592/6471], Loss: 2.8604, Perplexity: 17.4689 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 17, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [3593/6471], Loss: 3.1170, Perplexity: 22.5786 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [3594/6471], Loss: 2.9452, Perplexity: 19.0148 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [3595/6471], Loss: 2.6847, Perplexity: 14.6545 types **##    torch.cuda.FloatT

Epoch [1/3], Step [3663/6471], Loss: 2.9650, Perplexity: 19.3951 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [3664/6471], Loss: 3.0355, Perplexity: 20.8115 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [3665/6471], Loss: 2.8777, Perplexity: 17.7730 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [3666/6471], Loss: 2.8797, Perplexity: 17.8085 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [3667/6471], Loss: 3.0450, Perplexity: 21.0110 types **##    torch.cuda.FloatT

Epoch [1/3], Step [3735/6471], Loss: 2.8303, Perplexity: 16.9506 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [3736/6471], Loss: 2.7613, Perplexity: 15.8197 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [3737/6471], Loss: 2.9330, Perplexity: 18.7839 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [3738/6471], Loss: 3.0248, Perplexity: 20.5901 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 20, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [3739/6471], Loss: 3.4598, Perplexity: 31.8107 types **##    torch.cuda.FloatT

Epoch [1/3], Step [3807/6471], Loss: 2.6522, Perplexity: 14.1851 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [3808/6471], Loss: 3.0372, Perplexity: 20.8471 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [3809/6471], Loss: 2.8481, Perplexity: 17.2554 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [3810/6471], Loss: 2.9544, Perplexity: 19.1907 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [3811/6471], Loss: 2.8960, Perplexity: 18.1021 types **##    torch.cuda.FloatT

Epoch [1/3], Step [3879/6471], Loss: 2.9923, Perplexity: 19.9322 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [3880/6471], Loss: 2.9127, Perplexity: 18.4062 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [3881/6471], Loss: 2.8478, Perplexity: 17.2497 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [3882/6471], Loss: 2.8034, Perplexity: 16.5005 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 19, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [3883/6471], Loss: 3.3991, Perplexity: 29.9376 types **##    torch.cuda.FloatT

Epoch [1/3], Step [3951/6471], Loss: 2.8769, Perplexity: 17.7584 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 10, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [3952/6471], Loss: 3.2843, Perplexity: 26.6902 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [3953/6471], Loss: 2.9213, Perplexity: 18.5645 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 10, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [3954/6471], Loss: 3.4409, Perplexity: 31.2153 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 15, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [3955/6471], Loss: 2.8474, Perplexity: 17.2420 types **##    torch.cuda.FloatT

Epoch [1/3], Step [4023/6471], Loss: 2.9641, Perplexity: 19.3776 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [4024/6471], Loss: 2.9477, Perplexity: 19.0624 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [4025/6471], Loss: 3.0063, Perplexity: 20.2127 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [4026/6471], Loss: 2.8308, Perplexity: 16.9587 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [4027/6471], Loss: 2.8773, Perplexity: 17.7662 types **##    torch.cuda.FloatT

Epoch [1/3], Step [4095/6471], Loss: 2.9325, Perplexity: 18.7742 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [4096/6471], Loss: 2.7689, Perplexity: 15.9410 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 10, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [4097/6471], Loss: 3.4685, Perplexity: 32.0893 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [4098/6471], Loss: 3.0085, Perplexity: 20.2570 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 15, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [4099/6471], Loss: 2.9337, Perplexity: 18.7977 types **##    torch.cuda.FloatT

Epoch [1/3], Step [4167/6471], Loss: 2.8676, Perplexity: 17.5951 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 17, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [4168/6471], Loss: 3.1443, Perplexity: 23.2029 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [4169/6471], Loss: 3.0379, Perplexity: 20.8620 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 10, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [4170/6471], Loss: 3.4247, Perplexity: 30.7121 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [4171/6471], Loss: 3.1645, Perplexity: 23.6778 types **##    torch.cuda.FloatT

Epoch [1/3], Step [4239/6471], Loss: 2.8927, Perplexity: 18.0418 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 15, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [4240/6471], Loss: 2.8065, Perplexity: 16.5526 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [4241/6471], Loss: 2.9383, Perplexity: 18.8846 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [4242/6471], Loss: 3.0876, Perplexity: 21.9254 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [4243/6471], Loss: 2.7464, Perplexity: 15.5870 types **##    torch.cuda.FloatT

Epoch [1/3], Step [4311/6471], Loss: 3.2438, Perplexity: 25.6316 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [4312/6471], Loss: 3.0185, Perplexity: 20.4606 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 15, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [4313/6471], Loss: 2.9217, Perplexity: 18.5727 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [4314/6471], Loss: 3.0465, Perplexity: 21.0408 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [4315/6471], Loss: 2.7620, Perplexity: 15.8308 types **##    torch.cuda.FloatT

Epoch [1/3], Step [4383/6471], Loss: 2.6873, Perplexity: 14.6921 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [4384/6471], Loss: 3.1885, Perplexity: 24.2522 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 15, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [4385/6471], Loss: 2.8241, Perplexity: 16.8450 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [4386/6471], Loss: 2.8522, Perplexity: 17.3255 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 15, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [4387/6471], Loss: 2.8838, Perplexity: 17.8829 types **##    torch.cuda.FloatT

Epoch [1/3], Step [4455/6471], Loss: 3.0416, Perplexity: 20.9389 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 15, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [4456/6471], Loss: 2.8504, Perplexity: 17.2944 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [4457/6471], Loss: 2.8579, Perplexity: 17.4242 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 15, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [4458/6471], Loss: 3.0542, Perplexity: 21.2051 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [4459/6471], Loss: 3.0799, Perplexity: 21.7567 types **##    torch.cuda.FloatT

Epoch [1/3], Step [4527/6471], Loss: 3.1621, Perplexity: 23.6209 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [4528/6471], Loss: 2.8720, Perplexity: 17.6723 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [4529/6471], Loss: 2.9685, Perplexity: 19.4627 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 15, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [4530/6471], Loss: 3.0744, Perplexity: 21.6370 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 16, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [4531/6471], Loss: 3.0210, Perplexity: 20.5120 types **##    torch.cuda.FloatT

Epoch [1/3], Step [4599/6471], Loss: 2.7524, Perplexity: 15.6799 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [4600/6471], Loss: 2.7071, Perplexity: 14.9858
 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [4601/6471], Loss: 3.0905, Perplexity: 21.9887 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [4602/6471], Loss: 3.0068, Perplexity: 20.2217 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [4603/6471], Loss: 2.8793, Perplexity: 17.8016 types **##    torch.cuda.Float

Epoch [1/3], Step [4671/6471], Loss: 2.7305, Perplexity: 15.3400 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [4672/6471], Loss: 2.5839, Perplexity: 13.2488 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 10, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [4673/6471], Loss: 3.1078, Perplexity: 22.3708 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [4674/6471], Loss: 2.9058, Perplexity: 18.2804 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [4675/6471], Loss: 2.8928, Perplexity: 18.0439 types **##    torch.cuda.FloatT

Epoch [1/3], Step [4743/6471], Loss: 2.9804, Perplexity: 19.6958 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 15, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [4744/6471], Loss: 2.8216, Perplexity: 16.8045 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [4745/6471], Loss: 2.8078, Perplexity: 16.5730 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [4746/6471], Loss: 2.8173, Perplexity: 16.7316 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 16, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [4747/6471], Loss: 2.9936, Perplexity: 19.9581 types **##    torch.cuda.FloatT

Epoch [1/3], Step [4815/6471], Loss: 3.4727, Perplexity: 32.2233 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [4816/6471], Loss: 2.7296, Perplexity: 15.3273 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [4817/6471], Loss: 2.8083, Perplexity: 16.5815 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [4818/6471], Loss: 2.7791, Perplexity: 16.1053 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [4819/6471], Loss: 2.8541, Perplexity: 17.3594 types **##    torch.cuda.FloatT

Epoch [1/3], Step [4887/6471], Loss: 2.9250, Perplexity: 18.6338 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [4888/6471], Loss: 2.6907, Perplexity: 14.7426 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [4889/6471], Loss: 2.7848, Perplexity: 16.1964 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [4890/6471], Loss: 2.9624, Perplexity: 19.3448 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 19, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [4891/6471], Loss: 3.3364, Perplexity: 28.1165 types **##    torch.cuda.FloatT

Epoch [1/3], Step [4959/6471], Loss: 2.8185, Perplexity: 16.7511 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 18, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [4960/6471], Loss: 3.1860, Perplexity: 24.1912 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 17, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [4961/6471], Loss: 3.2388, Perplexity: 25.5037 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 17, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [4962/6471], Loss: 3.1582, Perplexity: 23.5284 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [4963/6471], Loss: 3.0694, Perplexity: 21.5293 types **##    torch.cuda.FloatT

Epoch [1/3], Step [5031/6471], Loss: 2.7387, Perplexity: 15.4673 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [5032/6471], Loss: 2.7639, Perplexity: 15.8617 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [5033/6471], Loss: 2.8482, Perplexity: 17.2564 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 17, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [5034/6471], Loss: 3.3227, Perplexity: 27.7348 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [5035/6471], Loss: 2.9276, Perplexity: 18.6823 types **##    torch.cuda.FloatT

Epoch [1/3], Step [5103/6471], Loss: 2.9119, Perplexity: 18.3917 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [5104/6471], Loss: 2.9149, Perplexity: 18.4471 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 15, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [5105/6471], Loss: 2.8121, Perplexity: 16.6442 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [5106/6471], Loss: 3.0369, Perplexity: 20.8413 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [5107/6471], Loss: 2.8949, Perplexity: 18.0816 types **##    torch.cuda.FloatT

Epoch [1/3], Step [5175/6471], Loss: 3.1959, Perplexity: 24.4311 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [5176/6471], Loss: 2.7178, Perplexity: 15.1463 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [5177/6471], Loss: 2.7877, Perplexity: 16.2435 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [5178/6471], Loss: 2.7814, Perplexity: 16.1415 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [5179/6471], Loss: 2.7197, Perplexity: 15.1764 types **##    torch.cuda.FloatT

Epoch [1/3], Step [5247/6471], Loss: 3.0100, Perplexity: 20.2884 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [5248/6471], Loss: 2.8712, Perplexity: 17.6578 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [5249/6471], Loss: 2.8657, Perplexity: 17.5620 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [5250/6471], Loss: 2.8908, Perplexity: 18.0075 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [5251/6471], Loss: 2.8745, Perplexity: 17.7169 types **##    torch.cuda.FloatT

Epoch [1/3], Step [5319/6471], Loss: 2.7632, Perplexity: 15.8512 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [5320/6471], Loss: 2.8937, Perplexity: 18.0593 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [5321/6471], Loss: 2.8732, Perplexity: 17.6933 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [5322/6471], Loss: 2.6966, Perplexity: 14.8298 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [5323/6471], Loss: 2.9017, Perplexity: 18.2053 types **##    torch.cuda.FloatT

Epoch [1/3], Step [5391/6471], Loss: 2.9433, Perplexity: 18.9783 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [5392/6471], Loss: 2.7675, Perplexity: 15.9191 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 18, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [5393/6471], Loss: 3.0930, Perplexity: 22.0434 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [5394/6471], Loss: 2.9273, Perplexity: 18.6763 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [5395/6471], Loss: 2.6587, Perplexity: 14.2777 types **##    torch.cuda.FloatT

Epoch [1/3], Step [5463/6471], Loss: 2.8209, Perplexity: 16.7912 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 10, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [5464/6471], Loss: 3.2601, Perplexity: 26.0513 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 22, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [5465/6471], Loss: 3.7380, Perplexity: 42.0148 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [5466/6471], Loss: 2.8361, Perplexity: 17.0485 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 16, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [5467/6471], Loss: 2.9114, Perplexity: 18.3818 types **##    torch.cuda.FloatT

Epoch [1/3], Step [5535/6471], Loss: 3.1670, Perplexity: 23.7350 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [5536/6471], Loss: 2.9408, Perplexity: 18.9304 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 16, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [5537/6471], Loss: 3.1598, Perplexity: 23.5661 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [5538/6471], Loss: 2.6682, Perplexity: 14.4137 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [5539/6471], Loss: 2.9650, Perplexity: 19.3949 types **##    torch.cuda.FloatT

Epoch [1/3], Step [5607/6471], Loss: 2.8581, Perplexity: 17.4276 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [5608/6471], Loss: 2.7323, Perplexity: 15.3686 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [5609/6471], Loss: 2.7541, Perplexity: 15.7067 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [5610/6471], Loss: 2.7374, Perplexity: 15.4472 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [5611/6471], Loss: 2.6803, Perplexity: 14.5899 types **##    torch.cuda.FloatT

Epoch [1/3], Step [5679/6471], Loss: 2.7369, Perplexity: 15.4397 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [5680/6471], Loss: 2.6474, Perplexity: 14.1171 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 15, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [5681/6471], Loss: 2.9428, Perplexity: 18.9685 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 17, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [5682/6471], Loss: 3.2133, Perplexity: 24.8610 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [5683/6471], Loss: 2.8825, Perplexity: 17.8586 types **##    torch.cuda.FloatT

Epoch [1/3], Step [5751/6471], Loss: 2.9875, Perplexity: 19.8367 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [5752/6471], Loss: 2.7351, Perplexity: 15.4112 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 15, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [5753/6471], Loss: 2.8395, Perplexity: 17.1076 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [5754/6471], Loss: 2.9527, Perplexity: 19.1578 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [5755/6471], Loss: 2.7958, Perplexity: 16.3751 types **##    torch.cuda.FloatT

Epoch [1/3], Step [5823/6471], Loss: 2.8501, Perplexity: 17.2901 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 10, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [5824/6471], Loss: 3.1001, Perplexity: 22.1995 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 10, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [5825/6471], Loss: 3.0979, Perplexity: 22.1510 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [5826/6471], Loss: 2.8765, Perplexity: 17.7526 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [5827/6471], Loss: 2.8182, Perplexity: 16.7469 types **##    torch.cuda.FloatT

Epoch [1/3], Step [5895/6471], Loss: 2.7659, Perplexity: 15.8927 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 16, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [5896/6471], Loss: 3.1170, Perplexity: 22.5796 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [5897/6471], Loss: 2.8672, Perplexity: 17.5882 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [5898/6471], Loss: 2.6680, Perplexity: 14.4118 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [5899/6471], Loss: 2.5525, Perplexity: 12.8395 types **##    torch.cuda.FloatT

Epoch [1/3], Step [5967/6471], Loss: 2.8015, Perplexity: 16.4698 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [5968/6471], Loss: 2.5779, Perplexity: 13.1690 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 17, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [5969/6471], Loss: 3.0485, Perplexity: 21.0843 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [5970/6471], Loss: 2.7879, Perplexity: 16.2462 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [5971/6471], Loss: 2.8915, Perplexity: 18.0199 types **##    torch.cuda.FloatT

Epoch [1/3], Step [6039/6471], Loss: 2.8672, Perplexity: 17.5885 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [6040/6471], Loss: 2.8126, Perplexity: 16.6528 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 18, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [6041/6471], Loss: 3.1158, Perplexity: 22.5519 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [6042/6471], Loss: 2.7916, Perplexity: 16.3067 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [6043/6471], Loss: 2.6889, Perplexity: 14.7156 types **##    torch.cuda.FloatT

Epoch [1/3], Step [6111/6471], Loss: 3.3466, Perplexity: 28.4048 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [6112/6471], Loss: 2.4427, Perplexity: 11.5036 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [6113/6471], Loss: 2.8311, Perplexity: 16.9642 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 10, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [6114/6471], Loss: 3.1847, Perplexity: 24.1598 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 15, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [6115/6471], Loss: 2.7882, Perplexity: 16.2511 types **##    torch.cuda.FloatT

Epoch [1/3], Step [6183/6471], Loss: 2.7468, Perplexity: 15.5928 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [6184/6471], Loss: 2.7407, Perplexity: 15.4985 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [6185/6471], Loss: 2.7656, Perplexity: 15.8890 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [6186/6471], Loss: 2.8762, Perplexity: 17.7471 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 15, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [6187/6471], Loss: 2.9870, Perplexity: 19.8265 types **##    torch.cuda.FloatT

Epoch [1/3], Step [6255/6471], Loss: 2.7892, Perplexity: 16.2673 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [6256/6471], Loss: 2.5654, Perplexity: 13.0065 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [6257/6471], Loss: 2.9563, Perplexity: 19.2266 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [6258/6471], Loss: 2.8279, Perplexity: 16.9093 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [6259/6471], Loss: 2.8382, Perplexity: 17.0845 types **##    torch.cuda.FloatT

Epoch [1/3], Step [6327/6471], Loss: 2.6902, Perplexity: 14.7343 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [6328/6471], Loss: 2.8319, Perplexity: 16.9771 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [6329/6471], Loss: 2.7638, Perplexity: 15.8608 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [6330/6471], Loss: 2.6979, Perplexity: 14.8479 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [6331/6471], Loss: 2.6986, Perplexity: 14.8589 types **##    torch.cuda.FloatT

Epoch [1/3], Step [6399/6471], Loss: 2.7855, Perplexity: 16.2085 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [6400/6471], Loss: 2.9089, Perplexity: 18.3374
 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 10, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [6401/6471], Loss: 3.0887, Perplexity: 21.9481 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [6402/6471], Loss: 2.9195, Perplexity: 18.5324 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [1/3], Step [6403/6471], Loss: 2.8765, Perplexity: 17.7522 types **##    torch.cuda.Float

Epoch [1/3], Step [6471/6471], Loss: 2.6893, Perplexity: 14.7217 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [1/6471], Loss: 2.6839, Perplexity: 14.6416 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 16, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [2/6471], Loss: 3.2072, Perplexity: 24.7092 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [3/6471], Loss: 2.5611, Perplexity: 12.9497 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [4/6471], Loss: 2.8345, Perplexity: 17.0225 types **##    torch.cuda.FloatTensor torch.

Epoch [2/3], Step [37/6471], Loss: 2.7692, Perplexity: 15.9454 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 16, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [38/6471], Loss: 2.9245, Perplexity: 18.6246 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [39/6471], Loss: 2.6956, Perplexity: 14.8150 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [40/6471], Loss: 2.8268, Perplexity: 16.8913 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [41/6471], Loss: 2.6633, Perplexity: 14.3432 types **##    torch.cuda.FloatTensor torc

Epoch [2/3], Step [74/6471], Loss: 2.7304, Perplexity: 15.3394 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [75/6471], Loss: 2.7305, Perplexity: 15.3413 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 10, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [76/6471], Loss: 3.2188, Perplexity: 24.9978 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [77/6471], Loss: 2.8186, Perplexity: 16.7530 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [78/6471], Loss: 2.8118, Perplexity: 16.6404 types **##    torch.cuda.FloatTensor torc

Epoch [2/3], Step [111/6471], Loss: 2.5629, Perplexity: 12.9731 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [112/6471], Loss: 2.5710, Perplexity: 13.0786 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [113/6471], Loss: 2.8396, Perplexity: 17.1096 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [114/6471], Loss: 2.7009, Perplexity: 14.8937 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [115/6471], Loss: 2.6159, Perplexity: 13.6796 types **##    torch.cuda.FloatTensor

Epoch [2/3], Step [148/6471], Loss: 2.7065, Perplexity: 14.9763 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [149/6471], Loss: 2.7920, Perplexity: 16.3132 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 15, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [150/6471], Loss: 2.8177, Perplexity: 16.7391 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [151/6471], Loss: 2.9693, Perplexity: 19.4779 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [152/6471], Loss: 2.7905, Perplexity: 16.2893 types **##    torch.cuda.FloatTensor

Epoch [2/3], Step [185/6471], Loss: 2.8702, Perplexity: 17.6408 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [186/6471], Loss: 2.8645, Perplexity: 17.5394 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 16, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [187/6471], Loss: 2.8259, Perplexity: 16.8762 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [188/6471], Loss: 2.9116, Perplexity: 18.3859 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 16, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [189/6471], Loss: 3.0150, Perplexity: 20.3886 types **##    torch.cuda.FloatTensor

Epoch [2/3], Step [222/6471], Loss: 2.8967, Perplexity: 18.1146 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 17, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [223/6471], Loss: 2.9326, Perplexity: 18.7771 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [224/6471], Loss: 2.7651, Perplexity: 15.8812 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 15, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [225/6471], Loss: 2.7803, Perplexity: 16.1240 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [226/6471], Loss: 2.7339, Perplexity: 15.3926 types **##    torch.cuda.FloatTensor

Epoch [2/3], Step [259/6471], Loss: 2.8191, Perplexity: 16.7624 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [260/6471], Loss: 2.7374, Perplexity: 15.4470 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [261/6471], Loss: 2.6866, Perplexity: 14.6813 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [262/6471], Loss: 2.8083, Perplexity: 16.5815 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [263/6471], Loss: 2.9959, Perplexity: 20.0038 types **##    torch.cuda.FloatTensor

Epoch [2/3], Step [296/6471], Loss: 2.6753, Perplexity: 14.5168 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [297/6471], Loss: 2.7919, Perplexity: 16.3119 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 16, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [298/6471], Loss: 2.9374, Perplexity: 18.8672 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 15, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [299/6471], Loss: 2.7459, Perplexity: 15.5794 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [300/6471], Loss: 2.7201, Perplexity: 15.1817
 types **##    torch.cuda.FloatTenso

Epoch [2/3], Step [333/6471], Loss: 2.7975, Perplexity: 16.4032 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [334/6471], Loss: 2.8443, Perplexity: 17.1888 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [335/6471], Loss: 2.7064, Perplexity: 14.9758 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [336/6471], Loss: 2.8738, Perplexity: 17.7045 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [337/6471], Loss: 2.8478, Perplexity: 17.2501 types **##    torch.cuda.FloatTensor

Epoch [2/3], Step [370/6471], Loss: 2.6466, Perplexity: 14.1055 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [371/6471], Loss: 2.8427, Perplexity: 17.1621 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 17, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [372/6471], Loss: 3.0371, Perplexity: 20.8444 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [373/6471], Loss: 2.8603, Perplexity: 17.4666 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [374/6471], Loss: 2.8190, Perplexity: 16.7601 types **##    torch.cuda.FloatTensor

Epoch [2/3], Step [407/6471], Loss: 2.7332, Perplexity: 15.3819 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [408/6471], Loss: 2.9010, Perplexity: 18.1920 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [409/6471], Loss: 2.7270, Perplexity: 15.2864 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 15, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [410/6471], Loss: 2.9609, Perplexity: 19.3146 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [411/6471], Loss: 2.8341, Perplexity: 17.0155 types **##    torch.cuda.FloatTensor

Epoch [2/3], Step [444/6471], Loss: 2.6245, Perplexity: 13.7971 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 16, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [445/6471], Loss: 2.7091, Perplexity: 15.0155 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [446/6471], Loss: 2.8381, Perplexity: 17.0839 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [447/6471], Loss: 2.6806, Perplexity: 14.5935 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [448/6471], Loss: 2.7986, Perplexity: 16.4212 types **##    torch.cuda.FloatTensor

Epoch [2/3], Step [481/6471], Loss: 2.7085, Perplexity: 15.0064 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 10, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [482/6471], Loss: 3.0165, Perplexity: 20.4205 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [483/6471], Loss: 2.7212, Perplexity: 15.1981 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [484/6471], Loss: 2.5517, Perplexity: 12.8292 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 16, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [485/6471], Loss: 3.0839, Perplexity: 21.8424 types **##    torch.cuda.FloatTensor

Epoch [2/3], Step [518/6471], Loss: 2.9273, Perplexity: 18.6765 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [519/6471], Loss: 2.7631, Perplexity: 15.8492 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 15, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [520/6471], Loss: 3.0175, Perplexity: 20.4392 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [521/6471], Loss: 2.8647, Perplexity: 17.5446 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [522/6471], Loss: 2.7511, Perplexity: 15.6597 types **##    torch.cuda.FloatTensor

Epoch [2/3], Step [555/6471], Loss: 2.6801, Perplexity: 14.5867 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [556/6471], Loss: 2.5960, Perplexity: 13.4096 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 17, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [557/6471], Loss: 2.9193, Perplexity: 18.5279 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 18, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [558/6471], Loss: 3.1364, Perplexity: 23.0216 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [559/6471], Loss: 2.6725, Perplexity: 14.4758 types **##    torch.cuda.FloatTensor

Epoch [2/3], Step [592/6471], Loss: 3.5664, Perplexity: 35.3896 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [593/6471], Loss: 2.7763, Perplexity: 16.0589 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [594/6471], Loss: 3.0608, Perplexity: 21.3445 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [595/6471], Loss: 2.8834, Perplexity: 17.8747 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [596/6471], Loss: 2.8777, Perplexity: 17.7732 types **##    torch.cuda.FloatTensor

Epoch [2/3], Step [629/6471], Loss: 2.7453, Perplexity: 15.5697 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [630/6471], Loss: 3.0212, Perplexity: 20.5165 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 10, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [631/6471], Loss: 2.8807, Perplexity: 17.8273 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [632/6471], Loss: 2.7527, Perplexity: 15.6846 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [633/6471], Loss: 2.7211, Perplexity: 15.1970 types **##    torch.cuda.FloatTensor

Epoch [2/3], Step [666/6471], Loss: 2.6701, Perplexity: 14.4420 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 10, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [667/6471], Loss: 3.0108, Perplexity: 20.3042 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 16, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [668/6471], Loss: 2.9919, Perplexity: 19.9232 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [669/6471], Loss: 2.6515, Perplexity: 14.1758 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 15, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [670/6471], Loss: 2.7921, Perplexity: 16.3158 types **##    torch.cuda.FloatTensor

Epoch [2/3], Step [703/6471], Loss: 2.7369, Perplexity: 15.4395 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [704/6471], Loss: 2.5870, Perplexity: 13.2892 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [705/6471], Loss: 2.8187, Perplexity: 16.7545 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [706/6471], Loss: 2.6518, Perplexity: 14.1791 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 17, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [707/6471], Loss: 3.1782, Perplexity: 24.0033 types **##    torch.cuda.FloatTensor

Epoch [2/3], Step [740/6471], Loss: 2.7823, Perplexity: 16.1564 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 16, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [741/6471], Loss: 3.0600, Perplexity: 21.3270 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [742/6471], Loss: 2.8802, Perplexity: 17.8183 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 27, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [743/6471], Loss: 3.6094, Perplexity: 36.9447 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [744/6471], Loss: 2.7557, Perplexity: 15.7319 types **##    torch.cuda.FloatTensor

Epoch [2/3], Step [777/6471], Loss: 2.7753, Perplexity: 16.0427 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 15, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [778/6471], Loss: 2.6730, Perplexity: 14.4831 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [779/6471], Loss: 2.7434, Perplexity: 15.5393 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 15, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [780/6471], Loss: 2.8168, Perplexity: 16.7228 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [781/6471], Loss: 2.6877, Perplexity: 14.6974 types **##    torch.cuda.FloatTensor

Epoch [2/3], Step [814/6471], Loss: 2.6077, Perplexity: 13.5677 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [815/6471], Loss: 2.4588, Perplexity: 11.6905 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 15, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [816/6471], Loss: 2.6128, Perplexity: 13.6368 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [817/6471], Loss: 2.6960, Perplexity: 14.8206 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 16, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [818/6471], Loss: 3.0032, Perplexity: 20.1508 types **##    torch.cuda.FloatTensor

Epoch [2/3], Step [851/6471], Loss: 2.6093, Perplexity: 13.5896 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [852/6471], Loss: 2.7298, Perplexity: 15.3293 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [853/6471], Loss: 2.6523, Perplexity: 14.1873 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [854/6471], Loss: 2.7669, Perplexity: 15.9099 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [855/6471], Loss: 2.5626, Perplexity: 12.9694 types **##    torch.cuda.FloatTensor

Epoch [2/3], Step [888/6471], Loss: 2.8667, Perplexity: 17.5795 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [889/6471], Loss: 2.7118, Perplexity: 15.0562 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [890/6471], Loss: 2.4535, Perplexity: 11.6294 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [891/6471], Loss: 2.6263, Perplexity: 13.8228 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [892/6471], Loss: 2.6488, Perplexity: 14.1367 types **##    torch.cuda.FloatTensor

Epoch [2/3], Step [925/6471], Loss: 2.9097, Perplexity: 18.3520 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [926/6471], Loss: 2.6701, Perplexity: 14.4411 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [927/6471], Loss: 2.6598, Perplexity: 14.2940 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [928/6471], Loss: 2.7720, Perplexity: 15.9898 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [929/6471], Loss: 2.5751, Perplexity: 13.1331 types **##    torch.cuda.FloatTensor

Epoch [2/3], Step [962/6471], Loss: 3.0751, Perplexity: 21.6514 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [963/6471], Loss: 2.8519, Perplexity: 17.3205 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 20, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [964/6471], Loss: 3.5383, Perplexity: 34.4086 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 16, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [965/6471], Loss: 2.9742, Perplexity: 19.5748 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [966/6471], Loss: 2.6508, Perplexity: 14.1649 types **##    torch.cuda.FloatTensor

Epoch [2/3], Step [999/6471], Loss: 2.5380, Perplexity: 12.6540 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [1000/6471], Loss: 2.5716, Perplexity: 13.0863
 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [1001/6471], Loss: 2.8974, Perplexity: 18.1270 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 16, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [1002/6471], Loss: 2.7441, Perplexity: 15.5513 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [1003/6471], Loss: 2.6723, Perplexity: 14.4731 types **##    torch.cuda.FloatT

Epoch [2/3], Step [1071/6471], Loss: 2.7647, Perplexity: 15.8747 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [1072/6471], Loss: 2.8950, Perplexity: 18.0840 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 15, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [1073/6471], Loss: 2.9772, Perplexity: 19.6319 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [1074/6471], Loss: 2.8365, Perplexity: 17.0557 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [1075/6471], Loss: 2.6417, Perplexity: 14.0369 types **##    torch.cuda.FloatT

Epoch [2/3], Step [1143/6471], Loss: 2.5916, Perplexity: 13.3518 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 10, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [1144/6471], Loss: 3.1458, Perplexity: 23.2393 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [1145/6471], Loss: 2.6891, Perplexity: 14.7187 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [1146/6471], Loss: 2.7991, Perplexity: 16.4293 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [1147/6471], Loss: 2.9522, Perplexity: 19.1474 types **##    torch.cuda.FloatT

Epoch [2/3], Step [1215/6471], Loss: 2.7971, Perplexity: 16.3962 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [1216/6471], Loss: 2.7013, Perplexity: 14.8985 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [1217/6471], Loss: 2.6676, Perplexity: 14.4049 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [1218/6471], Loss: 2.9912, Perplexity: 19.9105 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 16, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [1219/6471], Loss: 3.0160, Perplexity: 20.4099 types **##    torch.cuda.FloatT

Epoch [2/3], Step [1287/6471], Loss: 2.7634, Perplexity: 15.8541 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 10, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [1288/6471], Loss: 2.9866, Perplexity: 19.8184 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 17, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [1289/6471], Loss: 2.7820, Perplexity: 16.1517 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [1290/6471], Loss: 2.5793, Perplexity: 13.1877 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 16, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [1291/6471], Loss: 2.8514, Perplexity: 17.3128 types **##    torch.cuda.FloatT

Epoch [2/3], Step [1359/6471], Loss: 2.6791, Perplexity: 14.5726 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 15, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [1360/6471], Loss: 2.7166, Perplexity: 15.1291 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 16, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [1361/6471], Loss: 2.9005, Perplexity: 18.1830 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [1362/6471], Loss: 2.7572, Perplexity: 15.7552 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [1363/6471], Loss: 2.5927, Perplexity: 13.3654 types **##    torch.cuda.FloatT

Epoch [2/3], Step [1431/6471], Loss: 2.6544, Perplexity: 14.2166 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 15, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [1432/6471], Loss: 2.6591, Perplexity: 14.2837 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 17, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [1433/6471], Loss: 2.9550, Perplexity: 19.2009 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 15, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [1434/6471], Loss: 2.6802, Perplexity: 14.5877 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [1435/6471], Loss: 2.6189, Perplexity: 13.7200 types **##    torch.cuda.FloatT

Epoch [2/3], Step [1503/6471], Loss: 2.6506, Perplexity: 14.1627 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [1504/6471], Loss: 2.8493, Perplexity: 17.2750 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [1505/6471], Loss: 2.7798, Perplexity: 16.1161 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 15, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [1506/6471], Loss: 2.7327, Perplexity: 15.3740 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [1507/6471], Loss: 2.8512, Perplexity: 17.3077 types **##    torch.cuda.FloatT

Epoch [2/3], Step [1575/6471], Loss: 2.6637, Perplexity: 14.3494 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [1576/6471], Loss: 2.8963, Perplexity: 18.1069 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [1577/6471], Loss: 2.6586, Perplexity: 14.2766 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 21, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [1578/6471], Loss: 3.1844, Perplexity: 24.1529 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [1579/6471], Loss: 2.7442, Perplexity: 15.5527 types **##    torch.cuda.FloatT

Epoch [2/3], Step [1647/6471], Loss: 2.6222, Perplexity: 13.7658 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [1648/6471], Loss: 2.7738, Perplexity: 16.0201 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [1649/6471], Loss: 2.6903, Perplexity: 14.7361 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [1650/6471], Loss: 2.5510, Perplexity: 12.8194 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [1651/6471], Loss: 2.8251, Perplexity: 16.8633 types **##    torch.cuda.FloatT

Epoch [2/3], Step [1719/6471], Loss: 2.8839, Perplexity: 17.8847 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [1720/6471], Loss: 2.7377, Perplexity: 15.4518 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [1721/6471], Loss: 2.6876, Perplexity: 14.6964 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [1722/6471], Loss: 2.6849, Perplexity: 14.6568 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 17, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [1723/6471], Loss: 3.0008, Perplexity: 20.1014 types **##    torch.cuda.FloatT

Epoch [2/3], Step [1791/6471], Loss: 2.6851, Perplexity: 14.6602 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 15, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [1792/6471], Loss: 2.8012, Perplexity: 16.4652 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [1793/6471], Loss: 2.5727, Perplexity: 13.1018 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [1794/6471], Loss: 2.7277, Perplexity: 15.2979 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [1795/6471], Loss: 2.7117, Perplexity: 15.0548 types **##    torch.cuda.FloatT

Epoch [2/3], Step [1863/6471], Loss: 2.8814, Perplexity: 17.8394 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [1864/6471], Loss: 2.5306, Perplexity: 12.5605 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [1865/6471], Loss: 2.7097, Perplexity: 15.0242 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [1866/6471], Loss: 2.8684, Perplexity: 17.6094 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [1867/6471], Loss: 2.8079, Perplexity: 16.5743 types **##    torch.cuda.FloatT

Epoch [2/3], Step [1935/6471], Loss: 2.7757, Perplexity: 16.0493 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 19, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [1936/6471], Loss: 3.2023, Perplexity: 24.5902 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [1937/6471], Loss: 2.7533, Perplexity: 15.6947 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [1938/6471], Loss: 2.5027, Perplexity: 12.2160 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [1939/6471], Loss: 2.8285, Perplexity: 16.9202 types **##    torch.cuda.FloatT

Epoch [2/3], Step [2007/6471], Loss: 2.6994, Perplexity: 14.8713 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 16, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [2008/6471], Loss: 2.8134, Perplexity: 16.6667 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [2009/6471], Loss: 2.6096, Perplexity: 13.5930 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 15, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [2010/6471], Loss: 2.6427, Perplexity: 14.0514 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 17, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [2011/6471], Loss: 2.9367, Perplexity: 18.8534 types **##    torch.cuda.FloatT

Epoch [2/3], Step [2079/6471], Loss: 2.7072, Perplexity: 14.9875 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 10, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [2080/6471], Loss: 3.0462, Perplexity: 21.0342 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 17, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [2081/6471], Loss: 3.0221, Perplexity: 20.5350 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [2082/6471], Loss: 2.6439, Perplexity: 14.0684 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [2083/6471], Loss: 2.4580, Perplexity: 11.6813 types **##    torch.cuda.FloatT

Epoch [2/3], Step [2151/6471], Loss: 2.6483, Perplexity: 14.1296 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [2152/6471], Loss: 2.6341, Perplexity: 13.9302 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [2153/6471], Loss: 2.6601, Perplexity: 14.2981 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 18, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [2154/6471], Loss: 3.1788, Perplexity: 24.0183 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 17, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [2155/6471], Loss: 2.9494, Perplexity: 19.0948 types **##    torch.cuda.FloatT

Epoch [2/3], Step [2223/6471], Loss: 2.5065, Perplexity: 12.2624 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [2224/6471], Loss: 2.4928, Perplexity: 12.0957 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [2225/6471], Loss: 2.4830, Perplexity: 11.9766 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [2226/6471], Loss: 2.5264, Perplexity: 12.5084 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 15, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [2227/6471], Loss: 2.9132, Perplexity: 18.4160 types **##    torch.cuda.FloatT

Epoch [2/3], Step [2295/6471], Loss: 3.0928, Perplexity: 22.0395 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [2296/6471], Loss: 2.6738, Perplexity: 14.4944 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 17, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [2297/6471], Loss: 2.9152, Perplexity: 18.4519 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [2298/6471], Loss: 2.7830, Perplexity: 16.1669 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [2299/6471], Loss: 2.7577, Perplexity: 15.7634 types **##    torch.cuda.FloatT

Epoch [2/3], Step [2367/6471], Loss: 2.7642, Perplexity: 15.8660 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [2368/6471], Loss: 2.7646, Perplexity: 15.8729 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [2369/6471], Loss: 2.6831, Perplexity: 14.6310 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [2370/6471], Loss: 2.6565, Perplexity: 14.2462 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [2371/6471], Loss: 2.7222, Perplexity: 15.2142 types **##    torch.cuda.FloatT

Epoch [2/3], Step [2439/6471], Loss: 2.5751, Perplexity: 13.1331 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [2440/6471], Loss: 2.6909, Perplexity: 14.7445 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [2441/6471], Loss: 2.6017, Perplexity: 13.4870 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [2442/6471], Loss: 2.7527, Perplexity: 15.6842 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 16, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [2443/6471], Loss: 2.6845, Perplexity: 14.6512 types **##    torch.cuda.FloatT

Epoch [2/3], Step [2511/6471], Loss: 2.8964, Perplexity: 18.1096 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [2512/6471], Loss: 2.6622, Perplexity: 14.3273 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [2513/6471], Loss: 2.9676, Perplexity: 19.4448 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [2514/6471], Loss: 2.5019, Perplexity: 12.2061 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [2515/6471], Loss: 2.5334, Perplexity: 12.5962 types **##    torch.cuda.FloatT

Epoch [2/3], Step [2583/6471], Loss: 2.5177, Perplexity: 12.4000 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [2584/6471], Loss: 2.7661, Perplexity: 15.8970 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [2585/6471], Loss: 2.8812, Perplexity: 17.8358 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [2586/6471], Loss: 2.9255, Perplexity: 18.6444 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [2587/6471], Loss: 2.4817, Perplexity: 11.9617 types **##    torch.cuda.FloatT

Epoch [2/3], Step [2655/6471], Loss: 2.7112, Perplexity: 15.0466 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [2656/6471], Loss: 2.6996, Perplexity: 14.8734 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [2657/6471], Loss: 2.6429, Perplexity: 14.0544 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [2658/6471], Loss: 2.5952, Perplexity: 13.3986 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [2659/6471], Loss: 2.8803, Perplexity: 17.8188 types **##    torch.cuda.FloatT

Epoch [2/3], Step [2727/6471], Loss: 2.8637, Perplexity: 17.5265 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 17, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [2728/6471], Loss: 2.8805, Perplexity: 17.8241 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [2729/6471], Loss: 2.8779, Perplexity: 17.7770 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 15, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [2730/6471], Loss: 2.6773, Perplexity: 14.5465 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [2731/6471], Loss: 2.6487, Perplexity: 14.1356 types **##    torch.cuda.FloatT

Epoch [2/3], Step [2799/6471], Loss: 2.8772, Perplexity: 17.7651 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [2800/6471], Loss: 2.5782, Perplexity: 13.1738
 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [2801/6471], Loss: 2.4705, Perplexity: 11.8287 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 9, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [2802/6471], Loss: 3.1281, Perplexity: 22.8309 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [2803/6471], Loss: 2.6038, Perplexity: 13.5145 types **##    torch.cuda.FloatT

Epoch [2/3], Step [2871/6471], Loss: 2.7282, Perplexity: 15.3056 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [2872/6471], Loss: 2.6253, Perplexity: 13.8080 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [2873/6471], Loss: 2.4561, Perplexity: 11.6598 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 15, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [2874/6471], Loss: 2.6950, Perplexity: 14.8054 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [2875/6471], Loss: 2.6399, Perplexity: 14.0122 types **##    torch.cuda.FloatT

Epoch [2/3], Step [2943/6471], Loss: 2.8586, Perplexity: 17.4366 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [2944/6471], Loss: 2.7773, Perplexity: 16.0755 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [2945/6471], Loss: 2.6503, Perplexity: 14.1587 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [2946/6471], Loss: 2.6694, Perplexity: 14.4308 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [2947/6471], Loss: 2.4892, Perplexity: 12.0512 types **##    torch.cuda.FloatT

Epoch [2/3], Step [3015/6471], Loss: 2.7362, Perplexity: 15.4284 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [3016/6471], Loss: 2.5689, Perplexity: 13.0512 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 15, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [3017/6471], Loss: 2.6525, Perplexity: 14.1895 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [3018/6471], Loss: 2.4608, Perplexity: 11.7139 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 15, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [3019/6471], Loss: 2.6615, Perplexity: 14.3176 types **##    torch.cuda.FloatT

Epoch [2/3], Step [3087/6471], Loss: 2.6570, Perplexity: 14.2534 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [3088/6471], Loss: 2.8723, Perplexity: 17.6776 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [3089/6471], Loss: 2.6775, Perplexity: 14.5491 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [3090/6471], Loss: 2.5720, Perplexity: 13.0924 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [3091/6471], Loss: 2.5829, Perplexity: 13.2353 types **##    torch.cuda.FloatT

Epoch [2/3], Step [3159/6471], Loss: 2.6101, Perplexity: 13.6008 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [3160/6471], Loss: 2.3622, Perplexity: 10.6147 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [3161/6471], Loss: 2.5692, Perplexity: 13.0552 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 15, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [3162/6471], Loss: 2.6539, Perplexity: 14.2097 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [3163/6471], Loss: 2.6538, Perplexity: 14.2074 types **##    torch.cuda.FloatT

Epoch [2/3], Step [3231/6471], Loss: 2.4365, Perplexity: 11.4324 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [3232/6471], Loss: 2.7426, Perplexity: 15.5276 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [3233/6471], Loss: 2.6322, Perplexity: 13.9046 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 10, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [3234/6471], Loss: 2.9689, Perplexity: 19.4709 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 18, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [3235/6471], Loss: 2.8641, Perplexity: 17.5326 types **##    torch.cuda.FloatT

Epoch [2/3], Step [3303/6471], Loss: 3.0935, Perplexity: 22.0535 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [3304/6471], Loss: 2.7025, Perplexity: 14.9168 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [3305/6471], Loss: 2.4981, Perplexity: 12.1592 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [3306/6471], Loss: 2.4860, Perplexity: 12.0131 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [3307/6471], Loss: 2.4426, Perplexity: 11.5032 types **##    torch.cuda.FloatT

Epoch [2/3], Step [3375/6471], Loss: 2.7781, Perplexity: 16.0885 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [3376/6471], Loss: 2.5284, Perplexity: 12.5329 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 15, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [3377/6471], Loss: 2.7627, Perplexity: 15.8423 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [3378/6471], Loss: 2.5715, Perplexity: 13.0848 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [3379/6471], Loss: 2.5589, Perplexity: 12.9214 types **##    torch.cuda.FloatT

Epoch [2/3], Step [3447/6471], Loss: 2.5332, Perplexity: 12.5938 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 15, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [3448/6471], Loss: 2.7440, Perplexity: 15.5489 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 16, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [3449/6471], Loss: 2.8002, Perplexity: 16.4471 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [3450/6471], Loss: 2.6179, Perplexity: 13.7069 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [3451/6471], Loss: 2.6403, Perplexity: 14.0172 types **##    torch.cuda.FloatT

Epoch [2/3], Step [3519/6471], Loss: 2.6842, Perplexity: 14.6458 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [3520/6471], Loss: 2.5914, Perplexity: 13.3487 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [3521/6471], Loss: 2.7102, Perplexity: 15.0322 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 18, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [3522/6471], Loss: 3.0075, Perplexity: 20.2376 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [3523/6471], Loss: 2.8311, Perplexity: 16.9643 types **##    torch.cuda.FloatT

Epoch [2/3], Step [3591/6471], Loss: 2.5502, Perplexity: 12.8100 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [3592/6471], Loss: 2.9807, Perplexity: 19.7024 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [3593/6471], Loss: 2.6128, Perplexity: 13.6377 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [3594/6471], Loss: 2.5880, Perplexity: 13.3038 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [3595/6471], Loss: 2.7526, Perplexity: 15.6835 types **##    torch.cuda.FloatT

Epoch [2/3], Step [3663/6471], Loss: 2.5427, Perplexity: 12.7141 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [3664/6471], Loss: 2.6111, Perplexity: 13.6136 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [3665/6471], Loss: 2.2935, Perplexity: 9.9099 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [3666/6471], Loss: 2.6903, Perplexity: 14.7360 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [3667/6471], Loss: 2.5345, Perplexity: 12.6100 types **##    torch.cuda.FloatTe

Epoch [2/3], Step [3735/6471], Loss: 2.4726, Perplexity: 11.8532 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [3736/6471], Loss: 2.6630, Perplexity: 14.3397 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [3737/6471], Loss: 2.6771, Perplexity: 14.5433 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 16, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [3738/6471], Loss: 2.7851, Perplexity: 16.2018 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [3739/6471], Loss: 2.5472, Perplexity: 12.7712 types **##    torch.cuda.FloatT

Epoch [2/3], Step [3807/6471], Loss: 2.6599, Perplexity: 14.2954 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [3808/6471], Loss: 2.6600, Perplexity: 14.2964 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [3809/6471], Loss: 2.5468, Perplexity: 12.7657 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 15, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [3810/6471], Loss: 2.6198, Perplexity: 13.7326 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [3811/6471], Loss: 2.7007, Perplexity: 14.8901 types **##    torch.cuda.FloatT

Epoch [2/3], Step [3879/6471], Loss: 2.8310, Perplexity: 16.9629 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 23, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [3880/6471], Loss: 3.4476, Perplexity: 31.4242 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [3881/6471], Loss: 2.7604, Perplexity: 15.8063 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 16, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [3882/6471], Loss: 2.7139, Perplexity: 15.0875 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [3883/6471], Loss: 2.4892, Perplexity: 12.0519 types **##    torch.cuda.FloatT

Epoch [2/3], Step [3951/6471], Loss: 2.5131, Perplexity: 12.3436 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [3952/6471], Loss: 2.6449, Perplexity: 14.0818 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [3953/6471], Loss: 2.7727, Perplexity: 16.0024 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [3954/6471], Loss: 2.6717, Perplexity: 14.4641 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [3955/6471], Loss: 2.6142, Perplexity: 13.6565 types **##    torch.cuda.FloatT

Epoch [2/3], Step [4023/6471], Loss: 2.5683, Perplexity: 13.0441 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [4024/6471], Loss: 2.5706, Perplexity: 13.0737 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [4025/6471], Loss: 2.6316, Perplexity: 13.8955 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [4026/6471], Loss: 2.4709, Perplexity: 11.8329 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 15, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [4027/6471], Loss: 2.6892, Perplexity: 14.7194 types **##    torch.cuda.FloatT

Epoch [2/3], Step [4095/6471], Loss: 2.7438, Perplexity: 15.5464 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 16, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [4096/6471], Loss: 2.8360, Perplexity: 17.0479 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [4097/6471], Loss: 2.6862, Perplexity: 14.6758 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [4098/6471], Loss: 2.6775, Perplexity: 14.5482 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 15, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [4099/6471], Loss: 2.6181, Perplexity: 13.7093 types **##    torch.cuda.FloatT

Epoch [2/3], Step [4167/6471], Loss: 2.8699, Perplexity: 17.6360 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [4168/6471], Loss: 2.6261, Perplexity: 13.8196 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [4169/6471], Loss: 2.8347, Perplexity: 17.0258 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [4170/6471], Loss: 2.4220, Perplexity: 11.2688 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [4171/6471], Loss: 2.6235, Perplexity: 13.7838 types **##    torch.cuda.FloatT

Epoch [2/3], Step [4239/6471], Loss: 2.5663, Perplexity: 13.0176 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [4240/6471], Loss: 2.6574, Perplexity: 14.2585 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 10, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [4241/6471], Loss: 2.8561, Perplexity: 17.3942 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 15, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [4242/6471], Loss: 2.6900, Perplexity: 14.7315 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 16, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [4243/6471], Loss: 2.8060, Perplexity: 16.5441 types **##    torch.cuda.FloatT

Epoch [2/3], Step [4311/6471], Loss: 2.5811, Perplexity: 13.2117 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [4312/6471], Loss: 2.5705, Perplexity: 13.0719 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [4313/6471], Loss: 2.4718, Perplexity: 11.8438 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [4314/6471], Loss: 2.5496, Perplexity: 12.8024 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [4315/6471], Loss: 2.5690, Perplexity: 13.0528 types **##    torch.cuda.FloatT

Epoch [2/3], Step [4383/6471], Loss: 2.4612, Perplexity: 11.7186 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [4384/6471], Loss: 2.6169, Perplexity: 13.6929 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [4385/6471], Loss: 2.5771, Perplexity: 13.1583 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 17, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [4386/6471], Loss: 2.9226, Perplexity: 18.5895 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [4387/6471], Loss: 2.6055, Perplexity: 13.5382 types **##    torch.cuda.FloatT

Epoch [2/3], Step [4455/6471], Loss: 2.4395, Perplexity: 11.4672 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [4456/6471], Loss: 2.7918, Perplexity: 16.3111 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [4457/6471], Loss: 2.4581, Perplexity: 11.6825 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [4458/6471], Loss: 2.4569, Perplexity: 11.6681 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [4459/6471], Loss: 2.5987, Perplexity: 13.4469 types **##    torch.cuda.FloatT

Epoch [2/3], Step [4527/6471], Loss: 3.0388, Perplexity: 20.8811 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [4528/6471], Loss: 2.6974, Perplexity: 14.8418 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 16, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [4529/6471], Loss: 2.7478, Perplexity: 15.6084 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [4530/6471], Loss: 2.5073, Perplexity: 12.2715 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [4531/6471], Loss: 2.7034, Perplexity: 14.9297 types **##    torch.cuda.FloatT

Epoch [2/3], Step [4599/6471], Loss: 2.7440, Perplexity: 15.5484 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 10, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [4600/6471], Loss: 2.9183, Perplexity: 18.5091
 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [4601/6471], Loss: 2.6430, Perplexity: 14.0549 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [4602/6471], Loss: 2.8051, Perplexity: 16.5285 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 17, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [4603/6471], Loss: 3.0861, Perplexity: 21.8905 types **##    torch.cuda.Float

Epoch [2/3], Step [4671/6471], Loss: 2.7354, Perplexity: 15.4158 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [4672/6471], Loss: 2.4365, Perplexity: 11.4328 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 20, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [4673/6471], Loss: 3.0517, Perplexity: 21.1518 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [4674/6471], Loss: 2.5861, Perplexity: 13.2780 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [4675/6471], Loss: 2.5612, Perplexity: 12.9508 types **##    torch.cuda.FloatT

Epoch [2/3], Step [4743/6471], Loss: 2.5629, Perplexity: 12.9737 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [4744/6471], Loss: 2.3280, Perplexity: 10.2571 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 15, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [4745/6471], Loss: 2.7943, Perplexity: 16.3511 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 15, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [4746/6471], Loss: 2.5019, Perplexity: 12.2062 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [4747/6471], Loss: 2.7212, Perplexity: 15.1986 types **##    torch.cuda.FloatT

Epoch [2/3], Step [4815/6471], Loss: 2.5131, Perplexity: 12.3432 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 15, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [4816/6471], Loss: 2.5561, Perplexity: 12.8858 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [4817/6471], Loss: 2.6462, Perplexity: 14.0997 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [4818/6471], Loss: 2.6977, Perplexity: 14.8462 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 15, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [4819/6471], Loss: 2.8838, Perplexity: 17.8823 types **##    torch.cuda.FloatT

Epoch [2/3], Step [4887/6471], Loss: 2.5460, Perplexity: 12.7555 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [4888/6471], Loss: 2.6074, Perplexity: 13.5643 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [4889/6471], Loss: 2.5053, Perplexity: 12.2477 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [4890/6471], Loss: 2.5698, Perplexity: 13.0629 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 10, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [4891/6471], Loss: 2.8972, Perplexity: 18.1226 types **##    torch.cuda.FloatT

Epoch [2/3], Step [4959/6471], Loss: 2.7208, Perplexity: 15.1918 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 16, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [4960/6471], Loss: 2.7716, Perplexity: 15.9847 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [4961/6471], Loss: 2.5963, Perplexity: 13.4145 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 17, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [4962/6471], Loss: 2.9450, Perplexity: 19.0106 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [4963/6471], Loss: 2.6996, Perplexity: 14.8734 types **##    torch.cuda.FloatT

Epoch [2/3], Step [5031/6471], Loss: 2.7072, Perplexity: 14.9880 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 16, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [5032/6471], Loss: 2.7529, Perplexity: 15.6873 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [5033/6471], Loss: 2.6155, Perplexity: 13.6742 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [5034/6471], Loss: 2.4104, Perplexity: 11.1382 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [5035/6471], Loss: 2.4589, Perplexity: 11.6923 types **##    torch.cuda.FloatT

Epoch [2/3], Step [5103/6471], Loss: 2.4382, Perplexity: 11.4528 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [5104/6471], Loss: 2.7032, Perplexity: 14.9279 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 15, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [5105/6471], Loss: 2.7728, Perplexity: 16.0036 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [5106/6471], Loss: 2.6233, Perplexity: 13.7813 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 16, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [5107/6471], Loss: 2.8928, Perplexity: 18.0441 types **##    torch.cuda.FloatT

Epoch [2/3], Step [5175/6471], Loss: 2.7303, Perplexity: 15.3369 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 18, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [5176/6471], Loss: 2.8575, Perplexity: 17.4174 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 15, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [5177/6471], Loss: 2.3973, Perplexity: 10.9938 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [5178/6471], Loss: 2.6078, Perplexity: 13.5685 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [5179/6471], Loss: 2.4486, Perplexity: 11.5721 types **##    torch.cuda.FloatT

Epoch [2/3], Step [5247/6471], Loss: 2.4761, Perplexity: 11.8953 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [5248/6471], Loss: 2.5652, Perplexity: 13.0034 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [5249/6471], Loss: 2.6831, Perplexity: 14.6303 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [5250/6471], Loss: 2.5618, Perplexity: 12.9589 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [5251/6471], Loss: 2.5213, Perplexity: 12.4450 types **##    torch.cuda.FloatT

Epoch [2/3], Step [5319/6471], Loss: 2.6511, Perplexity: 14.1693 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [5320/6471], Loss: 2.5320, Perplexity: 12.5785 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 18, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [5321/6471], Loss: 3.0205, Perplexity: 20.5016 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [5322/6471], Loss: 2.5958, Perplexity: 13.4068 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [5323/6471], Loss: 2.5579, Perplexity: 12.9084 types **##    torch.cuda.FloatT

Epoch [2/3], Step [5391/6471], Loss: 2.3724, Perplexity: 10.7236 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [5392/6471], Loss: 2.7273, Perplexity: 15.2923 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 20, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [5393/6471], Loss: 3.2247, Perplexity: 25.1453 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [5394/6471], Loss: 2.4154, Perplexity: 11.1940 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [5395/6471], Loss: 2.3515, Perplexity: 10.5017 types **##    torch.cuda.FloatT

Epoch [2/3], Step [5463/6471], Loss: 3.1674, Perplexity: 23.7446 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 18, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [5464/6471], Loss: 2.8368, Perplexity: 17.0603 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [5465/6471], Loss: 2.4867, Perplexity: 12.0219 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 10, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [5466/6471], Loss: 3.0788, Perplexity: 21.7313 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [5467/6471], Loss: 2.4964, Perplexity: 12.1383 types **##    torch.cuda.FloatT

Epoch [2/3], Step [5535/6471], Loss: 2.7085, Perplexity: 15.0071 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 15, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [5536/6471], Loss: 2.8304, Perplexity: 16.9530 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [5537/6471], Loss: 2.6164, Perplexity: 13.6868 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 16, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [5538/6471], Loss: 2.7171, Perplexity: 15.1360 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [5539/6471], Loss: 2.6891, Perplexity: 14.7191 types **##    torch.cuda.FloatT

Epoch [2/3], Step [5607/6471], Loss: 2.8688, Perplexity: 17.6156 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [5608/6471], Loss: 2.5530, Perplexity: 12.8462 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [5609/6471], Loss: 2.6320, Perplexity: 13.9009 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 15, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [5610/6471], Loss: 2.5436, Perplexity: 12.7258 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [5611/6471], Loss: 2.7929, Perplexity: 16.3277 types **##    torch.cuda.FloatT

Epoch [2/3], Step [5679/6471], Loss: 2.5436, Perplexity: 12.7254 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 10, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [5680/6471], Loss: 2.8391, Perplexity: 17.1006 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 15, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [5681/6471], Loss: 2.5725, Perplexity: 13.0986 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [5682/6471], Loss: 2.5418, Perplexity: 12.7027 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [5683/6471], Loss: 2.6884, Perplexity: 14.7084 types **##    torch.cuda.FloatT

Epoch [2/3], Step [5751/6471], Loss: 2.4507, Perplexity: 11.5969 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [5752/6471], Loss: 2.8054, Perplexity: 16.5344 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [5753/6471], Loss: 2.4695, Perplexity: 11.8169 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [5754/6471], Loss: 2.6425, Perplexity: 14.0484 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 21, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [5755/6471], Loss: 3.3013, Perplexity: 27.1470 types **##    torch.cuda.FloatT

Epoch [2/3], Step [5823/6471], Loss: 2.6325, Perplexity: 13.9090 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 16, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [5824/6471], Loss: 2.7642, Perplexity: 15.8666 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [5825/6471], Loss: 2.6053, Perplexity: 13.5355 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [5826/6471], Loss: 2.5134, Perplexity: 12.3464 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [5827/6471], Loss: 2.5191, Perplexity: 12.4174 types **##    torch.cuda.FloatT

Epoch [2/3], Step [5895/6471], Loss: 3.9917, Perplexity: 54.1444 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [5896/6471], Loss: 2.8074, Perplexity: 16.5674 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [5897/6471], Loss: 2.5992, Perplexity: 13.4531 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [5898/6471], Loss: 2.7067, Perplexity: 14.9796 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 17, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [5899/6471], Loss: 2.7845, Perplexity: 16.1917 types **##    torch.cuda.FloatT

Epoch [2/3], Step [5967/6471], Loss: 2.7170, Perplexity: 15.1347 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [5968/6471], Loss: 2.2625, Perplexity: 9.6069 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [5969/6471], Loss: 2.6721, Perplexity: 14.4707 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [5970/6471], Loss: 2.3805, Perplexity: 10.8108 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [5971/6471], Loss: 2.6134, Perplexity: 13.6449 types **##    torch.cuda.FloatTe

Epoch [2/3], Step [6039/6471], Loss: 2.5240, Perplexity: 12.4784 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [6040/6471], Loss: 2.4746, Perplexity: 11.8768 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 28, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [6041/6471], Loss: 3.8397, Perplexity: 46.5133 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [6042/6471], Loss: 2.4423, Perplexity: 11.4993 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [6043/6471], Loss: 2.6820, Perplexity: 14.6143 types **##    torch.cuda.FloatT

Epoch [2/3], Step [6111/6471], Loss: 2.4403, Perplexity: 11.4768 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [6112/6471], Loss: 2.5230, Perplexity: 12.4662 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [6113/6471], Loss: 2.5564, Perplexity: 12.8896 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [6114/6471], Loss: 2.6609, Perplexity: 14.3091 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [6115/6471], Loss: 2.6264, Perplexity: 13.8236 types **##    torch.cuda.FloatT

Epoch [2/3], Step [6183/6471], Loss: 2.7549, Perplexity: 15.7197 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [6184/6471], Loss: 2.6291, Perplexity: 13.8612 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 16, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [6185/6471], Loss: 2.6875, Perplexity: 14.6951 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [6186/6471], Loss: 2.3250, Perplexity: 10.2262 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 23, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [6187/6471], Loss: 3.4773, Perplexity: 32.3716 types **##    torch.cuda.FloatT

Epoch [2/3], Step [6255/6471], Loss: 2.6616, Perplexity: 14.3192 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 15, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [6256/6471], Loss: 2.6201, Perplexity: 13.7375 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [6257/6471], Loss: 2.6885, Perplexity: 14.7089 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [6258/6471], Loss: 2.5332, Perplexity: 12.5934 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [6259/6471], Loss: 2.9139, Perplexity: 18.4283 types **##    torch.cuda.FloatT

Epoch [2/3], Step [6327/6471], Loss: 2.7980, Perplexity: 16.4112 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [6328/6471], Loss: 2.5208, Perplexity: 12.4387 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 15, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [6329/6471], Loss: 2.7607, Perplexity: 15.8114 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [6330/6471], Loss: 2.4949, Perplexity: 12.1201 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 19, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [6331/6471], Loss: 3.1459, Perplexity: 23.2395 types **##    torch.cuda.FloatT

Epoch [2/3], Step [6399/6471], Loss: 2.4731, Perplexity: 11.8587 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [6400/6471], Loss: 2.4699, Perplexity: 11.8208
 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 16, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [6401/6471], Loss: 2.8104, Perplexity: 16.6157 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [6402/6471], Loss: 2.6887, Perplexity: 14.7132 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 17, 512]) torch.Size([64, 1, 512])
Epoch [2/3], Step [6403/6471], Loss: 2.8339, Perplexity: 17.0110 types **##    torch.cuda.Float

Epoch [2/3], Step [6471/6471], Loss: 2.8242, Perplexity: 16.8473 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [1/6471], Loss: 2.3906, Perplexity: 10.9201 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 16, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [2/6471], Loss: 2.7662, Perplexity: 15.8977 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [3/6471], Loss: 2.4854, Perplexity: 12.0061 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [4/6471], Loss: 2.5021, Perplexity: 12.2078 types **##    torch.cuda.FloatTensor torch.

Epoch [3/3], Step [37/6471], Loss: 2.6583, Perplexity: 14.2717 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [38/6471], Loss: 2.5535, Perplexity: 12.8514 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 15, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [39/6471], Loss: 2.5258, Perplexity: 12.5004 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [40/6471], Loss: 2.7803, Perplexity: 16.1236 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [41/6471], Loss: 2.4623, Perplexity: 11.7313 types **##    torch.cuda.FloatTensor torc

Epoch [3/3], Step [74/6471], Loss: 2.4139, Perplexity: 11.1776 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [75/6471], Loss: 2.3257, Perplexity: 10.2339 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 16, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [76/6471], Loss: 2.8370, Perplexity: 17.0643 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [77/6471], Loss: 2.4915, Perplexity: 12.0793 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 19, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [78/6471], Loss: 3.1834, Perplexity: 24.1296 types **##    torch.cuda.FloatTensor torc

Epoch [3/3], Step [111/6471], Loss: 2.4726, Perplexity: 11.8529 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 10, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [112/6471], Loss: 2.7260, Perplexity: 15.2716 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 21, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [113/6471], Loss: 3.3188, Perplexity: 27.6286 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 10, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [114/6471], Loss: 2.7742, Perplexity: 16.0260 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 18, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [115/6471], Loss: 2.9728, Perplexity: 19.5474 types **##    torch.cuda.FloatTensor

Epoch [3/3], Step [148/6471], Loss: 2.4765, Perplexity: 11.8993 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [149/6471], Loss: 2.6101, Perplexity: 13.6002 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 15, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [150/6471], Loss: 2.6229, Perplexity: 13.7757 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [151/6471], Loss: 2.3829, Perplexity: 10.8359 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 16, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [152/6471], Loss: 2.7017, Perplexity: 14.9057 types **##    torch.cuda.FloatTensor

Epoch [3/3], Step [185/6471], Loss: 2.4789, Perplexity: 11.9287 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [186/6471], Loss: 2.4962, Perplexity: 12.1368 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [187/6471], Loss: 2.5906, Perplexity: 13.3372 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [188/6471], Loss: 2.5934, Perplexity: 13.3749 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 15, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [189/6471], Loss: 2.5756, Perplexity: 13.1392 types **##    torch.cuda.FloatTensor

Epoch [3/3], Step [222/6471], Loss: 2.5652, Perplexity: 13.0028 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [223/6471], Loss: 2.1839, Perplexity: 8.8805 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [224/6471], Loss: 2.5063, Perplexity: 12.2598 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [225/6471], Loss: 2.4225, Perplexity: 11.2743 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 17, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [226/6471], Loss: 2.9906, Perplexity: 19.8980 types **##    torch.cuda.FloatTensor 

Epoch [3/3], Step [259/6471], Loss: 2.7842, Perplexity: 16.1871 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [260/6471], Loss: 2.5676, Perplexity: 13.0342 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [261/6471], Loss: 2.3639, Perplexity: 10.6322 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [262/6471], Loss: 2.5709, Perplexity: 13.0780 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [263/6471], Loss: 2.6515, Perplexity: 14.1759 types **##    torch.cuda.FloatTensor

Epoch [3/3], Step [296/6471], Loss: 2.4428, Perplexity: 11.5054 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [297/6471], Loss: 2.4797, Perplexity: 11.9382 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [298/6471], Loss: 2.5113, Perplexity: 12.3211 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [299/6471], Loss: 2.5142, Perplexity: 12.3567 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 15, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [300/6471], Loss: 2.7431, Perplexity: 15.5344
 types **##    torch.cuda.FloatTenso

Epoch [3/3], Step [333/6471], Loss: 2.5143, Perplexity: 12.3580 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 25, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [334/6471], Loss: 3.4229, Perplexity: 30.6570 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [335/6471], Loss: 2.6280, Perplexity: 13.8456 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [336/6471], Loss: 2.5260, Perplexity: 12.5037 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [337/6471], Loss: 2.6846, Perplexity: 14.6524 types **##    torch.cuda.FloatTensor

Epoch [3/3], Step [370/6471], Loss: 2.5446, Perplexity: 12.7378 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [371/6471], Loss: 2.6314, Perplexity: 13.8936 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [372/6471], Loss: 2.5945, Perplexity: 13.3905 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [373/6471], Loss: 2.6679, Perplexity: 14.4092 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [374/6471], Loss: 2.2806, Perplexity: 9.7824 types **##    torch.cuda.FloatTensor 

Epoch [3/3], Step [407/6471], Loss: 2.4263, Perplexity: 11.3171 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [408/6471], Loss: 2.4272, Perplexity: 11.3266 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [409/6471], Loss: 2.5805, Perplexity: 13.2031 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [410/6471], Loss: 2.4364, Perplexity: 11.4318 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [411/6471], Loss: 2.3467, Perplexity: 10.4508 types **##    torch.cuda.FloatTensor

Epoch [3/3], Step [444/6471], Loss: 2.7095, Perplexity: 15.0213 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [445/6471], Loss: 2.3774, Perplexity: 10.7764 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 18, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [446/6471], Loss: 2.9720, Perplexity: 19.5318 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 19, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [447/6471], Loss: 3.0253, Perplexity: 20.6002 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 15, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [448/6471], Loss: 2.6925, Perplexity: 14.7686 types **##    torch.cuda.FloatTensor

Epoch [3/3], Step [481/6471], Loss: 2.5910, Perplexity: 13.3436 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [482/6471], Loss: 2.4082, Perplexity: 11.1136 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [483/6471], Loss: 2.4836, Perplexity: 11.9838 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [484/6471], Loss: 2.5143, Perplexity: 12.3577 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [485/6471], Loss: 2.5751, Perplexity: 13.1329 types **##    torch.cuda.FloatTensor

Epoch [3/3], Step [518/6471], Loss: 2.4282, Perplexity: 11.3380 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 22, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [519/6471], Loss: 3.3338, Perplexity: 28.0459 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [520/6471], Loss: 2.3997, Perplexity: 11.0201 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [521/6471], Loss: 2.5810, Perplexity: 13.2106 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [522/6471], Loss: 2.4096, Perplexity: 11.1292 types **##    torch.cuda.FloatTensor

Epoch [3/3], Step [555/6471], Loss: 2.6004, Perplexity: 13.4693 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [556/6471], Loss: 2.7706, Perplexity: 15.9686 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 15, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [557/6471], Loss: 2.5705, Perplexity: 13.0722 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [558/6471], Loss: 2.6804, Perplexity: 14.5908 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 15, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [559/6471], Loss: 2.7825, Perplexity: 16.1588 types **##    torch.cuda.FloatTensor

Epoch [3/3], Step [592/6471], Loss: 2.4192, Perplexity: 11.2364 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [593/6471], Loss: 2.3028, Perplexity: 10.0021 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [594/6471], Loss: 2.5533, Perplexity: 12.8490 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [595/6471], Loss: 2.5051, Perplexity: 12.2449 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 10, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [596/6471], Loss: 2.8948, Perplexity: 18.0795 types **##    torch.cuda.FloatTensor

Epoch [3/3], Step [629/6471], Loss: 2.5080, Perplexity: 12.2798 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 17, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [630/6471], Loss: 2.7826, Perplexity: 16.1611 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [631/6471], Loss: 2.6634, Perplexity: 14.3456 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [632/6471], Loss: 2.7745, Perplexity: 16.0309 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [633/6471], Loss: 2.7252, Perplexity: 15.2598 types **##    torch.cuda.FloatTensor

Epoch [3/3], Step [666/6471], Loss: 2.6556, Perplexity: 14.2332 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [667/6471], Loss: 2.5359, Perplexity: 12.6274 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [668/6471], Loss: 2.5040, Perplexity: 12.2317 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [669/6471], Loss: 2.4355, Perplexity: 11.4219 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [670/6471], Loss: 2.6163, Perplexity: 13.6845 types **##    torch.cuda.FloatTensor

Epoch [3/3], Step [703/6471], Loss: 2.2833, Perplexity: 9.8089 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [704/6471], Loss: 2.5932, Perplexity: 13.3727 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [705/6471], Loss: 2.4585, Perplexity: 11.6870 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 9, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [706/6471], Loss: 2.7144, Perplexity: 15.0959 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [707/6471], Loss: 2.4612, Perplexity: 11.7188 types **##    torch.cuda.FloatTensor t

Epoch [3/3], Step [740/6471], Loss: 2.5597, Perplexity: 12.9317 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 17, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [741/6471], Loss: 2.8715, Perplexity: 17.6633 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 10, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [742/6471], Loss: 2.6806, Perplexity: 14.5943 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 15, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [743/6471], Loss: 2.5652, Perplexity: 13.0033 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [744/6471], Loss: 2.5536, Perplexity: 12.8527 types **##    torch.cuda.FloatTensor

Epoch [3/3], Step [777/6471], Loss: 2.6566, Perplexity: 14.2475 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [778/6471], Loss: 2.4551, Perplexity: 11.6471 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [779/6471], Loss: 2.5001, Perplexity: 12.1835 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [780/6471], Loss: 2.5274, Perplexity: 12.5207 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 18, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [781/6471], Loss: 2.8253, Perplexity: 16.8656 types **##    torch.cuda.FloatTensor

Epoch [3/3], Step [814/6471], Loss: 2.7708, Perplexity: 15.9720 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [815/6471], Loss: 2.3375, Perplexity: 10.3549 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [816/6471], Loss: 2.3895, Perplexity: 10.9077 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [817/6471], Loss: 2.5708, Perplexity: 13.0761 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 21, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [818/6471], Loss: 3.2316, Perplexity: 25.3198 types **##    torch.cuda.FloatTensor

Epoch [3/3], Step [851/6471], Loss: 2.6750, Perplexity: 14.5129 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [852/6471], Loss: 2.6370, Perplexity: 13.9705 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [853/6471], Loss: 2.7698, Perplexity: 15.9549 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [854/6471], Loss: 2.4512, Perplexity: 11.6024 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 21, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [855/6471], Loss: 3.3271, Perplexity: 27.8579 types **##    torch.cuda.FloatTensor

Epoch [3/3], Step [888/6471], Loss: 2.4002, Perplexity: 11.0252 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [889/6471], Loss: 2.7210, Perplexity: 15.1955 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 15, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [890/6471], Loss: 2.7064, Perplexity: 14.9758 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [891/6471], Loss: 2.4360, Perplexity: 11.4272 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [892/6471], Loss: 2.6420, Perplexity: 14.0415 types **##    torch.cuda.FloatTensor

Epoch [3/3], Step [925/6471], Loss: 2.3581, Perplexity: 10.5710 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [926/6471], Loss: 2.4403, Perplexity: 11.4759 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [927/6471], Loss: 2.5826, Perplexity: 13.2312 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 17, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [928/6471], Loss: 2.8398, Perplexity: 17.1127 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [929/6471], Loss: 2.5640, Perplexity: 12.9883 types **##    torch.cuda.FloatTensor

Epoch [3/3], Step [962/6471], Loss: 3.0918, Perplexity: 22.0158 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [963/6471], Loss: 2.4463, Perplexity: 11.5459 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [964/6471], Loss: 2.6678, Perplexity: 14.4078 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [965/6471], Loss: 2.5928, Perplexity: 13.3672 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [966/6471], Loss: 2.4937, Perplexity: 12.1064 types **##    torch.cuda.FloatTensor

Epoch [3/3], Step [999/6471], Loss: 2.5793, Perplexity: 13.1883 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [1000/6471], Loss: 2.6450, Perplexity: 14.0836
 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 10, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [1001/6471], Loss: 2.8242, Perplexity: 16.8482 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 16, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [1002/6471], Loss: 2.7386, Perplexity: 15.4657 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 16, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [1003/6471], Loss: 2.8595, Perplexity: 17.4525 types **##    torch.cuda.FloatT

Epoch [3/3], Step [1071/6471], Loss: 2.6881, Perplexity: 14.7033 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 15, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [1072/6471], Loss: 2.7194, Perplexity: 15.1719 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [1073/6471], Loss: 2.5887, Perplexity: 13.3120 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [1074/6471], Loss: 2.5205, Perplexity: 12.4347 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [1075/6471], Loss: 2.6535, Perplexity: 14.2036 types **##    torch.cuda.FloatT

Epoch [3/3], Step [1143/6471], Loss: 2.5744, Perplexity: 13.1236 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [1144/6471], Loss: 2.5525, Perplexity: 12.8396 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 15, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [1145/6471], Loss: 2.6571, Perplexity: 14.2548 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [1146/6471], Loss: 2.6124, Perplexity: 13.6316 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [1147/6471], Loss: 2.5526, Perplexity: 12.8408 types **##    torch.cuda.FloatT

Epoch [3/3], Step [1215/6471], Loss: 2.5751, Perplexity: 13.1326 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 10, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [1216/6471], Loss: 2.9154, Perplexity: 18.4566 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [1217/6471], Loss: 2.5281, Perplexity: 12.5299 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [1218/6471], Loss: 2.4144, Perplexity: 11.1827 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [1219/6471], Loss: 2.7948, Perplexity: 16.3597 types **##    torch.cuda.FloatT

Epoch [3/3], Step [1287/6471], Loss: 2.5900, Perplexity: 13.3296 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [1288/6471], Loss: 2.6107, Perplexity: 13.6089 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [1289/6471], Loss: 2.5280, Perplexity: 12.5284 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [1290/6471], Loss: 2.4447, Perplexity: 11.5272 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 15, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [1291/6471], Loss: 2.6565, Perplexity: 14.2468 types **##    torch.cuda.FloatT

Epoch [3/3], Step [1359/6471], Loss: 2.8541, Perplexity: 17.3589 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [1360/6471], Loss: 2.5007, Perplexity: 12.1913 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [1361/6471], Loss: 2.5170, Perplexity: 12.3909 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [1362/6471], Loss: 2.5776, Perplexity: 13.1658 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [1363/6471], Loss: 2.4528, Perplexity: 11.6207 types **##    torch.cuda.FloatT

Epoch [3/3], Step [1431/6471], Loss: 2.4710, Perplexity: 11.8346 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [1432/6471], Loss: 2.3837, Perplexity: 10.8444 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [1433/6471], Loss: 2.6152, Perplexity: 13.6696 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [1434/6471], Loss: 2.7503, Perplexity: 15.6470 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [1435/6471], Loss: 2.5112, Perplexity: 12.3197 types **##    torch.cuda.FloatT

Epoch [3/3], Step [1503/6471], Loss: 2.4605, Perplexity: 11.7109 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [1504/6471], Loss: 2.4886, Perplexity: 12.0441 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 16, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [1505/6471], Loss: 2.7085, Perplexity: 15.0069 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [1506/6471], Loss: 2.6900, Perplexity: 14.7324 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [1507/6471], Loss: 2.2453, Perplexity: 9.4433 types **##    torch.cuda.FloatTe

Epoch [3/3], Step [1575/6471], Loss: 2.6014, Perplexity: 13.4821 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [1576/6471], Loss: 2.5293, Perplexity: 12.5452 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [1577/6471], Loss: 2.4586, Perplexity: 11.6886 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [1578/6471], Loss: 2.7182, Perplexity: 15.1530 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 21, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [1579/6471], Loss: 3.1858, Perplexity: 24.1863 types **##    torch.cuda.FloatT

Epoch [3/3], Step [1647/6471], Loss: 2.3145, Perplexity: 10.1203 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [1648/6471], Loss: 2.5514, Perplexity: 12.8248 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [1649/6471], Loss: 2.2281, Perplexity: 9.2822 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [1650/6471], Loss: 2.6268, Perplexity: 13.8292 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 10, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [1651/6471], Loss: 3.0327, Perplexity: 20.7528 types **##    torch.cuda.FloatTe

Epoch [3/3], Step [1719/6471], Loss: 2.6604, Perplexity: 14.3027 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [1720/6471], Loss: 2.5164, Perplexity: 12.3839 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [1721/6471], Loss: 2.3272, Perplexity: 10.2489 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 19, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [1722/6471], Loss: 3.0812, Perplexity: 21.7842 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 16, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [1723/6471], Loss: 2.6353, Perplexity: 13.9473 types **##    torch.cuda.FloatT

Epoch [3/3], Step [1791/6471], Loss: 2.3911, Perplexity: 10.9255 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [1792/6471], Loss: 2.5798, Perplexity: 13.1938 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [1793/6471], Loss: 2.4877, Perplexity: 12.0338 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 15, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [1794/6471], Loss: 2.5872, Perplexity: 13.2929 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [1795/6471], Loss: 2.6013, Perplexity: 13.4815 types **##    torch.cuda.FloatT

Epoch [3/3], Step [1863/6471], Loss: 2.8799, Perplexity: 17.8124 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [1864/6471], Loss: 2.6428, Perplexity: 14.0518 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [1865/6471], Loss: 2.5811, Perplexity: 13.2117 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [1866/6471], Loss: 2.6531, Perplexity: 14.1985 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [1867/6471], Loss: 2.5790, Perplexity: 13.1841 types **##    torch.cuda.FloatT

Epoch [3/3], Step [1935/6471], Loss: 2.8681, Perplexity: 17.6036 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 15, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [1936/6471], Loss: 2.5783, Perplexity: 13.1748 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [1937/6471], Loss: 2.4179, Perplexity: 11.2227 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [1938/6471], Loss: 2.4820, Perplexity: 11.9648 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [1939/6471], Loss: 2.7026, Perplexity: 14.9184 types **##    torch.cuda.FloatT

Epoch [3/3], Step [2007/6471], Loss: 3.1316, Perplexity: 22.9114 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 15, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [2008/6471], Loss: 2.4692, Perplexity: 11.8127 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 17, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [2009/6471], Loss: 2.8718, Perplexity: 17.6683 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [2010/6471], Loss: 2.5344, Perplexity: 12.6085 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [2011/6471], Loss: 2.5454, Perplexity: 12.7478 types **##    torch.cuda.FloatT

Epoch [3/3], Step [2079/6471], Loss: 2.4682, Perplexity: 11.8010 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [2080/6471], Loss: 2.5468, Perplexity: 12.7663 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [2081/6471], Loss: 2.4858, Perplexity: 12.0108 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [2082/6471], Loss: 2.5066, Perplexity: 12.2632 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [2083/6471], Loss: 2.5281, Perplexity: 12.5292 types **##    torch.cuda.FloatT

Epoch [3/3], Step [2151/6471], Loss: 2.7181, Perplexity: 15.1509 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 24, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [2152/6471], Loss: 3.5583, Perplexity: 35.1039 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [2153/6471], Loss: 2.4405, Perplexity: 11.4792 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [2154/6471], Loss: 2.3408, Perplexity: 10.3891 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [2155/6471], Loss: 2.6861, Perplexity: 14.6748 types **##    torch.cuda.FloatT

Epoch [3/3], Step [2223/6471], Loss: 2.3820, Perplexity: 10.8266 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 15, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [2224/6471], Loss: 2.6453, Perplexity: 14.0875 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [2225/6471], Loss: 2.6882, Perplexity: 14.7049 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 10, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [2226/6471], Loss: 2.7905, Perplexity: 16.2885 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 15, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [2227/6471], Loss: 2.4548, Perplexity: 11.6445 types **##    torch.cuda.FloatT

Epoch [3/3], Step [2295/6471], Loss: 2.5570, Perplexity: 12.8968 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [2296/6471], Loss: 2.3781, Perplexity: 10.7839 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [2297/6471], Loss: 2.5978, Perplexity: 13.4337 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [2298/6471], Loss: 2.4921, Perplexity: 12.0871 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 16, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [2299/6471], Loss: 2.6882, Perplexity: 14.7046 types **##    torch.cuda.FloatT

Epoch [3/3], Step [2367/6471], Loss: 2.5932, Perplexity: 13.3731 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [2368/6471], Loss: 2.3914, Perplexity: 10.9284 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [2369/6471], Loss: 2.4195, Perplexity: 11.2399 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [2370/6471], Loss: 2.6346, Perplexity: 13.9371 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [2371/6471], Loss: 2.4910, Perplexity: 12.0738 types **##    torch.cuda.FloatT

Epoch [3/3], Step [2439/6471], Loss: 2.4343, Perplexity: 11.4084 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [2440/6471], Loss: 2.4918, Perplexity: 12.0827 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [2441/6471], Loss: 2.3811, Perplexity: 10.8167 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 17, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [2442/6471], Loss: 3.0107, Perplexity: 20.3008 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [2443/6471], Loss: 2.3591, Perplexity: 10.5818 types **##    torch.cuda.FloatT

Epoch [3/3], Step [2511/6471], Loss: 2.3223, Perplexity: 10.1992 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [2512/6471], Loss: 2.6723, Perplexity: 14.4737 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [2513/6471], Loss: 2.6558, Perplexity: 14.2358 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [2514/6471], Loss: 2.5900, Perplexity: 13.3299 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [2515/6471], Loss: 2.5672, Perplexity: 13.0296 types **##    torch.cuda.FloatT

Epoch [3/3], Step [2583/6471], Loss: 3.0196, Perplexity: 20.4832 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 15, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [2584/6471], Loss: 2.5716, Perplexity: 13.0867 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [2585/6471], Loss: 2.6247, Perplexity: 13.8006 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [2586/6471], Loss: 2.6187, Perplexity: 13.7184 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [2587/6471], Loss: 2.3145, Perplexity: 10.1195 types **##    torch.cuda.FloatT

Epoch [3/3], Step [2655/6471], Loss: 2.3793, Perplexity: 10.7971 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 10, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [2656/6471], Loss: 2.9756, Perplexity: 19.6015 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 15, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [2657/6471], Loss: 2.5648, Perplexity: 12.9987 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [2658/6471], Loss: 2.7522, Perplexity: 15.6778 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [2659/6471], Loss: 2.5998, Perplexity: 13.4604 types **##    torch.cuda.FloatT

Epoch [3/3], Step [2727/6471], Loss: 2.5311, Perplexity: 12.5667 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [2728/6471], Loss: 2.4495, Perplexity: 11.5827 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [2729/6471], Loss: 2.5362, Perplexity: 12.6314 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [2730/6471], Loss: 2.5193, Perplexity: 12.4200 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [2731/6471], Loss: 2.3744, Perplexity: 10.7442 types **##    torch.cuda.FloatT

Epoch [3/3], Step [2799/6471], Loss: 2.3241, Perplexity: 10.2171 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 16, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [2800/6471], Loss: 2.8605, Perplexity: 17.4705
 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [2801/6471], Loss: 2.5136, Perplexity: 12.3497 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [2802/6471], Loss: 2.4560, Perplexity: 11.6580 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [2803/6471], Loss: 2.4189, Perplexity: 11.2331 types **##    torch.cuda.Float

Epoch [3/3], Step [2871/6471], Loss: 2.4849, Perplexity: 11.9998 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [2872/6471], Loss: 2.3556, Perplexity: 10.5440 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 18, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [2873/6471], Loss: 2.9128, Perplexity: 18.4087 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [2874/6471], Loss: 2.4241, Perplexity: 11.2923 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [2875/6471], Loss: 2.2603, Perplexity: 9.5856 types **##    torch.cuda.FloatTe

Epoch [3/3], Step [2943/6471], Loss: 2.3958, Perplexity: 10.9767 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [2944/6471], Loss: 2.6316, Perplexity: 13.8957 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [2945/6471], Loss: 2.5603, Perplexity: 12.9391 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [2946/6471], Loss: 2.3853, Perplexity: 10.8624 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [2947/6471], Loss: 2.3802, Perplexity: 10.8071 types **##    torch.cuda.FloatT

Epoch [3/3], Step [3015/6471], Loss: 2.5080, Perplexity: 12.2803 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [3016/6471], Loss: 2.4885, Perplexity: 12.0433 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [3017/6471], Loss: 2.4754, Perplexity: 11.8859 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [3018/6471], Loss: 2.4476, Perplexity: 11.5604 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [3019/6471], Loss: 2.8627, Perplexity: 17.5094 types **##    torch.cuda.FloatT

Epoch [3/3], Step [3087/6471], Loss: 2.6357, Perplexity: 13.9528 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [3088/6471], Loss: 2.3987, Perplexity: 11.0093 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 10, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [3089/6471], Loss: 2.8422, Perplexity: 17.1538 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 16, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [3090/6471], Loss: 2.7179, Perplexity: 15.1491 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [3091/6471], Loss: 2.4334, Perplexity: 11.3980 types **##    torch.cuda.FloatT

Epoch [3/3], Step [3159/6471], Loss: 2.8116, Perplexity: 16.6359 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [3160/6471], Loss: 2.5585, Perplexity: 12.9164 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 20, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [3161/6471], Loss: 3.2155, Perplexity: 24.9164 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 20, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [3162/6471], Loss: 3.0835, Perplexity: 21.8345 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [3163/6471], Loss: 2.5913, Perplexity: 13.3475 types **##    torch.cuda.FloatT

Epoch [3/3], Step [3231/6471], Loss: 2.3071, Perplexity: 10.0449 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [3232/6471], Loss: 2.6272, Perplexity: 13.8344 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [3233/6471], Loss: 2.4345, Perplexity: 11.4098 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [3234/6471], Loss: 2.6032, Perplexity: 13.5075 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [3235/6471], Loss: 2.7432, Perplexity: 15.5373 types **##    torch.cuda.FloatT

Epoch [3/3], Step [3303/6471], Loss: 2.5618, Perplexity: 12.9587 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 15, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [3304/6471], Loss: 2.7343, Perplexity: 15.3986 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [3305/6471], Loss: 2.4814, Perplexity: 11.9584 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [3306/6471], Loss: 2.4499, Perplexity: 11.5873 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [3307/6471], Loss: 2.5181, Perplexity: 12.4053 types **##    torch.cuda.FloatT

Epoch [3/3], Step [3375/6471], Loss: 2.3478, Perplexity: 10.4625 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 15, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [3376/6471], Loss: 2.6644, Perplexity: 14.3591 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 16, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [3377/6471], Loss: 2.7144, Perplexity: 15.0950 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [3378/6471], Loss: 2.4975, Perplexity: 12.1525 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [3379/6471], Loss: 2.3715, Perplexity: 10.7137 types **##    torch.cuda.FloatT

Epoch [3/3], Step [3447/6471], Loss: 2.3467, Perplexity: 10.4513 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 17, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [3448/6471], Loss: 2.7152, Perplexity: 15.1072 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [3449/6471], Loss: 2.4402, Perplexity: 11.4759 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [3450/6471], Loss: 2.5123, Perplexity: 12.3332 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [3451/6471], Loss: 2.4485, Perplexity: 11.5706 types **##    torch.cuda.FloatT

Epoch [3/3], Step [3519/6471], Loss: 2.3504, Perplexity: 10.4900 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [3520/6471], Loss: 2.4718, Perplexity: 11.8433 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [3521/6471], Loss: 2.3702, Perplexity: 10.6998 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [3522/6471], Loss: 2.5317, Perplexity: 12.5747 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [3523/6471], Loss: 2.3032, Perplexity: 10.0066 types **##    torch.cuda.FloatT

Epoch [3/3], Step [3591/6471], Loss: 2.5099, Perplexity: 12.3033 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [3592/6471], Loss: 2.5447, Perplexity: 12.7396 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 18, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [3593/6471], Loss: 2.9399, Perplexity: 18.9131 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [3594/6471], Loss: 2.4176, Perplexity: 11.2191 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 15, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [3595/6471], Loss: 2.6001, Perplexity: 13.4649 types **##    torch.cuda.FloatT

Epoch [3/3], Step [3663/6471], Loss: 2.5377, Perplexity: 12.6507 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [3664/6471], Loss: 2.4631, Perplexity: 11.7408 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [3665/6471], Loss: 2.6668, Perplexity: 14.3941 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 10, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [3666/6471], Loss: 2.6838, Perplexity: 14.6410 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [3667/6471], Loss: 2.3972, Perplexity: 10.9922 types **##    torch.cuda.FloatT

Epoch [3/3], Step [3735/6471], Loss: 2.5973, Perplexity: 13.4274 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [3736/6471], Loss: 2.5926, Perplexity: 13.3649 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [3737/6471], Loss: 2.4601, Perplexity: 11.7056 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [3738/6471], Loss: 2.5441, Perplexity: 12.7320 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [3739/6471], Loss: 2.4541, Perplexity: 11.6360 types **##    torch.cuda.FloatT

Epoch [3/3], Step [3807/6471], Loss: 2.7687, Perplexity: 15.9377 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 10, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [3808/6471], Loss: 2.6738, Perplexity: 14.4953 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 16, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [3809/6471], Loss: 2.7625, Perplexity: 15.8398 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 10, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [3810/6471], Loss: 2.8421, Perplexity: 17.1520 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [3811/6471], Loss: 2.6823, Perplexity: 14.6192 types **##    torch.cuda.FloatT

Epoch [3/3], Step [3879/6471], Loss: 2.5753, Perplexity: 13.1348 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 16, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [3880/6471], Loss: 2.7393, Perplexity: 15.4761 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [3881/6471], Loss: 2.6469, Perplexity: 14.1098 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 10, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [3882/6471], Loss: 2.7680, Perplexity: 15.9269 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 17, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [3883/6471], Loss: 2.7949, Perplexity: 16.3603 types **##    torch.cuda.FloatT

Epoch [3/3], Step [3951/6471], Loss: 2.3413, Perplexity: 10.3944 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [3952/6471], Loss: 2.4872, Perplexity: 12.0274 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [3953/6471], Loss: 2.6455, Perplexity: 14.0904 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 17, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [3954/6471], Loss: 2.8218, Perplexity: 16.8063 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [3955/6471], Loss: 2.3401, Perplexity: 10.3818 types **##    torch.cuda.FloatT

Epoch [3/3], Step [4023/6471], Loss: 2.7557, Perplexity: 15.7321 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 10, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [4024/6471], Loss: 2.7911, Perplexity: 16.2982 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [4025/6471], Loss: 2.6003, Perplexity: 13.4683 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [4026/6471], Loss: 2.4656, Perplexity: 11.7707 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [4027/6471], Loss: 2.6611, Perplexity: 14.3121 types **##    torch.cuda.FloatT

Epoch [3/3], Step [4095/6471], Loss: 2.3843, Perplexity: 10.8517 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [4096/6471], Loss: 2.5468, Perplexity: 12.7662 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 17, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [4097/6471], Loss: 2.8909, Perplexity: 18.0100 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 10, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [4098/6471], Loss: 2.8756, Perplexity: 17.7369 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 15, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [4099/6471], Loss: 2.6380, Perplexity: 13.9857 types **##    torch.cuda.FloatT

Epoch [3/3], Step [4167/6471], Loss: 2.2964, Perplexity: 9.9384 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [4168/6471], Loss: 2.5949, Perplexity: 13.3951 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [4169/6471], Loss: 2.6536, Perplexity: 14.2051 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 15, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [4170/6471], Loss: 2.6948, Perplexity: 14.8020 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 16, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [4171/6471], Loss: 2.4685, Perplexity: 11.8049 types **##    torch.cuda.FloatTe

Epoch [3/3], Step [4239/6471], Loss: 2.6921, Perplexity: 14.7632 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [4240/6471], Loss: 2.3627, Perplexity: 10.6192 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [4241/6471], Loss: 2.5727, Perplexity: 13.1011 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [4242/6471], Loss: 2.5842, Perplexity: 13.2523 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [4243/6471], Loss: 2.4328, Perplexity: 11.3905 types **##    torch.cuda.FloatT

Epoch [3/3], Step [4311/6471], Loss: 2.6220, Perplexity: 13.7632 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 10, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [4312/6471], Loss: 2.6232, Perplexity: 13.7802 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [4313/6471], Loss: 2.4985, Perplexity: 12.1644 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [4314/6471], Loss: 2.4327, Perplexity: 11.3892 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [4315/6471], Loss: 2.4724, Perplexity: 11.8506 types **##    torch.cuda.FloatT

Epoch [3/3], Step [4383/6471], Loss: 2.5961, Perplexity: 13.4109 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [4384/6471], Loss: 2.3229, Perplexity: 10.2053 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 15, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [4385/6471], Loss: 2.3541, Perplexity: 10.5289 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [4386/6471], Loss: 2.4846, Perplexity: 11.9967 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [4387/6471], Loss: 2.6298, Perplexity: 13.8713 types **##    torch.cuda.FloatT

Epoch [3/3], Step [4455/6471], Loss: 2.6562, Perplexity: 14.2425 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [4456/6471], Loss: 2.4370, Perplexity: 11.4383 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [4457/6471], Loss: 2.5302, Perplexity: 12.5566 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [4458/6471], Loss: 2.5731, Perplexity: 13.1070 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [4459/6471], Loss: 2.5137, Perplexity: 12.3504 types **##    torch.cuda.FloatT

Epoch [3/3], Step [4527/6471], Loss: 2.5172, Perplexity: 12.3936 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [4528/6471], Loss: 2.4572, Perplexity: 11.6724 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [4529/6471], Loss: 2.6487, Perplexity: 14.1361 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [4530/6471], Loss: 2.7117, Perplexity: 15.0543 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [4531/6471], Loss: 2.6175, Perplexity: 13.7015 types **##    torch.cuda.FloatT

Epoch [3/3], Step [4599/6471], Loss: 2.3119, Perplexity: 10.0941 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [4600/6471], Loss: 2.3382, Perplexity: 10.3625
 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 15, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [4601/6471], Loss: 2.5277, Perplexity: 12.5243 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [4602/6471], Loss: 2.5829, Perplexity: 13.2361 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [4603/6471], Loss: 2.5943, Perplexity: 13.3879 types **##    torch.cuda.Float

Epoch [3/3], Step [4671/6471], Loss: 2.5499, Perplexity: 12.8054 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [4672/6471], Loss: 2.4866, Perplexity: 12.0205 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [4673/6471], Loss: 2.4123, Perplexity: 11.1595 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [4674/6471], Loss: 2.1640, Perplexity: 8.7061 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [4675/6471], Loss: 2.4305, Perplexity: 11.3648 types **##    torch.cuda.FloatTe

Epoch [3/3], Step [4743/6471], Loss: 2.8331, Perplexity: 16.9981 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 17, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [4744/6471], Loss: 2.7414, Perplexity: 15.5089 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [4745/6471], Loss: 2.2898, Perplexity: 9.8727 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [4746/6471], Loss: 2.5731, Perplexity: 13.1065 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [4747/6471], Loss: 2.7400, Perplexity: 15.4863 types **##    torch.cuda.FloatTe

Epoch [3/3], Step [4815/6471], Loss: 2.5646, Perplexity: 12.9954 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [4816/6471], Loss: 2.5590, Perplexity: 12.9234 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [4817/6471], Loss: 2.5187, Perplexity: 12.4119 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [4818/6471], Loss: 2.5795, Perplexity: 13.1900 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [4819/6471], Loss: 2.4099, Perplexity: 11.1332 types **##    torch.cuda.FloatT

Epoch [3/3], Step [4887/6471], Loss: 2.4574, Perplexity: 11.6744 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 24, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [4888/6471], Loss: 3.6567, Perplexity: 38.7347 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [4889/6471], Loss: 2.5997, Perplexity: 13.4595 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [4890/6471], Loss: 2.4175, Perplexity: 11.2179 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [4891/6471], Loss: 2.5077, Perplexity: 12.2763 types **##    torch.cuda.FloatT

Epoch [3/3], Step [4959/6471], Loss: 2.4730, Perplexity: 11.8581 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 15, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [4960/6471], Loss: 2.5255, Perplexity: 12.4977 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [4961/6471], Loss: 2.4692, Perplexity: 11.8129 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [4962/6471], Loss: 2.4217, Perplexity: 11.2647 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [4963/6471], Loss: 2.5708, Perplexity: 13.0768 types **##    torch.cuda.FloatT

Epoch [3/3], Step [5031/6471], Loss: 2.6379, Perplexity: 13.9844 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [5032/6471], Loss: 2.4741, Perplexity: 11.8714 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [5033/6471], Loss: 2.5053, Perplexity: 12.2471 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 25, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [5034/6471], Loss: 3.5390, Perplexity: 34.4327 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 16, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [5035/6471], Loss: 2.7044, Perplexity: 14.9447 types **##    torch.cuda.FloatT

Epoch [3/3], Step [5103/6471], Loss: 2.4402, Perplexity: 11.4750 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [5104/6471], Loss: 2.3474, Perplexity: 10.4578 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [5105/6471], Loss: 2.5193, Perplexity: 12.4199 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 10, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [5106/6471], Loss: 2.6966, Perplexity: 14.8295 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [5107/6471], Loss: 2.5886, Perplexity: 13.3105 types **##    torch.cuda.FloatT

Epoch [3/3], Step [5175/6471], Loss: 2.5017, Perplexity: 12.2030 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 16, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [5176/6471], Loss: 2.7288, Perplexity: 15.3147 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [5177/6471], Loss: 2.4026, Perplexity: 11.0515 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [5178/6471], Loss: 2.5258, Perplexity: 12.5003 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [5179/6471], Loss: 2.4690, Perplexity: 11.8104 types **##    torch.cuda.FloatT

Epoch [3/3], Step [5247/6471], Loss: 2.7516, Perplexity: 15.6671 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [5248/6471], Loss: 2.4222, Perplexity: 11.2703 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [5249/6471], Loss: 2.5553, Perplexity: 12.8747 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [5250/6471], Loss: 2.4016, Perplexity: 11.0405 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 16, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [5251/6471], Loss: 2.6675, Perplexity: 14.4045 types **##    torch.cuda.FloatT

Epoch [3/3], Step [5319/6471], Loss: 2.5490, Perplexity: 12.7937 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 16, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [5320/6471], Loss: 2.4807, Perplexity: 11.9494 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [5321/6471], Loss: 2.4364, Perplexity: 11.4318 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [5322/6471], Loss: 2.4952, Perplexity: 12.1239 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 15, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [5323/6471], Loss: 2.5491, Perplexity: 12.7953 types **##    torch.cuda.FloatT

Epoch [3/3], Step [5391/6471], Loss: 2.5145, Perplexity: 12.3600 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [5392/6471], Loss: 2.3974, Perplexity: 10.9943 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 15, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [5393/6471], Loss: 2.5602, Perplexity: 12.9385 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [5394/6471], Loss: 2.5660, Perplexity: 13.0137 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [5395/6471], Loss: 2.4297, Perplexity: 11.3553 types **##    torch.cuda.FloatT

Epoch [3/3], Step [5463/6471], Loss: 2.5159, Perplexity: 12.3776 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [5464/6471], Loss: 2.4102, Perplexity: 11.1362 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 10, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [5465/6471], Loss: 2.6026, Perplexity: 13.4994 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [5466/6471], Loss: 2.4071, Perplexity: 11.1012 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [5467/6471], Loss: 2.4534, Perplexity: 11.6283 types **##    torch.cuda.FloatT

Epoch [3/3], Step [5535/6471], Loss: 2.1499, Perplexity: 8.5839 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [5536/6471], Loss: 2.2695, Perplexity: 9.6742 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 18, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [5537/6471], Loss: 2.8808, Perplexity: 17.8281 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [5538/6471], Loss: 2.4489, Perplexity: 11.5758 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 15, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [5539/6471], Loss: 2.4955, Perplexity: 12.1279 types **##    torch.cuda.FloatTen

Epoch [3/3], Step [5607/6471], Loss: 2.3943, Perplexity: 10.9609 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 19, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [5608/6471], Loss: 3.2275, Perplexity: 25.2155 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [5609/6471], Loss: 2.5964, Perplexity: 13.4150 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 15, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [5610/6471], Loss: 2.4949, Perplexity: 12.1203 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [5611/6471], Loss: 2.4128, Perplexity: 11.1654 types **##    torch.cuda.FloatT

Epoch [3/3], Step [5679/6471], Loss: 2.4404, Perplexity: 11.4772 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 16, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [5680/6471], Loss: 2.5709, Perplexity: 13.0774 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 10, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [5681/6471], Loss: 2.9525, Perplexity: 19.1530 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [5682/6471], Loss: 2.5814, Perplexity: 13.2161 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [5683/6471], Loss: 2.3816, Perplexity: 10.8222 types **##    torch.cuda.FloatT

Epoch [3/3], Step [5751/6471], Loss: 2.5081, Perplexity: 12.2813 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [5752/6471], Loss: 2.7350, Perplexity: 15.4100 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [5753/6471], Loss: 2.4656, Perplexity: 11.7700 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [5754/6471], Loss: 2.4062, Perplexity: 11.0917 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [5755/6471], Loss: 2.5530, Perplexity: 12.8450 types **##    torch.cuda.FloatT

Epoch [3/3], Step [5823/6471], Loss: 2.5959, Perplexity: 13.4092 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [5824/6471], Loss: 2.2260, Perplexity: 9.2625 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [5825/6471], Loss: 2.6864, Perplexity: 14.6791 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 15, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [5826/6471], Loss: 2.4203, Perplexity: 11.2492 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [5827/6471], Loss: 2.4640, Perplexity: 11.7518 types **##    torch.cuda.FloatTe

Epoch [3/3], Step [5895/6471], Loss: 2.5347, Perplexity: 12.6128 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [5896/6471], Loss: 2.2989, Perplexity: 9.9631 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [5897/6471], Loss: 2.5758, Perplexity: 13.1422 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 15, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [5898/6471], Loss: 2.7179, Perplexity: 15.1483 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [5899/6471], Loss: 2.4661, Perplexity: 11.7770 types **##    torch.cuda.FloatTe

Epoch [3/3], Step [5967/6471], Loss: 2.5616, Perplexity: 12.9562 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [5968/6471], Loss: 2.5368, Perplexity: 12.6389 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [5969/6471], Loss: 2.5998, Perplexity: 13.4617 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 16, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [5970/6471], Loss: 2.6918, Perplexity: 14.7582 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 15, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [5971/6471], Loss: 2.4991, Perplexity: 12.1717 types **##    torch.cuda.FloatT

Epoch [3/3], Step [6039/6471], Loss: 2.5658, Perplexity: 13.0109 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 16, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [6040/6471], Loss: 2.5756, Perplexity: 13.1391 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [6041/6471], Loss: 2.3955, Perplexity: 10.9741 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 10, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [6042/6471], Loss: 2.6613, Perplexity: 14.3156 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 15, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [6043/6471], Loss: 2.5519, Perplexity: 12.8314 types **##    torch.cuda.FloatT

Epoch [3/3], Step [6111/6471], Loss: 2.2824, Perplexity: 9.7997 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [6112/6471], Loss: 2.7209, Perplexity: 15.1947 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 15, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [6113/6471], Loss: 2.5393, Perplexity: 12.6706 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [6114/6471], Loss: 2.4511, Perplexity: 11.6012 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 11, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [6115/6471], Loss: 2.6101, Perplexity: 13.5998 types **##    torch.cuda.FloatTe

Epoch [3/3], Step [6183/6471], Loss: 2.4881, Perplexity: 12.0384 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [6184/6471], Loss: 2.4022, Perplexity: 11.0475 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [6185/6471], Loss: 2.6222, Perplexity: 13.7663 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [6186/6471], Loss: 2.4424, Perplexity: 11.5010 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [6187/6471], Loss: 2.3457, Perplexity: 10.4407 types **##    torch.cuda.FloatT

Epoch [3/3], Step [6255/6471], Loss: 2.5312, Perplexity: 12.5689 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [6256/6471], Loss: 2.4917, Perplexity: 12.0819 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [6257/6471], Loss: 2.5568, Perplexity: 12.8947 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 28, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [6258/6471], Loss: 3.8307, Perplexity: 46.0943 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [6259/6471], Loss: 2.3108, Perplexity: 10.0826 types **##    torch.cuda.FloatT

Epoch [3/3], Step [6327/6471], Loss: 2.3104, Perplexity: 10.0789 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [6328/6471], Loss: 2.1357, Perplexity: 8.4627 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 14, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [6329/6471], Loss: 2.5481, Perplexity: 12.7829 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [6330/6471], Loss: 2.4182, Perplexity: 11.2254 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 13, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [6331/6471], Loss: 2.4764, Perplexity: 11.8986 types **##    torch.cuda.FloatTe

Epoch [3/3], Step [6399/6471], Loss: 2.5346, Perplexity: 12.6116 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 10, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [6400/6471], Loss: 2.7437, Perplexity: 15.5439
 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 15, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [6401/6471], Loss: 2.5459, Perplexity: 12.7542 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 10, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [6402/6471], Loss: 2.5602, Perplexity: 12.9385 types **##    torch.cuda.FloatTensor torch.cuda.FloatTensor
Embed shape and unsqueeze features shape = *****  torch.Size([64, 12, 512]) torch.Size([64, 1, 512])
Epoch [3/3], Step [6403/6471], Loss: 2.3224, Perplexity: 10.2005 types **##    torch.cuda.Float

<a id='step3'></a>
## Step 3: (Optional) Validate your Model

To assess potential overfitting, one approach is to assess performance on a validation set.  If you decide to do this **optional** task, you are required to first complete all of the steps in the next notebook in the sequence (**3_Inference.ipynb**); as part of that notebook, you will write and test code (specifically, the `sample` method in the `DecoderRNN` class) that uses your RNN decoder to generate captions.  That code will prove incredibly useful here. 

If you decide to validate your model, please do not edit the data loader in **data_loader.py**.  Instead, create a new file named **data_loader_val.py** containing the code for obtaining the data loader for the validation data.  You can access:
- the validation images at filepath `'/opt/cocoapi/images/train2014/'`, and
- the validation image caption annotation file at filepath `'/opt/cocoapi/annotations/captions_val2014.json'`.

The suggested approach to validating your model involves creating a json file such as [this one](https://github.com/cocodataset/cocoapi/blob/master/results/captions_val2014_fakecap_results.json) containing your model's predicted captions for the validation images.  Then, you can write your own script or use one that you [find online](https://github.com/tylin/coco-caption) to calculate the BLEU score of your model.  You can read more about the BLEU score, along with other evaluation metrics (such as TEOR and Cider) in section 4.1 of [this paper](https://arxiv.org/pdf/1411.4555.pdf).  For more information about how to use the annotation file, check out the [website](http://cocodataset.org/#download) for the COCO dataset.

In [None]:
# (Optional) TODO: Validate your model.