# **Sequence Models Workshop for OxML 2020**

---

### **Prepared by *Asmita Poddar* & *Piotr Kozakowski***


In [None]:
#!pip install torch torchtext
from google.colab import drive
drive.mount('/content/drive')

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3aietf%3awg%3aoauth%3a2.0%3aoob&scope=email%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdocs.test%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive.photos.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fpeopleapi.readonly&response_type=code

Enter your authorization code:
··········
Mounted at /content/drive


## **Language Modeling**

Language Modelling is the core problem for a number of of Natural Language Processing (NLP) tasks such as speech to text, conversational system, and text summarization.

In this tutorial, you will learn how to create a language model for ***natural language text generation*** by implementing and training **Long Short Term Memory (LSTM)** networks and **Transformers**. 

A trained language model learns the likelihood of occurrence of a word based on the previous sequence of words used in the text. Language models can be operated at character level, n-gram level, sentence level or even paragraph level.

After completing this tutorial, you will know:


*   The challenge of developing a good framing of a word-based language model for a corpus of text using *PyTorch*
*   How to generate sequences using a fit language model.





In [None]:
import math
import pandas as pd
import re
import numpy as np
import random
import time
import tqdm

import torch
from torch import nn
from torch import optim
import torch.nn.functional as F
from torchtext import data
from torchtext import datasets

## Data

The [WikiText language modeling dataset](https://blog.einstein.ai/the-wikitext-long-term-dependency-language-modeling-dataset/) is a collection of over 100 million tokens extracted from the set of verified `Good` and `Featured` articles on Wikipedia.

PyTorch provides many inbuilt datasets for many applications as well as tools for data processing. 

In this tutorial, we will use the Pytorch Dataset - [wikitext2](https://torchtext.readthedocs.io/en/latest/datasets.html#wikitext-2). 

The texts in this dataset have already been tokenized to word level, and split into train, validation, test sets. We further split the data into batches for model training.

In [None]:
field = data.Field()
(train, val, test) = datasets.language_modeling.WikiText2.splits(text_field=field)

train_data = next(iter(train))
val_data = next(iter(val))
test_data = next(iter(test))

print('# Train tokens: ', len(train_data.text))  
print('# Val tokens: ', len(val_data.text))      
print('# Test tokens: ', len(test_data.text))    

# Train tokens:  2088628
# Val tokens:  217646
# Test tokens:  245569


In [None]:
# We will need to know the size of the vocabulary later for
#  both defining the word embedding layer in the model,
# and for encoding output words.
field.build_vocab(train, vectors=None)
print('Vocabulary size:', len(field.vocab))

Vocabulary size: 33279


In [None]:
BATCH_SIZE = 128
SEQ_LEN = 40
# Split the data into batches of size BATCH_SIZE 
(train, val, test) = data.BPTTIterator.splits((train, val, test), batch_size=BATCH_SIZE, bptt_len=SEQ_LEN)

In [None]:
batch = next(iter(train))
print('Batch text shape: ', batch.text.shape)     # [seq_len x batch_size]
print('Batch text: \n', batch.text)               #  consecutive words are in rows of the tensor
print('Batch target: \n', batch.target)           # target is the same as text, but shifted one position up

Batch text shape:  torch.Size([40, 128])
Batch text: 
 tensor([[    9,  1080,    38,  ..., 20496,     5,    28],
        [   11,   343, 24830,  ...,    21,     2,     2],
        [ 3932,     5,     3,  ...,  7154,  4917,   508],
        ...,
        [  955,  2531,    22,  ...,     6,    42,    27],
        [    3,  1500,    27,  ...,  2661,   247,  1530],
        [   25,   637, 22950,  ...,  1689,  1509,     0]])
Batch target: 
 tensor([[   11,   343, 24830,  ...,    21,     2,     2],
        [ 3932,     5,     3,  ...,  7154,  4917,   508],
        [ 4429,     2,    56,  ...,     0,  7129,     5],
        ...,
        [    3,  1500,    27,  ...,  2661,   247,  1530],
        [   25,   637, 22950,  ...,  1689,  1509,     0],
        [   10,   586,     3,  ...,    45,    11,    13]])


# Long Short Term Memory (LSTM) Networks
Recurrent Neural Networks suffer from two problems: 
* vanishing gradient and exploding gradient
* inability to capture long-term dependencies

LSTM solve this issue by explicitly introducing a memory unit, called the cell into the network. Read more at: https://colah.github.io/posts/2015-08-Understanding-LSTMs/

Therefore, this single unit makes decision by considering the current input, previous output and previous memory. And it generates a new output and alters its memory.

Layers in the LSTM model:
1. **Input Layer**: Takes the sequence of words as input
2. **Embedding Layer**: Creates embeddings from the input sequence using a simple lookup table that stores embeddings of a fixed dictionary and size
3. **LSTM Layer**: Computes the output using LSTM units. I have added 128 units in the layer, but this number can be fine tuned later.
4. **Dropout Layer**: A regularisation layer which randomly turns-off the activations of some neurons in the LSTM layer. It helps in preventing over fitting.
5. **Output Layer** : Computes the probability of the best possible next word as output


### Model


In [None]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
VOCAB_SIZE = len(field.vocab)

## TODO: Experiment with for fine-tuning the model
EMB_SIZE = 256
NHIDDEN = 128
NLAYERS = 3
DROPOUT = 0.5
EPOCHS = 100
LR = 0.001

![](https://drive.google.com/uc?export=view&id=1JMR622gBqx5U5jZPQLxM1FxAKmscKeM-)



In [None]:
class LSTMModel(nn.Module):
    """Container module with an encoder, a recurrent module, and a decoder."""

    def __init__(self, vocab_size, emb_size, nhidden, nlayers, dropout=0.5):
        super(LSTMModel, self).__init__()
        
        self.vocab_size = vocab_size
        self.drop = nn.Dropout(dropout)
        self.encoder = nn.Embedding(vocab_size, emb_size)
        self.lstm = nn.LSTM(emb_size, nhidden, nlayers, dropout=dropout)
        self.decoder = nn.Linear(nhidden, vocab_size)

        self.init_weights()
        self.nhidden = nhidden
        self.nlayers = nlayers

    # Initialise weights for encoder and decoder (uniform initialisation)
    def init_weights(self):
        initrange = 0.1
        nn.init.uniform_(self.encoder.weight, -initrange, initrange)
        nn.init.zeros_(self.decoder.weight)
        nn.init.uniform_(self.decoder.weight, -initrange, initrange)

     # Initialise weights for hidden $ cell units with zeros
    def init_hidden(self, bsz):
        weight = next(self.parameters())
        return (weight.new_zeros(self.nlayers, bsz, self.nhidden),
                    weight.new_zeros(self.nlayers, bsz, self.nhidden))

    def forward(self, input, hidden):
        emb = self.drop(self.encoder(input))
        output, hidden = self.lstm(emb, hidden)
        output = self.drop(output)
        decoded = self.decoder(output)
        decoded = decoded.view(-1, self.vocab_size)
        return F.log_softmax(decoded, dim=1), hidden

   

### Model Training

##### **Model:** LSTM

##### **Loss Function:** Negative Log-Likelihood (NLL):  $L(y) = −log(y)$

[](https://ljvmiranda921.github.io/assets/png/cs231n-ann/neg_log.png)

##### **Optimizer:** Adam Optimizer

In [None]:
model = LSTMModel(VOCAB_SIZE, EMB_SIZE, NHIDDEN, NLAYERS, DROPOUT).to(device)
loss_fn = nn.NLLLoss()
optimizer = optim.Adam(model.parameters(), lr=LR)

print(model)

LSTMModel(
  (drop): Dropout(p=0.5, inplace=False)
  (encoder): Embedding(33279, 256)
  (lstm): LSTM(256, 128, num_layers=3, dropout=0.5)
  (decoder): Linear(in_features=128, out_features=33279, bias=True)
)


In [None]:
eval_it = iter(val)


# Run training on one batch
def run_on_batch(batch):
  text = batch.text.cuda()
  target = batch.target.cuda()
  target = target.view(-1)
  hidden = model.init_hidden(BATCH_SIZE)

  # Starting each batch, we detach the hidden state from how it was previously produced.
  # If we didn't, the model would try backpropagating all the way to start of the dataset.
  hidden = (hidden[0].detach(), hidden[1].detach())
  pred, hidden = model(text, hidden)
  return loss_fn(pred, target)

for epoch in range(EPOCHS):
  print('**** Running Epoch {} ****'.format(epoch))

  for (i, batch) in enumerate(train):
    
    if batch.text.shape[1] != BATCH_SIZE:
      #Skip an incomplete batch.
      print('Skipping incomplete batch {} ...'.format(i))
      continue

    # Turn on training mode which enables dropout.
    model.train()
    model.zero_grad()

    train_loss = run_on_batch(batch)
    train_loss.backward()

    # `clip_grad_norm` helps prevent the exploding gradient problem in LSTMs.
    torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)   # max norm of the gradients = 1.0
    optimizer.step()
    
    if i % 600 == 0:
      print('Batch {} ----'.format(i) )
      print('Train loss: ', float(train_loss) )

      # Turn on evaluation mode which disables dropout.
      model.eval()
      eval_loss = float(run_on_batch(next(eval_it)))
      print('Eval loss: ',eval_loss )

torch.save(model, 'model.pt')

# Download model from Colab
from google.colab import files
files.download("model.pt")

**** Running Epoch 0 ****
Batch 0 ----
Train loss:  10.413809776306152
Eval loss:  10.403104782104492
**** Running Epoch 1 ****
Batch 0 ----
Train loss:  7.098013877868652
Eval loss:  6.8576483726501465
**** Running Epoch 2 ****
Batch 0 ----
Train loss:  6.746264457702637
Eval loss:  6.595510005950928
**** Running Epoch 3 ****
Batch 0 ----
Train loss:  6.412356376647949
Eval loss:  6.255554676055908
**** Running Epoch 4 ****
Batch 0 ----
Train loss:  6.243910789489746
Eval loss:  6.158896446228027
**** Running Epoch 5 ****
Batch 0 ----
Train loss:  6.098635673522949
Eval loss:  5.807308197021484
**** Running Epoch 6 ****
Batch 0 ----
Train loss:  5.983107566833496
Eval loss:  5.741576194763184
**** Running Epoch 7 ****
Batch 0 ----
Train loss:  5.895272254943848
Eval loss:  5.67717981338501
**** Running Epoch 8 ****
Batch 0 ----
Train loss:  5.817322731018066
Eval loss:  5.605138301849365
**** Running Epoch 9 ****
Batch 0 ----
Train loss:  5.768476486206055
Eval loss:  5.69232463836669

### Text Generation 
 

Now, we are ready to generate text. 

We begin with some input (seed) word and use the trained model to predict the next word based on the input words. Then, we append the predicted word into the input, and have the model predict the next word and so on. We continue the process until we obtain a sequence with the length we want (here we generate 20 words). 

The multiple generated words can be appended together to get the generated sequence!

 

In [None]:
example = next(eval_it).text.cuda()
buffer = example.clone()
input = torch.tensor([[example[0][0]]]).to(device)

### TODO: Your own input
#input = torch.randint(VOCAB_SIZE, (1, 1), dtype=torch.long).to(device). #generating a random starting word
### TODO: Specify no. of words to be generated 
NO_WORDS = 20

In [None]:
# Load saved trained model
with open('model.pt', 'rb') as f:
    model = torch.load(f).to(device)
    
generated_text = ''

# Turn on evaluation mode which disables dropout.
model.eval()
hidden = model.init_hidden(1)

for i in range(NO_WORDS):
  output, hidden = model(input, hidden)
  word_weights = output.squeeze().exp().cpu()
  word_idx = torch.multinomial(word_weights, 1)[0]
  input.fill_(word_idx)   # current word is input to next hidden unit
  generated_text = generated_text + field.vocab.itos[word_idx]+ ' '

dataset_text = [field.vocab.itos[idx] for idx in buffer[:, 0]]
print('INPUT WORD: ', field.vocab.itos[buffer[0][0]])
print('DATASET PORTION:', dataset_text[: NO_WORDS])
print('GENERATED TEXT: ', generated_text)

INPUT WORD:  before
DATASET PORTION: ['before', 'hatching', 'into', '<unk>', 'larvae', '.', 'Homarus', 'gammarus', 'is', 'a', 'highly', 'esteemed', 'food', ',', 'and', 'is', 'widely', 'caught', 'using', 'lobster']
GENERATED TEXT:  rest via least of Arena single such the City with westward Hartford in where real supporters Security ) portable far 


# Transformers

Define the model:

In [None]:
# from https://github.com/pytorch/examples/blob/master/word_language_model/model.py

class PositionalEncoding(nn.Module):

    def __init__(self, d_model, dropout=0.1, max_len=5000):
        super().__init__()
        self.dropout = nn.Dropout(p=dropout)

        pe = torch.zeros(max_len, d_model)
        position = torch.arange(0, max_len, dtype=torch.float).unsqueeze(1)
        div_term = torch.exp(torch.arange(0, d_model, 2).float() * (-math.log(10000.0) / d_model))
        pe[:, 0::2] = torch.sin(position * div_term)
        pe[:, 1::2] = torch.cos(position * div_term)
        pe = pe.unsqueeze(0).transpose(0, 1)
        self.register_buffer('pe', pe)

    def forward(self, x):
        x = x + self.pe[:x.size(0), :]
        return self.dropout(x)

In [None]:
class TransformerLM(nn.Module):

  def __init__(
      self,
      vocab_size,
      n_layers=3,
      n_heads=4,
      d_model=256,
      d_ff=512,
      dropout=0.5,
      activation='relu',
  ):
    super().__init__()
    
    self.embedding = nn.Embedding(vocab_size, d_model)
    self.pos_encoder = PositionalEncoding(d_model, dropout)
    encoder_layer = nn.TransformerEncoderLayer(d_model, n_heads, d_ff, dropout, activation)
    encoder_norm = nn.LayerNorm(d_model)
    self.encoder = nn.TransformerEncoder(encoder_layer, n_layers, encoder_norm)
    self.output = nn.Linear(d_model, vocab_size)
    self.output.weight = self.embedding.weight
    self.log_softmax = nn.LogSoftmax(dim=-1)

    self.init_weights()

    self._d_model = d_model
    self._mask = None
    self._memory = None

  def init_weights(self):
    initrange = 0.1
    nn.init.uniform_(self.embedding.weight, -initrange, initrange)
    nn.init.zeros_(self.output.weight)
    nn.init.uniform_(self.output.weight, -initrange, initrange)

  def _generate_square_subsequent_mask(self, sz):
    mask = (torch.triu(torch.ones(sz, sz)) == 1).transpose(0, 1)
    mask = mask.float().masked_fill(mask == 0, float('-inf')).masked_fill(mask == 1, float(0.0))
    return mask

  def forward(self, x):
    if self._mask is None:
      self._mask = self._generate_square_subsequent_mask(x.size(0)).to(x.device)

    x = self.embedding(x) * math.sqrt(self._d_model)
    x = self.pos_encoder(x)
    x = self.encoder(x, self._mask)
    x = self.output(x)
    return self.log_softmax(x)

Prepare the data:

In [None]:
batch_size = 128
seq_len = 32

field = data.Field()
(train, val, test) = datasets.language_modeling.WikiText2.splits(text_field=field)
field.build_vocab(train, max_size=None)
(train, val, test) = data.BPTTIterator.splits(
    (train, val, test), batch_size=batch_size, bptt_len=seq_len, repeat=True, shuffle=True
)

Train the model:

In [None]:
model = TransformerLM(vocab_size=len(field.vocab)).cuda()
loss_fn = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

eval_it = iter(val)

def run_on_batch(batch):
  text = batch.text.cuda()
  target = batch.target.cuda()
  pred = model(text).permute(0, 2, 1)
  return loss_fn(pred, target)

for (i, batch) in tqdm.tqdm(enumerate(train)):
  if batch.text.size() != (seq_len, batch_size):
    # Skip an incomplete batch.
    continue

  optimizer.zero_grad()
  loss = run_on_batch(batch)
  loss.backward()
  torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
  optimizer.step()

  if i % (len(train) // 10) == 0:
    print('train loss:', float(loss))
    model.eval()
    print('eval loss:', float(run_on_batch(next(eval_it))))
    model.train()

  if i == 3000:
    break


0it [00:00, ?it/s][A

train loss: 10.744112014770508



1it [00:01,  1.19s/it][A

eval loss: 9.901606559753418



2it [00:01,  1.05it/s][A
3it [00:01,  1.27it/s][A
4it [00:02,  1.49it/s][A
5it [00:02,  1.70it/s][A
6it [00:03,  1.88it/s][A
7it [00:03,  2.03it/s][A
8it [00:03,  2.15it/s][A
9it [00:04,  2.25it/s][A
10it [00:04,  2.32it/s][A
11it [00:05,  2.38it/s][A
12it [00:05,  2.42it/s][A
13it [00:05,  2.46it/s][A
14it [00:06,  2.48it/s][A
15it [00:06,  2.51it/s][A
16it [00:07,  2.53it/s][A
17it [00:07,  2.54it/s][A
18it [00:07,  2.54it/s][A
19it [00:08,  2.54it/s][A
20it [00:08,  2.54it/s][A
21it [00:09,  2.54it/s][A
22it [00:09,  2.54it/s][A
23it [00:09,  2.53it/s][A
24it [00:10,  2.52it/s][A
25it [00:10,  2.52it/s][A
26it [00:11,  2.52it/s][A
27it [00:11,  2.54it/s][A
28it [00:11,  2.55it/s][A
29it [00:12,  2.55it/s][A
30it [00:12,  2.55it/s][A
31it [00:13,  2.55it/s][A
32it [00:13,  2.53it/s][A
33it [00:13,  2.53it/s][A
34it [00:14,  2.53it/s][A
35it [00:14,  2.52it/s][A
36it [00:15,  2.53it/s][A
37it [00:15,  2.53it/s][A
38it [00:15,  2.53it/s][A
39it [00

train loss: 7.180344104766846
eval loss: 6.867320537567139



53it [00:21,  2.34it/s][A
54it [00:22,  2.39it/s][A
55it [00:22,  2.43it/s][A
56it [00:23,  2.46it/s][A
57it [00:23,  2.48it/s][A
58it [00:23,  2.49it/s][A
59it [00:24,  2.50it/s][A
60it [00:24,  2.50it/s][A
61it [00:25,  2.51it/s][A
62it [00:25,  2.52it/s][A
63it [00:25,  2.53it/s][A
64it [00:26,  2.54it/s][A
65it [00:26,  2.55it/s][A
66it [00:27,  2.55it/s][A
67it [00:27,  2.54it/s][A
68it [00:27,  2.53it/s][A
69it [00:28,  2.53it/s][A
70it [00:28,  2.53it/s][A
71it [00:28,  2.53it/s][A
72it [00:29,  2.52it/s][A
73it [00:29,  2.52it/s][A
74it [00:30,  2.51it/s][A
75it [00:30,  2.51it/s][A
76it [00:30,  2.51it/s][A
77it [00:31,  2.52it/s][A
78it [00:31,  2.53it/s][A
79it [00:32,  2.54it/s][A
80it [00:32,  2.55it/s][A
81it [00:32,  2.55it/s][A
82it [00:33,  2.55it/s][A
83it [00:33,  2.55it/s][A
84it [00:34,  2.54it/s][A
85it [00:34,  2.54it/s][A
86it [00:34,  2.55it/s][A
87it [00:35,  2.53it/s][A
88it [00:35,  2.52it/s][A
89it [00:36,  2.51it/s][A


train loss: 6.884403228759766
eval loss: 6.732758045196533



104it [00:42,  2.32it/s][A
105it [00:42,  2.39it/s][A
106it [00:42,  2.44it/s][A
107it [00:43,  2.47it/s][A
108it [00:43,  2.48it/s][A
109it [00:44,  2.49it/s][A
110it [00:44,  2.50it/s][A
111it [00:44,  2.50it/s][A
112it [00:45,  2.51it/s][A
113it [00:45,  2.52it/s][A
114it [00:46,  2.52it/s][A
115it [00:46,  2.53it/s][A
116it [00:46,  2.52it/s][A
117it [00:47,  2.52it/s][A
118it [00:47,  2.52it/s][A
119it [00:48,  2.53it/s][A
120it [00:48,  2.54it/s][A
121it [00:48,  2.55it/s][A
122it [00:49,  2.55it/s][A
123it [00:49,  2.55it/s][A
124it [00:50,  2.55it/s][A
125it [00:50,  2.54it/s][A
126it [00:50,  2.54it/s][A
127it [00:51,  2.53it/s][A
128it [00:51,  2.52it/s][A
129it [00:52,  2.51it/s][A
130it [00:52,  2.52it/s][A
131it [00:52,  2.52it/s][A
132it [00:53,  2.52it/s][A
133it [00:53,  2.52it/s][A
134it [00:54,  2.53it/s][A
135it [00:54,  2.54it/s][A
136it [00:54,  2.54it/s][A
137it [00:55,  2.55it/s][A
138it [00:55,  2.55it/s][A
139it [00:56,  2.53

train loss: 6.691217422485352
eval loss: 6.485075950622559



155it [01:02,  2.33it/s][A
156it [01:02,  2.39it/s][A
157it [01:03,  2.44it/s][A
158it [01:03,  2.48it/s][A
159it [01:04,  2.50it/s][A
160it [01:04,  2.51it/s][A
161it [01:04,  2.52it/s][A
162it [01:05,  2.52it/s][A
163it [01:05,  2.52it/s][A
164it [01:06,  2.52it/s][A
165it [01:06,  2.52it/s][A
166it [01:06,  2.52it/s][A
167it [01:07,  2.52it/s][A
168it [01:07,  2.51it/s][A
169it [01:08,  2.52it/s][A
170it [01:08,  2.53it/s][A
171it [01:08,  2.54it/s][A
172it [01:09,  2.54it/s][A
173it [01:09,  2.55it/s][A
174it [01:10,  2.54it/s][A
175it [01:10,  2.54it/s][A
176it [01:10,  2.53it/s][A
177it [01:11,  2.52it/s][A
178it [01:11,  2.51it/s][A
179it [01:11,  2.51it/s][A
180it [01:12,  2.51it/s][A
181it [01:12,  2.51it/s][A
182it [01:13,  2.49it/s][A
183it [01:13,  2.50it/s][A
184it [01:13,  2.51it/s][A
185it [01:14,  2.52it/s][A
186it [01:14,  2.52it/s][A
187it [01:15,  2.52it/s][A
188it [01:15,  2.53it/s][A
189it [01:15,  2.53it/s][A
190it [01:16,  2.53

train loss: 6.525059700012207
eval loss: 6.398266315460205



206it [01:22,  2.31it/s][A
207it [01:23,  2.38it/s][A
208it [01:23,  2.43it/s][A
209it [01:24,  2.45it/s][A
210it [01:24,  2.46it/s][A
211it [01:24,  2.48it/s][A
212it [01:25,  2.50it/s][A
213it [01:25,  2.51it/s][A
214it [01:26,  2.51it/s][A
215it [01:26,  2.51it/s][A
216it [01:26,  2.51it/s][A
217it [01:27,  2.51it/s][A
218it [01:27,  2.51it/s][A
219it [01:28,  2.51it/s][A
220it [01:28,  2.51it/s][A
221it [01:28,  2.50it/s][A
222it [01:29,  2.49it/s][A
223it [01:29,  2.51it/s][A
224it [01:30,  2.52it/s][A
225it [01:30,  2.53it/s][A
226it [01:30,  2.54it/s][A
227it [01:31,  2.54it/s][A
228it [01:31,  2.55it/s][A
229it [01:31,  2.54it/s][A
230it [01:32,  2.53it/s][A
231it [01:32,  2.52it/s][A
232it [01:33,  2.51it/s][A
233it [01:33,  2.50it/s][A
234it [01:33,  2.50it/s][A
235it [01:34,  2.51it/s][A
236it [01:34,  2.52it/s][A
237it [01:35,  2.52it/s][A
238it [01:35,  2.52it/s][A
239it [01:35,  2.52it/s][A
240it [01:36,  2.52it/s][A
241it [01:36,  2.52

train loss: 6.504654407501221
eval loss: 6.262438774108887



257it [01:43,  2.31it/s][A
258it [01:43,  2.38it/s][A
259it [01:44,  2.43it/s][A
260it [01:44,  2.44it/s][A
261it [01:44,  2.47it/s][A
262it [01:45,  2.48it/s][A
263it [01:45,  2.50it/s][A
264it [01:46,  2.51it/s][A
265it [01:46,  2.51it/s][A
266it [01:46,  2.52it/s][A
267it [01:47,  2.52it/s][A
268it [01:47,  2.53it/s][A
269it [01:47,  2.53it/s][A
270it [01:48,  2.54it/s][A
271it [01:48,  2.54it/s][A
272it [01:49,  2.54it/s][A
273it [01:49,  2.54it/s][A
274it [01:49,  2.54it/s][A
275it [01:50,  2.54it/s][A
276it [01:50,  2.52it/s][A
277it [01:51,  2.52it/s][A
278it [01:51,  2.52it/s][A
279it [01:51,  2.53it/s][A
280it [01:52,  2.53it/s][A
281it [01:52,  2.52it/s][A
282it [01:53,  2.52it/s][A
283it [01:53,  2.54it/s][A
284it [01:53,  2.54it/s][A
285it [01:54,  2.55it/s][A
286it [01:54,  2.55it/s][A
287it [01:55,  2.56it/s][A
288it [01:55,  2.54it/s][A
289it [01:55,  2.53it/s][A
290it [01:56,  2.52it/s][A
291it [01:56,  2.52it/s][A
292it [01:57,  2.52

train loss: 6.48200798034668
eval loss: 6.059976100921631



308it [02:03,  2.32it/s][A
309it [02:03,  2.37it/s][A
310it [02:04,  2.41it/s][A
311it [02:04,  2.44it/s][A
312it [02:05,  2.45it/s][A
313it [02:05,  2.46it/s][A
314it [02:05,  2.49it/s][A
315it [02:06,  2.51it/s][A
316it [02:06,  2.51it/s][A
317it [02:07,  2.53it/s][A
318it [02:07,  2.54it/s][A
319it [02:07,  2.54it/s][A
320it [02:08,  2.53it/s][A
321it [02:08,  2.52it/s][A
322it [02:09,  2.52it/s][A
323it [02:09,  2.52it/s][A
324it [02:09,  2.52it/s][A
325it [02:10,  2.52it/s][A
326it [02:10,  2.53it/s][A
327it [02:11,  2.53it/s][A
328it [02:11,  2.52it/s][A
329it [02:11,  2.53it/s][A
330it [02:12,  2.53it/s][A
331it [02:12,  2.54it/s][A
332it [02:13,  2.54it/s][A
333it [02:13,  2.54it/s][A
334it [02:13,  2.55it/s][A
335it [02:14,  2.53it/s][A
336it [02:14,  2.52it/s][A
337it [02:15,  2.52it/s][A
338it [02:15,  2.53it/s][A
339it [02:15,  2.52it/s][A
340it [02:16,  2.51it/s][A
341it [02:16,  2.52it/s][A
342it [02:17,  2.53it/s][A
343it [02:17,  2.52

train loss: 6.279682159423828
eval loss: 5.93257474899292



359it [02:23,  2.33it/s][A
360it [02:24,  2.39it/s][A
361it [02:24,  2.43it/s][A
362it [02:25,  2.46it/s][A
363it [02:25,  2.47it/s][A
364it [02:25,  2.49it/s][A
365it [02:26,  2.49it/s][A
366it [02:26,  2.51it/s][A
367it [02:27,  2.51it/s][A
368it [02:27,  2.50it/s][A
369it [02:27,  2.51it/s][A
370it [02:28,  2.50it/s][A
371it [02:28,  2.52it/s][A
372it [02:29,  2.53it/s][A
373it [02:29,  2.54it/s][A
374it [02:29,  2.54it/s][A
375it [02:30,  2.54it/s][A
376it [02:30,  2.54it/s][A
377it [02:31,  2.52it/s][A
378it [02:31,  2.51it/s][A
379it [02:31,  2.51it/s][A
380it [02:32,  2.51it/s][A
381it [02:32,  2.51it/s][A
382it [02:33,  2.51it/s][A
383it [02:33,  2.51it/s][A
384it [02:33,  2.50it/s][A
385it [02:34,  2.51it/s][A
386it [02:34,  2.52it/s][A
387it [02:35,  2.52it/s][A
388it [02:35,  2.53it/s][A
389it [02:35,  2.54it/s][A
390it [02:36,  2.54it/s][A
391it [02:36,  2.55it/s][A
392it [02:36,  2.55it/s][A
393it [02:37,  2.55it/s][A
394it [02:37,  2.53

train loss: 6.24365234375
eval loss: 6.002635955810547



410it [02:44,  2.32it/s][A
411it [02:44,  2.38it/s][A
412it [02:45,  2.41it/s][A
413it [02:45,  2.44it/s][A
414it [02:45,  2.46it/s][A
415it [02:46,  2.47it/s][A
416it [02:46,  2.50it/s][A
417it [02:47,  2.51it/s][A
418it [02:47,  2.52it/s][A
419it [02:47,  2.53it/s][A
420it [02:48,  2.53it/s][A
421it [02:48,  2.53it/s][A
422it [02:49,  2.54it/s][A
423it [02:49,  2.54it/s][A
424it [02:49,  2.54it/s][A
425it [02:50,  2.54it/s][A
426it [02:50,  2.53it/s][A
427it [02:50,  2.53it/s][A
428it [02:51,  2.53it/s][A
429it [02:51,  2.52it/s][A
430it [02:52,  2.51it/s][A
431it [02:52,  2.50it/s][A
432it [02:52,  2.51it/s][A
433it [02:53,  2.51it/s][A
434it [02:53,  2.52it/s][A
435it [02:54,  2.53it/s][A
436it [02:54,  2.55it/s][A
437it [02:54,  2.55it/s][A
438it [02:55,  2.54it/s][A
439it [02:55,  2.52it/s][A
440it [02:56,  2.53it/s][A
441it [02:56,  2.53it/s][A
442it [02:56,  2.52it/s][A
443it [02:57,  2.53it/s][A
444it [02:57,  2.52it/s][A
445it [02:58,  2.53

train loss: 6.204072952270508
eval loss: 5.843438625335693



461it [03:04,  2.33it/s][A
462it [03:04,  2.39it/s][A
463it [03:05,  2.42it/s][A
464it [03:05,  2.45it/s][A
465it [03:06,  2.47it/s][A
466it [03:06,  2.48it/s][A
467it [03:06,  2.50it/s][A
468it [03:07,  2.50it/s][A
469it [03:07,  2.50it/s][A
470it [03:08,  2.51it/s][A
471it [03:08,  2.52it/s][A
472it [03:08,  2.53it/s][A
473it [03:09,  2.53it/s][A
474it [03:09,  2.54it/s][A
475it [03:10,  2.55it/s][A
476it [03:10,  2.54it/s][A
477it [03:10,  2.54it/s][A
478it [03:11,  2.53it/s][A
479it [03:11,  2.53it/s][A
480it [03:12,  2.52it/s][A
481it [03:12,  2.52it/s][A
482it [03:12,  2.52it/s][A
483it [03:13,  2.53it/s][A
484it [03:13,  2.52it/s][A
485it [03:14,  2.52it/s][A
486it [03:14,  2.53it/s][A
487it [03:14,  2.53it/s][A
488it [03:15,  2.52it/s][A
489it [03:15,  2.53it/s][A
490it [03:16,  2.54it/s][A
491it [03:16,  2.54it/s][A
492it [03:16,  2.54it/s][A
493it [03:17,  2.54it/s][A
494it [03:17,  2.54it/s][A
495it [03:18,  2.54it/s][A
496it [03:18,  2.53

train loss: 6.03971004486084
eval loss: 5.802977085113525



512it [03:24,  2.68it/s][A
513it [03:24,  2.63it/s][A
514it [03:25,  2.59it/s][A
515it [03:25,  2.58it/s][A
516it [03:26,  2.57it/s][A
517it [03:26,  2.56it/s][A
518it [03:26,  2.55it/s][A
519it [03:27,  2.55it/s][A
520it [03:27,  2.55it/s][A
521it [03:28,  2.53it/s][A
522it [03:28,  2.53it/s][A
523it [03:28,  2.53it/s][A
524it [03:29,  2.52it/s][A
525it [03:29,  2.52it/s][A
526it [03:30,  2.52it/s][A
527it [03:30,  2.53it/s][A
528it [03:30,  2.53it/s][A
529it [03:31,  2.52it/s][A
530it [03:31,  2.52it/s][A
531it [03:32,  2.51it/s][A
532it [03:32,  2.52it/s][A
533it [03:32,  2.52it/s][A
534it [03:33,  2.53it/s][A
535it [03:33,  2.54it/s][A
536it [03:34,  2.54it/s][A
537it [03:34,  2.55it/s][A
538it [03:34,  2.54it/s][A
539it [03:35,  2.54it/s][A
540it [03:35,  2.54it/s][A
541it [03:36,  2.53it/s][A
542it [03:36,  2.53it/s][A
543it [03:36,  2.53it/s][A
544it [03:37,  2.53it/s][A
545it [03:37,  2.53it/s][A
546it [03:37,  2.53it/s][A
547it [03:38,  2.54

train loss: 6.087571620941162
eval loss: 5.901940822601318



563it [03:44,  2.32it/s][A
564it [03:45,  2.38it/s][A
565it [03:45,  2.41it/s][A
566it [03:46,  2.45it/s][A
567it [03:46,  2.46it/s][A
568it [03:46,  2.48it/s][A
569it [03:47,  2.49it/s][A
570it [03:47,  2.50it/s][A
571it [03:48,  2.51it/s][A
572it [03:48,  2.51it/s][A
573it [03:48,  2.51it/s][A
574it [03:49,  2.53it/s][A
575it [03:49,  2.53it/s][A
576it [03:50,  2.54it/s][A
577it [03:50,  2.54it/s][A
578it [03:50,  2.54it/s][A
579it [03:51,  2.54it/s][A
580it [03:51,  2.54it/s][A
581it [03:51,  2.54it/s][A
582it [03:52,  2.53it/s][A
583it [03:52,  2.53it/s][A
584it [03:53,  2.53it/s][A
585it [03:53,  2.52it/s][A
586it [03:53,  2.52it/s][A
587it [03:54,  2.51it/s][A
588it [03:54,  2.51it/s][A
589it [03:55,  2.51it/s][A
590it [03:55,  2.52it/s][A
591it [03:55,  2.53it/s][A
592it [03:56,  2.52it/s][A
593it [03:56,  2.54it/s][A
594it [03:57,  2.54it/s][A
595it [03:57,  2.54it/s][A
596it [03:57,  2.54it/s][A
597it [03:58,  2.54it/s][A
598it [03:58,  2.53

train loss: 6.049144268035889
eval loss: 5.858266830444336



614it [04:05,  2.32it/s][A
615it [04:05,  2.38it/s][A
616it [04:05,  2.43it/s][A
617it [04:06,  2.47it/s][A
618it [04:06,  2.48it/s][A
619it [04:07,  2.49it/s][A
620it [04:07,  2.51it/s][A
621it [04:07,  2.51it/s][A
622it [04:08,  2.51it/s][A
623it [04:08,  2.51it/s][A
624it [04:09,  2.52it/s][A
625it [04:09,  2.52it/s][A
626it [04:09,  2.52it/s][A
627it [04:10,  2.52it/s][A
628it [04:10,  2.50it/s][A
629it [04:11,  2.50it/s][A
630it [04:11,  2.51it/s][A
631it [04:11,  2.51it/s][A
632it [04:12,  2.53it/s][A
633it [04:12,  2.52it/s][A
634it [04:13,  2.53it/s][A
635it [04:13,  2.53it/s][A
636it [04:13,  2.54it/s][A
637it [04:14,  2.55it/s][A
638it [04:14,  2.54it/s][A
639it [04:15,  2.54it/s][A
640it [04:15,  2.53it/s][A
641it [04:15,  2.52it/s][A
642it [04:16,  2.51it/s][A
643it [04:16,  2.51it/s][A
644it [04:17,  2.52it/s][A
645it [04:17,  2.52it/s][A
646it [04:17,  2.52it/s][A
647it [04:18,  2.51it/s][A
648it [04:18,  2.51it/s][A
649it [04:19,  2.52

train loss: 5.964940071105957
eval loss: 5.697671890258789



665it [04:25,  2.32it/s][A
666it [04:25,  2.37it/s][A
667it [04:26,  2.42it/s][A
668it [04:26,  2.46it/s][A
669it [04:27,  2.48it/s][A
670it [04:27,  2.49it/s][A
671it [04:27,  2.50it/s][A
672it [04:28,  2.52it/s][A
673it [04:28,  2.51it/s][A
674it [04:29,  2.52it/s][A
675it [04:29,  2.51it/s][A
676it [04:29,  2.52it/s][A
677it [04:30,  2.52it/s][A
678it [04:30,  2.51it/s][A
679it [04:31,  2.52it/s][A
680it [04:31,  2.54it/s][A
681it [04:31,  2.54it/s][A
682it [04:32,  2.53it/s][A
683it [04:32,  2.54it/s][A
684it [04:33,  2.53it/s][A
685it [04:33,  2.53it/s][A
686it [04:33,  2.53it/s][A
687it [04:34,  2.53it/s][A
688it [04:34,  2.53it/s][A
689it [04:35,  2.53it/s][A
690it [04:35,  2.53it/s][A
691it [04:35,  2.54it/s][A
692it [04:36,  2.54it/s][A
693it [04:36,  2.53it/s][A
694it [04:37,  2.54it/s][A
695it [04:37,  2.54it/s][A
696it [04:37,  2.55it/s][A
697it [04:38,  2.52it/s][A
698it [04:38,  2.52it/s][A
699it [04:39,  2.53it/s][A
700it [04:39,  2.52

train loss: 5.901734352111816
eval loss: 5.778346061706543



716it [04:45,  2.33it/s][A
717it [04:46,  2.38it/s][A
718it [04:46,  2.41it/s][A
719it [04:47,  2.44it/s][A
720it [04:47,  2.46it/s][A
721it [04:47,  2.48it/s][A
722it [04:48,  2.49it/s][A
723it [04:48,  2.49it/s][A
724it [04:49,  2.49it/s][A
725it [04:49,  2.50it/s][A
726it [04:49,  2.52it/s][A
727it [04:50,  2.53it/s][A
728it [04:50,  2.52it/s][A
729it [04:51,  2.52it/s][A
730it [04:51,  2.53it/s][A
731it [04:51,  2.54it/s][A
732it [04:52,  2.54it/s][A
733it [04:52,  2.54it/s][A
734it [04:53,  2.54it/s][A
735it [04:53,  2.54it/s][A
736it [04:53,  2.53it/s][A
737it [04:54,  2.53it/s][A
738it [04:54,  2.53it/s][A
739it [04:55,  2.53it/s][A
740it [04:55,  2.53it/s][A
741it [04:55,  2.53it/s][A
742it [04:56,  2.54it/s][A
743it [04:56,  2.55it/s][A
744it [04:56,  2.55it/s][A
745it [04:57,  2.55it/s][A
746it [04:57,  2.54it/s][A
747it [04:58,  2.54it/s][A
748it [04:58,  2.52it/s][A
749it [04:58,  2.51it/s][A
750it [04:59,  2.52it/s][A
751it [04:59,  2.50

train loss: 5.963535308837891
eval loss: 5.748756408691406



767it [05:06,  2.32it/s][A
768it [05:06,  2.37it/s][A
769it [05:07,  2.41it/s][A
770it [05:07,  2.44it/s][A
771it [05:07,  2.45it/s][A
772it [05:08,  2.47it/s][A
773it [05:08,  2.49it/s][A
774it [05:09,  2.51it/s][A
775it [05:09,  2.53it/s][A
776it [05:09,  2.53it/s][A
777it [05:10,  2.53it/s][A
778it [05:10,  2.54it/s][A
779it [05:11,  2.53it/s][A
780it [05:11,  2.53it/s][A
781it [05:11,  2.53it/s][A
782it [05:12,  2.53it/s][A
783it [05:12,  2.51it/s][A
784it [05:13,  2.51it/s][A
785it [05:13,  2.52it/s][A
786it [05:13,  2.53it/s][A
787it [05:14,  2.53it/s][A
788it [05:14,  2.53it/s][A
789it [05:14,  2.53it/s][A
790it [05:15,  2.53it/s][A
791it [05:15,  2.53it/s][A
792it [05:16,  2.54it/s][A
793it [05:16,  2.54it/s][A
794it [05:16,  2.54it/s][A
795it [05:17,  2.52it/s][A
796it [05:17,  2.51it/s][A
797it [05:18,  2.52it/s][A
798it [05:18,  2.52it/s][A
799it [05:18,  2.52it/s][A
800it [05:19,  2.51it/s][A
801it [05:19,  2.52it/s][A
802it [05:20,  2.53

train loss: 6.009275436401367
eval loss: 5.692862510681152



818it [05:26,  2.30it/s][A
819it [05:27,  2.38it/s][A
820it [05:27,  2.42it/s][A
821it [05:27,  2.45it/s][A
822it [05:28,  2.49it/s][A
823it [05:28,  2.51it/s][A
824it [05:29,  2.51it/s][A
825it [05:29,  2.52it/s][A
826it [05:29,  2.52it/s][A
827it [05:30,  2.52it/s][A
828it [05:30,  2.52it/s][A
829it [05:30,  2.52it/s][A
830it [05:31,  2.51it/s][A
831it [05:31,  2.50it/s][A
832it [05:32,  2.51it/s][A
833it [05:32,  2.51it/s][A
834it [05:32,  2.50it/s][A
835it [05:33,  2.51it/s][A
836it [05:33,  2.52it/s][A
837it [05:34,  2.53it/s][A
838it [05:34,  2.54it/s][A
839it [05:34,  2.54it/s][A
840it [05:35,  2.54it/s][A
841it [05:35,  2.54it/s][A
842it [05:36,  2.54it/s][A
843it [05:36,  2.53it/s][A
844it [05:36,  2.53it/s][A
845it [05:37,  2.52it/s][A
846it [05:37,  2.53it/s][A
847it [05:38,  2.53it/s][A
848it [05:38,  2.52it/s][A
849it [05:38,  2.53it/s][A
850it [05:39,  2.53it/s][A
851it [05:39,  2.54it/s][A
852it [05:40,  2.55it/s][A
853it [05:40,  2.55

train loss: 5.835292339324951
eval loss: 5.6659464836120605



869it [05:46,  2.32it/s][A
870it [05:47,  2.38it/s][A
871it [05:47,  2.43it/s][A
872it [05:48,  2.46it/s][A
873it [05:48,  2.48it/s][A
874it [05:48,  2.50it/s][A
875it [05:49,  2.52it/s][A
876it [05:49,  2.54it/s][A
877it [05:50,  2.54it/s][A
878it [05:50,  2.53it/s][A
879it [05:50,  2.53it/s][A
880it [05:51,  2.53it/s][A
881it [05:51,  2.53it/s][A
882it [05:52,  2.52it/s][A
883it [05:52,  2.53it/s][A
884it [05:52,  2.53it/s][A
885it [05:53,  2.52it/s][A
886it [05:53,  2.51it/s][A
887it [05:54,  2.51it/s][A
888it [05:54,  2.51it/s][A
889it [05:54,  2.52it/s][A
890it [05:55,  2.53it/s][A
891it [05:55,  2.54it/s][A
892it [05:56,  2.55it/s][A
893it [05:56,  2.54it/s][A
894it [05:56,  2.53it/s][A
895it [05:57,  2.52it/s][A
896it [05:57,  2.52it/s][A
897it [05:58,  2.52it/s][A
898it [05:58,  2.52it/s][A
899it [05:58,  2.52it/s][A
900it [05:59,  2.52it/s][A
901it [05:59,  2.52it/s][A
902it [06:00,  2.52it/s][A
903it [06:00,  2.52it/s][A
904it [06:00,  2.52

train loss: 5.820431232452393
eval loss: 5.605086326599121



920it [06:07,  2.32it/s][A
921it [06:07,  2.39it/s][A
922it [06:08,  2.44it/s][A
923it [06:08,  2.47it/s][A
924it [06:08,  2.48it/s][A
925it [06:09,  2.49it/s][A
926it [06:09,  2.50it/s][A
927it [06:10,  2.51it/s][A
928it [06:10,  2.51it/s][A
929it [06:10,  2.49it/s][A
930it [06:11,  2.51it/s][A
931it [06:11,  2.52it/s][A
932it [06:12,  2.52it/s][A
933it [06:12,  2.52it/s][A
934it [06:12,  2.53it/s][A
935it [06:13,  2.53it/s][A
936it [06:13,  2.52it/s][A
937it [06:14,  2.52it/s][A
938it [06:14,  2.54it/s][A
939it [06:14,  2.54it/s][A
940it [06:15,  2.54it/s][A
941it [06:15,  2.54it/s][A
942it [06:16,  2.53it/s][A
943it [06:16,  2.53it/s][A
944it [06:16,  2.53it/s][A
945it [06:17,  2.53it/s][A
946it [06:17,  2.53it/s][A
947it [06:17,  2.53it/s][A
948it [06:18,  2.51it/s][A
949it [06:18,  2.51it/s][A
950it [06:19,  2.52it/s][A
951it [06:19,  2.52it/s][A
952it [06:19,  2.53it/s][A
953it [06:20,  2.52it/s][A
954it [06:20,  2.53it/s][A
955it [06:21,  2.53

train loss: 5.8187408447265625
eval loss: 5.55510950088501



971it [06:27,  2.34it/s][A
972it [06:28,  2.39it/s][A
973it [06:28,  2.42it/s][A
974it [06:28,  2.46it/s][A
975it [06:29,  2.49it/s][A
976it [06:29,  2.50it/s][A
977it [06:30,  2.52it/s][A
978it [06:30,  2.53it/s][A
979it [06:30,  2.52it/s][A
980it [06:31,  2.53it/s][A
981it [06:31,  2.53it/s][A
982it [06:31,  2.53it/s][A
983it [06:32,  2.52it/s][A
984it [06:32,  2.52it/s][A
985it [06:33,  2.52it/s][A
986it [06:33,  2.52it/s][A
987it [06:33,  2.52it/s][A
988it [06:34,  2.52it/s][A
989it [06:34,  2.53it/s][A
990it [06:35,  2.54it/s][A
991it [06:35,  2.54it/s][A
992it [06:35,  2.55it/s][A
993it [06:36,  2.55it/s][A
994it [06:36,  2.55it/s][A
995it [06:37,  2.53it/s][A
996it [06:37,  2.53it/s][A
997it [06:37,  2.52it/s][A
998it [06:38,  2.53it/s][A
999it [06:38,  2.52it/s][A
1000it [06:39,  2.51it/s][A
1001it [06:39,  2.52it/s][A
1002it [06:39,  2.52it/s][A
1003it [06:40,  2.53it/s][A
1004it [06:40,  2.52it/s][A
1005it [06:41,  2.52it/s][A
1006it [06:41

train loss: 5.717184543609619
eval loss: 5.708514213562012



1022it [06:47,  2.68it/s][A
1023it [06:47,  2.64it/s][A
1024it [06:48,  2.60it/s][A
1025it [06:48,  2.59it/s][A
1026it [06:49,  2.57it/s][A
1027it [06:49,  2.54it/s][A
1028it [06:49,  2.53it/s][A
1029it [06:50,  2.54it/s][A
1030it [06:50,  2.54it/s][A
1031it [06:51,  2.53it/s][A
1032it [06:51,  2.53it/s][A
1033it [06:51,  2.53it/s][A
1034it [06:52,  2.51it/s][A
1035it [06:52,  2.50it/s][A
1036it [06:53,  2.50it/s][A
1037it [06:53,  2.51it/s][A
1038it [06:53,  2.52it/s][A
1039it [06:54,  2.52it/s][A
1040it [06:54,  2.53it/s][A
1041it [06:55,  2.55it/s][A
1042it [06:55,  2.55it/s][A
1043it [06:55,  2.54it/s][A
1044it [06:56,  2.53it/s][A
1045it [06:56,  2.53it/s][A
1046it [06:57,  2.53it/s][A
1047it [06:57,  2.52it/s][A
1048it [06:57,  2.52it/s][A
1049it [06:58,  2.51it/s][A
1050it [06:58,  2.50it/s][A
1051it [06:59,  2.50it/s][A
1052it [06:59,  2.51it/s][A
1053it [06:59,  2.52it/s][A
1054it [07:00,  2.51it/s][A
1055it [07:00,  2.51it/s][A
1056it [07:01

train loss: 5.770957946777344
eval loss: 5.588698387145996



1073it [07:07,  2.33it/s][A
1074it [07:08,  2.38it/s][A
1075it [07:08,  2.42it/s][A
1076it [07:09,  2.45it/s][A
1077it [07:09,  2.47it/s][A
1078it [07:09,  2.48it/s][A
1079it [07:10,  2.49it/s][A
1080it [07:10,  2.50it/s][A
1081it [07:11,  2.51it/s][A
1082it [07:11,  2.51it/s][A
1083it [07:11,  2.51it/s][A
1084it [07:12,  2.53it/s][A
1085it [07:12,  2.54it/s][A
1086it [07:13,  2.55it/s][A
1087it [07:13,  2.54it/s][A
1088it [07:13,  2.54it/s][A
1089it [07:14,  2.54it/s][A
1090it [07:14,  2.53it/s][A
1091it [07:15,  2.53it/s][A
1092it [07:15,  2.53it/s][A
1093it [07:15,  2.53it/s][A
1094it [07:16,  2.53it/s][A
1095it [07:16,  2.53it/s][A
1096it [07:17,  2.53it/s][A
1097it [07:17,  2.53it/s][A
1098it [07:17,  2.54it/s][A
1099it [07:18,  2.54it/s][A
1100it [07:18,  2.55it/s][A
1101it [07:19,  2.55it/s][A
1102it [07:19,  2.55it/s][A
1103it [07:19,  2.54it/s][A
1104it [07:20,  2.53it/s][A
1105it [07:20,  2.53it/s][A
1106it [07:20,  2.53it/s][A
1107it [07:21

train loss: 5.714615345001221
eval loss: 5.644379138946533



1124it [07:28,  2.31it/s][A
1125it [07:28,  2.37it/s][A
1126it [07:29,  2.42it/s][A
1127it [07:29,  2.46it/s][A
1128it [07:29,  2.49it/s][A
1129it [07:30,  2.50it/s][A
1130it [07:30,  2.51it/s][A
1131it [07:31,  2.52it/s][A
1132it [07:31,  2.52it/s][A
1133it [07:31,  2.53it/s][A
1134it [07:32,  2.52it/s][A
1135it [07:32,  2.53it/s][A
1136it [07:33,  2.53it/s][A
1137it [07:33,  2.53it/s][A
1138it [07:33,  2.54it/s][A
1139it [07:34,  2.55it/s][A
1140it [07:34,  2.56it/s][A
1141it [07:34,  2.55it/s][A
1142it [07:35,  2.55it/s][A
1143it [07:35,  2.54it/s][A
1144it [07:36,  2.54it/s][A
1145it [07:36,  2.54it/s][A
1146it [07:36,  2.53it/s][A
1147it [07:37,  2.53it/s][A
1148it [07:37,  2.53it/s][A
1149it [07:38,  2.53it/s][A
1150it [07:38,  2.54it/s][A
1151it [07:38,  2.55it/s][A
1152it [07:39,  2.55it/s][A
1153it [07:39,  2.55it/s][A
1154it [07:40,  2.55it/s][A
1155it [07:40,  2.54it/s][A
1156it [07:40,  2.53it/s][A
1157it [07:41,  2.51it/s][A
1158it [07:41

train loss: 5.656882286071777
eval loss: 5.538461208343506



1175it [07:48,  2.33it/s][A
1176it [07:49,  2.38it/s][A
1177it [07:49,  2.42it/s][A
1178it [07:49,  2.44it/s][A
1179it [07:50,  2.47it/s][A
1180it [07:50,  2.49it/s][A
1181it [07:51,  2.50it/s][A
1182it [07:51,  2.51it/s][A
1183it [07:51,  2.53it/s][A
1184it [07:52,  2.53it/s][A
1185it [07:52,  2.54it/s][A
1186it [07:52,  2.54it/s][A
1187it [07:53,  2.54it/s][A
1188it [07:53,  2.54it/s][A
1189it [07:54,  2.54it/s][A
1190it [07:54,  2.53it/s][A
1191it [07:54,  2.53it/s][A
1192it [07:55,  2.53it/s][A
1193it [07:55,  2.52it/s][A
1194it [07:56,  2.53it/s][A
1195it [07:56,  2.53it/s][A
1196it [07:56,  2.53it/s][A
1197it [07:57,  2.52it/s][A
1198it [07:57,  2.52it/s][A
1199it [07:58,  2.53it/s][A
1200it [07:58,  2.52it/s][A
1201it [07:58,  2.53it/s][A
1202it [07:59,  2.54it/s][A
1203it [07:59,  2.54it/s][A
1204it [08:00,  2.54it/s][A
1205it [08:00,  2.53it/s][A
1206it [08:00,  2.53it/s][A
1207it [08:01,  2.53it/s][A
1208it [08:01,  2.53it/s][A
1209it [08:02

train loss: 5.6443257331848145
eval loss: 5.55673885345459



1226it [08:08,  2.30it/s][A
1227it [08:09,  2.35it/s][A
1228it [08:09,  2.39it/s][A
1229it [08:10,  2.43it/s][A
1230it [08:10,  2.46it/s][A
1231it [08:10,  2.49it/s][A
1232it [08:11,  2.50it/s][A
1233it [08:11,  2.52it/s][A
1234it [08:12,  2.53it/s][A
1235it [08:12,  2.53it/s][A
1236it [08:12,  2.53it/s][A
1237it [08:13,  2.52it/s][A
1238it [08:13,  2.53it/s][A
1239it [08:14,  2.53it/s][A
1240it [08:14,  2.52it/s][A
1241it [08:14,  2.53it/s][A
1242it [08:15,  2.53it/s][A
1243it [08:15,  2.53it/s][A
1244it [08:16,  2.53it/s][A
1245it [08:16,  2.54it/s][A
1246it [08:16,  2.54it/s][A
1247it [08:17,  2.55it/s][A
1248it [08:17,  2.55it/s][A
1249it [08:18,  2.55it/s][A
1250it [08:18,  2.54it/s][A
1251it [08:18,  2.54it/s][A
1252it [08:19,  2.53it/s][A
1253it [08:19,  2.53it/s][A
1254it [08:20,  2.53it/s][A
1255it [08:20,  2.53it/s][A
1256it [08:20,  2.52it/s][A
1257it [08:21,  2.51it/s][A
1258it [08:21,  2.52it/s][A
1259it [08:22,  2.53it/s][A
1260it [08:22

train loss: 5.702193260192871
eval loss: 5.554844379425049



1277it [08:29,  2.34it/s][A
1278it [08:29,  2.39it/s][A
1279it [08:30,  2.42it/s][A
1280it [08:30,  2.45it/s][A
1281it [08:30,  2.48it/s][A
1282it [08:31,  2.49it/s][A
1283it [08:31,  2.51it/s][A
1284it [08:32,  2.51it/s][A
1285it [08:32,  2.53it/s][A
1286it [08:32,  2.54it/s][A
1287it [08:33,  2.53it/s][A
1288it [08:33,  2.53it/s][A
1289it [08:34,  2.53it/s][A
1290it [08:34,  2.53it/s][A
1291it [08:34,  2.52it/s][A
1292it [08:35,  2.52it/s][A
1293it [08:35,  2.52it/s][A
1294it [08:36,  2.52it/s][A
1295it [08:36,  2.51it/s][A
1296it [08:36,  2.51it/s][A
1297it [08:37,  2.52it/s][A
1298it [08:37,  2.52it/s][A
1299it [08:37,  2.53it/s][A
1300it [08:38,  2.53it/s][A
1301it [08:38,  2.54it/s][A
1302it [08:39,  2.55it/s][A
1303it [08:39,  2.54it/s][A
1304it [08:39,  2.54it/s][A
1305it [08:40,  2.53it/s][A
1306it [08:40,  2.53it/s][A
1307it [08:41,  2.53it/s][A
1308it [08:41,  2.53it/s][A
1309it [08:41,  2.53it/s][A
1310it [08:42,  2.52it/s][A
1311it [08:42

train loss: 5.752259254455566
eval loss: 5.394573211669922



1328it [08:49,  2.32it/s][A
1329it [08:50,  2.39it/s][A
1330it [08:50,  2.43it/s][A
1331it [08:50,  2.47it/s][A
1332it [08:51,  2.48it/s][A
1333it [08:51,  2.49it/s][A
1334it [08:51,  2.50it/s][A
1335it [08:52,  2.51it/s][A
1336it [08:52,  2.52it/s][A
1337it [08:53,  2.52it/s][A
1338it [08:53,  2.52it/s][A
1339it [08:53,  2.53it/s][A
1340it [08:54,  2.53it/s][A
1341it [08:54,  2.54it/s][A
1342it [08:55,  2.54it/s][A
1343it [08:55,  2.55it/s][A
1344it [08:55,  2.55it/s][A
1345it [08:56,  2.53it/s][A
1346it [08:56,  2.53it/s][A
1347it [08:57,  2.52it/s][A
1348it [08:57,  2.52it/s][A
1349it [08:57,  2.52it/s][A
1350it [08:58,  2.53it/s][A
1351it [08:58,  2.53it/s][A
1352it [08:59,  2.52it/s][A
1353it [08:59,  2.52it/s][A
1354it [08:59,  2.52it/s][A
1355it [09:00,  2.52it/s][A
1356it [09:00,  2.52it/s][A
1357it [09:01,  2.52it/s][A
1358it [09:01,  2.52it/s][A
1359it [09:01,  2.53it/s][A
1360it [09:02,  2.53it/s][A
1361it [09:02,  2.54it/s][A
1362it [09:03

train loss: 5.612976551055908
eval loss: 5.4909844398498535



1379it [09:09,  2.31it/s][A
1380it [09:10,  2.36it/s][A
1381it [09:10,  2.40it/s][A
1382it [09:11,  2.44it/s][A
1383it [09:11,  2.47it/s][A
1384it [09:11,  2.49it/s][A
1385it [09:12,  2.51it/s][A
1386it [09:12,  2.52it/s][A
1387it [09:13,  2.53it/s][A
1388it [09:13,  2.53it/s][A
1389it [09:13,  2.53it/s][A
1390it [09:14,  2.53it/s][A
1391it [09:14,  2.53it/s][A
1392it [09:15,  2.53it/s][A
1393it [09:15,  2.53it/s][A
1394it [09:15,  2.53it/s][A
1395it [09:16,  2.53it/s][A
1396it [09:16,  2.53it/s][A
1397it [09:17,  2.54it/s][A
1398it [09:17,  2.55it/s][A
1399it [09:17,  2.55it/s][A
1400it [09:18,  2.55it/s][A
1401it [09:18,  2.54it/s][A
1402it [09:19,  2.53it/s][A
1403it [09:19,  2.53it/s][A
1404it [09:19,  2.53it/s][A
1405it [09:20,  2.52it/s][A
1406it [09:20,  2.52it/s][A
1407it [09:21,  2.52it/s][A
1408it [09:21,  2.52it/s][A
1409it [09:21,  2.52it/s][A
1410it [09:22,  2.51it/s][A
1411it [09:22,  2.52it/s][A
1412it [09:22,  2.52it/s][A
1413it [09:23

train loss: 5.606976509094238
eval loss: 5.51396369934082



1430it [09:30,  2.33it/s][A
1431it [09:30,  2.38it/s][A
1432it [09:31,  2.41it/s][A
1433it [09:31,  2.45it/s][A
1434it [09:31,  2.47it/s][A
1435it [09:32,  2.49it/s][A
1436it [09:32,  2.50it/s][A
1437it [09:33,  2.50it/s][A
1438it [09:33,  2.51it/s][A
1439it [09:33,  2.51it/s][A
1440it [09:34,  2.51it/s][A
1441it [09:34,  2.51it/s][A
1442it [09:35,  2.53it/s][A
1443it [09:35,  2.54it/s][A
1444it [09:35,  2.54it/s][A
1445it [09:36,  2.55it/s][A
1446it [09:36,  2.53it/s][A
1447it [09:37,  2.53it/s][A
1448it [09:37,  2.52it/s][A
1449it [09:37,  2.53it/s][A
1450it [09:38,  2.52it/s][A
1451it [09:38,  2.52it/s][A
1452it [09:38,  2.52it/s][A
1453it [09:39,  2.52it/s][A
1454it [09:39,  2.53it/s][A
1455it [09:40,  2.51it/s][A
1456it [09:40,  2.51it/s][A
1457it [09:40,  2.52it/s][A
1458it [09:41,  2.53it/s][A
1459it [09:41,  2.54it/s][A
1460it [09:42,  2.55it/s][A
1461it [09:42,  2.55it/s][A
1462it [09:42,  2.54it/s][A
1463it [09:43,  2.53it/s][A
1464it [09:43

train loss: 5.608867645263672
eval loss: 5.621983051300049



1481it [09:50,  2.32it/s][A
1482it [09:51,  2.37it/s][A
1483it [09:51,  2.40it/s][A
1484it [09:51,  2.44it/s][A
1485it [09:52,  2.47it/s][A
1486it [09:52,  2.50it/s][A
1487it [09:53,  2.52it/s][A
1488it [09:53,  2.53it/s][A
1489it [09:53,  2.53it/s][A
1490it [09:54,  2.53it/s][A
1491it [09:54,  2.52it/s][A
1492it [09:54,  2.52it/s][A
1493it [09:55,  2.52it/s][A
1494it [09:55,  2.52it/s][A
1495it [09:56,  2.51it/s][A
1496it [09:56,  2.51it/s][A
1497it [09:56,  2.52it/s][A
1498it [09:57,  2.52it/s][A
1499it [09:57,  2.52it/s][A
1500it [09:58,  2.52it/s][A
1501it [09:58,  2.52it/s][A
1502it [09:58,  2.53it/s][A
1503it [09:59,  2.53it/s][A
1504it [09:59,  2.52it/s][A
1505it [10:00,  2.54it/s][A
1506it [10:00,  2.54it/s][A
1507it [10:00,  2.54it/s][A
1508it [10:01,  2.54it/s][A
1509it [10:01,  2.54it/s][A
1510it [10:02,  2.53it/s][A
1511it [10:02,  2.53it/s][A
1512it [10:02,  2.53it/s][A
1513it [10:03,  2.53it/s][A
1514it [10:03,  2.52it/s][A
1515it [10:04

train loss: 5.515800952911377
eval loss: 5.511759281158447



1532it [10:10,  2.69it/s][A
1533it [10:10,  2.64it/s][A
1534it [10:11,  2.61it/s][A
1535it [10:11,  2.58it/s][A
1536it [10:12,  2.57it/s][A
1537it [10:12,  2.56it/s][A
1538it [10:12,  2.55it/s][A
1539it [10:13,  2.53it/s][A
1540it [10:13,  2.52it/s][A
1541it [10:14,  2.53it/s][A
1542it [10:14,  2.52it/s][A
1543it [10:14,  2.53it/s][A
1544it [10:15,  2.54it/s][A
1545it [10:15,  2.55it/s][A
1546it [10:16,  2.55it/s][A
1547it [10:16,  2.55it/s][A
1548it [10:16,  2.55it/s][A
1549it [10:17,  2.54it/s][A
1550it [10:17,  2.53it/s][A
1551it [10:18,  2.53it/s][A
1552it [10:18,  2.53it/s][A
1553it [10:18,  2.54it/s][A
1554it [10:19,  2.53it/s][A
1555it [10:19,  2.53it/s][A
1556it [10:20,  2.53it/s][A
1557it [10:20,  2.54it/s][A
1558it [10:20,  2.54it/s][A
1559it [10:21,  2.55it/s][A
1560it [10:21,  2.55it/s][A
1561it [10:22,  2.55it/s][A
1562it [10:22,  2.54it/s][A
1563it [10:22,  2.54it/s][A
1564it [10:23,  2.54it/s][A
1565it [10:23,  2.54it/s][A
1566it [10:23

train loss: 5.570403099060059
eval loss: 5.483953475952148



1583it [10:30,  2.32it/s][A
1584it [10:31,  2.38it/s][A
1585it [10:31,  2.43it/s][A
1586it [10:32,  2.46it/s][A
1587it [10:32,  2.48it/s][A
1588it [10:32,  2.49it/s][A
1589it [10:33,  2.50it/s][A
1590it [10:33,  2.51it/s][A
1591it [10:34,  2.51it/s][A
1592it [10:34,  2.52it/s][A
1593it [10:34,  2.52it/s][A
1594it [10:35,  2.52it/s][A
1595it [10:35,  2.53it/s][A
1596it [10:36,  2.53it/s][A
1597it [10:36,  2.53it/s][A
1598it [10:36,  2.53it/s][A
1599it [10:37,  2.54it/s][A
1600it [10:37,  2.55it/s][A
1601it [10:37,  2.54it/s][A
1602it [10:38,  2.54it/s][A
1603it [10:38,  2.53it/s][A
1604it [10:39,  2.53it/s][A
1605it [10:39,  2.52it/s][A
1606it [10:39,  2.51it/s][A
1607it [10:40,  2.52it/s][A
1608it [10:40,  2.52it/s][A
1609it [10:41,  2.52it/s][A
1610it [10:41,  2.52it/s][A
1611it [10:41,  2.52it/s][A
1612it [10:42,  2.52it/s][A
1613it [10:42,  2.53it/s][A
1614it [10:43,  2.54it/s][A
1615it [10:43,  2.55it/s][A
1616it [10:43,  2.55it/s][A
1617it [10:44

train loss: 5.497532844543457
eval loss: 5.497312068939209



1634it [10:51,  2.32it/s][A
1635it [10:51,  2.38it/s][A
1636it [10:51,  2.43it/s][A
1637it [10:52,  2.46it/s][A
1638it [10:52,  2.49it/s][A
1639it [10:53,  2.52it/s][A
1640it [10:53,  2.53it/s][A
1641it [10:53,  2.52it/s][A
1642it [10:54,  2.53it/s][A
1643it [10:54,  2.53it/s][A
1644it [10:55,  2.53it/s][A
1645it [10:55,  2.53it/s][A
1646it [10:55,  2.53it/s][A
1647it [10:56,  2.53it/s][A
1648it [10:56,  2.53it/s][A
1649it [10:57,  2.53it/s][A
1650it [10:57,  2.54it/s][A
1651it [10:57,  2.55it/s][A
1652it [10:58,  2.54it/s][A
1653it [10:58,  2.54it/s][A
1654it [10:59,  2.54it/s][A
1655it [10:59,  2.54it/s][A
1656it [10:59,  2.53it/s][A
1657it [11:00,  2.52it/s][A
1658it [11:00,  2.53it/s][A
1659it [11:01,  2.53it/s][A
1660it [11:01,  2.53it/s][A
1661it [11:01,  2.52it/s][A
1662it [11:02,  2.52it/s][A
1663it [11:02,  2.53it/s][A
1664it [11:03,  2.53it/s][A
1665it [11:03,  2.54it/s][A
1666it [11:03,  2.54it/s][A
1667it [11:04,  2.55it/s][A
1668it [11:04

train loss: 5.473233699798584
eval loss: 5.531450271606445



1685it [11:11,  2.32it/s][A
1686it [11:11,  2.38it/s][A
1687it [11:12,  2.41it/s][A
1688it [11:12,  2.45it/s][A
1689it [11:13,  2.48it/s][A
1690it [11:13,  2.50it/s][A
1691it [11:13,  2.52it/s][A
1692it [11:14,  2.53it/s][A
1693it [11:14,  2.54it/s][A
1694it [11:15,  2.54it/s][A
1695it [11:15,  2.53it/s][A
1696it [11:15,  2.53it/s][A
1697it [11:16,  2.53it/s][A
1698it [11:16,  2.53it/s][A
1699it [11:17,  2.53it/s][A
1700it [11:17,  2.53it/s][A
1701it [11:17,  2.52it/s][A
1702it [11:18,  2.52it/s][A
1703it [11:18,  2.52it/s][A
1704it [11:18,  2.53it/s][A
1705it [11:19,  2.54it/s][A
1706it [11:19,  2.55it/s][A
1707it [11:20,  2.55it/s][A
1708it [11:20,  2.55it/s][A
1709it [11:20,  2.53it/s][A
1710it [11:21,  2.53it/s][A
1711it [11:21,  2.53it/s][A
1712it [11:22,  2.53it/s][A
1713it [11:22,  2.52it/s][A
1714it [11:22,  2.53it/s][A
1715it [11:23,  2.51it/s][A
1716it [11:23,  2.52it/s][A
1717it [11:24,  2.53it/s][A
1718it [11:24,  2.53it/s][A
1719it [11:24

train loss: 5.515231132507324
eval loss: 5.629790306091309



1736it [11:31,  2.33it/s][A
1737it [11:32,  2.38it/s][A
1738it [11:32,  2.42it/s][A
1739it [11:32,  2.45it/s][A
1740it [11:33,  2.47it/s][A
1741it [11:33,  2.49it/s][A
1742it [11:34,  2.50it/s][A
1743it [11:34,  2.51it/s][A
1744it [11:34,  2.52it/s][A
1745it [11:35,  2.52it/s][A
1746it [11:35,  2.53it/s][A
1747it [11:36,  2.54it/s][A
1748it [11:36,  2.54it/s][A
1749it [11:36,  2.54it/s][A
1750it [11:37,  2.53it/s][A
1751it [11:37,  2.53it/s][A
1752it [11:38,  2.52it/s][A
1753it [11:38,  2.53it/s][A
1754it [11:38,  2.51it/s][A
1755it [11:39,  2.50it/s][A
1756it [11:39,  2.51it/s][A
1757it [11:40,  2.51it/s][A
1758it [11:40,  2.52it/s][A
1759it [11:40,  2.52it/s][A
1760it [11:41,  2.52it/s][A
1761it [11:41,  2.53it/s][A
1762it [11:42,  2.53it/s][A
1763it [11:42,  2.54it/s][A
1764it [11:42,  2.55it/s][A
1765it [11:43,  2.55it/s][A
1766it [11:43,  2.54it/s][A
1767it [11:44,  2.54it/s][A
1768it [11:44,  2.54it/s][A
1769it [11:44,  2.53it/s][A
1770it [11:45

train loss: 5.523017883300781
eval loss: 5.502058506011963



1787it [11:52,  2.32it/s][A
1788it [11:52,  2.38it/s][A
1789it [11:52,  2.43it/s][A
1790it [11:53,  2.47it/s][A
1791it [11:53,  2.49it/s][A
1792it [11:54,  2.50it/s][A
1793it [11:54,  2.51it/s][A
1794it [11:54,  2.51it/s][A
1795it [11:55,  2.51it/s][A
1796it [11:55,  2.52it/s][A
1797it [11:56,  2.52it/s][A
1798it [11:56,  2.52it/s][A
1799it [11:56,  2.52it/s][A
1800it [11:57,  2.52it/s][A
1801it [11:57,  2.52it/s][A
1802it [11:58,  2.51it/s][A
1803it [11:58,  2.51it/s][A
1804it [11:58,  2.52it/s][A
1805it [11:59,  2.52it/s][A
1806it [11:59,  2.53it/s][A
1807it [12:00,  2.55it/s][A
1808it [12:00,  2.55it/s][A
1809it [12:00,  2.54it/s][A
1810it [12:01,  2.54it/s][A
1811it [12:01,  2.53it/s][A
1812it [12:02,  2.54it/s][A
1813it [12:02,  2.52it/s][A
1814it [12:02,  2.51it/s][A
1815it [12:03,  2.52it/s][A
1816it [12:03,  2.51it/s][A
1817it [12:04,  2.52it/s][A
1818it [12:04,  2.52it/s][A
1819it [12:04,  2.52it/s][A
1820it [12:05,  2.53it/s][A
1821it [12:05

train loss: 5.60330057144165
eval loss: 5.472036838531494



1838it [12:12,  2.34it/s][A
1839it [12:12,  2.39it/s][A
1840it [12:13,  2.43it/s][A
1841it [12:13,  2.46it/s][A
1842it [12:14,  2.49it/s][A
1843it [12:14,  2.51it/s][A
1844it [12:14,  2.52it/s][A
1845it [12:15,  2.53it/s][A
1846it [12:15,  2.53it/s][A
1847it [12:15,  2.52it/s][A
1848it [12:16,  2.51it/s][A
1849it [12:16,  2.52it/s][A
1850it [12:17,  2.51it/s][A
1851it [12:17,  2.50it/s][A
1852it [12:17,  2.50it/s][A
1853it [12:18,  2.51it/s][A
1854it [12:18,  2.51it/s][A
1855it [12:19,  2.52it/s][A
1856it [12:19,  2.52it/s][A
1857it [12:19,  2.52it/s][A
1858it [12:20,  2.51it/s][A
1859it [12:20,  2.51it/s][A
1860it [12:21,  2.52it/s][A
1861it [12:21,  2.53it/s][A
1862it [12:21,  2.54it/s][A
1863it [12:22,  2.55it/s][A
1864it [12:22,  2.55it/s][A
1865it [12:23,  2.54it/s][A
1866it [12:23,  2.54it/s][A
1867it [12:23,  2.54it/s][A
1868it [12:24,  2.53it/s][A
1869it [12:24,  2.52it/s][A
1870it [12:25,  2.51it/s][A
1871it [12:25,  2.52it/s][A
1872it [12:25

train loss: 5.465083599090576
eval loss: 5.492844581604004



1889it [12:32,  2.31it/s][A
1890it [12:33,  2.38it/s][A
1891it [12:33,  2.43it/s][A
1892it [12:33,  2.45it/s][A
1893it [12:34,  2.47it/s][A
1894it [12:34,  2.49it/s][A
1895it [12:35,  2.49it/s][A
1896it [12:35,  2.50it/s][A
1897it [12:35,  2.51it/s][A
1898it [12:36,  2.52it/s][A
1899it [12:36,  2.52it/s][A
1900it [12:37,  2.52it/s][A
1901it [12:37,  2.53it/s][A
1902it [12:37,  2.54it/s][A
1903it [12:38,  2.54it/s][A
1904it [12:38,  2.55it/s][A
1905it [12:39,  2.55it/s][A
1906it [12:39,  2.55it/s][A
1907it [12:39,  2.54it/s][A
1908it [12:40,  2.54it/s][A
1909it [12:40,  2.54it/s][A
1910it [12:41,  2.54it/s][A
1911it [12:41,  2.53it/s][A
1912it [12:41,  2.52it/s][A
1913it [12:42,  2.52it/s][A
1914it [12:42,  2.51it/s][A
1915it [12:43,  2.52it/s][A
1916it [12:43,  2.53it/s][A
1917it [12:43,  2.54it/s][A
1918it [12:44,  2.55it/s][A
1919it [12:44,  2.55it/s][A
1920it [12:45,  2.54it/s][A
1921it [12:45,  2.54it/s][A
1922it [12:45,  2.54it/s][A
1923it [12:46

train loss: 5.454954147338867
eval loss: 5.440124034881592



1940it [12:53,  2.31it/s][A
1941it [12:53,  2.38it/s][A
1942it [12:53,  2.43it/s][A
1943it [12:54,  2.46it/s][A
1944it [12:54,  2.48it/s][A
1945it [12:55,  2.50it/s][A
1946it [12:55,  2.50it/s][A
1947it [12:55,  2.51it/s][A
1948it [12:56,  2.52it/s][A
1949it [12:56,  2.52it/s][A
1950it [12:57,  2.52it/s][A
1951it [12:57,  2.52it/s][A
1952it [12:57,  2.53it/s][A
1953it [12:58,  2.54it/s][A
1954it [12:58,  2.55it/s][A
1955it [12:59,  2.55it/s][A
1956it [12:59,  2.55it/s][A
1957it [12:59,  2.55it/s][A
1958it [13:00,  2.54it/s][A
1959it [13:00,  2.53it/s][A
1960it [13:00,  2.53it/s][A
1961it [13:01,  2.53it/s][A
1962it [13:01,  2.51it/s][A
1963it [13:02,  2.51it/s][A
1964it [13:02,  2.51it/s][A
1965it [13:02,  2.51it/s][A
1966it [13:03,  2.51it/s][A
1967it [13:03,  2.51it/s][A
1968it [13:04,  2.52it/s][A
1969it [13:04,  2.53it/s][A
1970it [13:04,  2.53it/s][A
1971it [13:05,  2.54it/s][A
1972it [13:05,  2.55it/s][A
1973it [13:06,  2.55it/s][A
1974it [13:06

In [None]:
example = next(eval_it).text.cuda()
buffer = example.clone()
model.eval()
for i in range(example.size(0) // 2, example.size(0)):
  pred = model(buffer)
  buffer[i] = torch.multinomial(torch.exp(pred[i - 1, 0]), 1)

text = [field.vocab.itos[idx] for idx in buffer[:, 0]]
print('dataset portion:', text[: example.size(0) // 2])
print('predicted portion:', text[example.size(0) // 2 :])

dataset portion: ['is', 'a', 'highly', '<unk>', 'food', ',', 'and', 'is', 'widely', 'caught', 'using', 'lobster', '<unk>', ',', 'mostly', 'around']
predicted portion: ['any', '<unk>', 'and', 'other', 'methods', 'of', 'poor', 'and', 'hold', 'that', 'are', 'treated', 'to', 'a', 'person', '.']


In [None]:
|

['Hutchings',
 '(',
 'Aniston',
 'of',
 'University',
 'with',
 'in',
 'of',
 'urgent',
 'signal',
 '<unk>',
 'seller',
 ',',
 ',',
 'force',
 "'s",
 'she',
 'television',
 '!',
 'could',
 '.',
 ',',
 'package',
 'and',
 'Disciplina',
 ')',
 ',',
 '"',
 'had',
 ',',
 ',',
 '.',
 'founded',
 'in',
 '<unk>']

Adapted from [PyTorch](https://github.com/pytorch/examples/tree/master/word_language_model)