# Before You Start

I am not installing any packages in this framework, so make sure to create a Python environment (Conda preferred) which contains the required packages. This will boil down to:

- numpy
- [more_itertools](https://pypi.org/project/more-itertools/)
- PyTorch (including torchtext)
- [sentencepiece](https://github.com/google/sentencepiece)
- [tqdm](https://tqdm.github.io/)

If you want to run this book in Visual Studio Code, you additionally need to have Jupyter packages installed as well.

# Initialization

In [1]:
import logging
import torch

NOTEBOOK_NAME = 'transformer-enfr-pytorch'

# Training device.
DEVICE = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')

# Tokenization configuration.
SOURCE_VOCAB_SIZE = 32000
TARGET_VOCAB_SIZE = 32000
PAD = 0
BOS = 1 # beginning of statement
EOS = 2 # end of statement

# Transformer parameters.
FEEDFORWARD_DIM = 2048
EMBEDDING_DIM = 512
MAX_SENTENCE = 88

# Batching parameters.
BUCKETING_BOUNDARIES = [8, 16, 24, 32, 48, 64, 72, 80, 88, 96, 104, 112, 120, 128]
BUCKETING_SIZES = [196, 196, 196, 196, 132, 132, 132, 100, 100, 100, 64, 32, 32, 32, 32]

def configure_logging():
    logging.basicConfig(
        handlers=[logging.StreamHandler()],
        level=logging.INFO,
        format='%(asctime)s %(levelname)s %(message)s'
    )

    return logging.getLogger()


log = configure_logging()

# Load Training Data

In [2]:
from torchtext.datasets import IWSLT2016

TRAIN_DATA_CACHE = None
TEST_DATA_CACHE = None

SOURCE_LANG = 'en'
TARGET_LANG = 'fr'

def train_data_gen():
    global TRAIN_DATA_CACHE
    if TRAIN_DATA_CACHE is None:
        TRAIN_DATA_CACHE = list(IWSLT2016(split='train', language_pair=(SOURCE_LANG, TARGET_LANG)))
    def gen(dummy=None):
        return iter(TRAIN_DATA_CACHE)
    return gen

def test_data_gen():
    global TEST_DATA_CACHE
    if TEST_DATA_CACHE is None:
        TEST_DATA_CACHE = list(IWSLT2016(split='valid', language_pair=(SOURCE_LANG, TARGET_LANG)))
    def gen(dummy=None):
        return iter(TEST_DATA_CACHE)
    return gen

print(next(iter(train_data_gen()())))
print(next(iter(test_data_gen()())))



("David Gallo: This is Bill Lange. I'm Dave Gallo.\n", 'David Gallo: Voici Bill Lange. Je suis Dave Gallo.\n')
('Everything I do, and everything I do professionally -- my life -- has been shaped by seven years of work as a young man in Africa.\n', "Chaque chose que je fais, chaque chose que je fais professionnellement -- ma vie -- a été façonnée par sept ans de travail en Afrique quand j'étais un jeune.\n")


# Train a SentencePiece Text Tokenizer

More information about this can be found [here](https://github.com/google/sentencepiece).

In [3]:
# Load IMDb dataset and split it into training and test sets.
import os 
import itertools
import sentencepiece as spm

FILENAME_PREFIX = f'iwslt2016-{SOURCE_LANG}{TARGET_LANG}'

def data_filepath(filename):
    return os.path.join('./data', filename)

def build_sp_model(train, test, lang, vocab_size):
    prefix = data_filepath(f'{FILENAME_PREFIX}-{lang}-{vocab_size//1000}k')
    if os.path.isfile(f'{prefix}.model') and os.path.isfile(f'{prefix}.vocab'):
        # Model is already built; no need to do anything.
        return

    # Dump the training and test data to a text file.
    filename = data_filepath(f'{FILENAME_PREFIX}-{lang}-textdump.txt')
    if not os.path.isfile(filename):
        # Generate a unified text file containing the training and test data.
        with open(filename, 'w') as f:
            for stat in itertools.chain(train, test):
                f.write(stat.strip() + '\n')

    spm.SentencePieceTrainer.train(input=filename, model_prefix=prefix, vocab_size=SOURCE_VOCAB_SIZE)
            
build_sp_model(
    map(lambda x: x[0], train_data_gen()(None)), # Use 0 for source statement.
    map(lambda x: x[0], test_data_gen()(None)), # Use 0 for source statement.
    SOURCE_LANG,
    SOURCE_VOCAB_SIZE
)

build_sp_model(
    map(lambda x: x[1], train_data_gen()(None)), # Use 1 for target statement
    map(lambda x: x[1], test_data_gen()(None)),  # Use 1 for target statement
    TARGET_LANG,
    TARGET_VOCAB_SIZE
)

sentencepiece_trainer.cc(77) LOG(INFO) Starts training with : 
trainer_spec {
  input: ./data/iwslt2016-enfr-en-textdump.txt
  input_format: 
  model_prefix: ./data/iwslt2016-enfr-en-32k
  model_type: UNIGRAM
  vocab_size: 32000
  self_test_sample_size: 0
  character_coverage: 0.9995
  input_sentence_size: 0
  shuffle_input_sentence: 1
  seed_sentencepiece_size: 1000000
  shrinking_factor: 0.75
  max_sentence_length: 4192
  num_threads: 16
  num_sub_iterations: 2
  max_sentencepiece_length: 16
  split_by_unicode_script: 1
  split_by_number: 1
  split_by_whitespace: 1
  split_digits: 0
  treat_whitespace_as_suffix: 0
  allow_whitespace_only_pieces: 0
  required_chars: 
  byte_fallback: 0
  vocabulary_output_piece_score: 1
  train_extremely_large_corpus: 0
  hard_vocab_limit: 1
  use_all_vocab: 0
  unk_id: 0
  bos_id: 1
  eos_id: 2
  pad_id: -1
  unk_piece: <unk>
  bos_piece: <s>
  eos_piece: </s>
  pad_piece: <pad>
  unk_surface:  ⁇ 
  enable_differential_privacy: 0
  differential_priva

# Load and Test SP Model


Let's now load and test the SP models we just trained.

In [4]:
sp_source_model = spm.SentencePieceProcessor(model_file=data_filepath(
    f'{FILENAME_PREFIX}-{SOURCE_LANG}-{SOURCE_VOCAB_SIZE//1000}k.model'))
sp_target_model = spm.SentencePieceProcessor(model_file=data_filepath(
    f'{FILENAME_PREFIX}-{TARGET_LANG}-{TARGET_VOCAB_SIZE//1000}k.model'))
log.info(f"Test {SOURCE_LANG} tokenization with 'Hello, World!':")
log.info(f"{sp_source_model.encode('Hello, World!')}")
log.info(f"Test {TARGET_LANG} tokenization with 'Bonjour le monde':")
log.info(f"{sp_target_model.encode('Bonjour le monde')}")

2023-02-11 17:18:46,358 INFO Test en tokenization with 'Hello, World!':
2023-02-11 17:18:46,359 INFO [4832, 3, 827, 0]
2023-02-11 17:18:46,359 INFO Test fr tokenization with 'Bonjour le monde':
2023-02-11 17:18:46,359 INFO [2932, 10, 81]


# Preprocessing


Before we can use the text in the transformer, we have to convert it batches of tensor arrays 
that the transformer can understand. In this section, we will define a bunch of preprocessors
that will help us do so.

### Tokenization

First, we need to be able to use the SentencePiece tokenizers we defined above to convert text into a sequence of tokens:

In [5]:
def Tokenize(source_model, target_model):
    def tokenize(iterable):
        for source, target in iterable:
            yield np.array(source_model.encode(source)), np.array(target_model.encode(target))

    return lambda obj: tokenize(obj)

### Statement Boundary

Additionally, to know the boundary of a statement, we define a data pre-processing layer for adding BOS and EOS to each sequence.

In [10]:
import numpy as np

def StatementBoundary():
    def end_of_statement(iterable):
        for source, target in iterable:
            # Yield three fields:
            # 1. The sentence we use as the input to the encoder.
            # 2. The sentence we use as the ipnut to the decoder.
            # 3. The sentence we use as the target for the decoder.
            yield (
                # Encoder source
                np.concatenate((source, [EOS])),
                # Decoder source
                np.concatenate(([BOS], target)),
                # Decoder target.
                np.concatenate((target, [EOS]))
            )

    return lambda obj: end_of_statement(obj)

### Bucketing

Bucket the training data by length to improve training speed. Bucketing by length means trying to generate buckets with samples having similar length to avoid large padding.

In [11]:
import math
import numpy as np

def BucketByLength(bucket_boundaries, batch_sizes):
    def pad(iterable, padded_size):
        for el in iterable:
            enc_src, dec_src, dec_trg = el
            if max(len(enc_src), len(dec_src), len(dec_trg)) > padded_size:
                raise ValueError(f'Source or target sentence is longer than required padding size.')
            yield (
                np.pad(enc_src, (PAD, padded_size - len(enc_src))),
                np.pad(dec_src, (PAD, padded_size - len(dec_src))),
                np.pad(dec_trg, (PAD, padded_size - len(dec_trg))),
            )

    bucket_boundaries = bucket_boundaries + [math.inf]  # Max boundary is unlimited.
    def gen(iterable):
        buckets = [[] for _ in bucket_boundaries]
        for el in iterable:
            enc_src, dec_src, dec_trg = el
            bucket_idx = min([
                idx
                for idx, boundary in enumerate(bucket_boundaries)
                if max(len(enc_src), len(dec_src), len(dec_trg)) < boundary
            ])
            buckets[bucket_idx].append(el)
            if len(buckets[bucket_idx]) == batch_sizes[bucket_idx]:
                bucket_boundary = bucket_boundaries[bucket_idx]
                if math.isinf(bucket_boundary):
                    # Find the maximum possible sentence and consider it to be
                    # the bucket size.
                    bucket_boundary = max([
                        max(len(es), len(ds), len(dt))
                        for es, ds, dt in buckets[bucket_idx]
                    ])
                padded = list(pad(buckets[bucket_idx], bucket_boundary))
                yield (
                    np.array(list(map(lambda x: x[0], padded))), # encoder source
                    np.array(list(map(lambda x: x[1], padded))), # decoder source
                    np.array(list(map(lambda x: x[2], padded))), # decoder target
                )
                buckets[bucket_idx] = []
        # TODO yield the remaining buckets.
    return gen


### Removing Long Sentences

We need to limit the input we send to our model, so we define a layer for that.

In [12]:
def FilterByLength(max_length, min_length=0, length_keys=None):
    def _length_fn(x, length_keys):
        if length_keys is not None and isinstance(x, (list, tuple)):
            return max(len(x[idx]) for idx in length_keys)
        elif length_keys is not None and isinstance(x, dict):
            return max(len(x[key]) for key in length_keys)
        else:
            return len(x)

    def filtered(iter):
        for el in iter:
            el_len = _length_fn(el, length_keys)

            # Checking boundaries.
            if max_length is not None and (
                el_len > max_length or
                el_len < min_length
            ):
                continue
            # Within bounds.
            yield el
    return filtered


### Shuffle

To improve the training, we randomize the training data:

In [13]:
import random 
from more_itertools import chunked

def Shuffle(queue_size, seed=None):
    def shuffle(iterable, queue_size, seed=None):
        rnd = random.Random(seed)

        for chunk in chunked(iterable, queue_size):
            chunk = list(chunk)
            rnd.shuffle(chunk)
            for item in chunk:
                yield item


    return lambda iterable: shuffle(iterable, queue_size, seed)


### Combining Layers

Finally, having defined the filters above, we need to be able to execute them in serial:

In [14]:
def Serial(*layers):
    """Combines data processing layers into a single serial layer."""
    def serial(iterable=None):
        for layer in layers:
            iterable = layer(iterable)
        return iterable
    return serial


### Putting it altogether

Having defined all the required preprocessing layers, we can now use them for loading the training and test data.

In [15]:
data_loader = Serial(
    train_data_gen(),
    Tokenize(sp_source_model, sp_target_model),
    StatementBoundary(),
    FilterByLength(MAX_SENTENCE, length_keys=[0, 1, 2]),
    Shuffle(1024, seed=10),
    BucketByLength(BUCKETING_BOUNDARIES, BUCKETING_SIZES),
)

sample_enc_source_batch, sample_dec_source_batch, sample_dec_target_batch = next(data_loader())
sample_enc_source_batch, sample_dec_source_batch, sample_dec_target_batch 

(array([[   59,    74, 11141, ...,     0,     0,     0],
        [  363,    64,   148, ...,     0,     0,     0],
        [  116,    24,  3128, ...,     2,     0,     0],
        ...,
        [ 9799,    56,   320, ...,     0,     0,     0],
        [   98,    16,   106, ...,     0,     0,     0],
        [   98,    16,   418, ...,     0,     0,     0]]),
 array([[    1,    74,   577, ...,     0,     0,     0],
        [    1,  2638,    23, ...,     0,     0,     0],
        [    1,   132,    38, ...,     5,     0,     0],
        ...,
        [    1, 11235,   160, ...,     0,     0,     0],
        [    1,    45,     4, ...,     0,     0,     0],
        [    1,   106,   520, ...,     0,     0,     0]]),
 array([[   74,   577,    16, ...,     0,     0,     0],
        [ 2638,    23,   450, ...,     0,     0,     0],
        [  132,    38,    44, ...,     2,     0,     0],
        ...,
        [11235,   160,   420, ...,     0,     0,     0],
        [   45,     4,    18, ...,     0,    

Let's also print the original text for comparison purposes:

In [16]:
from itertools import islice

for enc_src, dec_src, dec_trg in islice(zip(sample_enc_source_batch, sample_dec_target_batch, sample_dec_source_batch), 3):
    print(sp_source_model.decode(enc_src.tolist()))
    print(sp_target_model.decode(dec_src.tolist()))
    print(sp_target_model.decode(dec_trg.tolist()))
    print()

We had marginal profit -- I did. ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇ 
Nous avions un bénéfice marginal - je l'ai fait. ⁇  ⁇  ⁇ 
Nous avions un bénéfice marginal - je l'ai fait. ⁇  ⁇  ⁇ 

Let me take you on a little tour. ⁇  ⁇  ⁇  ⁇  ⁇  ⁇ 
Permettez-moi de vous en donner un petit aperçu. ⁇  ⁇  ⁇  ⁇ 
Permettez-moi de vous en donner un petit aperçu. ⁇  ⁇  ⁇  ⁇ 

They are neither feral nor myopically self-absorbed. ⁇  ⁇ 
Ils ne sont ni des sauvages ni des myopes égocentriques. ⁇  ⁇ 
Ils ne sont ni des sauvages ni des myopes égocentriques. ⁇  ⁇ 



# Encoding

Now, we need to build PyTorch modules for:

- **Positional encoding**: Since a transformer takes all tokens in parallel, we need a mechanism to know the relative positions of the tokens in the sequence.
- **Embedding**: This is the usual embedding to convert a word into an n-dimensional vector.

For more information, see Vaswani et al. 2017.

In [17]:
import math
import torch
from torch import nn

class PositionalEncoding(nn.Module):
    def __init__(self,
                 emb_size: int,
                 dropout: float,
                 maxlen: int):
        super(PositionalEncoding, self).__init__()
        self.pos_embedding = nn.Embedding(maxlen, emb_size)
        self.dropout = nn.Dropout(dropout)
        self.register_buffer('indices', torch.arange(0, maxlen))

    def forward(self, token_embedding):
        return self.dropout(token_embedding + self.pos_embedding(self.indices[:token_embedding.size(1)]))

class TokenEmbedding(nn.Module):
    def __init__(self, vocab_size: int, emb_size):
        super(TokenEmbedding, self).__init__()
        self.embedding = nn.Embedding(vocab_size, emb_size)
        self.emb_size = emb_size

    def forward(self, tokens):
        return self.embedding(tokens) * math.sqrt(self.emb_size)


# Transformer

Now we can define our transformer model.

In [18]:
from torch.nn import Transformer

class Seq2SeqTransformer(nn.Module):
    def __init__(self,
                 emb_size,
                 source_vocab_size,
                 target_vocab_size,
                 max_len,
                 nhead = 8,
                 num_encoder_layers = 6,
                 num_decoder_layers = 6,
                 dim_feedforward = 2048,
                 dropout = 0.1):
        super().__init__()
        self.source_token_emb = TokenEmbedding(source_vocab_size, emb_size)
        self.target_token_emb = TokenEmbedding(target_vocab_size, emb_size)
        self.positional_encoding = PositionalEncoding(emb_size, dropout=dropout, maxlen=max_len)
        self.transformer = Transformer(d_model=emb_size,
                                       nhead=nhead,
                                       num_encoder_layers=num_encoder_layers,
                                       num_decoder_layers=num_decoder_layers,
                                       dim_feedforward=dim_feedforward,
                                       dropout=dropout,
                                       batch_first=True)
        self.generator = nn.Linear(emb_size, target_vocab_size)

    def __create_source_masks(self, source_batch):
        source_seq_len = source_batch.shape[1]

        
        # TODO Bad practice to use a hard-coded device.
        source_mask = torch.zeros((source_seq_len, source_seq_len), device=DEVICE).type(torch.bool)
        source_padding_mask = (source_batch == PAD)

        return source_mask, source_padding_mask

    def __create_target_masks(self, target_batch):
        target_seq_len = target_batch.shape[1]

        # TODO Bad practice to use a hard-coded device.
        target_mask = self.transformer.generate_square_subsequent_mask(target_seq_len).to(DEVICE)
        target_padding_mask = (target_batch == PAD)

        return target_mask, target_padding_mask

    def forward(self, source_batch, target_batch):                
        source_mask, source_padding_mask = self.__create_source_masks(source_batch)
        target_mask, target_padding_mask = self.__create_target_masks(target_batch)

        source_emb = self.positional_encoding(self.source_token_emb(source_batch))
        target_emb = self.positional_encoding(self.target_token_emb(target_batch))

        outs = self.transformer.forward(
            source_emb, target_emb, # source and target
            source_mask, target_mask, None, # source, target, and memory masks
            source_padding_mask, target_padding_mask, source_padding_mask # source, target, and memory padding mask
        )
        return self.generator(outs)

    def encode(self, source_batch):
        source_mask, _ = self.__create_source_masks(source_batch)
        source_encoded = self.positional_encoding(self.source_token_emb(source_batch))
        return self.transformer.encoder(source_encoded, source_mask)

    def decode(self, target_batch, memory):
        target_mask, _ = self.__create_target_masks(target_batch)
        target_encoded = self.positional_encoding(self.target_token_emb(target_batch))
        return self.transformer.decoder(target_encoded, memory, target_mask)

Let's try it out:

In [19]:
seq2seq = Seq2SeqTransformer(EMBEDDING_DIM, SOURCE_VOCAB_SIZE, TARGET_VOCAB_SIZE, max_len=MAX_SENTENCE)
seq2seq.to(DEVICE)

Seq2SeqTransformer(
  (source_token_emb): TokenEmbedding(
    (embedding): Embedding(32000, 512)
  )
  (target_token_emb): TokenEmbedding(
    (embedding): Embedding(32000, 512)
  )
  (positional_encoding): PositionalEncoding(
    (pos_embedding): Embedding(88, 512)
    (dropout): Dropout(p=0.1, inplace=False)
  )
  (transformer): Transformer(
    (encoder): TransformerEncoder(
      (layers): ModuleList(
        (0): TransformerEncoderLayer(
          (self_attn): MultiheadAttention(
            (out_proj): NonDynamicallyQuantizableLinear(in_features=512, out_features=512, bias=True)
          )
          (linear1): Linear(in_features=512, out_features=2048, bias=True)
          (dropout): Dropout(p=0.1, inplace=False)
          (linear2): Linear(in_features=2048, out_features=512, bias=True)
          (norm1): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
          (norm2): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
          (dropout1): Dropout(p=0.1, inplace=Fals

In [20]:
from torch import tensor
test_source_batch = torch.randint(0, SOURCE_VOCAB_SIZE, (10, MAX_SENTENCE), device=DEVICE)
test_target_batch = torch.randint(0, TARGET_VOCAB_SIZE, (10, MAX_SENTENCE), device=DEVICE)
seq2seq(test_source_batch, test_target_batch).shape

torch.Size([10, 88, 32000])

Notice that the output is of shape `(batch_size, seq_len, target_vocab_size)`.

Let's also compute the total number of parameters in our transformer.

In [21]:
sum(p.numel() for p in seq2seq.parameters())

93369600

# Beam Search-based Translation

To be able to find good results using an auto-regressive model, [beam search](https://en.wikipedia.org/wiki/Beam_search) is typically used.

In [22]:
from torch.nn import functional as F

@torch.no_grad()
def translate_sentence_beam(self, tokenized_source, sp_target_model, max_output_len, max_candidates=4):
    class Node(object):
        def __init__(self, word, previous_node, probability):
            self.word = word
            self.probability = probability
            self.previous_node = previous_node
            self.len = 1 if previous_node is None else previous_node.len + 1

        def word_list(self):
            if self.previous_node is None:
                return [self.word]
            else:
                return self.previous_node.word_list() + [self.word]

        def sentence(self):
            return sp_target_model.decode(self.word_list())
        
        def total_probability(self):
            if self.previous_node is None:
                return self.probability
            else:
                return self.probability + self.previous_node.total_probability()

    memory = self.encode(tokenized_source)

    queue = [Node(BOS, None, 0.0)]
    candidates = []
    
    while queue:
        new_queue = []
        
        for node in queue:
            out = self.decode(
                tensor([node.word_list()]).to(DEVICE),
                memory,
            )
            predictions = F.log_softmax(self.generator(out), dim=-1)
            
            best_guesses_idxs = predictions.argsort(dim=2).flip(2)[0, -1, :max_candidates]
            best_guesses_probs = predictions[0, -1, best_guesses_idxs]
            
            # Convert to a Python list for easier manipulation.
            best_guesses_idxs = best_guesses_idxs.cpu().detach().numpy().tolist()
            best_guesses_probs = best_guesses_probs.cpu().detach().numpy().tolist()
            
            for word, prob in zip(best_guesses_idxs, best_guesses_probs):
                new_queue.append(Node(
                    word,
                    node,
                    prob,
                ))
        
        queue = []
        
        for node in sorted(new_queue, key=lambda n: n.total_probability(), reverse=True):
            if node.word == EOS or node.len >= max_output_len:
                candidates.append(node)
            else:
                queue.append(node)
                if len(queue) == max_candidates:
                    break
        
        
        if len(candidates) == 2*max_candidates:
            # If we have many candidates already, we break even though there are
            # still some paths to try.
            break        

    best_candidate = sorted(candidates, key=lambda n: n.total_probability(), reverse=True)[0]
    
    return best_candidate.sentence()


# Training!!

In [23]:
from itertools import islice, product

import torch
from torch.nn import CrossEntropyLoss
from torch.optim import Adam 
from tqdm.notebook import tqdm
from torch.optim.lr_scheduler import ReduceLROnPlateau, StepLR

MODEL_NAME = NOTEBOOK_NAME.replace('.ipynb', '_model.pt')

def save_model(model, optimizer, epoch, batch_idx):
    training_state = {
        'model_state_dict': model.state_dict(),
        'optimizer_state_dict': optimizer.state_dict(),
        'epoch': epoch,
        'batch_idx': batch_idx,
    }
    torch.save(training_state, data_filepath(f'{MODEL_NAME}.state'))

def load_model(model, optimizer):
    try:
        training_state = torch.load(data_filepath(f'{MODEL_NAME}.state'))

        model.load_state_dict(training_state['model_state_dict'])
        optimizer.load_state_dict(training_state['optimizer_state_dict'])

        return (training_state['epoch'], training_state['batch_idx'])
    except Exception as e:
        log.info(f"Couldn't find a saved model. {e}")
        return (None, None)


def train():
    # Initialize the model.
    seq2seq = Seq2SeqTransformer(
        EMBEDDING_DIM,
        SOURCE_VOCAB_SIZE,
        TARGET_VOCAB_SIZE,
        max_len=128,
        dropout=0.1
    )
    for p in seq2seq.parameters():
        if p.dim() > 1:
            nn.init.xavier_uniform_(p)
    seq2seq.to(DEVICE)

    loss_fn = CrossEntropyLoss(ignore_index=PAD)
    optimizer = Adam(seq2seq.parameters(), lr=0.0001)
    scheduler = ReduceLROnPlateau(optimizer, factor=0.1, threshold=0.005, patience=0)

    # Check if we have a saved model to resume training from.
    start_epoch, start_batch_idx = load_model(seq2seq, optimizer)
    if start_epoch is None:
        log.info("Couldn't find a saved model. Starting from scratch.")
    else:
        log.info(f"Found a saved model. Starting from epoch {start_epoch} and batch index {start_batch_idx}.")

    num_batches = sum(1 for _ in data_loader())
    save_model_every = num_batches // 5 # save the model 5 times in each echo.

    log.info("During training, we will test with the following two sentences:")
    log.info(f"Source Statement: ")
    log.info(f"{sp_source_model.decode(sample_enc_source_batch[0].tolist())}")
    log.info(f"Target Statement: ")
    log.info(f"{sp_target_model.decode(sample_dec_target_batch[0].tolist())}")
    batch_counter = 0

    try:
        for epoch in tqdm(range(30)):
            total_loss = 0
            iter_count = 0

            if start_epoch is not None:
                # Resuming an interrupted training.
                if epoch < start_epoch:
                    continue
                else:
                    start_epoch = None

            seq2seq.train()
            log.info(f"Starting epoch {epoch + 1}")
            for batch_idx, (enc_src, dec_src, dec_trg) in tqdm(enumerate(islice(data_loader(), num_batches)), total=num_batches):
                # Resume training from the last index if there was an interrupted training.
                if start_batch_idx is not None:
                    if batch_idx < start_batch_idx:
                        continue
                    else:
                        start_batch_idx = None

                enc_src = tensor(enc_src).to(DEVICE)
                dec_src = tensor(dec_src).to(DEVICE)
                dec_trg = tensor(dec_trg).to(DEVICE)

                # Use the model to make predictions.
                # Notice that for the target sentence, we pass the expected statement from the beginning
                # up until the word before the last, and expect the model to predict the sentence from
                # the first word onwards.
                predictions = seq2seq(enc_src, dec_src)

                # Calculate loss and update the parameters.
                optimizer.zero_grad()
                loss = loss_fn(
                    predictions.reshape(-1, predictions.shape[-1]),
                    dec_trg.reshape(-1)
                )
                loss.backward()
                total_loss += loss.item()
                iter_count += 1
                
                torch.nn.utils.clip_grad_norm_(seq2seq.parameters(), max_norm=1)
                
                optimizer.step()

                # See whether it is time to save.
                batch_counter += 1
                if batch_counter >= save_model_every:
                    log.info('Saving training state...')
                    save_model(seq2seq, optimizer, epoch, batch_idx)
                    batch_counter = 0
                    
                    # Try the model.
                    seq2seq.eval()

                    # Try the model after training.
                    translation = translate_sentence_beam(
                        seq2seq,
                        tensor([sample_enc_source_batch[0]]).to(DEVICE),
                        sp_target_model,
                        # we can use dec_src or dec_trg here, they are of the same length, just
                        # dec_src has BOS at the beginning and dec_trg has EOS at the end.
                        max_output_len=dec_trg.shape[1]
                    )

                    log.info(f"Sample translation at epoch {epoch+1} and index {batch_idx+1}.")
                    log.info(f"Predicted Statement: {translation}.")

                    seq2seq.train()
            
            avg_loss = total_loss / iter_count
            scheduler.step(avg_loss)
            log.info(f"Epoch {epoch + 1} finished. Total loss is {avg_loss}. Current learning rate is {optimizer.param_groups[0]['lr']}")
                        

    except KeyboardInterrupt:
        log.info("Training stopped because the user pressed Ctrl-C. Returning the model to the state it was just before interruption.")
        pass


    return seq2seq, optimizer

seq2seq, optimizer = train()




2023-02-11 17:20:20,898 INFO Couldn't find a saved model. [Errno 2] No such file or directory: './data/transformer-enfr-pytorch.state'
2023-02-11 17:20:20,898 INFO Couldn't find a saved model. Starting from scratch.
2023-02-11 17:20:31,149 INFO During training, we will test with the following two sentences:
2023-02-11 17:20:31,150 INFO Source Statement: 
2023-02-11 17:20:31,150 INFO We had marginal profit -- I did. ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇ 
2023-02-11 17:20:31,150 INFO Target Statement: 
2023-02-11 17:20:31,150 INFO Nous avions un bénéfice marginal - je l'ai fait. ⁇  ⁇  ⁇ 


  0%|          | 0/30 [00:00<?, ?it/s]

2023-02-11 17:20:31,158 INFO Starting epoch 1


  0%|          | 0/1259 [00:00<?, ?it/s]

2023-02-11 17:21:11,972 INFO Saving training state...
  tensor([sample_enc_source_batch[0]]).to(DEVICE),
2023-02-11 17:21:12,939 INFO Sample translation at epoch 1 and index 251.
2023-02-11 17:21:12,940 INFO Predicted Statement: C'est pas..
2023-02-11 17:21:55,702 INFO Saving training state...
2023-02-11 17:21:56,997 INFO Sample translation at epoch 1 and index 502.
2023-02-11 17:21:56,997 INFO Predicted Statement: Et j'ai dit que j'ai dit, je pense que nous.
2023-02-11 17:22:37,516 INFO Saving training state...
2023-02-11 17:22:39,186 INFO Sample translation at epoch 1 and index 753.
2023-02-11 17:22:39,186 INFO Predicted Statement: Et j'ai commencé à l'intérieur de nous..
2023-02-11 17:23:19,814 INFO Saving training state...
2023-02-11 17:23:21,139 INFO Sample translation at epoch 1 and index 1004.
2023-02-11 17:23:21,140 INFO Predicted Statement: Nous avons commencé à dire que nous avons commencé à l'école..
2023-02-11 17:24:03,396 INFO Saving training state...
2023-02-11 17:24:04,6

  0%|          | 0/1259 [00:00<?, ?it/s]

2023-02-11 17:24:47,443 INFO Saving training state...
2023-02-11 17:24:48,678 INFO Sample translation at epoch 2 and index 247.
2023-02-11 17:24:48,678 INFO Predicted Statement: Nous avons commencé à faire, j'ai été en train d'être.
2023-02-11 17:25:29,295 INFO Saving training state...
2023-02-11 17:25:30,577 INFO Sample translation at epoch 2 and index 498.
2023-02-11 17:25:30,577 INFO Predicted Statement: Nous avons eu l'idée d'utiliser -- j'ai fait un.
2023-02-11 17:26:09,130 INFO Saving training state...
2023-02-11 17:26:10,441 INFO Sample translation at epoch 2 and index 749.
2023-02-11 17:26:10,441 INFO Predicted Statement: Nous avons eu l'intérieur -- j'ai fait l'intérieur de l'intérieur de l'intérieur de.
2023-02-11 17:26:49,194 INFO Saving training state...
2023-02-11 17:26:50,653 INFO Sample translation at epoch 2 and index 1000.
2023-02-11 17:26:50,654 INFO Predicted Statement: Nous avons eu l'avenir -- j'ai fait une série d'art..
2023-02-11 17:27:29,582 INFO Saving training

  0%|          | 0/1259 [00:00<?, ?it/s]

2023-02-11 17:28:09,932 INFO Saving training state...
2023-02-11 17:28:11,317 INFO Sample translation at epoch 3 and index 243.
2023-02-11 17:28:11,317 INFO Predicted Statement: Nous avons eu un plan -- j'ai fait un moyen d'utiliser l'intérieur..
2023-02-11 17:28:48,576 INFO Saving training state...
2023-02-11 17:28:49,837 INFO Sample translation at epoch 3 and index 494.
2023-02-11 17:28:49,837 INFO Predicted Statement: Nous avons eu un outil d'action -- j'ai fait un .
2023-02-11 17:29:29,244 INFO Saving training state...
2023-02-11 17:29:30,494 INFO Sample translation at epoch 3 and index 745.
2023-02-11 17:29:30,494 INFO Predicted Statement: Nous avions un moyen d'utiliser -- j'ai fait l'ensemble.
2023-02-11 17:30:09,960 INFO Saving training state...
2023-02-11 17:30:11,595 INFO Sample translation at epoch 3 and index 996.
2023-02-11 17:30:11,595 INFO Predicted Statement: Nous avons eu un moyen de faire -- j'ai fait l'ensemble de l'environnement..
2023-02-11 17:30:50,591 INFO Saving

  0%|          | 0/1259 [00:00<?, ?it/s]

2023-02-11 17:31:32,117 INFO Saving training state...
2023-02-11 17:31:33,474 INFO Sample translation at epoch 4 and index 239.
2023-02-11 17:31:33,474 INFO Predicted Statement: Nous avons eu un moyen de faire -- j'ai fait l'ensemble de l'objet de l'a.
2023-02-11 17:32:12,033 INFO Saving training state...
2023-02-11 17:32:13,288 INFO Sample translation at epoch 4 and index 490.
2023-02-11 17:32:13,289 INFO Predicted Statement: Nous avons eu un bénéfice -- j'ai fait l'ensemble de l.
2023-02-11 17:32:52,566 INFO Saving training state...
2023-02-11 17:32:53,906 INFO Sample translation at epoch 4 and index 741.
2023-02-11 17:32:53,906 INFO Predicted Statement: Nous avions un profit -- j'ai fait un oxydeage de l'ensemble de l'ensemble de l'ensemble..
2023-02-11 17:33:30,901 INFO Saving training state...
2023-02-11 17:33:32,515 INFO Sample translation at epoch 4 and index 992.
2023-02-11 17:33:32,516 INFO Predicted Statement: Nous avons eu un profit, j'ai fait une part de l'ensemble de l'ens

  0%|          | 0/1259 [00:00<?, ?it/s]

2023-02-11 17:34:52,812 INFO Saving training state...
2023-02-11 17:34:54,234 INFO Sample translation at epoch 5 and index 235.
2023-02-11 17:34:54,234 INFO Predicted Statement: Nous avons eu le profit -- j'ai fait un impact sur l'.
2023-02-11 17:35:32,765 INFO Saving training state...
2023-02-11 17:35:34,201 INFO Sample translation at epoch 5 and index 486.
2023-02-11 17:35:34,201 INFO Predicted Statement: Nous avons eu un bénéfice professionnel -- j'ai fait le point d'.
2023-02-11 17:36:12,104 INFO Saving training state...
2023-02-11 17:36:13,471 INFO Sample translation at epoch 5 and index 737.
2023-02-11 17:36:13,472 INFO Predicted Statement: Nous avons eu un bénéfice -- j'ai fait un élément de l'ensemble de notre vie..
2023-02-11 17:36:50,343 INFO Saving training state...
2023-02-11 17:36:51,627 INFO Sample translation at epoch 5 and index 988.
2023-02-11 17:36:51,628 INFO Predicted Statement: Nous avons eu un profit de profit -- J'ai fait de l'.
2023-02-11 17:37:30,184 INFO Savin

  0%|          | 0/1259 [00:00<?, ?it/s]

2023-02-11 17:38:09,027 INFO Saving training state...
2023-02-11 17:38:10,447 INFO Sample translation at epoch 6 and index 231.
2023-02-11 17:38:10,448 INFO Predicted Statement: Nous avons eu un profit de profit -- j'ai fait un système de.
2023-02-11 17:38:50,830 INFO Saving training state...
2023-02-11 17:38:52,174 INFO Sample translation at epoch 6 and index 482.
2023-02-11 17:38:52,175 INFO Predicted Statement: Nous avons eu un bénéfice. J'ai fait un jour de l'ensemble de l'architecture..
2023-02-11 17:39:29,214 INFO Saving training state...
2023-02-11 17:39:30,442 INFO Sample translation at epoch 6 and index 733.
2023-02-11 17:39:30,442 INFO Predicted Statement: Nous avons eu un bénéfice. J.
2023-02-11 17:40:10,577 INFO Saving training state...
2023-02-11 17:40:12,022 INFO Sample translation at epoch 6 and index 984.
2023-02-11 17:40:12,023 INFO Predicted Statement: Nous avons eu un bénéfice. J'ai fait un individu de l'.
2023-02-11 17:40:52,010 INFO Saving training state...
2023-02

  0%|          | 0/1259 [00:00<?, ?it/s]

2023-02-11 17:41:33,003 INFO Saving training state...
2023-02-11 17:41:34,307 INFO Sample translation at epoch 7 and index 227.
2023-02-11 17:41:34,308 INFO Predicted Statement: Nous avons eu un profit vicieux -- j'ai fait un rat.
2023-02-11 17:42:16,016 INFO Saving training state...
2023-02-11 17:42:17,437 INFO Sample translation at epoch 7 and index 478.
2023-02-11 17:42:17,437 INFO Predicted Statement: Il y a eu un bénéfice. J'ai fait de l'ensemble de l'ensemble de l'architecture..
2023-02-11 17:42:56,494 INFO Saving training state...
2023-02-11 17:42:57,721 INFO Sample translation at epoch 7 and index 729.
2023-02-11 17:42:57,722 INFO Predicted Statement: Nous avons eu un bénéfice en échange. J'ai fait un système d.
2023-02-11 17:43:34,291 INFO Saving training state...
2023-02-11 17:43:35,549 INFO Sample translation at epoch 7 and index 980.
2023-02-11 17:43:35,550 INFO Predicted Statement: Nous avions un bénéfice marginal -- j'ai fait un individu d'un.
2023-02-11 17:44:14,406 INFO

  0%|          | 0/1259 [00:00<?, ?it/s]

2023-02-11 17:44:55,255 INFO Saving training state...
2023-02-11 17:44:56,737 INFO Sample translation at epoch 8 and index 223.
2023-02-11 17:44:56,738 INFO Predicted Statement: Nous avons eu un profit marginal -- j'ai fait un système d'.
2023-02-11 17:45:35,081 INFO Saving training state...
2023-02-11 17:45:36,566 INFO Sample translation at epoch 8 and index 474.
2023-02-11 17:45:36,566 INFO Predicted Statement: Nous avons eu un bénéfice marginal - j'ai fait un système d'ensembles de l'ensemble..
2023-02-11 17:46:15,273 INFO Saving training state...
2023-02-11 17:46:16,819 INFO Sample translation at epoch 8 and index 725.
2023-02-11 17:46:16,820 INFO Predicted Statement: Nous avions un bénéfice marginal -- j'ai fait un système d'intelligence..
2023-02-11 17:46:53,912 INFO Saving training state...
2023-02-11 17:46:55,347 INFO Sample translation at epoch 8 and index 976.
2023-02-11 17:46:55,348 INFO Predicted Statement: Nous avions un bénéfice marginal -- J'ai fait un épinement de l'un 

  0%|          | 0/1259 [00:00<?, ?it/s]

2023-02-11 17:48:15,915 INFO Saving training state...
2023-02-11 17:48:17,568 INFO Sample translation at epoch 9 and index 219.
2023-02-11 17:48:17,569 INFO Predicted Statement: Nous avions un bénéfice marginal -- j'ai fait un centre de fonctionnement d'un an de l'intert.
2023-02-11 17:48:56,845 INFO Saving training state...
2023-02-11 17:48:58,243 INFO Sample translation at epoch 9 and index 470.
2023-02-11 17:48:58,244 INFO Predicted Statement: Nous avons eu un bénéfice marginal -- j'ai fait de l'ensemble de l'un de l'a.
2023-02-11 17:49:37,403 INFO Saving training state...
2023-02-11 17:49:39,281 INFO Sample translation at epoch 9 and index 721.
2023-02-11 17:49:39,281 INFO Predicted Statement: Nous avions un bénéfice marginal -- j'ai fait un système d'intertilna..
2023-02-11 17:50:19,400 INFO Saving training state...
2023-02-11 17:50:20,641 INFO Sample translation at epoch 9 and index 972.
2023-02-11 17:50:20,641 INFO Predicted Statement: Nous avons eu un bénéfice marginal -- j'ai 

  0%|          | 0/1259 [00:00<?, ?it/s]

2023-02-11 17:51:38,260 INFO Saving training state...
2023-02-11 17:51:39,593 INFO Sample translation at epoch 10 and index 215.
2023-02-11 17:51:39,594 INFO Predicted Statement: Nous avions un profit marginal -- j'ai fait d'un fonctionnement de l'intertataire d'.
2023-02-11 17:52:17,654 INFO Saving training state...
2023-02-11 17:52:19,296 INFO Sample translation at epoch 10 and index 466.
2023-02-11 17:52:19,297 INFO Predicted Statement: Nous avions un bénéfice marginal - je l'ai fait de l'un de l'est..
2023-02-11 17:52:57,687 INFO Saving training state...
2023-02-11 17:52:58,971 INFO Sample translation at epoch 10 and index 717.
2023-02-11 17:52:58,971 INFO Predicted Statement: Nous avions un bénéfice marginal -- j'ai fait un système d'intelligence.
2023-02-11 17:53:36,164 INFO Saving training state...
2023-02-11 17:53:37,884 INFO Sample translation at epoch 10 and index 968.
2023-02-11 17:53:37,885 INFO Predicted Statement: Nous avions un bénéfice marginal -- je l'ai fait de l'un d

  0%|          | 0/1259 [00:00<?, ?it/s]

2023-02-11 17:54:52,286 INFO Saving training state...
2023-02-11 17:54:53,842 INFO Sample translation at epoch 11 and index 211.
2023-02-11 17:54:53,843 INFO Predicted Statement: Nous avions un profit marginal -- j'ai fait du fonctionnement de l'un de l'a transformé en l'allit d'un individu..
2023-02-11 17:55:30,372 INFO Saving training state...
2023-02-11 17:55:31,777 INFO Sample translation at epoch 11 and index 462.
2023-02-11 17:55:31,777 INFO Predicted Statement: Nous avions un bénéfice marginal -- je l'ai fait d'un centre de l'allit..
2023-02-11 17:56:08,083 INFO Saving training state...
2023-02-11 17:56:09,421 INFO Sample translation at epoch 11 and index 713.
2023-02-11 17:56:09,421 INFO Predicted Statement: Nous avons eu un bénéfice marginal -- j'ai fait un système d'intertacht..
2023-02-11 17:56:45,652 INFO Saving training state...
2023-02-11 17:56:47,446 INFO Sample translation at epoch 11 and index 964.
2023-02-11 17:56:47,447 INFO Predicted Statement: Nous avons eu un béné

  0%|          | 0/1259 [00:00<?, ?it/s]

2023-02-11 17:58:01,992 INFO Saving training state...
2023-02-11 17:58:03,394 INFO Sample translation at epoch 12 and index 207.
2023-02-11 17:58:03,394 INFO Predicted Statement: Nous avions un bénéfice marginal - j'ai fait un fonctionnement de l'évolution d'un individu à l'autre..
2023-02-11 17:58:39,906 INFO Saving training state...
2023-02-11 17:58:41,225 INFO Sample translation at epoch 12 and index 458.
2023-02-11 17:58:41,225 INFO Predicted Statement: Nous avions un bénéfice marginal -- j'ai fait un fonctionnement de l'amotalisme d'un individu.
2023-02-11 17:59:16,288 INFO Saving training state...
2023-02-11 17:59:17,501 INFO Sample translation at epoch 12 and index 709.
2023-02-11 17:59:17,502 INFO Predicted Statement: Nous avions un bénéfice marginal -- j'ai fait un système d'inter.
2023-02-11 17:59:51,097 INFO Saving training state...
2023-02-11 17:59:52,320 INFO Sample translation at epoch 12 and index 960.
2023-02-11 17:59:52,321 INFO Predicted Statement: Nous avions un béné

  0%|          | 0/1259 [00:00<?, ?it/s]

2023-02-11 18:01:01,833 INFO Saving training state...
2023-02-11 18:01:03,334 INFO Sample translation at epoch 13 and index 203.
2023-02-11 18:01:03,334 INFO Predicted Statement: Nous avions un profit marginal -- j'ai fait de l'un de l'at-il de l'aduction..
2023-02-11 18:01:37,310 INFO Saving training state...
2023-02-11 18:01:38,734 INFO Sample translation at epoch 13 and index 454.
2023-02-11 18:01:38,734 INFO Predicted Statement: Nous avions un bénéfice marginal -- j'ai fait un damot d'amotalisme qui a conduit à l'autre..
2023-02-11 18:02:12,553 INFO Saving training state...
2023-02-11 18:02:13,740 INFO Sample translation at epoch 13 and index 705.
2023-02-11 18:02:13,741 INFO Predicted Statement: Nous avions un bénéfice marginal -- j'ai fait un système qui s'.
2023-02-11 18:02:47,447 INFO Saving training state...
2023-02-11 18:02:48,807 INFO Sample translation at epoch 13 and index 956.
2023-02-11 18:02:48,807 INFO Predicted Statement: Nous avions un bénéfice marginal -- je l'ai fa

  0%|          | 0/1259 [00:00<?, ?it/s]

2023-02-11 18:03:58,334 INFO Saving training state...
2023-02-11 18:03:59,834 INFO Sample translation at epoch 14 and index 199.
2023-02-11 18:03:59,834 INFO Predicted Statement: Nous avions un bénéfice marginal - j'ai fait de l'un d'entre nous d'entre nous qui a conduit à l'est..
2023-02-11 18:04:33,759 INFO Saving training state...
2023-02-11 18:04:35,107 INFO Sample translation at epoch 14 and index 450.
2023-02-11 18:04:35,107 INFO Predicted Statement: Nous avions un bénéfice marginal - j'ai fait un dysfonctionnement d'amotalisme qui gouverne..
2023-02-11 18:05:08,968 INFO Saving training state...
2023-02-11 18:05:10,386 INFO Sample translation at epoch 14 and index 701.
2023-02-11 18:05:10,386 INFO Predicted Statement: Nous avions un bénéfice marginal - j'ai fait un stilt.
2023-02-11 18:05:44,146 INFO Saving training state...
2023-02-11 18:05:45,860 INFO Sample translation at epoch 14 and index 952.
2023-02-11 18:05:45,860 INFO Predicted Statement: Nous avions un bénéfice marginal

  0%|          | 0/1259 [00:00<?, ?it/s]

2023-02-11 18:06:55,229 INFO Saving training state...
2023-02-11 18:06:56,808 INFO Sample translation at epoch 15 and index 195.
2023-02-11 18:06:56,808 INFO Predicted Statement: Nous avons eu un bénéfice marginal - j'ai fait de l'actest d'amotomé d'une façon qui a.
2023-02-11 18:07:30,723 INFO Saving training state...
2023-02-11 18:07:32,120 INFO Sample translation at epoch 15 and index 446.
2023-02-11 18:07:32,120 INFO Predicted Statement: Nous avions un bénéfice marginal -- j'ai fait de l'amo.
2023-02-11 18:08:05,977 INFO Saving training state...
2023-02-11 18:08:07,494 INFO Sample translation at epoch 15 and index 697.
2023-02-11 18:08:07,495 INFO Predicted Statement: Nous avions un bénéfice marginal -- j'en ai fait l'automot influe génétique..
2023-02-11 18:08:41,149 INFO Saving training state...
2023-02-11 18:08:42,959 INFO Sample translation at epoch 15 and index 948.
2023-02-11 18:08:42,959 INFO Predicted Statement: Nous avions un bénéfice marginal -- j'ai fait d'un système d'o

  0%|          | 0/1259 [00:00<?, ?it/s]

2023-02-11 18:09:53,258 INFO Saving training state...
2023-02-11 18:09:54,925 INFO Sample translation at epoch 16 and index 191.
2023-02-11 18:09:54,926 INFO Predicted Statement: Nous avions un bénéfice marginal - j'ai fait un moteur d'inspiration pour l'atome d'intertétaché..
2023-02-11 18:10:28,851 INFO Saving training state...
2023-02-11 18:10:30,213 INFO Sample translation at epoch 16 and index 442.
2023-02-11 18:10:30,214 INFO Predicted Statement: Nous avions un bénéfice marginal - j'ai fait un moteur de l'amot neurologique..
2023-02-11 18:11:04,056 INFO Saving training state...
2023-02-11 18:11:05,541 INFO Sample translation at epoch 16 and index 693.
2023-02-11 18:11:05,541 INFO Predicted Statement: Nous avions un bénéfice marginal -- j'en ai fait un dommotom d'un pour l'autre..
2023-02-11 18:11:39,157 INFO Saving training state...
2023-02-11 18:11:40,522 INFO Sample translation at epoch 16 and index 944.
2023-02-11 18:11:40,522 INFO Predicted Statement: Nous avions un bénéfice 

  0%|          | 0/1259 [00:00<?, ?it/s]

2023-02-11 18:12:50,002 INFO Saving training state...
2023-02-11 18:12:51,282 INFO Sample translation at epoch 17 and index 187.
2023-02-11 18:12:51,282 INFO Predicted Statement: Nous avions un bénéfice marginal - je l'ai fait d'un moteur à l'autre..
2023-02-11 18:13:25,210 INFO Saving training state...
2023-02-11 18:13:26,529 INFO Sample translation at epoch 17 and index 438.
2023-02-11 18:13:26,530 INFO Predicted Statement: Nous avions un bénéfice marginal - j'ai fait un suivi de l'amotomotomoteur de.
2023-02-11 18:14:00,369 INFO Saving training state...
2023-02-11 18:14:01,662 INFO Sample translation at epoch 17 and index 689.
2023-02-11 18:14:01,663 INFO Predicted Statement: Nous avions un bénéfice marginal -- j'ai fait l'automoto diriger d'une génération d'auto.
2023-02-11 18:14:35,296 INFO Saving training state...
2023-02-11 18:14:36,630 INFO Sample translation at epoch 17 and index 940.
2023-02-11 18:14:36,631 INFO Predicted Statement: Nous avions un bénéfice marginal -- je l'ai

  0%|          | 0/1259 [00:00<?, ?it/s]

2023-02-11 18:15:46,540 INFO Saving training state...
2023-02-11 18:15:48,043 INFO Sample translation at epoch 18 and index 183.
2023-02-11 18:15:48,043 INFO Predicted Statement: Nous avions un bénéfice marginal -- je l'ai fait d'un moteur à l'autre..
2023-02-11 18:16:22,061 INFO Saving training state...
2023-02-11 18:16:23,449 INFO Sample translation at epoch 18 and index 434.
2023-02-11 18:16:23,450 INFO Predicted Statement: Nous avions un bénéfice marginal - j'ai fait un moteur de l'amotométrie..
2023-02-11 18:16:57,297 INFO Saving training state...
2023-02-11 18:16:58,616 INFO Sample translation at epoch 18 and index 685.
2023-02-11 18:16:58,616 INFO Predicted Statement: Nous avions un bénéfice marginal -- je l'ai fait d'un guide d'automote d'autre..
2023-02-11 18:17:32,292 INFO Saving training state...
2023-02-11 18:17:34,098 INFO Sample translation at epoch 18 and index 936.
2023-02-11 18:17:34,098 INFO Predicted Statement: Nous avions un bénéfice marginal -- je l'ai fait d'un di

  0%|          | 0/1259 [00:00<?, ?it/s]

2023-02-11 18:18:44,487 INFO Saving training state...
2023-02-11 18:18:46,706 INFO Sample translation at epoch 19 and index 179.
2023-02-11 18:18:46,707 INFO Predicted Statement: Nous avions un bénéfice marginal -- je l'ai fait d'un moteur à l'écart de notre personnalité..
2023-02-11 18:19:20,616 INFO Saving training state...
2023-02-11 18:19:21,911 INFO Sample translation at epoch 19 and index 430.
2023-02-11 18:19:21,912 INFO Predicted Statement: Nous avions un bénéfice marginal - je l'ai fait d'un guide d'automoteur et d'.
2023-02-11 18:19:55,723 INFO Saving training state...
2023-02-11 18:19:57,017 INFO Sample translation at epoch 19 and index 681.
2023-02-11 18:19:57,018 INFO Predicted Statement: Nous avions un bénéfice marginal -- j'ai fait l'automotance d'une génération d'un.
2023-02-11 18:20:30,822 INFO Saving training state...
2023-02-11 18:20:32,333 INFO Sample translation at epoch 19 and index 932.
2023-02-11 18:20:32,333 INFO Predicted Statement: Nous avions un bénéfice mar

  0%|          | 0/1259 [00:00<?, ?it/s]

2023-02-11 18:21:41,516 INFO Saving training state...
2023-02-11 18:21:43,395 INFO Sample translation at epoch 20 and index 175.
2023-02-11 18:21:43,396 INFO Predicted Statement: Nous avions un bénéfice marginal - je réussis d'automoteur moteur de gouvernance et d'automotomépendé..
2023-02-11 18:22:17,254 INFO Saving training state...
2023-02-11 18:22:18,693 INFO Sample translation at epoch 20 and index 426.
2023-02-11 18:22:18,694 INFO Predicted Statement: Nous avions un bénéfice marginal, je l'ai fait module d'a.
2023-02-11 18:22:52,627 INFO Saving training state...
2023-02-11 18:22:53,874 INFO Sample translation at epoch 20 and index 677.
2023-02-11 18:22:53,874 INFO Predicted Statement: Nous avions un bénéfice marginal -- j'ai fait un bourgeon de l'ommotom Sanmotom.
2023-02-11 18:23:27,669 INFO Saving training state...
2023-02-11 18:23:29,401 INFO Sample translation at epoch 20 and index 928.
2023-02-11 18:23:29,401 INFO Predicted Statement: Nous avions un bénéfice marginal -- je l

  0%|          | 0/1259 [00:00<?, ?it/s]

2023-02-11 18:24:38,681 INFO Saving training state...
2023-02-11 18:24:40,050 INFO Sample translation at epoch 21 and index 171.
2023-02-11 18:24:40,050 INFO Predicted Statement: Nous avions un bénéfice marginal - j'ai fait un moteur d'autonat d'un moteur à caractère moteur..
2023-02-11 18:25:13,963 INFO Saving training state...
2023-02-11 18:25:15,353 INFO Sample translation at epoch 21 and index 422.
2023-02-11 18:25:15,354 INFO Predicted Statement: Nous avions un bénéfice marginal - je l'ai fait fonctionner grâce à l.
2023-02-11 18:25:51,832 INFO Saving training state...
2023-02-11 18:25:53,272 INFO Sample translation at epoch 21 and index 673.
2023-02-11 18:25:53,273 INFO Predicted Statement: Nous avions un bénéfice marginal -- je l'ai fait Sante Central Imaginez notre fantastique édifice..
2023-02-11 18:26:31,240 INFO Saving training state...
2023-02-11 18:26:32,524 INFO Sample translation at epoch 21 and index 924.
2023-02-11 18:26:32,525 INFO Predicted Statement: Nous avions un 

  0%|          | 0/1259 [00:00<?, ?it/s]

2023-02-11 18:27:50,212 INFO Saving training state...
2023-02-11 18:27:52,118 INFO Sample translation at epoch 22 and index 167.
2023-02-11 18:27:52,118 INFO Predicted Statement: Nous avions un bénéfice marginal - je l'ai conduit moteur à notre gouvernance..
2023-02-11 18:28:29,823 INFO Saving training state...
2023-02-11 18:28:31,207 INFO Sample translation at epoch 22 and index 418.
2023-02-11 18:28:31,207 INFO Predicted Statement: Nous avions un bénéfice marginal -- j'ai fait de l'Agence d'automotateur d'.
2023-02-11 18:29:09,457 INFO Saving training state...
2023-02-11 18:29:10,828 INFO Sample translation at epoch 22 and index 669.
2023-02-11 18:29:10,828 INFO Predicted Statement: Nous avions un bénéfice marginal -- j'ai conduit l'élagage d'un bourgeon d'un produit spina.
2023-02-11 18:29:48,257 INFO Saving training state...
2023-02-11 18:29:49,556 INFO Sample translation at epoch 22 and index 920.
2023-02-11 18:29:49,556 INFO Predicted Statement: Nous avions un bénéfice marginal -

  0%|          | 0/1259 [00:00<?, ?it/s]

2023-02-11 18:31:06,797 INFO Saving training state...
2023-02-11 18:31:08,261 INFO Sample translation at epoch 23 and index 163.
2023-02-11 18:31:08,261 INFO Predicted Statement: Nous avions un bénéfice marginal - je m'a conduit d'un moteur à l'automotode..
2023-02-11 18:31:51,385 INFO Saving training state...
2023-02-11 18:31:53,194 INFO Sample translation at epoch 23 and index 414.
2023-02-11 18:31:53,195 INFO Predicted Statement: Nous avions un bénéfice marginal - je m'a conduit à diriger notre spin d'un spina moteur..
2023-02-11 18:32:33,922 INFO Saving training state...
2023-02-11 18:32:35,385 INFO Sample translation at epoch 23 and index 665.
2023-02-11 18:32:35,385 INFO Predicted Statement: Nous avions un bénéfice marginal -- j'ai fait un produit d'automoteur spécial d'un réacteur à.
2023-02-11 18:33:18,695 INFO Saving training state...
2023-02-11 18:33:20,706 INFO Sample translation at epoch 23 and index 916.
2023-02-11 18:33:20,706 INFO Predicted Statement: Nous avions un béné

  0%|          | 0/1259 [00:00<?, ?it/s]

2023-02-11 18:35:13,374 INFO Saving training state...
2023-02-11 18:35:15,955 INFO Sample translation at epoch 24 and index 159.
2023-02-11 18:35:15,955 INFO Predicted Statement: Nous avions un bénéfice marginal - je l'ai conduit spin d'un guide spina spinum..
2023-02-11 18:36:17,788 INFO Saving training state...
2023-02-11 18:36:19,562 INFO Sample translation at epoch 24 and index 410.
2023-02-11 18:36:19,562 INFO Predicted Statement: Nous avions un bénéfice marginal - je l'ai fait module Sanmoteur Sanmotherum génétiquement..
2023-02-11 18:37:21,113 INFO Saving training state...
2023-02-11 18:37:23,188 INFO Sample translation at epoch 24 and index 661.
2023-02-11 18:37:23,189 INFO Predicted Statement: Nous avions un bénéfice marginal - je l'ai fait influe génétique artificiel..
2023-02-11 18:38:25,238 INFO Saving training state...
2023-02-11 18:38:26,768 INFO Sample translation at epoch 24 and index 912.
2023-02-11 18:38:26,769 INFO Predicted Statement: Nous avions un bénéfice margina

  0%|          | 0/1259 [00:00<?, ?it/s]

2023-02-11 18:40:33,255 INFO Saving training state...
2023-02-11 18:40:34,868 INFO Sample translation at epoch 25 and index 155.
2023-02-11 18:40:34,869 INFO Predicted Statement: Nous avions un bénéfice marginal - je l’ai fait s’est transformé d’un guide d’automo.
2023-02-11 18:41:37,126 INFO Saving training state...
2023-02-11 18:41:38,591 INFO Sample translation at epoch 25 and index 406.
2023-02-11 18:41:38,591 INFO Predicted Statement: Nous avions un bénéfice marginal - je l'ai fait catalyseur de l'.
2023-02-11 18:42:40,779 INFO Saving training state...
2023-02-11 18:42:43,509 INFO Sample translation at epoch 25 and index 657.
2023-02-11 18:42:43,509 INFO Predicted Statement: Nous avions un bénéfice marginal - je l'ai fait idéal. Développer Central Centralfield..
2023-02-11 18:43:45,666 INFO Saving training state...
2023-02-11 18:43:47,220 INFO Sample translation at epoch 25 and index 908.
2023-02-11 18:43:47,221 INFO Predicted Statement: Nous avions un bénéfice marginal -- je l'ai

  0%|          | 0/1259 [00:00<?, ?it/s]

2023-02-11 18:45:50,788 INFO Saving training state...
2023-02-11 18:45:53,191 INFO Sample translation at epoch 26 and index 151.
2023-02-11 18:45:53,191 INFO Predicted Statement: Nous avions un bénéfice marginal - je l'ai fait fonctionner spina idéal pour diriger Centralment..
2023-02-11 18:46:39,151 INFO Saving training state...
2023-02-11 18:46:40,494 INFO Sample translation at epoch 26 and index 402.
2023-02-11 18:46:40,495 INFO Predicted Statement: Nous avions un bénéfice marginal - je l'ai dirigé spin d'un.
2023-02-11 18:47:23,421 INFO Saving training state...
2023-02-11 18:47:24,902 INFO Sample translation at epoch 26 and index 653.
2023-02-11 18:47:24,903 INFO Predicted Statement: Nous avions un bénéfice marginal - je l'ai fait module d'un génome Sanmotome..
2023-02-11 18:48:08,218 INFO Saving training state...
2023-02-11 18:48:09,587 INFO Sample translation at epoch 26 and index 904.
2023-02-11 18:48:09,587 INFO Predicted Statement: Nous avions un bénéfice marginal - je.
2023-0

  0%|          | 0/1259 [00:00<?, ?it/s]

2023-02-11 18:49:34,635 INFO Saving training state...
2023-02-11 18:49:36,051 INFO Sample translation at epoch 27 and index 147.
2023-02-11 18:49:36,051 INFO Predicted Statement: Nous avions un bénéfice marginal - je l'ai fait spin d'un guide spina conduit d'un moteur.
2023-02-11 18:50:17,857 INFO Saving training state...
2023-02-11 18:50:19,181 INFO Sample translation at epoch 27 and index 398.
2023-02-11 18:50:19,182 INFO Predicted Statement: Nous avions un bénéfice marginal - je l'ai fait spin d'un.
2023-02-11 18:51:07,709 INFO Saving training state...
2023-02-11 18:51:09,707 INFO Sample translation at epoch 27 and index 649.
2023-02-11 18:51:09,707 INFO Predicted Statement: Nous avions un bénéfice marginal - je l'ai conduit d'un guide d'automoteur spécial..
2023-02-11 18:52:00,121 INFO Saving training state...
2023-02-11 18:52:01,866 INFO Sample translation at epoch 27 and index 900.
2023-02-11 18:52:01,866 INFO Predicted Statement: Nous avions un bénéfice marginal - je l'ai fait d

  0%|          | 0/1259 [00:00<?, ?it/s]

2023-02-11 18:53:45,526 INFO Saving training state...
2023-02-11 18:53:46,961 INFO Sample translation at epoch 28 and index 143.
2023-02-11 18:53:46,961 INFO Predicted Statement: Nous avions un bénéfice marginal - je l'ai fait conduit d'un guide spina géologique..
2023-02-11 18:54:37,216 INFO Saving training state...
2023-02-11 18:54:38,828 INFO Sample translation at epoch 28 and index 394.
2023-02-11 18:54:38,829 INFO Predicted Statement: Nous avions un bénéfice marginal - je l'ai fait d'un guide.
2023-02-11 18:55:28,522 INFO Saving training state...
2023-02-11 18:55:30,178 INFO Sample translation at epoch 28 and index 645.
2023-02-11 18:55:30,179 INFO Predicted Statement: Nous avions un bénéfice marginal - je l'ai fait d'un guide.
2023-02-11 18:56:20,476 INFO Saving training state...
2023-02-11 18:56:22,224 INFO Sample translation at epoch 28 and index 896.
2023-02-11 18:56:22,225 INFO Predicted Statement: Nous avions un bénéfice marginal - je l'ai fait guide idéal pour Central Centr

  0%|          | 0/1259 [00:00<?, ?it/s]

2023-02-11 18:58:05,267 INFO Saving training state...
2023-02-11 18:58:07,012 INFO Sample translation at epoch 29 and index 139.
2023-02-11 18:58:07,013 INFO Predicted Statement: Nous avions un bénéfice marginal - je l'ai fait conduit d'un guide spina Central..
2023-02-11 18:58:56,718 INFO Saving training state...
2023-02-11 18:58:59,225 INFO Sample translation at epoch 29 and index 390.
2023-02-11 18:58:59,225 INFO Predicted Statement: Nous avions un bénéfice marginal - je l'ai fait de conduit d'un guide spina..
2023-02-11 18:59:50,342 INFO Saving training state...
2023-02-11 18:59:52,018 INFO Sample translation at epoch 29 and index 641.
2023-02-11 18:59:52,019 INFO Predicted Statement: Nous avions un bénéfice marginal - je l'ai fait. Développer Développ.
2023-02-11 19:00:43,016 INFO Saving training state...
2023-02-11 19:00:45,088 INFO Sample translation at epoch 29 and index 892.
2023-02-11 19:00:45,088 INFO Predicted Statement: Nous avions un bénéfice marginal - je l'ai fait artif

  0%|          | 0/1259 [00:00<?, ?it/s]

2023-02-11 19:02:30,717 INFO Saving training state...
2023-02-11 19:02:32,795 INFO Sample translation at epoch 30 and index 135.
2023-02-11 19:02:32,795 INFO Predicted Statement: Nous avions un bénéfice marginal - je l'ai fait d'un guide damoteur spécial d'allium..
2023-02-11 19:03:25,791 INFO Saving training state...
2023-02-11 19:03:27,618 INFO Sample translation at epoch 30 and index 386.
2023-02-11 19:03:27,619 INFO Predicted Statement: Nous avions un bénéfice marginal - je l'ai fait d'un guide d'amoteur guide spécial..
2023-02-11 19:04:19,469 INFO Saving training state...
2023-02-11 19:04:20,973 INFO Sample translation at epoch 30 and index 637.
2023-02-11 19:04:20,974 INFO Predicted Statement: Nous avions un bénéfice marginal - je.
2023-02-11 19:05:13,623 INFO Saving training state...
2023-02-11 19:05:15,482 INFO Sample translation at epoch 30 and index 888.
2023-02-11 19:05:15,482 INFO Predicted Statement: Nous avions un bénéfice marginal - je l'ai fait module spin d'un guide gé

# Testing the Model

In [24]:
from torch import tensor

for sentence in [
    'How are you?',
]:
    t = tensor([sp_source_model.encode(sentence) + [EOS]])
    #t = tensor([sample_enc_source_batch[0].tolist()])
    seq2seq.eval()
    trans = translate_sentence_beam(seq2seq,
        t.to(DEVICE),
        sp_target_model,
        MAX_SENTENCE,
        4
    )
    print(trans)

Comment es-tu ?
