
<a href="https://colab.research.google.com/github/patrickctrf/IA024_2022S2/blob/main/ex07/patrick_ferreira/ex07_patrick_ferreira_175480.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Notebook de referência 

Nome: Patrick de Carvalho Tavares Rezende Ferreira

## Instruções

Neste colab iremos treinar um modelo T5 para traduzir de inglês para português. Iremos treiná-lo com o data Paracrawl.

- Usaremos o dataset Paracrawl Inglês-Português. Truncamos o dataset de treino para apenas 100k pares para deixar o treinamento mais rápido. Quem quiser pode treinar com mais amostras. Se demorar muito para treinar, truncar o dataset ainda mais.

- Usaremos o BLEU como métrica. Usaremos o SacreBLEU pois sempre faz o mesmo pré-processamento (tokenização, lowercase). Não usaremos torchnlp.metrics.bleu, torchtext.data.metrics.bleu_score, etc. SacreBLEU é lento: usar poucas amostras de validação (ex: 5k)


Usaremos o modelo PTT5 disponível no model hub da HuggingFace:

https://huggingface.co/unicamp-dl/ptt5-small-portuguese-vocab

Este é  um T5 pré-treinado em textos em português e com tokenizador em português.

É recomendável salvar os pesos do modelo e estado dos otimizadores, pois o treinamento é longo.


In [1]:
# Configurações gerais
from queue import Queue
from threading import Thread

import numpy as np
from torch import nn
from tqdm import tqdm

model_name = "unicamp-dl/ptt5-small-portuguese-vocab"
batch_size = 64
accumulate_grad_batches = 2
source_max_length = 128
target_max_length = 128
learning_rate = 5e-4

In [2]:
! pip install sacrebleu
! pip install transformers
! pip install sentencepiece



In [3]:
# Importar todos os pacotes de uma só vez para evitar duplicados ao longo do notebook.
import gzip
import os
import random
import sacrebleu
import torch
import torch.nn.functional as F

# from google.colab import drive

from transformers import T5ForConditionalGeneration
from transformers import T5Tokenizer
from torch.utils.data import DataLoader
from torch.utils.data import Dataset

from typing import Dict
from typing import List
from typing import Tuple

In [4]:
# Important: Fix seeds so we can replicate results
seed = 123
random.seed(seed)
torch.random.manual_seed(seed)
torch.cuda.manual_seed(seed)

if torch.cuda.is_available():
    dev = "cuda:0"
else:
    dev = "cpu"
device = torch.device(dev)
print('Using {}'.format(device))

Using cuda:0


Iremos salvar os checkpoints (pesos do modelo) no google drive, para que possamos continuar o treino de onde paramos.

In [5]:
# drive.mount('/content/drive')

## Preparando Dados

Primeiro, fazemos download do dataset:

In [6]:
! wget -nc https://storage.googleapis.com/unicamp-dl/ia024a_2022s2/paracrawl_enpt_train.tsv.gz
! wget -nc https://storage.googleapis.com/unicamp-dl/ia024a_2022s2/paracrawl_enpt_test.tsv.gz

File ‘paracrawl_enpt_train.tsv.gz’ already there; not retrieving.

File ‘paracrawl_enpt_test.tsv.gz’ already there; not retrieving.



## Carregando o dataset

Criaremos uma divisão de treino (100k pares) e val (5k pares) artificialmente.

Nota: Evitar de olhar ao máximo o dataset de teste para não ficar enviseado no que será testado. Em aplicações reais, o dataset de teste só estará disponível no futuro, ou seja, é quando o usuário começa a testar o seu produto.


In [7]:
def load_text_pairs(path):
    text_pairs = []
    for line in gzip.open(path, mode='rt'):
        text_pairs.append(line.strip().split('\t'))
    return text_pairs

x_train = load_text_pairs('paracrawl_enpt_train.tsv.gz')
x_test = load_text_pairs('paracrawl_enpt_test.tsv.gz')

# Embaralhamos o treino para depois fazermos a divisão treino/val.
random.shuffle(x_train)

# Truncamos o dataset para 100k pares de treino e 5k pares de validação.
truncate_size = 500000
x_val = x_train[truncate_size:truncate_size + 5000]
x_train = x_train[:truncate_size]

for set_name, x in [('treino', x_train), ('validação', x_val), ('test', x_test)]:
    print(f'\n{len(x)} amostras de {set_name}')
    print(f'3 primeiras amostras {set_name}:')
    for i, (source, target) in enumerate(x[:3]):
        print(f'{i}: source: {source}\n   target: {target}')


500000 amostras de treino
3 primeiras amostras treino:
0: source: More Croatian words and phrases
   target: Mais palavras e frases em croata
1: source: Jerseys and pullovers, containing at least 50Â % by weight of wool and weighing 600Â g or more per article 6110 11 10 (PCE)
   target: Camisolas e pulôveres, com pelo menos 50 %, em peso, de lã e pesando 600g ou mais por unidade 6110 11 10 (PCE)
2: source: Atex Colombia SAS makes available its lead product, 100% natural liquid latex, excellent quality and price. ... Welding manizales caldas Colombia a DuckDuckGo
   target: Atex Colômbia SAS torna principal produto está disponível, látex líquido 100% natural, excelente qualidade e preço. ...

5000 amostras de validação
3 primeiras amostras validação:
0: source: Cum on face and Fisting watch online
   target: Gozada no rosto e Fisting assistir online
1: source: Cylinders in Huila, Colombia
   target: Cilindros em Cesar, Colômbia
2: source: Brooms and brushes in Santa Rita (Chalatenango,

Criando Dataset


In [8]:
tokenizer = T5Tokenizer.from_pretrained(model_name)

In [9]:
class MyDataset(Dataset):
    def __init__(self, text_pairs: List[Tuple[str]], tokenizer,
                 source_max_length: int = 32, target_max_length: int = 32):
        self.tokenizer = tokenizer
        self.text_pairs = text_pairs
        self.source_max_length = source_max_length
        self.target_max_length = target_max_length

        sources, targets = list(zip(*text_pairs))

        self.sources_tokenizadas = tokenizer(sources, padding=True, truncation=True, max_length=self.source_max_length, return_tensors = "pt")
        self.targets_tokenizadas = tokenizer(targets, padding=True, truncation=True, max_length=self.source_max_length, return_tensors = "pt")


    def __len__(self):
        return len(self.text_pairs)
    
    def __getitem__(self, idx):
        source, target = self.text_pairs[idx]
        # TODO: tokenizar texto

        source_token_ids =  self.sources_tokenizadas.input_ids[idx]
        source_mask =       self.sources_tokenizadas.attention_mask[idx]
        target_token_ids =  self.targets_tokenizadas.input_ids[idx]
        target_mask =       self.targets_tokenizadas.attention_mask[idx]

        return source_token_ids, source_mask, target_token_ids, target_mask, source, target

## Testando o DataLoader

In [10]:
text_pairs = [('we like pizza', 'eu gosto de pizza')]
dataset_debug = MyDataset(
    text_pairs=text_pairs,
    tokenizer=tokenizer,
    source_max_length=source_max_length,
    target_max_length=target_max_length)

dataloader_debug = DataLoader(dataset_debug, batch_size=10, shuffle=True, 
                              num_workers=0)

source_token_ids, source_mask, target_token_ids, target_mask, _, _ = next(iter(dataloader_debug))
print('source_token_ids:\n', source_token_ids)
print('source_mask:\n', source_mask)
print('target_token_ids:\n', target_token_ids)
print('target_mask:\n', target_mask)

print('source_token_ids.shape:', source_token_ids.shape)
print('source_mask.shape:', source_mask.shape)
print('target_token_ids.shape:', target_token_ids.shape)
print('target_mask.shape:', target_mask.shape)

source_token_ids:
 tensor([[  31, 1528, 1079,  634, 1241, 7531,    1]])
source_mask:
 tensor([[1, 1, 1, 1, 1, 1, 1]])
target_token_ids:
 tensor([[2077, 6618,    4, 1241, 7531,    1]])
target_mask:
 tensor([[1, 1, 1, 1, 1, 1]])
source_token_ids.shape: torch.Size([1, 7])
source_mask.shape: torch.Size([1, 7])
target_token_ids.shape: torch.Size([1, 6])
target_mask.shape: torch.Size([1, 6])


## Criando DataLoaders de Treino/Val/Test

In [11]:
dataset_train = MyDataset(text_pairs=x_train,
                          tokenizer=tokenizer,
                          source_max_length=source_max_length,
                          target_max_length=target_max_length)

dataset_val = MyDataset(text_pairs=x_val,
                        tokenizer=tokenizer,
                        source_max_length=source_max_length,
                        target_max_length=target_max_length)

dataset_test = MyDataset(text_pairs=x_test,
                         tokenizer=tokenizer,
                         source_max_length=source_max_length,
                         target_max_length=target_max_length)

train_dataloader = DataLoader(dataset_train, batch_size=batch_size,
                              shuffle=True, num_workers=4)

val_dataloader = DataLoader(dataset_val, batch_size=batch_size, shuffle=False, 
                            num_workers=4)

test_dataloader = DataLoader(dataset_test, batch_size=batch_size,
                             shuffle=False, num_workers=4)

### Utilidade para converter dados de device em paralelo


### TREINAMENTO

In [12]:
model = T5ForConditionalGeneration.from_pretrained(model_name).to(device)


In [13]:
from torch.optim.lr_scheduler import ReduceLROnPlateau
from torch.cuda.amp import GradScaler, autocast

max_examples = truncate_size
eval_every_steps = 200
lr = learning_rate
use_amp = True

# DataManager(train_dataloader, device=device, data_type=None)
# DataManager(val_dataloader, device=device, data_type=None)

train_loader = train_dataloader
validation_loader = val_dataloader


optimizer = torch.optim.Adam(model.parameters(), lr=lr)
scheduler = ReduceLROnPlateau(optimizer, 'min', factor=0.9, min_lr=3e-5, patience=200, threshold=1e-2, verbose=True)
scaler=GradScaler()

accumulated_grad_batches_until_now = 0

def train_step(source_tokens, source_mask, target_tokens, target_mask, original_source, original_target):
    model.train()
    model.zero_grad()
    with autocast(enabled=use_amp):
        loss = model(input_ids = source_tokens.to(device), attention_mask = source_mask.to(device), decoder_attention_mask = target_mask.to(device), labels = target_tokens.to(device)).loss
    scaler.scale(loss).backward()
    scaler.step(optimizer)
    scaler.update()

    return loss.item()


def validation_step(source_tokens, source_mask, target_tokens, target_mask, original_source, original_target):
    model.eval()
    with torch.no_grad():
        with autocast(enabled=use_amp):
            loss = model(input_ids = source_tokens.to(device), attention_mask = source_mask.to(device), decoder_attention_mask = target_mask.to(device), labels = target_tokens.to(device)).loss
    return loss.item()


best_validation_loss = 9999
train_losses = []
n_examples = 0
step = 0
pbar = tqdm(total=max_examples)
while n_examples < max_examples:
    for mini_batch in train_dataloader:
        loss = train_step(*mini_batch)
        train_losses.append(loss)

        # LR scheduler
        scheduler.step(loss)

        if step % eval_every_steps == 0:
            train_loss = np.average(train_losses)

            with torch.no_grad():
                valid_loss = np.average([
                    validation_step(*mini_batch)
                    for mini_batch in val_dataloader])
                # Checkpoint to best models found.
                if best_validation_loss > valid_loss:
                    # Update the new best perplexity.
                    best_validation_loss = valid_loss
                    model.eval()
                    torch.save(model, "best_model.pth")

            print(f'{step} steps; {n_examples} examples so far; train loss: {train_loss:.2f}, valid loss: {valid_loss:.2f}')
            train_losses = []

        n_examples += mini_batch[0].shape[0]  # Increment of batch size
        step += 1
        pbar.update(mini_batch[0].shape[0])
        if n_examples >= max_examples:
            break

pbar.close()

# Restore best model (checkpoint) found
model = torch.load("best_model.pth")

  0%|          | 64/500000 [00:10<23:11:57,  5.99it/s]

0 steps; 0 examples so far; train loss: 23.53, valid loss: 30.86


  3%|▎         | 12864/500000 [01:29<6:34:16, 20.59it/s]

200 steps; 12800 examples so far; train loss: 1.38, valid loss: 0.83


  5%|▌         | 25664/500000 [02:47<6:39:06, 19.81it/s] 

400 steps; 25600 examples so far; train loss: 0.87, valid loss: 0.75


  8%|▊         | 38464/500000 [04:08<6:35:57, 19.43it/s] 

600 steps; 38400 examples so far; train loss: 0.81, valid loss: 0.71


  8%|▊         | 40064/500000 [04:17<42:39, 179.71it/s]  

Epoch 00626: reducing learning rate of group 0 to 4.5000e-04.


 10%|█         | 51264/500000 [05:29<6:23:12, 19.52it/s]

800 steps; 51200 examples so far; train loss: 0.77, valid loss: 0.67


 13%|█▎        | 64064/500000 [06:52<6:14:23, 19.41it/s] 

1000 steps; 64000 examples so far; train loss: 0.74, valid loss: 0.65


 15%|█▌        | 76864/500000 [08:13<5:51:17, 20.08it/s] 

1200 steps; 76800 examples so far; train loss: 0.72, valid loss: 0.62


 15%|█▌        | 76928/500000 [08:14<4:17:50, 27.35it/s]

Epoch 01202: reducing learning rate of group 0 to 4.0500e-04.


 18%|█▊        | 89664/500000 [09:33<5:35:15, 20.40it/s] 

1400 steps; 89600 examples so far; train loss: 0.70, valid loss: 0.61


 20%|██        | 101632/500000 [10:37<35:25, 187.44it/s] 

Epoch 01588: reducing learning rate of group 0 to 3.6450e-04.


 20%|██        | 102464/500000 [10:51<5:21:00, 20.64it/s]

1600 steps; 102400 examples so far; train loss: 0.68, valid loss: 0.59


 23%|██▎       | 114496/500000 [11:55<34:46, 184.76it/s] 

Epoch 01789: reducing learning rate of group 0 to 3.2805e-04.


 23%|██▎       | 115264/500000 [12:09<5:10:16, 20.67it/s]

1800 steps; 115200 examples so far; train loss: 0.66, valid loss: 0.58


 26%|██▌       | 128064/500000 [13:27<5:02:59, 20.46it/s]

2000 steps; 128000 examples so far; train loss: 0.66, valid loss: 0.57


 27%|██▋       | 132928/500000 [13:53<32:37, 187.50it/s] 

Epoch 02077: reducing learning rate of group 0 to 2.9525e-04.


 28%|██▊       | 140864/500000 [14:45<4:50:44, 20.59it/s]

2200 steps; 140800 examples so far; train loss: 0.65, valid loss: 0.56


 31%|███       | 153280/500000 [15:52<31:19, 184.44it/s] 

Epoch 02395: reducing learning rate of group 0 to 2.6572e-04.


 31%|███       | 153664/500000 [16:04<4:42:11, 20.45it/s]

2400 steps; 153600 examples so far; train loss: 0.64, valid loss: 0.55


 33%|███▎      | 166464/500000 [17:22<4:30:12, 20.57it/s]

2600 steps; 166400 examples so far; train loss: 0.63, valid loss: 0.54


 36%|███▌      | 179264/500000 [18:40<4:20:32, 20.52it/s]

2800 steps; 179200 examples so far; train loss: 0.62, valid loss: 0.53


 38%|███▊      | 188608/500000 [19:31<28:17, 183.46it/s] 

Epoch 02947: reducing learning rate of group 0 to 2.3915e-04.


 38%|███▊      | 192064/500000 [19:59<4:11:19, 20.42it/s]

3000 steps; 192000 examples so far; train loss: 0.60, valid loss: 0.53


 40%|████      | 201472/500000 [20:50<26:46, 185.77it/s] 

Epoch 03148: reducing learning rate of group 0 to 2.1523e-04.


 41%|████      | 204864/500000 [21:18<4:08:37, 19.78it/s]

3200 steps; 204800 examples so far; train loss: 0.60, valid loss: 0.52


 44%|████▎     | 217664/500000 [22:38<3:49:45, 20.48it/s]

3400 steps; 217600 examples so far; train loss: 0.60, valid loss: 0.52


 45%|████▍     | 224256/500000 [23:15<25:42, 178.76it/s] 

Epoch 03504: reducing learning rate of group 0 to 1.9371e-04.


 46%|████▌     | 230464/500000 [23:59<3:44:18, 20.03it/s]

3600 steps; 230400 examples so far; train loss: 0.59, valid loss: 0.51


 48%|████▊     | 240640/500000 [24:54<23:13, 186.09it/s] 

Epoch 03760: reducing learning rate of group 0 to 1.7434e-04.


 49%|████▊     | 243264/500000 [25:18<3:27:59, 20.57it/s]

3800 steps; 243200 examples so far; train loss: 0.58, valid loss: 0.51


 51%|█████     | 256064/500000 [26:36<3:22:06, 20.12it/s]

4000 steps; 256000 examples so far; train loss: 0.58, valid loss: 0.50


 52%|█████▏    | 260352/500000 [26:59<21:30, 185.73it/s] 

Epoch 04068: reducing learning rate of group 0 to 1.5691e-04.


 54%|█████▍    | 268864/500000 [27:55<3:07:25, 20.55it/s]

4200 steps; 268800 examples so far; train loss: 0.58, valid loss: 0.50


 55%|█████▍    | 273216/500000 [28:18<20:38, 183.05it/s] 

Epoch 04269: reducing learning rate of group 0 to 1.4121e-04.


 56%|█████▋    | 281664/500000 [29:14<3:00:35, 20.15it/s]

4400 steps; 281600 examples so far; train loss: 0.59, valid loss: 0.50


 57%|█████▋    | 286080/500000 [29:38<19:08, 186.31it/s] 

Epoch 04470: reducing learning rate of group 0 to 1.2709e-04.


 59%|█████▉    | 294464/500000 [30:33<2:47:04, 20.50it/s]

4600 steps; 294400 examples so far; train loss: 0.58, valid loss: 0.50


 60%|█████▉    | 298944/500000 [30:57<17:57, 186.68it/s] 

Epoch 04671: reducing learning rate of group 0 to 1.1438e-04.


 61%|██████▏   | 307264/500000 [31:51<2:39:10, 20.18it/s]

4800 steps; 307200 examples so far; train loss: 0.57, valid loss: 0.49


 62%|██████▏   | 311808/500000 [32:16<16:44, 187.34it/s] 

Epoch 04872: reducing learning rate of group 0 to 1.0295e-04.


 64%|██████▍   | 320064/500000 [33:09<2:26:39, 20.45it/s]

5000 steps; 320000 examples so far; train loss: 0.57, valid loss: 0.49


 66%|██████▌   | 327552/500000 [33:50<15:45, 182.33it/s] 

Epoch 05118: reducing learning rate of group 0 to 9.2651e-05.


 67%|██████▋   | 332864/500000 [34:28<2:16:47, 20.36it/s]

5200 steps; 332800 examples so far; train loss: 0.57, valid loss: 0.49


 68%|██████▊   | 340416/500000 [35:09<14:24, 184.65it/s] 

Epoch 05319: reducing learning rate of group 0 to 8.3386e-05.


 69%|██████▉   | 345664/500000 [35:46<2:07:10, 20.23it/s]

5400 steps; 345600 examples so far; train loss: 0.57, valid loss: 0.49


 72%|███████▏  | 358464/500000 [37:05<1:55:51, 20.36it/s]

5600 steps; 358400 examples so far; train loss: 0.56, valid loss: 0.49


 73%|███████▎  | 365184/500000 [37:41<12:37, 178.05it/s] 

Epoch 05706: reducing learning rate of group 0 to 7.5047e-05.


 74%|███████▍  | 371264/500000 [38:24<1:47:58, 19.87it/s]

5800 steps; 371200 examples so far; train loss: 0.56, valid loss: 0.48


 76%|███████▌  | 378048/500000 [39:01<10:56, 185.78it/s] 

Epoch 05907: reducing learning rate of group 0 to 6.7543e-05.


 77%|███████▋  | 384064/500000 [39:42<1:34:10, 20.52it/s]

6000 steps; 384000 examples so far; train loss: 0.55, valid loss: 0.48


 78%|███████▊  | 390912/500000 [40:19<09:43, 186.99it/s] 

Epoch 06108: reducing learning rate of group 0 to 6.0788e-05.


 79%|███████▉  | 396864/500000 [41:00<1:24:04, 20.44it/s]

6200 steps; 396800 examples so far; train loss: 0.57, valid loss: 0.48


 81%|████████  | 403776/500000 [41:38<08:41, 184.49it/s] 

Epoch 06309: reducing learning rate of group 0 to 5.4709e-05.


 82%|████████▏ | 409664/500000 [42:19<1:13:58, 20.35it/s]

6400 steps; 409600 examples so far; train loss: 0.56, valid loss: 0.48


 83%|████████▎ | 416640/500000 [42:57<07:26, 186.57it/s] 

Epoch 06510: reducing learning rate of group 0 to 4.9239e-05.


 84%|████████▍ | 422464/500000 [43:38<1:03:22, 20.39it/s]

6600 steps; 422400 examples so far; train loss: 0.55, valid loss: 0.48


 87%|████████▋ | 435264/500000 [44:57<53:43, 20.08it/s]  

6800 steps; 435200 examples so far; train loss: 0.56, valid loss: 0.48


 87%|████████▋ | 435968/500000 [45:01<06:39, 160.30it/s]

Epoch 06812: reducing learning rate of group 0 to 4.4315e-05.


 90%|████████▉ | 448064/500000 [46:15<39:18, 22.02it/s] 

7000 steps; 448000 examples so far; train loss: 0.55, valid loss: 0.48


 90%|████████▉ | 448832/500000 [46:20<05:15, 162.11it/s]

Epoch 07013: reducing learning rate of group 0 to 3.9883e-05.


 92%|█████████▏| 460864/500000 [47:35<32:13, 20.25it/s] 

7200 steps; 460800 examples so far; train loss: 0.56, valid loss: 0.48


 92%|█████████▏| 461696/500000 [47:40<03:50, 166.34it/s]

Epoch 07214: reducing learning rate of group 0 to 3.5895e-05.


 95%|█████████▍| 473664/500000 [48:55<21:46, 20.15it/s] 

7400 steps; 473600 examples so far; train loss: 0.55, valid loss: 0.48


 95%|█████████▍| 474560/500000 [48:59<02:23, 176.91it/s]

Epoch 07415: reducing learning rate of group 0 to 3.2305e-05.


 97%|█████████▋| 486464/500000 [50:13<10:59, 20.52it/s] 

7600 steps; 486400 examples so far; train loss: 0.56, valid loss: 0.48


 97%|█████████▋| 487424/500000 [50:18<01:12, 172.28it/s]

Epoch 07616: reducing learning rate of group 0 to 3.0000e-05.


100%|█████████▉| 499264/500000 [51:32<00:35, 20.60it/s] 

7800 steps; 499200 examples so far; train loss: 0.55, valid loss: 0.48


100%|██████████| 500000/500000 [51:36<00:00, 161.49it/s]


### Avaliando BLEU score

In [14]:
!pip install torchmetrics

# Rotina de avaliação inspirada no notebook de Bruno da Silvia
from torchmetrics import SacreBLEUScore

def evaluate_bleu_score(model, target_dataloader):
    model.eval()

    pred_translations, targets = [], []

    for i, batch in tqdm(enumerate(target_dataloader), total=len(target_dataloader)):
        inputs = batch[0]
        inputs_mask = batch[1]
        targets += [[i] for i in batch[-1]]

        with torch.no_grad():
            model_output = model.generate(input_ids=inputs.to(device), attention_mask=inputs_mask.to(device), max_length=target_max_length)
            pred_translations += tokenizer.batch_decode(model_output, skip_special_tokens=True)

    metric = SacreBLEUScore()
    return metric(pred_translations, targets)



In [15]:
# Restore best model (checkpoint) found
model = torch.load("best_model.pth")
model.eval()

bleu = evaluate_bleu_score(model, test_dataloader)
print(f'\nFinal BLEU score on test: {bleu.item()*100:.2f}')



100%|██████████| 313/313 [10:17<00:00,  1.97s/it]



Final BLEU score on test: 26.85


### Traduzindo alguns exemplos


In [16]:
# Rotina de geração de sentenças inspirada no notebook de Mateus Lindino

model.eval()
randomlist = random.sample(range(0, len(dataset_test)), 5)

for i in randomlist:
    item           = dataset_test[i]
    input_ids      = item[0]
    attention_mask = item[1]
    sample_en      = item[-2]
    sample_pt      = item[-1]

    pred = model.generate(input_ids=input_ids.reshape(1, -1).to(device), attention_mask=attention_mask.reshape(1, -1).to(device), max_length=target_max_length)[0]
    pred = tokenizer.decode(pred, skip_special_tokens=True)

    print('-'*200)
    print(f'{sample_en}\n\tPortuguese Target: {sample_pt}\n\tPortuguese Output: {pred}\n\n')

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
To have your own server, a company or professional, you may find different options, from buying the physical server, or hardware equipment, to hire a dedicated server on the Internet, through the possibility of hiring a VPS or even a Reseller service, depending on the actual needs that have or will have in the future. It also depends on the potential knowledge, professional or employee of the company, on the administration server.
	Portuguese Target: Para ter seu próprio servidor, uma empresa ou um profissional, você pode encontrar opções diferentes, desde a compra de servidores físicos, ou equipamento de hardware, para contratar os serviços de um servidor dedicado na Internet, através da possibilidade de contratar um VPS ou até mesmo um serviço de Revenda, dependendo das necessidades rea