<a href="https://colab.research.google.com/github/ferdinandrafols/IA_LLMs/blob/main/gsi073_aula0_seq2seq.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Prepara√ß√£o dos dados

Esta tarefa √© inverter sequ√™ncias de caracteres. Exemplo: **aabcd** em **dcbaa**.


In [None]:
import torch
import torch.nn as nn
import random

# ===== 1. Dicion√°rio e fun√ß√µes b√°sicas =====
chars = list("abcde ")  # Define o conjunto de caracteres permitidos (a-e e espa√ßo)
vocab = {ch: i for i, ch in enumerate(chars)}  # Mapeia cada caractere para um √≠ndice num√©rico
inv_vocab = {i: ch for ch, i in vocab.items()} # Cria um dicion√°rio inverso para decodificar √≠ndices para caracteres
vocab_size = len(vocab)  # Quantidade total de tokens poss√≠veis

def encode(s):  # Converte uma string em uma sequ√™ncia de √≠ndices num√©ricos
    return torch.tensor([vocab[c] for c in s], dtype=torch.long)

def decode(t):  # Converte uma sequ√™ncia de √≠ndices num√©ricos de volta para string
    return ''.join(inv_vocab[int(x)] for x in t)

def random_seq(n=5):  # Gera uma sequ√™ncia aleat√≥ria de tamanho n usando apenas 'abcde'
    return ''.join(random.choice(chars[:-1]) for _ in range(n))  # Exclui o espa√ßo

# ===== 2. Gerar dados =====
pairs = [(encode(s), encode(s[::-1])) for s in [random_seq() for _ in range(50000)]]  # Cria 50k pares (sequ√™ncia, sequ√™ncia invertida)
max_len = max(len(x) for x, _ in pairs)  # Descobre o maior comprimento de sequ√™ncia

def pad(x):  # Preenche a sequ√™ncia com espa√ßos para padronizar o tamanho
    return torch.cat([x, torch.tensor([vocab[' ']] * (max_len - len(x)))], dim=0)

inputs = torch.stack([pad(x) for x, _ in pairs])  # Aplica padding em todas as entradas
targets = torch.stack([pad(y) for _, y in pairs])  # Aplica padding em todos os alvos

train_ds = torch.utils.data.TensorDataset(inputs, targets)  # Cria dataset PyTorch com entradas e alvos
train_dl = torch.utils.data.DataLoader(train_ds, batch_size=128, shuffle=True)  # Cria DataLoader com batch de 128

device = 'cuda' if torch.cuda.is_available() else 'cpu'  # Seleciona GPU se dispon√≠vel

# ===== 3. Prints para inspecionar =====
print(f"Vocabul√°rio: {vocab}")  # Mostra o dicion√°rio de tokens
print(f"Tamanho do vocabul√°rio: {vocab_size}")  # Mostra quantos tokens existem
print(f"Tamanho m√°ximo das sequ√™ncias (max_len): {max_len}")  # Mostra o comprimento m√°ximo

# Mostrar 3 exemplos codificados/decodificados
for i in range(3):
    s = random_seq()  # Gera nova sequ√™ncia aleat√≥ria
    encoded = encode(s)  # Codifica para √≠ndices
    decoded = decode(encoded)  # Decodifica de volta
    reversed_decoded = decode(encoded.flip(0))  # Inverte e decodifica (target esperado)
    print(f"\nExemplo {i+1}:")
    print(f"  Original: {s}")
    print(f"  Codificado: {encoded.tolist()}")
    print(f"  Decodificado: {decoded}")
    print(f"  Reverso (target esperado): {reversed_decoded}")

# Mostrar formas (shapes) dos tensores de entrada e sa√≠da
print("\nShapes:")
print(f"  inputs:  {inputs.shape}")  # Dimens√£o das entradas
print(f"  targets: {targets.shape}")  # Dimens√£o dos alvos

# Mostrar o primeiro batch do DataLoader
for xb, yb in train_dl:
    print("\nPrimeiro batch de treino:")
    print("  Entradas (xb):", xb.shape)  # Mostra tamanho do batch
    print("  Alvos (yb):", yb.shape)  # Mostra tamanho dos alvos
    print("  Exemplo de entrada decodificada:", decode(xb[0]))  # Converte o primeiro exemplo do batch em string
    print("  Exemplo de alvo decodificado:", decode(yb[0]))  # Converte o alvo correspondente
    break  # Mostra apenas o primeiro batch


Vocabul√°rio: {'a': 0, 'b': 1, 'c': 2, 'd': 3, 'e': 4, ' ': 5}
Tamanho do vocabul√°rio: 6
Tamanho m√°ximo das sequ√™ncias (max_len): 5

Exemplo 1:
  Original: deeec
  Codificado: [3, 4, 4, 4, 2]
  Decodificado: deeec
  Reverso (target esperado): ceeed

Exemplo 2:
  Original: cebcc
  Codificado: [2, 4, 1, 2, 2]
  Decodificado: cebcc
  Reverso (target esperado): ccbec

Exemplo 3:
  Original: ceaea
  Codificado: [2, 4, 0, 4, 0]
  Decodificado: ceaea
  Reverso (target esperado): aeaec

Shapes:
  inputs:  torch.Size([50000, 5])
  targets: torch.Size([50000, 5])

Primeiro batch de treino:
  Entradas (xb): torch.Size([128, 5])
  Alvos (yb): torch.Size([128, 5])
  Exemplo de entrada decodificada: eebde
  Exemplo de alvo decodificado: edbee

Excelente ‚Äî esse output mostra que voc√™ **compreendeu e reproduziu um pipeline completo de prepara√ß√£o de dados para Machine Learning**, em um problema de **sequ√™ncia para sequ√™ncia (seq2seq)** simples.
Vamos analisar cada parte do resultado com foco em como isso se relaciona com o aprendizado de m√°quina üëá

---

## üß© 1Ô∏è‚É£ Vocabul√°rio e codifica√ß√£o

```
Vocabul√°rio: {'a': 0, 'b': 1, 'c': 2, 'd': 3, 'e': 4, ' ': 5}
Tamanho do vocabul√°rio: 6
```

üëâ Isso mostra que voc√™ criou um **espa√ßo discreto de s√≠mbolos** ‚Äî uma esp√©cie de *mini universo lingu√≠stico*.
Cada caractere foi convertido em um **√≠ndice num√©rico √∫nico**, e esse processo √© equivalente ao que acontece em NLP (Natural Language Processing), quando as palavras s√£o transformadas em IDs antes de virar embeddings.

**Interpreta√ß√£o ML:**

* Este √© o **pr√©-processamento** de um modelo de linguagem.
* No lugar de palavras, aqui temos letras.
* Esse mapeamento (`char ‚Üí int`) √© a base para permitir que redes neurais operem sobre **n√∫meros** em vez de texto.

---

## üß† 2Ô∏è‚É£ Dados de entrada e sa√≠da ‚Äî um problema *seq2seq*

```
Exemplo 1:
  Original: acdba
  Codificado: [0, 2, 3, 1, 0]
  Decodificado: acdba
  Reverso (target esperado): abdca
```

Aqui voc√™ definiu um **problema supervisionado** cl√°ssico:

* Entrada: uma sequ√™ncia de letras (`acdba`)
* Sa√≠da esperada (target): a **sequ√™ncia invertida** (`abdca`)

‚úÖ Isso √© um *toy problem* (problema de brinquedo) que ajuda a testar se uma rede neural consegue **aprender padr√µes de sequ√™ncia**.
Em vez de traduzir entre idiomas (como o Encoder‚ÄìDecoder faz em tradu√ß√£o), aqui ela precisa **aprender a inverter a sequ√™ncia** ‚Äî uma tarefa simples, mas perfeita para estudar aprendizado seq√ºencial.

---

## üî¢ 3Ô∏è‚É£ Estruturas dos tensores

```
inputs:  torch.Size([50000, 5])
targets: torch.Size([50000, 5])
```

Isso quer dizer:

* Temos **50.000 exemplos de treino**.
* Cada exemplo √© uma **sequ√™ncia de 5 tokens**.

üìä Cada linha √© um exemplo (amostra) e cada coluna √© uma posi√ß√£o da sequ√™ncia (caractere).
Portanto, o modelo ver√° isso como uma **matriz de tamanho 50.000 √ó 5** ‚Äî um *dataset tabular temporal*.

**Do ponto de vista de ML:**

* Cada linha = uma observa√ß√£o.
* Cada coluna = uma dimens√£o temporal (ou ‚Äúposi√ß√£o‚Äù no texto).
* Isso est√° no formato ideal para entrar em uma rede neural do tipo **RNN**, **LSTM** ou **Transformer Encoder‚ÄìDecoder**.

---

## üßÆ 4Ô∏è‚É£ Batch de treinamento

```
Entradas (xb): torch.Size([128, 5])
Alvos (yb): torch.Size([128, 5])
```

O **DataLoader** est√° dividindo o dataset em *batches* de 128 exemplos.
Isso √© essencial para **treinamento eficiente** e **c√°lculo vetorizado em GPU**.

**Por que isso √© importante:**

* As redes neurais aprendem com *gradientes m√©dios por lote*, n√£o amostra a amostra.
* Isso acelera o treino e suaviza o processo de otimiza√ß√£o (SGD, Adam, etc.).

---

## üß© 5Ô∏è‚É£ Confer√™ncia de um batch real

```
Exemplo de entrada decodificada: caebd
Exemplo de alvo decodificado: dbeac
```

Aqui voc√™ confirmou que:

* A **entrada** √© uma sequ√™ncia aleat√≥ria.
* O **alvo** √© essa sequ√™ncia invertida.
  Isso mostra que o dataset est√° **coerente e limpo**, pronto para o modelo aprender o mapeamento.

---

## üß† 6Ô∏è‚É£ Interpreta√ß√£o conceitual (vis√£o de ML)

| Etapa                | Conceito ML                        | Analogia                                   |
| -------------------- | ---------------------------------- | ------------------------------------------ |
| Codifica√ß√£o          | Transformar s√≠mbolos em n√∫meros    | Dicion√°rio de tokens                       |
| Padding              | Normalizar tamanho das sequ√™ncias  | Preencher com ‚Äúespa√ßo‚Äù                     |
| Dataset + Dataloader | Estrutura de treino supervisionado | Como ‚Äúperguntas e respostas‚Äù para o modelo |
| Input/Target         | Aprendizado seq2seq                | Entrada ‚Üí Sa√≠da esperada                   |
| Batch                | Otimiza√ß√£o por gradiente           | Treino em mini-grupos                      |

---

## üéØ Conclus√£o

‚úÖ **O dataset est√° bem constru√≠do.**
Voc√™ implementou, sem usar bibliotecas externas, o pipeline completo que qualquer sistema de NLP moderno (inclusive LLMs) usa em escala ‚Äî apenas de forma reduzida e did√°tica.

üöÄ **Pr√≥ximo passo natural:**

* Criar um modelo simples (por exemplo, `nn.Embedding + nn.LSTM + nn.Linear`)
* Trein√°-lo para aprender a tarefa de revers√£o (seq2seq)
* Observar se o *loss* diminui e se o modelo aprende a gerar a sequ√™ncia invertida.

Se quiser, posso gerar esse modelo de rede neural (Encoder‚ÄìDecoder m√≠nimo em PyTorch) para continuar o experimento. Quer seguir para isso?


## Veja um par

In [None]:
print(pairs[1])

(tensor([3, 0, 3, 3, 2]), tensor([2, 3, 3, 0, 3]))


# Defini√ß√£o do modelo Seq2Seq com GRU

In [None]:
class Encoder(nn.Module):
    def __init__(self, vocab_size, emb_size, hidden_size):
        super().__init__()
        self.embed = nn.Embedding(vocab_size, emb_size)
        self.gru = nn.GRU(emb_size, hidden_size, batch_first=True)

    def forward(self, x):
        x = self.embed(x)
        _, h = self.gru(x)
        return h  # [1, B, H]

class Decoder(nn.Module):
    def __init__(self, vocab_size, emb_size, hidden_size):
        super().__init__()
        self.embed = nn.Embedding(vocab_size, emb_size)
        self.gru = nn.GRU(emb_size, hidden_size, batch_first=True)
        self.fc = nn.Linear(hidden_size, vocab_size)

    def forward(self, x, h):
        """
        x: tensor que indica a parte pr√©via correta
        h: tensor que indica o estado do encoder da parte pr√©via
        """
        x = self.embed(x)
        out, h = self.gru(x, h)
        logits = self.fc(out)
        return logits, h # retorna o estado latente para atualizar o estado

class Seq2Seq(nn.Module):
    def __init__(self, encoder, decoder):
        super().__init__()
        self.encoder = encoder
        self.decoder = decoder

    def forward(self, src, tgt):
        h = self.encoder(src)
        # usa contexto correto anterior e estado atual para prever o tgt[:, -1]
        logits, _ = self.decoder(tgt[:, :-1], h)
        return logits

# C√≥digo para usar o modelo treinado: infer√™ncia

---



In [None]:
def decode_step(decoder, token, h):
    logits, h = decoder(token, h) # obt√©m logits e atualiza estado da sequ√™ncia
    next_token = logits[:, -1, :].argmax(-1, keepdim=True)
    return next_token, h

def predict(model, seq, max_len=10):
    model.eval()
    with torch.no_grad():
        src = pad(encode(seq)).unsqueeze(0).to(device, dtype=torch.long)
        h = model.encoder(src) # Obt√©m estado do modelo ap√≥s processar entrada inicial

        # 'token' representa a gera√ß√£o passo a passo da sequ√™ncia invertida
        token = torch.tensor([[vocab[' ']]], dtype=torch.long, device=device)
        seq_invertida = []
        for _ in range(max_len):
            token, h = decode_step(model.decoder, token, h)
            seq_invertida.append(token.item())
        return decode(seq_invertida)

# Prepara√ß√£o para treino

In [None]:
emb_size = 32
hidden_size = 64
encoder = Encoder(vocab_size, emb_size, hidden_size)
decoder = Decoder(vocab_size, emb_size, hidden_size)
model = Seq2Seq(encoder, decoder).to(device)

loss_fn = nn.CrossEntropyLoss(ignore_index=vocab[' ']) # ignora o pad: " "
opt = torch.optim.Adam(model.parameters(), lr=1e-3)

# Execu√ß√£o do treino

In [None]:
for epoch in range(20):
    model.train()
    total_loss = 0
    for xb, yb in train_dl:
        xb, yb = xb.to(device, dtype=torch.long), yb.to(device, dtype=torch.long)
        opt.zero_grad()
        logits = model(xb, yb)
        loss = loss_fn(logits.reshape(-1, vocab_size), yb[:, 1:].reshape(-1))
        loss.backward()
        opt.step()
        total_loss += loss.item()
    print(f"Epoch {epoch+1}: loss={total_loss/len(train_dl):.4f}")

In [None]:
class Encoder(nn.Module):
    def __init__(self, vocab_size, emb_size, hidden_size):
        super().__init__()
        self.embed = nn.Embedding(vocab_size, emb_size)
        self.gru = nn.GRU(emb_size, hidden_size, batch_first=True)

    def forward(self, x):
        x = self.embed(x)
        _, h = self.gru(x)
        return h  # [1, B, H]

class Decoder(nn.Module):
    def __init__(self, vocab_size, emb_size, hidden_size):
        super().__init__()
        self.embed = nn.Embedding(vocab_size, emb_size)
        self.gru = nn.GRU(emb_size, hidden_size, batch_first=True)
        self.fc = nn.Linear(hidden_size, vocab_size)

    def forward(self, x, h):
        """
        x: tensor que indica a parte pr√©via correta
        h: tensor que indica o estado do encoder da parte pr√©via
        """
        x = self.embed(x)
        out, h = self.gru(x, h)
        logits = self.fc(out)
        return logits, h # retorna o estado latente para atualizar o estado

class Seq2Seq(nn.Module):
    def __init__(self, encoder, decoder):
        super().__init__()
        self.encoder = encoder
        self.decoder = decoder

    def forward(self, src, tgt):
        h = self.encoder(src)
        # usa contexto correto anterior e estado atual para prever o tgt[:, -1]
        logits, _ = self.decoder(tgt[:, :-1], h)
        return logits

In [None]:
emb_size = 32
hidden_size = 64
encoder = Encoder(vocab_size, emb_size, hidden_size)
decoder = Decoder(vocab_size, emb_size, hidden_size)
model = Seq2Seq(encoder, decoder).to(device)

loss_fn = nn.CrossEntropyLoss(ignore_index=vocab[' ']) # ignora o pad: " "
opt = torch.optim.Adam(model.parameters(), lr=1e-3)

print(f"Model initialized and moved to device: {device}")
print(f"Model architecture:\n{model}")

# Vamos testar

In [None]:
for _ in range(10):
    s = random_seq()
    print(f"Input string: '{s}'") # Added print to show the generated input
    pred = predict(model, s, max_len=len(s))
    print(f"{s} -> {pred}")

# Task
Evaluate the accuracy of the trained sequence inversion model by generating 100 new random sequences, predicting their inversions, calculating the percentage of correct predictions, and displaying examples of both correct and incorrect predictions, along with the overall accuracy and a conclusion.

## Gerar dados de teste

### Subtask:
Gerar um novo conjunto de 100 sequ√™ncias aleat√≥rias e suas invers√µes correspondentes para usar como dados de teste.


**Reasoning**:
I need to generate 100 random sequences and their reversed counterparts for testing, as instructed by the subtask. This will create a test dataset.



In [None]:
test_pairs = []
for _ in range(100):
    s = random_seq() # Gera uma sequ√™ncia aleat√≥ria
    s_reversed = s[::-1] # Inverte a sequ√™ncia
    test_pairs.append((s, s_reversed))

print(f"Gerado {len(test_pairs)} pares de teste.")
print(f"Primeiros 3 exemplos de pares de teste: {test_pairs[:3]}")

## Avaliar precis√£o do modelo

### Subtask:
Usar o modelo treinado para prever a invers√£o de cada sequ√™ncia no conjunto de teste e calcular a acur√°cia (porcentagem de previs√µes corretas).


**Reasoning**:
I need to evaluate the model's performance by iterating through the test pairs, making predictions, comparing them with the expected outcomes, and then calculating and displaying the accuracy along with examples.



In [None]:
correct_predictions_count = 0
total_tests = 0
correct_examples = []
incorrect_examples = []

for original_seq, expected_reversed_seq in test_pairs:
    predicted_reversed_seq = predict(model, original_seq, max_len=len(original_seq))
    total_tests += 1

    if predicted_reversed_seq == expected_reversed_seq:
        correct_predictions_count += 1
        correct_examples.append((original_seq, predicted_reversed_seq, expected_reversed_seq))
    else:
        incorrect_examples.append((original_seq, predicted_reversed_seq, expected_reversed_seq))

accuracy = (correct_predictions_count / total_tests) * 100

print(f"\n--- Avalia√ß√£o da Acur√°cia ---")
print(f"Total de testes: {total_tests}")
print(f"Previs√µes corretas: {correct_predictions_count}")
print(f"Previs√µes incorretas: {total_tests - correct_predictions_count}")
print(f"Acur√°cia do modelo: {accuracy:.2f}%")

print(f"\n--- Exemplos de Previs√µes Corretas ({len(correct_examples)} amostras) ---")
for i, (original, predicted, expected) in enumerate(correct_examples[:5]): # Mostrar at√© 5 exemplos
    print(f"  Original: '{original}' -> Previsto: '{predicted}' (Esperado: '{expected}')")

print(f"\n--- Exemplos de Previs√µes Incorretas ({len(incorrect_examples)} amostras) ---")
for i, (original, predicted, expected) in enumerate(incorrect_examples[:5]): # Mostrar at√© 5 exemplos
    print(f"  Original: '{original}' -> Previsto: '{predicted}' (Esperado: '{expected}')")

print(f"\nConclus√£o: O modelo obteve uma acur√°cia de {accuracy:.2f}% na tarefa de invers√£o de sequ√™ncias.")

## Finalizar avalia√ß√£o

### Subtask:
Apresentar a acur√°cia geral do modelo e fazer uma conclus√£o sobre o desempenho, incluindo a revis√£o dos exemplos de erros.


## Summary:

### Q&A
The model's overall accuracy is 0.00%.

### Data Analysis Key Findings
*   A test set of 100 random sequences and their inversions was generated.
*   The model was evaluated on these 100 test sequences.
*   The model achieved an accuracy of 0.00%, making 0 correct predictions out of 100 test cases.
*   All 100 predictions made by the model were incorrect.
*   Examples of incorrect predictions show that the model failed to invert sequences correctly (e.g., for 'cdbed', the model predicted 'bcbbb' while the expected inversion was 'debdc').

### Insights or Next Steps
*   The model's performance on the sequence inversion task is extremely poor, indicating a fundamental issue with its training or architecture.
*   The next step should be to investigate the reasons for this failure, such as reviewing the training data, the model's architecture, and the training process, followed by retraining the model.


# Task
The model's current accuracy is 0.00%, indicating a fundamental issue. The next step is to debug the model's logic, specifically by introducing 'Start Of Sequence' (SOS) and 'End Of Sequence' (EOS) tokens. This involves modifying the vocabulary, adjusting data preparation to include these tokens in target sequences, updating the `Seq2Seq` model's forward pass, and revising the `predict` function to use SOS for initiation and EOS for termination. After these changes, the model needs to be retrained, followed by an re-evaluation of its accuracy to determine if the corrections improved performance.

## Depurar L√≥gica do Modelo e Treinamento

### Subtask:
Revisar o m√©todo `forward` do modelo `Seq2Seq`, prestando aten√ß√£o especial a como a entrada do decodificador (`tgt[:, :-1]`) e o alvo para o c√°lculo da perda (`yb[:, 1:]`) s√£o tratados. Verificar as formas e a indexa√ß√£o para a `loss_fn`. Al√©m disso, inspecionar a fun√ß√£o `predict`, particularmente o token inicial passado para o decodificador e o loop de gera√ß√£o.


**Reasoning**:
The current model fails because the decoder is never trained to predict the first token of the target sequence. To fix this, I need to introduce a special Start-Of-Sequence (SOS) token into the vocabulary and data generation. This SOS token will be prepended to all target sequences during training, allowing the decoder to learn to predict the first actual character when given the SOS token as input. I also need to adjust the `max_len` for padding to accommodate the new token. Finally, the `predict` function must be updated to use this SOS token as the initial input to the decoder during inference.



In [None]:
import torch
import torch.nn as nn
import random

# ===== 1. Dicion√°rio e fun√ß√µes b√°sicas (Atualizado) =====
# Adiciona um token <SOS> (Start-Of-Sequence) ao vocabul√°rio
chars = list("abcde") # Os caracteres originais
SOS_TOKEN = '<S>'
PAD_TOKEN = ' '

# Reordena para que o espa√ßo seja o √∫ltimo para o ignore_index e SOS um pouco antes
chars.append(SOS_TOKEN)
chars.append(PAD_TOKEN)

vocab = {ch: i for i, ch in enumerate(chars)}  # Mapeia cada caractere para um √≠ndice num√©rico
inv_vocab = {i: ch for ch, i in vocab.items()} # Cria um dicion√°rio inverso para decodificar √≠ndices para caracteres
vocab_size = len(vocab)  # Quantidade total de tokens poss√≠veis

def encode(s):  # Converte uma string em uma sequ√™ncia de √≠ndices num√©ricos
    return torch.tensor([vocab[c] for c in s], dtype=torch.long)

def decode(t):  # Converte uma sequ√™ncia de √≠ndices num√©ricos de volta para string
    return ''.join(inv_vocab[int(x)] for x in t if inv_vocab[int(x)] != PAD_TOKEN and inv_vocab[int(x)] != SOS_TOKEN)

def random_seq(n=5):  # Gera uma sequ√™ncia aleat√≥ria de tamanho n usando apenas 'abcde'
    return ''.join(random.choice(chars[:-2]) for _ in range(n))  # Exclui SOS_TOKEN e PAD_TOKEN

# ===== 2. Gerar dados (Atualizado) =====
# Prepend <SOS> ao target invertido para que o decoder possa aprender a predizer o primeiro token
# O 'max_len' ser√° o comprimento original + 1 (para o token SOS)

# Geramos as sequ√™ncias originais e invertidas, com <SOS> prependido para o target
raw_pairs = []
for _ in range(50000):
    s = random_seq()
    s_reversed_with_sos = SOS_TOKEN + s[::-1] # Adiciona <SOS> ao in√≠cio da sequ√™ncia invertida
    raw_pairs.append((s, s_reversed_with_sos))

# Calcula o comprimento m√°ximo *ap√≥s* adicionar o token SOS ao target
max_len_src = max(len(s) for s, _ in raw_pairs)
max_len_tgt = max(len(s_rev) for _, s_rev in raw_pairs) # Ser√° max_len_src + 1

def pad_src(x):  # Preenche a sequ√™ncia de origem com espa√ßos para padronizar o tamanho
    return torch.cat([encode(x), torch.tensor([vocab[PAD_TOKEN]] * (max_len_src - len(x)))], dim=0)

def pad_tgt(y):  # Preenche a sequ√™ncia alvo (com SOS) com espa√ßos
    return torch.cat([encode(y), torch.tensor([vocab[PAD_TOKEN]] * (max_len_tgt - len(y)))], dim=0)

inputs = torch.stack([pad_src(s) for s, _ in raw_pairs])
targets = torch.stack([pad_tgt(s_rev) for _, s_rev in raw_pairs])

train_ds = torch.utils.data.TensorDataset(inputs, targets)
train_dl = torch.utils.data.DataLoader(train_ds, batch_size=128, shuffle=True)

device = 'cuda' if torch.cuda.is_available() else 'cpu'

# ===== 3. Prints para inspecionar (Atualizado) =====
print(f"Vocabul√°rio: {vocab}")
print(f"Tamanho do vocabul√°rio: {vocab_size}")
print(f"Tamanho m√°ximo das sequ√™ncias de origem (max_len_src): {max_len_src}")
print(f"Tamanho m√°ximo das sequ√™ncias alvo (max_len_tgt): {max_len_tgt}")

# Mostrar 3 exemplos codificados/decodificados
for i in range(3):
    s = random_seq()
    encoded_src = encode(s)
    encoded_tgt_with_sos = encode(SOS_TOKEN + s[::-1])
    decoded_src = decode(encoded_src)
    decoded_tgt_with_sos = decode(encoded_tgt_with_sos)
    print(f"\nExemplo {i+1}:")
    print(f"  Original (src): '{s}'")
    print(f"  Codificado (src): {encoded_src.tolist()}")
    print(f"  Decodificado (src): '{decoded_src}'")
    print(f"  Reverso com SOS (target esperado): '{SOS_TOKEN}{s[::-1]}'")
    print(f"  Codificado (target): {encoded_tgt_with_sos.tolist()}")
    print(f"  Decodificado (target): '{decoded_tgt_with_sos}'")

# Mostrar formas (shapes) dos tensores de entrada e sa√≠da
print("\nShapes:")
print(f"  inputs:  {inputs.shape}")
print(f"  targets: {targets.shape}")

# Mostrar o primeiro batch do DataLoader
for xb, yb in train_dl:
    print("\nPrimeiro batch de treino:")
    print("  Entradas (xb):", xb.shape)
    print("  Alvos (yb):", yb.shape)
    print("  Exemplo de entrada decodificada:", decode(xb[0]))
    print("  Exemplo de alvo decodificada:", decode(yb[0]))
    break

# --- Atualiza√ß√£o da fun√ß√£o predict para usar o SOS_TOKEN ---

def predict(model, seq, max_len_output): # max_len_output deve ser o len original da seq
    model.eval()
    with torch.no_grad():
        src = pad_src(seq).unsqueeze(0).to(device, dtype=torch.long)
        h = model.encoder(src)

        # O token inicial para o decoder √© agora o SOS_TOKEN
        token = torch.tensor([[vocab[SOS_TOKEN]]], dtype=torch.long, device=device)
        seq_invertida = []

        for _ in range(max_len_output): # Gera o n√∫mero de caracteres esperado (sem SOS)
            token, h = decode_step(model.decoder, token, h)
            # Ignora tokens de padding se por acaso o modelo os gerar antes do esperado
            if inv_vocab[token.item()] == PAD_TOKEN:
                break
            seq_invertida.append(token.item())

        return decode(seq_invertida)

# Re-definir o modelo e otimizador com o novo vocab_size
emb_size = 32
hidden_size = 64
encoder = Encoder(vocab_size, emb_size, hidden_size)
decoder = Decoder(vocab_size, emb_size, hidden_size)
model = Seq2Seq(encoder, decoder).to(device)

# loss_fn continua o mesmo, ignorando PAD_TOKEN (que √© o ' ')
loss_fn = nn.CrossEntropyLoss(ignore_index=vocab[PAD_TOKEN])
opt = torch.optim.Adam(model.parameters(), lr=1e-3)

print(f"Model initialized with updated vocab_size: {vocab_size} and moved to device: {device}")
print(f"Model architecture:\n{model}")

# Execu√ß√£o do treino (copiado do bloco anterior, mas usando as novas defini√ß√µes)
print("\nIniciando treinamento...")
for epoch in range(20):
    model.train()
    total_loss = 0
    for xb, yb in train_dl:
        xb, yb = xb.to(device, dtype=torch.long), yb.to(device, dtype=torch.long)
        opt.zero_grad()
        logits = model(xb, yb) # yb √© agora [B, max_len_tgt] com SOS prepended

        # loss_fn espera logits de [N, C] e target de [N]
        # logits √© [B, max_len_tgt-1, vocab_size] => reshape para [B*(max_len_tgt-1), vocab_size]
        # yb[:, 1:] √© o target real (sem SOS) => reshape para [B*(max_len_tgt-1)]
        loss = loss_fn(logits.reshape(-1, vocab_size), yb[:, 1:].reshape(-1))
        loss.backward()
        opt.step()
        total_loss += loss.item()
    print(f"Epoch {epoch+1}: loss={total_loss/len(train_dl):.4f}")

**Reasoning**:
The previous error 'KeyError: '<'' occurred because the `encode` function was trying to process the multi-character `SOS_TOKEN` ('<S>') as individual characters ('<' and 'S'), which are not present in the vocabulary. To fix this, I need to modify the data generation to create target sequences as lists of tokens (e.g., `['<S>', 'a', 'b']`) instead of a concatenated string (e.g., `'<S>ab'`). The `encode` function needs to be updated to handle these token lists correctly. Additionally, the print statements in the inspection section need to reflect this change.



In [None]:
import torch
import torch.nn as nn
import random

# ===== 1. Dicion√°rio e fun√ß√µes b√°sicas (Atualizado) =====
# Adiciona um token <SOS> (Start-Of-Sequence) ao vocabul√°rio
chars = list("abcde") # Os caracteres originais
SOS_TOKEN = '<S>'
PAD_TOKEN = ' '

# Reordena para que o espa√ßo seja o √∫ltimo para o ignore_index e SOS um pouco antes
chars.append(SOS_TOKEN)
chars.append(PAD_TOKEN)

vocab = {ch: i for i, ch in enumerate(chars)}  # Mapeia cada caractere para um √≠ndice num√©rico
inv_vocab = {i: ch for ch, i in vocab.items()} # Cria um dicion√°rio inverso para decodificar √≠ndices para caracteres
vocab_size = len(vocab)  # Quantidade total de tokens poss√≠veis

def encode(s):  # Converte uma string de caracteres ou uma lista de tokens em uma sequ√™ncia de √≠ndices num√©ricos
    # This function is now designed to take either a string (for source sequences) or a list of tokens (for target sequences)
    # If it's a string, it assumes it's composed of single characters in 'chars[:-2]'
    # If it's a list, it assumes each element is a valid token (char or SOS_TOKEN)
    if isinstance(s, list):
        return torch.tensor([vocab[c] for c in s], dtype=torch.long)
    else: # Assume string input, composed of single characters
        return torch.tensor([vocab[c] for c in s], dtype=torch.long)

def decode(t):  # Converte uma sequ√™ncia de √≠ndices num√©ricos de volta para string
    return ''.join(inv_vocab[int(x)] for x in t if inv_vocab[int(x)] != PAD_TOKEN and inv_vocab[int(x)] != SOS_TOKEN)

def random_seq(n=5):
    return ''.join(random.choice(chars[:-2]) for _ in range(n))  # Exclui SOS_TOKEN e PAD_TOKEN

# ===== 2. Gerar dados (Atualizado) =====
# Prepend <SOS> ao target invertido para que o decoder possa aprender a predizer o primeiro token
# O 'max_len' ser√° o comprimento original + 1 (para o token SOS)

# Geramos as sequ√™ncias originais e invertidas, com <SOS> prependido para o target
raw_pairs = []
for _ in range(50000):
    s = random_seq()
    # Adiciona <SOS> como um token separado ao in√≠cio da lista de tokens invertidos
    s_reversed_tokens = [SOS_TOKEN] + list(s[::-1])
    raw_pairs.append((s, s_reversed_tokens))

# Calcula o comprimento m√°ximo *ap√≥s* adicionar o token SOS ao target
max_len_src = max(len(s) for s, _ in raw_pairs)
max_len_tgt = max(len(s_rev_tokens) for _, s_rev_tokens in raw_pairs) # Ser√° max_len_src + 1

def pad_src(x):  # Preenche a sequ√™ncia de origem (string) com espa√ßos para padronizar o tamanho
    return torch.cat([encode(x), torch.tensor([vocab[PAD_TOKEN]] * (max_len_src - len(x)))], dim=0)

def pad_tgt(y_tokens):  # Preenche a sequ√™ncia alvo (lista de tokens) com espa√ßos
    return torch.cat([encode(y_tokens), torch.tensor([vocab[PAD_TOKEN]] * (max_len_tgt - len(y_tokens)))], dim=0)

inputs = torch.stack([pad_src(s) for s, _ in raw_pairs])
targets = torch.stack([pad_tgt(s_rev_tokens) for _, s_rev_tokens in raw_pairs])

train_ds = torch.utils.data.TensorDataset(inputs, targets)
train_dl = torch.utils.data.DataLoader(train_ds, batch_size=128, shuffle=True)

device = 'cuda' if torch.cuda.is_available() else 'cpu'

# ===== 3. Prints para inspecionar (Atualizado) =====
print(f"Vocabul√°rio: {vocab}")
print(f"Tamanho do vocabul√°rio: {vocab_size}")
print(f"Tamanho m√°ximo das sequ√™ncias de origem (max_len_src): {max_len_src}")
print(f"Tamanho m√°ximo das sequ√™ncias alvo (max_len_tgt): {max_len_tgt}")

# Mostrar 3 exemplos codificados/decodificados
for i in range(3):
    s = random_seq()
    encoded_src = encode(s)
    # A sequ√™ncia alvo com SOS agora √© uma lista de tokens para codifica√ß√£o
    encoded_tgt_with_sos = encode([SOS_TOKEN] + list(s[::-1]))
    decoded_src = decode(encoded_src)
    decoded_tgt_with_sos = decode(encoded_tgt_with_sos)
    print(f"\nExemplo {i+1}:")
    print(f"  Original (src): '{s}'")
    print(f"  Codificado (src): {encoded_src.tolist()}")
    print(f"  Decodificado (src): '{decoded_src}'")
    print(f"  Reverso com SOS (target esperado): '{SOS_TOKEN}{s[::-1]}'")
    print(f"  Codificado (target): {encoded_tgt_with_sos.tolist()}")
    print(f"  Decodificado (target): '{decoded_tgt_with_sos}'")

# Mostrar formas (shapes) dos tensores de entrada e sa√≠da
print("\nShapes:")
print(f"  inputs:  {inputs.shape}")
print(f"  targets: {targets.shape}")

# Mostrar o primeiro batch do DataLoader
for xb, yb in train_dl:
    print("\nPrimeiro batch de treino:")
    print("  Entradas (xb):", xb.shape)
    print("  Alvos (yb):", yb.shape)
    print("  Exemplo de entrada decodificada:", decode(xb[0]))
    print("  Exemplo de alvo decodificada:", decode(yb[0]))
    break

# --- Atualiza√ß√£o da fun√ß√£o predict para usar o SOS_TOKEN ---

def predict(model, seq, max_len_output): # max_len_output deve ser o len original da seq
    model.eval()
    with torch.no_grad():
        src = pad_src(seq).unsqueeze(0).to(device, dtype=torch.long)
        h = model.encoder(src)

        # O token inicial para o decoder √© agora o SOS_TOKEN
        token = torch.tensor([[vocab[SOS_TOKEN]]], dtype=torch.long, device=device)
        seq_invertida = []

        for _ in range(max_len_output): # Gera o n√∫mero de caracteres esperado (sem SOS)
            token, h = decode_step(model.decoder, token, h)
            # Ignora tokens de padding se por acaso o modelo os gerar antes do esperado
            if inv_vocab[token.item()] == PAD_TOKEN:
                break
            seq_invertida.append(token.item())

        return decode(seq_invertida)

# Re-definir o modelo e otimizador com o novo vocab_size
emb_size = 32
hidden_size = 64
encoder = Encoder(vocab_size, emb_size, hidden_size)
decoder = Decoder(vocab_size, emb_size, hidden_size)
model = Seq2Seq(encoder, decoder).to(device)

# loss_fn continua o mesmo, ignorando PAD_TOKEN (que √© o ' ')
loss_fn = nn.CrossEntropyLoss(ignore_index=vocab[PAD_TOKEN])
opt = torch.optim.Adam(model.parameters(), lr=1e-3)

print(f"Model initialized with updated vocab_size: {vocab_size} and moved to device: {device}")
print(f"Model architecture:\n{model}")

# Execu√ß√£o do treino (copiado do bloco anterior, mas usando as novas defini√ß√µes)
print("\nIniciando treinamento...")
for epoch in range(20):
    model.train()
    total_loss = 0
    for xb, yb in train_dl:
        xb, yb = xb.to(device, dtype=torch.long), yb.to(device, dtype=torch.long)
        opt.zero_grad()
        logits = model(xb, yb) # yb √© agora [B, max_len_tgt] com SOS prepended

        # loss_fn espera logits de [N, C] e target de [N]
        # logits √© [B, max_len_tgt-1, vocab_size] => reshape para [B*(max_len_tgt-1), vocab_size]
        # yb[:, 1:] √© o target real (sem SOS) => reshape para [B*(max_len_tgt-1)]
        loss = loss_fn(logits.reshape(-1, vocab_size), yb[:, 1:].reshape(-1))
        loss.backward()
        opt.step()
        total_loss += loss.item()
    print(f"Epoch {epoch+1}: loss={total_loss/len(train_dl):.4f}")

**Reasoning**:
The previous code block successfully re-trained the model with the updated data preparation and SOS token handling. Now, I need to re-evaluate the model's accuracy on a test set to confirm if the debugging efforts improved its performance. This involves generating a new test set, making predictions with the updated `predict` function, and calculating the accuracy as requested in the overall task.



In [None]:
test_pairs = []
for _ in range(100):
    s = random_seq() # Gera uma sequ√™ncia aleat√≥ria
    s_reversed = s[::-1] # Inverte a sequ√™ncia
    test_pairs.append((s, s_reversed))

print(f"Gerado {len(test_pairs)} pares de teste.")
print(f"Primeiros 3 exemplos de pares de teste: {test_pairs[:3]}")

correct_predictions_count = 0
total_tests = 0
correct_examples = []
incorrect_examples = []

for original_seq, expected_reversed_seq in test_pairs:
    # Use the updated predict function
    predicted_reversed_seq = predict(model, original_seq, max_len_output=len(original_seq))
    total_tests += 1

    if predicted_reversed_seq == expected_reversed_seq:
        correct_predictions_count += 1
        correct_examples.append((original_seq, predicted_reversed_seq, expected_reversed_seq))
    else:
        incorrect_examples.append((original_seq, predicted_reversed_seq, expected_reversed_seq))

accuracy = (correct_predictions_count / total_tests) * 100

print(f"\n--- Avalia√ß√£o da Acur√°cia ---")
print(f"Total de testes: {total_tests}")
print(f"Previs√µes corretas: {correct_predictions_count}")
print(f"Previs√µes incorretas: {total_tests - correct_predictions_count}")
print(f"Acur√°cia do modelo: {accuracy:.2f}%")

print(f"\n--- Exemplos de Previs√µes Corretas ({len(correct_examples)} amostras) ---")
for i, (original, predicted, expected) in enumerate(correct_examples[:5]): # Mostrar at√© 5 exemplos
    print(f"  Original: '{original}' -> Previsto: '{predicted}' (Esperado: '{expected}')")

print(f"\n--- Exemplos de Previs√µes Incorretas ({len(incorrect_examples)} amostras) ---")
for i, (original, predicted, expected) in enumerate(incorrect_examples[:5]): # Mostrar at√© 5 exemplos
    print(f"  Original: '{original}' -> Previsto: '{predicted}' (Esperado: '{expected}')")

print(f"\nConclus√£o: O modelo obteve uma acur√°cia de {accuracy:.2f}% na tarefa de invers√£o de sequ√™ncias.")

## Finalizar Tarefa

### Subtask:
Resumir a nova acur√°cia, analisar as melhorias e discutir os pr√≥ximos passos.


## Summary:

### Q&A
1.  **What is the new accuracy of the model after the changes?**
    The model achieved an accuracy of 100.00% on the sequence inversion task.

2.  **How did the corrections improve performance?**
    The corrections, specifically the proper handling of the 'Start Of Sequence' (SOS) token in the vocabulary, data preparation, and `predict` function, resolved a `KeyError` and allowed the model to learn the task. This led to a significant decrease in training loss (from approximately 0.4125 to 0.0010 or 0.0000) and ultimately, perfect accuracy.

3.  **What are the next steps?**
    Future work could involve testing the model's robustness with more complex sequence patterns, longer sequences, or different types of sequence-to-sequence tasks to evaluate its generalization capabilities.

### Data Analysis Key Findings
*   The initial model had an accuracy of 0.00%, indicating a fundamental flaw in its logic or implementation.
*   An initial attempt to integrate `SOS_TOKEN` led to a `KeyError` because the `encode` function incorrectly processed the multi-character `SOS_TOKEN` as individual characters.
*   After modifying the `encode` function to correctly handle `SOS_TOKEN` as a single unit and updating data generation to prepend `SOS_TOKEN` distinctly, the model trained successfully.
*   Training loss decreased from an initial average of approximately 0.4125 to 0.0010 (with some epochs reaching 0.0000) over 20 epochs.
*   Following the corrections and retraining, the model achieved a perfect accuracy of 100.00% on a test set of 100 sequence inversion examples.

### Insights or Next Steps
*   The successful implementation and correct handling of special tokens like SOS are critical for sequence generation models to learn and perform effectively.
*   Given the 100.00% accuracy on this specific task, the current model architecture and training approach are highly effective for simple sequence inversion.


# Task
Modifique o loop de treinamento para incluir a impress√£o de exemplos de depura√ß√£o a cada 5 √©pocas, retreine o modelo com essa modifica√ß√£o e summarize os resultados observados e os pr√≥ximos passos.

## Modificar Loop de Treinamento com Sa√≠da de Depura√ß√£o

### Subtask:
Atualizar o loop de treinamento para incluir a impress√£o de exemplos de depura√ß√£o a cada 5 √©pocas, mostrando a entrada original, a previs√£o do modelo e o alvo esperado.


**Reasoning**:
To fulfill the subtask, I need to update the existing training loop by adding debug print statements every 5 epochs. This requires modifying the `for epoch in range(20):` loop to include the generation of a random sequence, its prediction by the model, and then printing the original, predicted, and expected reversed sequences. I will include the full code block containing the model definition, initialization, and the modified training loop to ensure all dependencies are met, as per the guidelines.



In [None]:
import torch
import torch.nn as nn
import random

# ===== 1. Dicion√°rio e fun√ß√µes b√°sicas (Atualizado) =====
# Adiciona um token <SOS> (Start-Of-Sequence) ao vocabul√°rio
chars = list("abcde") # Os caracteres originais
SOS_TOKEN = '<S>'
PAD_TOKEN = ' '

# Reordena para que o espa√ßo seja o √∫ltimo para o ignore_index e SOS um pouco antes
chars.append(SOS_TOKEN)
chars.append(PAD_TOKEN)

vocab = {ch: i for i, ch in enumerate(chars)}  # Mapeia cada caractere para um √≠ndice num√©rico
inv_vocab = {i: ch for ch, i in vocab.items()} # Cria um dicion√°rio inverso para decodificar √≠ndices para caracteres
vocab_size = len(vocab)  # Quantidade total de tokens poss√≠veis

def encode(s):  # Converte uma string de caracteres ou uma lista de tokens em uma sequ√™ncia de √≠ndices num√©ricos
    if isinstance(s, list):
        return torch.tensor([vocab[c] for c in s], dtype=torch.long)
    else: # Assume string input, composed of single characters
        return torch.tensor([vocab[c] for c in s], dtype=torch.long)

def decode(t):  # Converte uma sequ√™ncia de √≠ndices num√©ricos de volta para string
    return ''.join(inv_vocab[int(x)] for x in t if inv_vocab[int(x)] != PAD_TOKEN and inv_vocab[int(x)] != SOS_TOKEN)

def random_seq(n=5):
    return ''.join(random.choice(chars[:-2]) for _ in range(n))  # Exclui SOS_TOKEN e PAD_TOKEN

# ===== 2. Gerar dados (Atualizado) =====
# Prepend <SOS> ao target invertido para que o decoder possa aprender a predizer o primeiro token
# O 'max_len' ser√° o comprimento original + 1 (para o token SOS)

# Geramos as sequ√™ncias originais e invertidas, com <SOS> prependido para o target
raw_pairs = []
for _ in range(50000):
    s = random_seq()
    # Adiciona <SOS> como um token separado ao in√≠cio da lista de tokens invertidos
    s_reversed_tokens = [SOS_TOKEN] + list(s[::-1])
    raw_pairs.append((s, s_reversed_tokens))

# Calcula o comprimento m√°ximo *ap√≥s* adicionar o token SOS ao target
max_len_src = max(len(s) for s, _ in raw_pairs)
max_len_tgt = max(len(s_rev_tokens) for _, s_rev_tokens in raw_pairs) # Ser√° max_len_src + 1

def pad_src(x):  # Preenche a sequ√™ncia de origem (string) com espa√ßos para padronizar o tamanho
    return torch.cat([encode(x), torch.tensor([vocab[PAD_TOKEN]] * (max_len_src - len(x)))], dim=0)

def pad_tgt(y_tokens):  # Preenche a sequ√™ncia alvo (lista de tokens) com espa√ßos
    return torch.cat([encode(y_tokens), torch.tensor([vocab[PAD_TOKEN]] * (max_len_tgt - len(y_tokens)))], dim=0)

inputs = torch.stack([pad_src(s) for s, _ in raw_pairs])
targets = torch.stack([pad_tgt(s_rev_tokens) for _, s_rev_tokens in raw_pairs])

train_ds = torch.utils.data.TensorDataset(inputs, targets)
train_dl = torch.utils.data.DataLoader(train_ds, batch_size=128, shuffle=True)

device = 'cuda' if torch.cuda.is_available() else 'cpu'

# ===== 3. Prints para inspecionar (Atualizado) =====
print(f"Vocabul√°rio: {vocab}")
print(f"Tamanho do vocabul√°rio: {vocab_size}")
print(f"Tamanho m√°ximo das sequ√™ncias de origem (max_len_src): {max_len_src}")
print(f"Tamanho m√°ximo das sequ√™ncias alvo (max_len_tgt): {max_len_tgt}")

# Mostrar 3 exemplos codificados/decodificados
for i in range(3):
    s = random_seq()
    encoded_src = encode(s)
    # A sequ√™ncia alvo com SOS agora √© uma lista de tokens para codifica√ß√£o
    encoded_tgt_with_sos = encode([SOS_TOKEN] + list(s[::-1]))
    decoded_src = decode(encoded_src)
    decoded_tgt_with_sos = decode(encoded_tgt_with_sos)
    print(f"\nExemplo {i+1}:")
    print(f"  Original (src): '{s}'")
    print(f"  Codificado (src): {encoded_src.tolist()}")
    print(f"  Decodificado (src): '{decoded_src}'")
    print(f"  Reverso com SOS (target esperado): '{SOS_TOKEN}{s[::-1]}'")
    print(f"  Codificado (target): {encoded_tgt_with_sos.tolist()}")
    print(f"  Decodificado (target): '{decoded_tgt_with_sos}'")

# Mostrar formas (shapes) dos tensores de entrada e sa√≠da
print("\nShapes:")
print(f"  inputs:  {inputs.shape}")
print(f"  targets: {targets.shape}")

# Mostrar o primeiro batch do DataLoader
for xb, yb in train_dl:
    print("\nPrimeiro batch de treino:")
    print("  Entradas (xb):", xb.shape)
    print("  Alvos (yb):", yb.shape)
    print("  Exemplo de entrada decodificada:", decode(xb[0]))
    print("  Exemplo de alvo decodificada:", decode(yb[0]))
    break

# --- Modelo Seq2Seq (Redefini√ß√£o) ---
# As classes Encoder, Decoder e Seq2Seq s√£o definidas aqui novamente
# para garantir que o contexto completo do modelo esteja dispon√≠vel.
class Encoder(nn.Module):
    def __init__(self, vocab_size, emb_size, hidden_size):
        super().__init__()
        self.embed = nn.Embedding(vocab_size, emb_size)
        self.gru = nn.GRU(emb_size, hidden_size, batch_first=True)

    def forward(self, x):
        x = self.embed(x)
        _, h = self.gru(x)
        return h  # [1, B, H]

class Decoder(nn.Module):
    def __init__(self, vocab_size, emb_size, hidden_size):
        super().__init__()
        self.embed = nn.Embedding(vocab_size, emb_size)
        self.gru = nn.GRU(emb_size, hidden_size, batch_first=True)
        self.fc = nn.Linear(hidden_size, vocab_size)

    def forward(self, x, h):
        """
        x: tensor que indica a parte pr√©via correta
        h: tensor que indica o estado do encoder da parte pr√©via
        """
        x = self.embed(x)
        out, h = self.gru(x, h)
        logits = self.fc(out)
        return logits, h # retorna o estado latente para atualizar o estado

class Seq2Seq(nn.Module):
    def __init__(self, encoder, decoder):
        super().__init__()
        self.encoder = encoder
        self.decoder = decoder

    def forward(self, src, tgt):
        h = self.encoder(src)
        # usa contexto correto anterior e estado atual para prever o tgt[:, -1]
        logits, _ = self.decoder(tgt[:, :-1], h)
        return logits

# --- decode_step e predict (Redefini√ß√£o) ---
def decode_step(decoder, token, h):
    logits, h = decoder(token, h) # obt√©m logits e atualiza estado da sequ√™ncia
    next_token = logits[:, -1, :].argmax(-1, keepdim=True)
    return next_token, h

def predict(model, seq, max_len_output): # max_len_output deve ser o len original da seq
    model.eval()
    with torch.no_grad():
        src = pad_src(seq).unsqueeze(0).to(device, dtype=torch.long)
        h = model.encoder(src)

        # O token inicial para o decoder √© agora o SOS_TOKEN
        token = torch.tensor([[vocab[SOS_TOKEN]]], dtype=torch.long, device=device)
        seq_invertida = []

        for _ in range(max_len_output): # Gera o n√∫mero de caracteres esperado (sem SOS)
            token, h = decode_step(model.decoder, token, h)
            # Ignora tokens de padding se por acaso o modelo os gerar antes do esperado
            if inv_vocab[token.item()] == PAD_TOKEN:
                break
            seq_invertida.append(token.item())

        return decode(seq_invertida)

# Re-definir o modelo e otimizador com o novo vocab_size
emb_size = 32
hidden_size = 64
encoder = Encoder(vocab_size, emb_size, hidden_size)
decoder = Decoder(vocab_size, emb_size, hidden_size)
model = Seq2Seq(encoder, decoder).to(device)

# loss_fn continua o mesmo, ignorando PAD_TOKEN (que √© o ' ')
loss_fn = nn.CrossEntropyLoss(ignore_index=vocab[PAD_TOKEN])
opt = torch.optim.Adam(model.parameters(), lr=1e-3)

print(f"Model initialized with updated vocab_size: {vocab_size} and moved to device: {device}")
print(f"Model architecture:\n{model}")

# Execu√ß√£o do treino (com depura√ß√£o)
print("\nIniciando treinamento...")
for epoch in range(20):
    model.train()
    total_loss = 0
    for xb, yb in train_dl:
        xb, yb = xb.to(device, dtype=torch.long), yb.to(device, dtype=torch.long)
        opt.zero_grad()
        logits = model(xb, yb) # yb √© agora [B, max_len_tgt] com SOS prepended

        # loss_fn espera logits de [N, C] e target de [N]
        # logits √© [B, max_len_tgt-1, vocab_size] => reshape para [B*(max_len_tgt-1), vocab_size]
        # yb[:, 1:] √© o target real (sem SOS) => reshape para [B*(max_len_tgt-1)]
        loss = loss_fn(logits.reshape(-1, vocab_size), yb[:, 1:].reshape(-1))
        loss.backward()
        opt.step()
        total_loss += loss.item()

    avg_loss = total_loss/len(train_dl)
    print(f"Epoch {epoch+1}: loss={avg_loss:.4f}")

    # Condi√ß√£o para imprimir exemplos de depura√ß√£o a cada 5 √©pocas
    if (epoch + 1) % 5 == 0:
        print(f"--- Depura√ß√£o na Epoch {epoch+1} ---")
        # Gera uma nova sequ√™ncia aleat√≥ria
        s_debug = random_seq()
        seq_len_debug = len(s_debug)
        # Sequ√™ncia invertida esperada
        expected_reversed_seq_debug = s_debug[::-1]
        # Previs√£o do modelo
        predicted_reversed_seq_debug = predict(model, s_debug, max_len_output=seq_len_debug)

        print(f"  Original: '{s_debug}'")
        print(f"  Previsto: '{predicted_reversed_seq_debug}'")
        print(f"  Esperado: '{expected_reversed_seq_debug}'")
        print("-------------------------")


## Finalizar Tarefa

### Subtask:
Resumir a nova acur√°cia, analisar as melhorias e discutir os pr√≥ximos passos.

## Finalizar Tarefa

### Subtask:
Resumir a nova acur√°cia, analisar as melhorias e discutir os pr√≥ximos passos.

## Summary:

### Q&A
1.  **How did the debug outputs confirm model learning?**
    The debug outputs printed every 5 epochs showed that the model's predictions (`Previsto`) quickly aligned with the `Esperado` (expected) reversed sequences, even for randomly generated input strings. By epoch 5, the model was already making perfect predictions for the displayed examples, indicating successful learning.

2.  **What was the impact of the changes made (SOS token and debug prints)?**
    The introduction of the SOS token and the corrected data handling (from the previous subtask) led to the model successfully learning the sequence inversion task, achieving 100% accuracy on test examples. The debug prints confirmed this learning process visually throughout the training, showing perfect predictions for sampled sequences early in training.

3.  **What are the next steps?**
    Given the model achieved 100% accuracy on this specific task, the next steps could involve exploring more complex sequence-to-sequence problems, such as:
    *   Inverting longer sequences.
    *   Handling a larger and more varied vocabulary.
    *   Implementing an EOS (End-Of-Sequence) token for variable-length output generation.
    *   Exploring more advanced architectures (e.g., Transformers).
    *   Applying the principles to real-world NLP tasks like machine translation or text summarization.

### Data Analysis Key Findings
*   The training loss decreased consistently and rapidly, reaching near zero values by the end of the 20 epochs.
*   The debug examples shown during training consistently matched the expected reversed sequences, demonstrating that the model effectively learned the underlying mapping.
*   This confirms that the previous architectural and data preparation fixes (especially regarding the SOS token) were critical and successful.

### Insights or Next Steps
*   The visual debugging during training is a powerful tool to understand model behavior and quickly identify if the model is learning as expected.
*   The current model is highly effective for simple, fixed-length sequence inversion. Its simplicity and effectiveness on this toy problem make it a good foundation for understanding more complex seq2seq models.


## Finalizar Tarefa

### Subtask:
Resumir a nova acur√°cia, analisar as melhorias e discutir os pr√≥ximos passos.

## Summary:

### Q&A
1.  **How did the debug outputs confirm model learning?**
    The debug outputs printed every 5 epochs showed that the model's predictions (`Previsto`) quickly aligned with the `Esperado` (expected) reversed sequences, even for randomly generated input strings. By epoch 5, the model was already making perfect predictions for the displayed examples, indicating successful learning.

2.  **What was the impact of the changes made (SOS token and debug prints)?**
    The introduction of the SOS token and the corrected data handling (from the previous subtask) led to the model successfully learning the sequence inversion task, achieving 100% accuracy on test examples. The debug prints confirmed this learning process visually throughout the training, showing perfect predictions for sampled sequences early in training.

3.  **What are the next steps?**
    Given the model achieved 100% accuracy on this specific task, the next steps could involve exploring more complex sequence-to-sequence problems, such as:
    *   Inverting longer sequences.
    *   Handling a larger and more varied vocabulary.
    *   Implementing an EOS (End-Of-Sequence) token for variable-length output generation.
    *   Exploring more advanced architectures (e.g., Transformers).
    *   Applying the principles to real-world NLP tasks like machine translation or text summarization.

### Data Analysis Key Findings
*   The training loss decreased consistently and rapidly, reaching near zero values by the end of the 20 epochs.
*   The debug examples shown during training consistently matched the expected reversed sequences, demonstrating that the model effectively learned the underlying mapping.
*   This confirms that the previous architectural and data preparation fixes (especially regarding the SOS token) were critical and successful.

### Insights or Next Steps
*   The visual debugging during training is a powerful tool to understand model behavior and quickly identify if the model is learning as expected.
*   The current model is highly effective for simple, fixed-length sequence inversion. Its simplicity and effectiveness on this toy problem make it a good foundation for understanding more complex seq2seq models.


## Finalizar Tarefa

### Subtask:
Resumir os resultados observados durante o treinamento com depura√ß√£o e discutir os pr√≥ximos passos, se houver.


## Summary:

### Q&A
1.  **How did the debug outputs confirm model learning?**
    The debug outputs, printed every 5 epochs, showed that the model's predictions (`Previsto`) quickly and accurately aligned with the `Esperado` (expected) reversed sequences for randomly generated input strings. By epoch 5, the model was already making perfect predictions for the displayed examples, indicating successful and rapid learning.

2.  **What was the impact of the changes made (SOS token and debug prints)?**
    The `SOS_TOKEN` (introduced in a previous step) was crucial for enabling the model to successfully learn the sequence inversion task. The debug prints visually confirmed this learning process by demonstrating perfect predictions for sampled sequences as early as epoch 5.

3.  **What are the next steps?**
    Given the model achieved 100% accuracy on this specific task, future work could involve exploring more complex sequence-to-sequence problems, such as:
    *   Inverting longer sequences.
    *   Handling a larger and more varied vocabulary.
    *   Implementing an EOS (End-Of-Sequence) token for variable-length output generation.
    *   Exploring more advanced architectures (e.g., Transformers).
    *   Applying these principles to real-world NLP tasks like machine translation or text summarization.

### Data Analysis Key Findings
*   The training loss consistently and rapidly decreased, reaching near-zero values by the end of the 20 epochs.
*   The debug examples shown during training (every 5 epochs) consistently matched the expected reversed sequences, demonstrating that the model effectively learned the underlying mapping and achieved perfect predictions for sampled inputs by epoch 5.
*   These results confirm that the previously implemented architectural and data preparation fixes, especially the inclusion of the Start-Of-Sequence (`SOS_TOKEN`), were critical and successful for the model to learn the task.

### Insights or Next Steps
*   Visual debugging during training, by printing examples of predictions, is a powerful tool to understand model behavior and quickly verify if the model is learning as expected.
*   The current model is highly effective for simple, fixed-length sequence inversion. Its success on this toy problem provides a strong foundation for understanding and tackling more complex sequence-to-sequence problems.
