## Baseline Seq2Seq para suma de enteros

Entrenamos un modelo Seq2Seq baseline (sin atención) sobre un data toy de suma de enteros. Esto nos servirá para luego poder comparar variantes de atención.

### Carga y generación del toy-dataset

In [11]:
import os
import sys
sys.path.append(os.path.abspath(".."))
from src.utils import generate_toy_data, load_toy_data, save_toy_data

In [12]:
PROJECT_ROOT = os.path.abspath(os.path.join(os.getcwd(), ".."))
DATA_PATH = os.path.join(PROJECT_ROOT, "data", "toy_seq2seq.json")

if os.path.exists(DATA_PATH):
    train_data, val_data = load_toy_data(DATA_PATH)
    print(f"Toy data cargado (train={len(train_data)}, val={len(val_data)})")
else:
    train_data, val_data = generate_toy_data()
    os.makedirs(os.path.dirname(DATA_PATH), exist_ok=True)
    save_toy_data(train_data, val_data, DATA_PATH)
    print(f"Toy data generado y guardado (train={len(train_data)}, val={len(val_data)})")

Toy data cargado (train=8000, val=2000)


In [13]:
print(f"Train data: {train_data[:5]}")
print(f"Val data: {val_data[:5]}")

Train data: [('33+91', '124'), ('88+91', '179'), ('24+32', '56'), ('61+41', '102'), ('63+82', '145')]
Val data: [('91+78', '169'), ('6+22', '28'), ('21+80', '101'), ('35+80', '115'), ('24+7', '31')]


### Entrenamiento y evaluación

In [52]:
import importlib
import src.seq2seq_baseline
importlib.reload(src.seq2seq_baseline)
from src.seq2seq_baseline import Seq2SeqBaseline

In [53]:
from src.utils import CharTokenizer
from src.seq2seq_baseline import Seq2SeqBaseline

tokenizer = CharTokenizer()
print(f"Vocab size: {tokenizer.vocab_size}, pad: {tokenizer.pad_token_id}, sos: {tokenizer.sos_token_id}, eos: {tokenizer.eos_token_id}")

Vocab size: 14, pad: 0, sos: 1, eos: 2


In [54]:
model = Seq2SeqBaseline(tokenizer, emb_size=32, hidden_size=64, lr=1e-3)
print(f"Modelo en dispositivo: {model.device}")

Modelo en dispositivo: cpu


In [58]:
# Entrenamiento
model.train(train_data, epochs=30, batch_size=32)

Epoch 1/30  Loss: 0.3101
Epoch 2/30  Loss: 0.3061
Epoch 3/30  Loss: 0.2991
Epoch 4/30  Loss: 0.2889
Epoch 5/30  Loss: 0.2818
Epoch 6/30  Loss: 0.2874
Epoch 7/30  Loss: 0.2694
Epoch 8/30  Loss: 0.2625
Epoch 9/30  Loss: 0.2721
Epoch 10/30  Loss: 0.2581
Epoch 11/30  Loss: 0.2566
Epoch 12/30  Loss: 0.2615
Epoch 13/30  Loss: 0.2531
Epoch 14/30  Loss: 0.2544
Epoch 15/30  Loss: 0.2303
Epoch 16/30  Loss: 0.2502
Epoch 17/30  Loss: 0.2350
Epoch 18/30  Loss: 0.2366
Epoch 19/30  Loss: 0.2184
Epoch 20/30  Loss: 0.2197
Epoch 21/30  Loss: 0.2100
Epoch 22/30  Loss: 0.2255
Epoch 23/30  Loss: 0.2319
Epoch 24/30  Loss: 0.2113
Epoch 25/30  Loss: 0.2215
Epoch 26/30  Loss: 0.2057
Epoch 27/30  Loss: 0.2082
Epoch 28/30  Loss: 0.1997
Epoch 29/30  Loss: 0.2060
Epoch 30/30  Loss: 0.2025


In [59]:
# Evaluación
model.evaluate(val_data)

Validation accuracy: 0.5115
