# Implementación de Transformers para Procesamiento de Lenguaje Natural (NLP)


### Objetivo
En esta evaluación, implementaremos un modelo basado en arquitecturas de Transformers para una tarea de procesamiento de lenguaje natural (NLP), utilizando el dataset **DailyDialog**. Este conjunto de datos de diálogos permite que el modelo practique en generación de texto y comprensión de contexto en interacciones cotidianas.

Usaremos TensorFlow para construir un modelo transformer básico con las siguientes características:
- **Encoder-Decoder**: para procesar la entrada y generar salida secuencial.
- **Atención Multi-cabezal**: para capturar dependencias a largo plazo en el diálogo.

Al final, evaluaremos el modelo utilizando métricas específicas de NLP, como BLEU o ROUGE.


## 1. Carga y Exploración del Dataset: DailyDialog

In [None]:
!wget https://raw.githubusercontent.com/JaznaLaProfe/Deep-Learning/main/data/dialog/train.csv
!wget https://raw.githubusercontent.com/JaznaLaProfe/Deep-Learning/main/data/dialog/test.csv
!wget https://raw.githubusercontent.com/JaznaLaProfe/Deep-Learning/main/data/dialog/validation.csv

--2025-06-23 08:22:21--  https://raw.githubusercontent.com/JaznaLaProfe/Deep-Learning/main/data/dialog/train.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 6233555 (5.9M) [text/plain]
Saving to: ‘train.csv.9’


2025-06-23 08:22:21 (104 MB/s) - ‘train.csv.9’ saved [6233555/6233555]

--2025-06-23 08:22:21--  https://raw.githubusercontent.com/JaznaLaProfe/Deep-Learning/main/data/dialog/test.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 561656 (548K) [text/plain]
Saving to: ‘test.csv.9’


2025-06-23 08:22:21 (14.2 MB/s) - ‘

In [None]:
import pandas as pd
import numpy as np
import tensorflow as tf
import ast
import re
import unicodedata
import keras_nlp
import matplotlib.pyplot as plt
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.preprocessing.text import text_to_word_sequence
from keras_nlp.tokenizers import BytePairTokenizer
from nltk.translate.bleu_score import sentence_bleu
from nltk.tokenize import sent_tokenize  # Más robusto que split simple
from tensorflow.keras.callbacks import EarlyStopping
from tensorflow.keras.layers import TextVectorization


In [None]:
print(keras_nlp.__version__)

0.18.1


In [None]:
pip install rouge-score



In [None]:
from nltk.translate.bleu_score import sentence_bleu, SmoothingFunction
# Si quieres usar ROUGE, instala primero: pip install rouge-score
try:
    from rouge_score import rouge_scorer
    rouge_available = True
except ImportError:
    rouge_available = False

In [None]:
train = pd.read_csv('train.csv')
test = pd.read_csv('test.csv')
validation = pd.read_csv('validation.csv')

In [None]:
train

Unnamed: 0,dialog,act,emotion
0,"['Say , Jim , how about going for a few beers ...",[3 4 2 2 2 3 4 1 3 4],[0 0 0 0 0 0 4 4 4 4]
1,"['Can you do push-ups ? '\n "" Of course I can ...",[2 1 2 2 1 1],[0 0 6 0 0 0]
2,"['Can you study with the radio on ? '\n ' No ,...",[2 1 2 1 1],[0 0 0 0 0]
3,['Are you all right ? '\n ' I will be all righ...,[2 1 1 1],[0 0 0 0]
4,"['Hey John , nice skates . Are they new ? '\n ...",[2 1 2 1 1 2 1 3 4],[0 0 0 0 0 6 0 6 0]
...,...,...,...
11113,"['Hello , I bought a pen in your shop just bef...",[1 1 1 2 3 2 1 4 1],[0 4 0 0 0 0 0 0 4]
11114,['Do you have any seats available ? ' ' Yes . ...,[2 1 2 1 3 4],[0 0 0 0 0 4]
11115,"['Uncle Ben , how did the Forbidden City get t...",[2 1 2 1 1 1 1 1 2 1 2 1 2 1 3 4],[0 0 6 0 6 0 0 0 0 0 0 0 0 0 4 0]
11116,"['May I help you , sir ? ' ' I want a pair of ...",[2 3 4 3],[0 0 0 0]


In [None]:
train['dialog'][0]

'[\'Say , Jim , how about going for a few beers after dinner ? \'\n \' You know that is tempting but is really not good for our fitness . \'\n \' What do you mean ? It will help us to relax . \'\n " Do you really think so ? I don\'t . It will just make us fat and act silly . Remember last time ? "\n " I guess you are right.But what shall we do ? I don\'t feel like sitting at home . "\n \' I suggest a walk over to the gym where we can play singsong and meet some of our friends . \'\n " That\'s a good idea . I hear Mary and Sally often go there to play pingpong.Perhaps we can make a foursome with them . "\n \' Sounds great to me ! If they are willing , we could ask them to go dancing with us.That is excellent exercise and fun , too . \'\n " Good.Let \' s go now . " \' All right . \']'

### 1. Normalización

In [None]:
def normalize_text(text):
    text = text.lower()
    text = unicodedata.normalize('NFKD', text).encode('ascii', 'ignore').decode('utf-8')
    text = re.sub(r'([.,;?!])', r' \1 ', text)
    text = re.sub(r'[^a-zA-Z0-9áéíóúñÁÉÍÓÚÑ .,;?!]', '', text)
    text = re.sub(r'\s+', ' ', text)
    return text.strip()

#### Train

In [None]:
texto_norm_train = [normalize_text(a) for a in train['dialog']]

In [None]:
texto_norm_train[0]

'say , jim , how about going for a few beers after dinner ? you know that is tempting but is really not good for our fitness . what do you mean ? it will help us to relax . do you really think so ? i dont . it will just make us fat and act silly . remember last time ? i guess you are right . but what shall we do ? i dont feel like sitting at home . i suggest a walk over to the gym where we can play singsong and meet some of our friends . thats a good idea . i hear mary and sally often go there to play pingpong . perhaps we can make a foursome with them . sounds great to me ! if they are willing , we could ask them to go dancing with us . that is excellent exercise and fun , too . good . let s go now . all right .'

In [None]:
texto_norm_train

['say , jim , how about going for a few beers after dinner ? you know that is tempting but is really not good for our fitness . what do you mean ? it will help us to relax . do you really think so ? i dont . it will just make us fat and act silly . remember last time ? i guess you are right . but what shall we do ? i dont feel like sitting at home . i suggest a walk over to the gym where we can play singsong and meet some of our friends . thats a good idea . i hear mary and sally often go there to play pingpong . perhaps we can make a foursome with them . sounds great to me ! if they are willing , we could ask them to go dancing with us . that is excellent exercise and fun , too . good . let s go now . all right .',
 'can you do pushups ? of course i can . its a piece of cake ! believe it or not , i can do 30 pushups a minute . really ? i think thats impossible ! you mean 30 pushups ? yeah ! its easy . if you do exercise everyday , you can make it , too .',
 'can you study with the rad

#### Validación

In [None]:
texto_norm_val = [normalize_text(a) for a in validation['dialog']]

In [None]:
texto_norm_val[0]

'good morning , sir . is there a bank near here ? there is one . 5 blocks away from here ? well , thats too far . can you change some money for me ? surely , of course . what kind of currency have you got ? rib . how much would you like to change ? 1000 yuan . here you are .'

#### Test

In [None]:
texto_norm_test = [normalize_text(a) for a in test['dialog']]

In [None]:
texto_norm_test[0]

'hey man , you wanna buy some weed ? some what ? weed ! you know ? pot , ganja , mary jane some chronic ! oh , umm , no thanks . i also have blow if you prefer to do a few lines . no , i am ok , really . come on man ! i even got dope and acid ! try some ! do you really have all of these drugs ? where do you get them from ? i got my connections ! just tell me what you want and i ll even give you one ounce for free . sounds good ! let s see , i want . yeah ? i want you to put your hands behind your head ! you are under arrest !'

### Agrupar en pares

In [None]:
def dialog_to_pairs(dialogs):
    inputs = []
    targets = []
    for dialog in dialogs:
        turns = [t.strip() for t in dialog.split('\n') if t.strip()]
        for i in range(len(turns) - 1):
            inputs.append(turns[i])
            targets.append(turns[i + 1])
    return inputs, targets

# Usa la columna original, no la normalizada
train_inputs_text, train_targets_text = dialog_to_pairs(train['dialog'])
val_inputs_text, val_targets_text = dialog_to_pairs(validation['dialog'])
test_inputs_text, test_targets_text = dialog_to_pairs(test['dialog'])

# Normaliza después de separar en pares
train_inputs_text = [normalize_text(t) for t in train_inputs_text]
train_targets_text = [normalize_text(t) for t in train_targets_text]
val_inputs_text = [normalize_text(t) for t in val_inputs_text]
val_targets_text = [normalize_text(t) for t in val_targets_text]
test_inputs_text = [normalize_text(t) for t in test_inputs_text]
test_targets_text = [normalize_text(t) for t in test_targets_text]

print(len(train_inputs_text), len(train_targets_text))
print(train_inputs_text[:3])
print(train_targets_text[:3])

64998 64998
['say , jim , how about going for a few beers after dinner ?', 'you know that is tempting but is really not good for our fitness .', 'what do you mean ? it will help us to relax .']
['you know that is tempting but is really not good for our fitness .', 'what do you mean ? it will help us to relax .', 'do you really think so ? i dont . it will just make us fat and act silly . remember last time ?']


### Tokenización

In [None]:
vectorizer = TextVectorization(
    max_tokens=8000,
    output_sequence_length=128,
    standardize=None  # Ya hicimos la normalización
)

In [None]:
vectorizer.adapt(train_inputs_text + train_targets_text)

In [None]:
print(len(train_inputs_text), len(train_targets_text))
print(train_inputs_text[:3])
print(train_targets_text[:3])

64998 64998
['say , jim , how about going for a few beers after dinner ?', 'you know that is tempting but is really not good for our fitness .', 'what do you mean ? it will help us to relax .']
['you know that is tempting but is really not good for our fitness .', 'what do you mean ? it will help us to relax .', 'do you really think so ? i dont . it will just make us fat and act silly . remember last time ?']


In [None]:
train_inputs = vectorizer(train_inputs_text).numpy()
train_targets = vectorizer(train_targets_text).numpy()
val_inputs = vectorizer(val_inputs_text).numpy()
val_targets = vectorizer(val_targets_text).numpy()

In [None]:
test_inputs = vectorizer(test_inputs_text).numpy()
test_targets = vectorizer(test_targets_text).numpy()

In [None]:
# 5. Ejemplo de tokenización
ejemplo = "habia una vez un señor llamado pedro y se fue corriendo"
encoded = vectorizer([ejemplo])
print(f"Texto original: {ejemplo}")
print(f"IDs tokenizados: {encoded.numpy()}")

Texto original: habia una vez un señor llamado pedro y se fue corriendo
IDs tokenizados: [[   1    1    1    1    1    1    1 6035    1    1    1    0    0    0
     0    0    0    0    0    0    0    0    0    0    0    0    0    0
     0    0    0    0    0    0    0    0    0    0    0    0    0    0
     0    0    0    0    0    0    0    0    0    0    0    0    0    0
     0    0    0    0    0    0    0    0    0    0    0    0    0    0
     0    0    0    0    0    0    0    0    0    0    0    0    0    0
     0    0    0    0    0    0    0    0    0    0    0    0    0    0
     0    0    0    0    0    0    0    0    0    0    0    0    0    0
     0    0    0    0    0    0    0    0    0    0    0    0    0    0
     0    0]]


### Attention masks

In [None]:
##### EN ESTE PASO DA ERROR POR SUPERAR LA MEMORIA DE LA GPU #####

# def create_padding_mask(inputs):
#     return tf.cast(inputs != 0, tf.float32)  # 0 es el ID del padding en TextVectorization

# def create_look_ahead_mask(seq_len):
#     return 1 - tf.linalg.band_part(tf.ones((seq_len, seq_len)), -1, 0)

# # Padding masks
# train_padding_mask = create_padding_mask(train_inputs)
# val_padding_mask = create_padding_mask(val_inputs)

# # Look-ahead (estático si tienes secuencia fija)
# seq_len = train_inputs.shape[1]
# look_ahead_mask = create_look_ahead_mask(seq_len)
# look_ahead_mask = tf.expand_dims(look_ahead_mask, axis=0)  # shape (1, L, L)
# look_ahead_mask = tf.repeat(look_ahead_mask, repeats=train_inputs.shape[0], axis=0)

### Padding

In [None]:
padded_train_inputs = pad_sequences(train_inputs, padding='post', maxlen=128)
padded_train_targets = pad_sequences(train_targets, padding='post', maxlen=128)
padded_val_inputs = pad_sequences(val_inputs, padding='post', maxlen=128)
padded_val_targets = pad_sequences(val_targets, padding='post', maxlen=128)

## 2. Implementación del Modelo Transformer

In [None]:
# def positional_encoding(seq_len, d_model):
#     pos = np.arange(seq_len)[:, np.newaxis]
#     i = np.arange(d_model)[np.newaxis, :]
#     angle_rates = 1 / np.power(10000, (2 * (i//2)) / np.float32(d_model))
#     angle_rads = pos * angle_rates

#     # aplicar sin en indices pares, cos en impares
#     pos_encoding = np.zeros(angle_rads.shape)
#     pos_encoding[:, 0::2] = np.sin(angle_rads[:, 0::2])
#     pos_encoding[:, 1::2] = np.cos(angle_rads[:, 1::2])

#     pos_encoding = pos_encoding[np.newaxis, ...]
#     return tf.cast(pos_encoding, dtype=tf.float32)

### Encoder

In [None]:
def transformer_encoder(embed_dim, num_heads, ff_dim):
    inputs = tf.keras.Input(shape=(None, embed_dim))
    attention = tf.keras.layers.MultiHeadAttention(num_heads=num_heads, key_dim=embed_dim)(inputs, inputs)
    attention = tf.keras.layers.Dropout(0.4)(attention)
    attention = tf.keras.layers.LayerNormalization(epsilon=1e-6)(attention + inputs)
    ffn = tf.keras.layers.Dense(ff_dim, activation='relu')(attention)
    ffn = tf.keras.layers.Dense(embed_dim)(ffn)
    outputs = tf.keras.layers.LayerNormalization(epsilon=1e-6)(ffn + attention)
    return tf.keras.Model(inputs=inputs, outputs=outputs)


### Decoder

In [None]:
def transformer_decoder(embed_dim, num_heads, ff_dim, dropout=0.1):
    target_seq = tf.keras.Input(shape=(None, embed_dim), name="target_seq")
    enc_output = tf.keras.Input(shape=(None, embed_dim), name="enc_output")
    attn1 = tf.keras.layers.MultiHeadAttention(num_heads=num_heads, key_dim=embed_dim)(
        target_seq, target_seq, use_causal_mask=True)
    attn1 = tf.keras.layers.Dropout(dropout)(attn1)
    out1 = tf.keras.layers.LayerNormalization(epsilon=1e-6)(attn1 + target_seq)
    attn2 = tf.keras.layers.MultiHeadAttention(num_heads=num_heads, key_dim=embed_dim)(
        out1, enc_output)
    attn2 = tf.keras.layers.Dropout(dropout)(attn2)
    out2 = tf.keras.layers.LayerNormalization(epsilon=1e-6)(attn2 + out1)
    ff = tf.keras.layers.Dense(ff_dim, activation='relu')(out2)
    ff = tf.keras.layers.Dense(embed_dim)(ff)
    out3 = tf.keras.layers.LayerNormalization(epsilon=1e-6)(ff + out2)
    return tf.keras.Model(inputs=[target_seq, enc_output], outputs=out3)

In [None]:
def create_decoder_io(sequences, bos_token_id=1, seq_len=128):
    decoder_inputs = []
    decoder_targets = []
    for seq in sequences:
        seq = list(seq)
        # recorta o rellena a seq_len-1
        seq = seq[:seq_len-1] + [0]*(seq_len-1-len(seq))
        dec_in = [bos_token_id] + seq  # [BOS] + seq[:-1]
        dec_in = dec_in[:seq_len]
        dec_tgt = seq + [0]            # seq + [PAD]
        dec_tgt = dec_tgt[:seq_len]
        decoder_inputs.append(dec_in)
        decoder_targets.append(dec_tgt)
    return np.array(decoder_inputs), np.array(decoder_targets)

In [None]:
# Parámetros
vocab_size = 8000
embed_dim = 128
num_heads = 8
ff_dim = 512
sequence_length = 128

# Embedding compartido
embedding_layer = tf.keras.layers.Embedding(input_dim=vocab_size, output_dim=embed_dim, mask_zero=True)

# Inputs
encoder_inputs = tf.keras.Input(shape=(sequence_length,), name="encoder_inputs")
decoder_inputs = tf.keras.Input(shape=(sequence_length,), name="decoder_inputs")

# Embedding
enc_emb = embedding_layer(encoder_inputs)
dec_emb = embedding_layer(decoder_inputs)

# Encoder y Decoder
encoder = transformer_encoder(embed_dim, num_heads, ff_dim)
decoder = transformer_decoder(embed_dim, num_heads, ff_dim)

enc_out = encoder(enc_emb)
dec_out = decoder([dec_emb, enc_out])

# Output final
outputs = tf.keras.layers.Dense(vocab_size)(dec_out)



In [None]:
# Modelo completo
model = tf.keras.Model(inputs=[encoder_inputs, decoder_inputs], outputs=outputs)
model.compile(
    optimizer='adam',
    loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    metrics=['accuracy']
)

In [None]:
decoder_inputs_train, decoder_targets_train = create_decoder_io(train_targets, seq_len=128)
decoder_inputs_val, decoder_targets_val = create_decoder_io(val_targets, seq_len=128)

In [None]:
batch_size = 32

train_dataset = tf.data.Dataset.from_tensor_slices((
    {
        'encoder_inputs': train_inputs,
        'decoder_inputs': decoder_inputs_train
    },
    decoder_targets_train
)).shuffle(1000).batch(batch_size).prefetch(tf.data.AUTOTUNE)

val_dataset = tf.data.Dataset.from_tensor_slices((
    {
        'encoder_inputs': val_inputs,
        'decoder_inputs': decoder_inputs_val
    },
    decoder_targets_val
)).batch(batch_size).prefetch(tf.data.AUTOTUNE)

## 3. Entrenamiento del Modelo

In [None]:
early_stop = EarlyStopping(monitor='val_loss', patience=3, restore_best_weights=True)

In [None]:
model.fit(train_dataset, validation_data=val_dataset, epochs=10)

Epoch 1/10
[1m2032/2032[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m117s[0m 51ms/step - accuracy: 0.8912 - loss: 0.9880 - val_accuracy: 0.9060 - val_loss: 0.5176
Epoch 2/10
[1m2032/2032[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m93s[0m 46ms/step - accuracy: 0.9031 - loss: 0.5321 - val_accuracy: 0.9081 - val_loss: 0.4926
Epoch 3/10
[1m2032/2032[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m93s[0m 46ms/step - accuracy: 0.9055 - loss: 0.4966 - val_accuracy: 0.9092 - val_loss: 0.4815
Epoch 4/10
[1m2032/2032[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m95s[0m 47ms/step - accuracy: 0.9073 - loss: 0.4714 - val_accuracy: 0.9096 - val_loss: 0.4770
Epoch 5/10
[1m2032/2032[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m96s[0m 47ms/step - accuracy: 0.9092 - loss: 0.4508 - val_accuracy: 0.9100 - val_loss: 0.4756
Epoch 6/10
[1m2032/2032[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m143s[0m 48ms/step - accuracy: 0.9109 - loss: 0.4342 - val_accuracy: 0.9103 - val_loss: 0.4766
Ep

<keras.src.callbacks.history.History at 0x781cb1b24cd0>

In [None]:
def decode_sequence(input_text, model, vectorizer, max_len_target=82, bos_token=1, eos_token=2, seq_len=128):
    # Tokeniza la pregunta
    input_ids = vectorizer([input_text]).numpy()[0]
    encoder_input = tf.expand_dims(input_ids, 0)  # (1, seq_len)
    decoder_input = [bos_token]
    for _ in range(max_len_target):
        # Pad decoder_input a longitud seq_len
        dec_input_padded = decoder_input + [0] * (seq_len - len(decoder_input))
        dec_input_tensor = tf.expand_dims(dec_input_padded, 0)  # (1, seq_len)
        predictions = model([encoder_input, dec_input_tensor])
        predicted_id = tf.argmax(predictions[0, len(decoder_input)-1]).numpy()
        if predicted_id == eos_token or len(decoder_input) >= seq_len:
            break
        decoder_input.append(predicted_id)
    # Decodifica tokens a texto
    vocab = vectorizer.get_vocabulary()
    respuesta = " ".join([vocab[tok] for tok in decoder_input[1:] if tok < len(vocab) and tok != 0])
    return respuesta

In [None]:
response = decode_sequence("how are you?", model, vectorizer, max_len_target=82, seq_len=128)
print("Respuesta del modelo:", response)

Respuesta del modelo: well , i think its a bit chilly for the days , but i have got it


In [None]:
response = decode_sequence("hello", model, vectorizer, max_len_target=82, seq_len=128)
print("Respuesta del modelo:", response)

Respuesta del modelo: hello , mr


In [None]:
response = decode_sequence("how old are you?", model, vectorizer, max_len_target=82, seq_len=128)
print("Respuesta del modelo:", response)

Respuesta del modelo: how many are you going to be ?


In [None]:
response = decode_sequence("where are you from?", model, vectorizer, max_len_target=82, seq_len=128)
print("Respuesta del modelo:", response)

Respuesta del modelo: i am in the cabinet


In [None]:
response = decode_sequence("what are you going to do?", model, vectorizer, max_len_target=82, seq_len=128)
print("Respuesta del modelo:", response)

Respuesta del modelo: i am going to be able to get a movie about the dentist


In [None]:
response = decode_sequence("what is his name?", model, vectorizer, max_len_target=82, seq_len=128)
print("Respuesta del modelo:", response)

Respuesta del modelo: he is a [UNK]


In [None]:
response = decode_sequence("what is her name?", model, vectorizer, max_len_target=82, seq_len=128)
print("Respuesta del modelo:", response)

Respuesta del modelo: she is a [UNK] , [UNK]


## 4. Evaluación del Modelo

In [None]:
# Generar predicciones para las primeras 100 muestras del test
num_samples = 100
bleu_scores = []
rouge_scores = []

for i in range(num_samples):
    input_text = test_inputs_text[i]
    reference = test_targets_text[i]
    prediction = decode_sequence(input_text, model, vectorizer, max_len_target=82, seq_len=128)
    reference_tokens = reference.split()
    prediction_tokens = prediction.split()
    bleu = sentence_bleu([reference_tokens], prediction_tokens, smoothing_function=SmoothingFunction().method1)
    bleu_scores.append(bleu)
    if rouge_available:
        scorer = rouge_scorer.RougeScorer(['rouge1', 'rougeL'], use_stemmer=True)
        scores = scorer.score(reference, prediction)
        rouge_scores.append(scores)

print(f"BLEU promedio en {num_samples} muestras: {np.mean(bleu_scores):.4f}")
if rouge_available:
    avg_rouge1 = np.mean([s['rouge1'].fmeasure for s in rouge_scores])
    avg_rougeL = np.mean([s['rougeL'].fmeasure for s in rouge_scores])
    print(f"ROUGE-1 promedio: {avg_rouge1:.4f}")
    print(f"ROUGE-L promedio: {avg_rougeL:.4f}")
else:
    print("ROUGE no disponible. Instala con: pip install rouge-score")

BLEU promedio en 100 muestras: 0.0084
ROUGE-1 promedio: 0.0915
ROUGE-L promedio: 0.0838


## 5. Ajuste de Hiperparámetros

In [None]:
# --- Función para construir el modelo Transformer ---
def build_transformer_model(
    vocab_size=8000,
    embed_dim=128,
    num_heads=8,
    ff_dim=512,
    sequence_length=128,
    dropout=0.1
):
    # Embedding compartido
    embedding_layer = tf.keras.layers.Embedding(input_dim=vocab_size, output_dim=embed_dim, mask_zero=True)

    # Inputs
    encoder_inputs = tf.keras.Input(shape=(sequence_length,), name="encoder_inputs")
    decoder_inputs = tf.keras.Input(shape=(sequence_length,), name="decoder_inputs")

    # Embedding
    enc_emb = embedding_layer(encoder_inputs)
    dec_emb = embedding_layer(decoder_inputs)

    # Encoder y Decoder
    encoder = transformer_encoder(embed_dim, num_heads, ff_dim)
    decoder = transformer_decoder(embed_dim, num_heads, ff_dim, dropout=dropout)

    enc_out = encoder(enc_emb)
    dec_out = decoder([dec_emb, enc_out])

    # Output final
    outputs = tf.keras.layers.Dense(vocab_size)(dec_out)

    model = tf.keras.Model(inputs=[encoder_inputs, decoder_inputs], outputs=outputs)
    model.compile(
        optimizer='adam',
        loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
        metrics=['accuracy']
    )
    return model


In [None]:
decoder_inputs_test, decoder_targets_test = create_decoder_io(test_targets, seq_len=128)

In [None]:
def calcular_bleu(model, X, y_in, y_out):
    # Calcula BLEU promedio para el set de test
    scores = []
    for i in range(len(X)):
        input_seq = X[i]
        target_seq = y_out[i]
        # Decodifica la predicción (ajusta según tu función de decodificación)
        pred = decode_sequence(" ".join([str(tok) for tok in input_seq if tok != 0]), model, vectorizer, seq_len=MAXLEN)
        reference = " ".join([str(tok) for tok in target_seq if tok != 0])
        pred_tokens = pred.split()
        ref_tokens = reference.split()
        score = sentence_bleu([ref_tokens], pred_tokens, smoothing_function=SmoothingFunction().method1)
        scores.append(score)
    return np.mean(scores)

def calcular_rouge(model, X, y_in, y_out):
    # Calcula ROUGE promedio para el set de test
    if not rouge_available:
        return None
    scores = []
    for i in range(len(X)):
        input_seq = X[i]
        target_seq = y_out[i]
        pred = decode_sequence(" ".join([str(tok) for tok in input_seq if tok != 0]), model, vectorizer, seq_len=MAXLEN)
        reference = " ".join([str(tok) for tok in target_seq if tok != 0])
        scorer = rouge_scorer.RougeScorer(['rouge1', 'rougeL'], use_stemmer=True)
        score = scorer.score(reference, pred)
        scores.append(score['rougeL'].fmeasure)
    return np.mean(scores)

In [None]:
EPOCHS = 10
VOCAB_SIZE = 8000
EMBED_DIM = 128
BATCH_SIZE = 32
MAXLEN = 128


experiments = [
    {"num_heads": 2, "ff_dim": 64},
    {"num_heads": 4, "ff_dim": 128},
    {"num_heads": 8, "ff_dim": 512}
]

results = []

for config in experiments:
    print(f"\n🔧 Entrenando con configuración: {config}")

    model = build_transformer_model(
        vocab_size=VOCAB_SIZE,
        embed_dim=EMBED_DIM,
        num_heads=config["num_heads"],
        ff_dim=config["ff_dim"],
        sequence_length=MAXLEN,
        dropout=0.1
    )

    history = model.fit(
        x={
            "encoder_inputs": train_inputs,
            "decoder_inputs": decoder_inputs_train
        },
        y=decoder_targets_train,
        validation_data=(
            {
                "encoder_inputs": val_inputs,
                "decoder_inputs": decoder_inputs_val
            },
            decoder_targets_val
        ),
        epochs=EPOCHS,
        batch_size=BATCH_SIZE,
        callbacks=[early_stop],
        verbose=2
    )

    print("📈 Evaluando en test...")
    # Usamos los primeros 100 ejemplos del test para acelerar
    bleu_score = calcular_bleu(model, test_inputs[:100], decoder_inputs_test[:100], decoder_targets_test[:100])
    rouge_score = calcular_rouge(model, test_inputs[:100], decoder_inputs_test[:100], decoder_targets_test[:100]) if rouge_available else None

    print(f"✅ BLEU: {bleu_score:.4f}")
    if rouge_score is not None:
        print(f"✅ ROUGE-L: {rouge_score:.4f}")

    results.append({
        "num_heads": config["num_heads"],
        "ff_dim": config["ff_dim"],
        "bleu": bleu_score,
        "rougeL": rouge_score
    })

# Mostrar resultados
results_df = pd.DataFrame(results)
display(results_df)



🔧 Entrenando con configuración: {'num_heads': 2, 'ff_dim': 64}




Epoch 1/10
2032/2032 - 117s - 57ms/step - accuracy: 0.8992 - loss: 0.6528 - val_accuracy: 0.9053 - val_loss: 0.5258
Epoch 2/10
2032/2032 - 43s - 21ms/step - accuracy: 0.9042 - loss: 0.5186 - val_accuracy: 0.9071 - val_loss: 0.4995
Epoch 3/10
2032/2032 - 44s - 22ms/step - accuracy: 0.9062 - loss: 0.4878 - val_accuracy: 0.9078 - val_loss: 0.4882
Epoch 4/10
2032/2032 - 43s - 21ms/step - accuracy: 0.9076 - loss: 0.4664 - val_accuracy: 0.9086 - val_loss: 0.4816
Epoch 5/10
2032/2032 - 44s - 22ms/step - accuracy: 0.9091 - loss: 0.4494 - val_accuracy: 0.9089 - val_loss: 0.4790
Epoch 6/10
2032/2032 - 47s - 23ms/step - accuracy: 0.9105 - loss: 0.4355 - val_accuracy: 0.9094 - val_loss: 0.4771
Epoch 7/10
2032/2032 - 82s - 40ms/step - accuracy: 0.9117 - loss: 0.4240 - val_accuracy: 0.9097 - val_loss: 0.4769
Epoch 8/10
2032/2032 - 81s - 40ms/step - accuracy: 0.9129 - loss: 0.4143 - val_accuracy: 0.9099 - val_loss: 0.4782
Epoch 9/10
2032/2032 - 45s - 22ms/step - accuracy: 0.9139 - loss: 0.4057 - val_



Epoch 1/10
2032/2032 - 86s - 43ms/step - accuracy: 0.8994 - loss: 0.6479 - val_accuracy: 0.9055 - val_loss: 0.5237
Epoch 2/10
2032/2032 - 66s - 32ms/step - accuracy: 0.9047 - loss: 0.5156 - val_accuracy: 0.9075 - val_loss: 0.4972
Epoch 3/10
2032/2032 - 63s - 31ms/step - accuracy: 0.9067 - loss: 0.4847 - val_accuracy: 0.9084 - val_loss: 0.4839
Epoch 4/10
2032/2032 - 82s - 40ms/step - accuracy: 0.9083 - loss: 0.4626 - val_accuracy: 0.9090 - val_loss: 0.4777
Epoch 5/10
2032/2032 - 83s - 41ms/step - accuracy: 0.9099 - loss: 0.4451 - val_accuracy: 0.9097 - val_loss: 0.4749
Epoch 6/10


In [None]:
if 'history' in locals() and history is not None:
    plt.figure(figsize=(12, 6))

    # Plot loss
    plt.subplot(1, 2, 1)
    plt.plot(history.history['loss'], label='Train Loss')
    plt.plot(history.history['val_loss'], label='Validation Loss')
    plt.title('Loss vs. Epochs')
    plt.xlabel('Epochs')
    plt.ylabel('Loss')
    plt.legend()
    plt.grid(True)

    # Plot accuracy
    plt.subplot(1, 2, 2)
    plt.plot(history.history['accuracy'], label='Train Accuracy')
    plt.plot(history.history['val_accuracy'], label='Validation Accuracy')
    plt.title('Accuracy vs. Epochs')
    plt.xlabel('Epochs')
    plt.ylabel('Accuracy')
    plt.legend()
    plt.grid(True)

    plt.tight_layout()
    plt.show()

# Plot evaluation metrics for different configurations from the experiments
if 'results_df' in locals() and not results_df.empty:
    results_df['config_label'] = results_df.apply(lambda row: f"H:{int(row['num_heads'])}, FF:{int(row['ff_dim'])}", axis=1)

    plt.figure(figsize=(10, 6))

    # Plot BLEU scores
    plt.plot(results_df['config_label'], results_df['bleu'], marker='o', linestyle='-', label='BLEU Score')

    # Plot ROUGE-L scores if available
    if 'rougeL' in results_df.columns and results_df['rougeL'].isnull().sum() < len(results_df):
         plt.plot(results_df['config_label'], results_df['rougeL'], marker='x', linestyle='--', label='ROUGE-L Score')

    plt.title('Evaluation Metrics by Transformer Configuration')
    plt.xlabel('Configuration (Num Heads, FF Dim)')
    plt.ylabel('Score')
    plt.ylim(0, 1) # Scores are typically between 0 and 1
    plt.legend()
    plt.grid(True)
    plt.xticks(rotation=45, ha='right') # Rotate labels for readability
    plt.tight_layout()
    plt.show()

else:
    print("No training history or experiment results found to plot.")


## 6. Presentación de Resultados y Conclusiones


En esta sección, resumiremos los resultados obtenidos, mostrando cómo los ajustes de los hiperparámetros impactaron en el rendimiento del modelo.
- **Resultados Finales**: Comparación de BLEU, ROUGE, y otras métricas para cada configuración.
- **Conclusiones**: Reflexión sobre el proceso, dificultades encontradas y aprendizajes obtenidos.

¡Gracias por revisar nuestro proyecto! Esperamos que esta implementación demuestre nuestro dominio en el uso de transformers para NLP.
