# HW04 - NLP
## Punto III

You are going to build a classifier to identify the most likely author for a set of input lines of text (I suggest utilizing text segments comprising 150 to 250 words). It is a multinomial classification task (3 classes).
- Describe how you prepare the dataset. Create the training, validation, and testing sets. Make a summary table with the dimensions (number of samples) by class for each one of the previous data sets.
- Define three feed-forward (dense) neural network architectures in Keras that make use of the previously built embeddings.
    - Explain the dimensions of each layer of each architecture (model summary).
- Describe the results of combining the 3 architectures with the 3 types of embeddings in terms of accuracy, precision and recall in tests set

In [1]:
import pandas as pd
import os
import re
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
import tensorflow as tf
from tensorflow.keras import layers
from tensorflow.keras import losses
from tensorflow.keras import preprocessing

2025-10-18 18:04:00.266279: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
2025-10-18 18:04:01.127831: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2025-10-18 18:04:04.002132: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.


Reutilización de código del punto 1 para procesar los textos. Se agrega la función create_text_samples para generar muestras de texto de longitud variable.

In [2]:
def serialize_text(f) -> list[str]:
    begun = False
    full_text = []
    paragraph = ""

    for base_line in f:
        line = base_line.strip()

        if len(line) == 0:
            if len(paragraph) > 0:
                full_text.append(paragraph.strip())
                paragraph = ""
            continue

        if line.startswith("*** START OF THE PROJECT GUTENBERG EBOOK"):
            begun = True
            continue

        if line.startswith("*** END OF THE PROJECT GUTENBERG EBOOK"):
            break

        if begun:
            paragraph += line + " "

    return full_text

def create_text_samples(text, min_words=150, max_words=250) -> list[str]:
    """Crea segmentos de texto entre min_words y max_words palabras
    input: text (str): texto completo
           min_words (int): mínimo de palabras por muestra
           max_words (int): máximo de palabras por muestra
    output: list of str: lista de muestras de texto
    """
    samples = []
    words = text.split()

    i = 0
    while i < len(words):
        # Tomar un segmento aleatorio entre min_words y max_words
        sample_size = np.random.randint(min_words, max_words + 1)
        if i + sample_size <= len(words):
            sample = ' '.join(words[i:i+sample_size])
            samples.append(sample)
            i += sample_size
        else:
            # Último segmento si queda texto
            if len(words) - i >= min_words:
                sample = ' '.join(words[i:])
                samples.append(sample)
            break

    return samples

Cargamos los datos y preprocesamos los textos usando las funciones definidas anteriormente. Se asocia cada texto con su autor y libro correspondiente.

In [3]:
base_path = "./books"
books = os.listdir(base_path)

data = []
author_mapping = {
    'arthur': 'Arthur Conan Doyle',
    'lewis': 'Lewis Carroll',
    'shakespear': 'William Shakespeare'
}

for book in books:
    # Identificar autor
    author_key = book.split('-')[0]
    author = author_mapping[author_key]

    path = os.path.join(base_path, book)
    with open(path, encoding="utf-8") as f:
        paragraphs = serialize_text(f)
        full_text = ' '.join(paragraphs)

        # Crear muestras de 150-250 palabras
        samples = create_text_samples(full_text, min_words=150, max_words=250)

        for sample in samples:
            data.append({
                'text': sample,
                'author': author,
                'book': book.replace('.txt', '')
            })

Creamos el DataFrame con las muestras generadas, mapeamos los autores a IDs numéricos, y dividimos el dataset en conjuntos de entrenamiento, validación y prueba. Finalmente, generamos una tabla resumen con la distribución de muestras por autor en cada conjunto.

In [4]:
df = pd.DataFrame(data)

# Mapear autores a números
author_to_id = {author: idx for idx, author in enumerate(df['author'].unique())}
df['author_id'] = df['author'].map(author_to_id)

print(f"Total de muestras: {len(df)}")
print(f"\nDistribución por autor:")
print(df['author'].value_counts())
print(f"\nDistribución por libro:")
print(df['book'].value_counts())

# Dividir en train, validation, test (70%, 15%, 15%)
train_df, temp_df = train_test_split(df, test_size=0.3, stratify=df['author_id'], random_state=42)
val_df, test_df = train_test_split(temp_df, test_size=0.5, stratify=temp_df['author_id'], random_state=42)

print(f"\n=== DATASET SPLITS ===")
print(f"Train: {len(train_df)} samples")
print(f"Validation: {len(val_df)} samples")
print(f"Test: {len(test_df)} samples")

# Crear tabla resumen
summary_data = []
for dataset_name, dataset in [('Train', train_df), ('Validation', val_df), ('Test', test_df)]:
    for author in df['author'].unique():
        count = len(dataset[dataset['author'] == author])
        summary_data.append({
            'Dataset': dataset_name,
            'Author': author,
            'Samples': count
        })

summary_df = pd.DataFrame(summary_data)
summary_pivot = summary_df.pivot(index='Author', columns='Dataset', values='Samples')
summary_pivot['Total'] = summary_pivot.sum(axis=1)

print("\n=== SUMMARY TABLE ===")
print(summary_pivot)
print(f"\nTotal samples: {summary_pivot['Total'].sum()}")

Total de muestras: 1794

Distribución por autor:
author
Arthur Conan Doyle     1073
William Shakespeare     412
Lewis Carroll           309
Name: count, dtype: int64

Distribución por libro:
book
arthur-return-sherlock      562
arthur-hound-baskerville    296
arthur-the-sign-of-four     215
shakespear-hamlet           160
lewis-glass                 148
shakespear-king-henry       134
lewis-alice-wonderland      132
shakespear-the-temptest     118
lewis-hunting                29
Name: count, dtype: int64

=== DATASET SPLITS ===
Train: 1255 samples
Validation: 269 samples
Test: 270 samples

=== SUMMARY TABLE ===
Dataset              Test  Train  Validation  Total
Author                                             
Arthur Conan Doyle    161    751         161   1073
Lewis Carroll          47    216          46    309
William Shakespeare    62    288          62    412

Total samples: 1794


Reutilizamos el mismo preprocesamiento de textos del punto 1, con los cuales se generaron los embeddings pre-entrenados. Este paso es necesario antes de usar el tokenizador de Keras. Si no se hace este preprocesamiento, los embeddings pre-entrenados no serán efectivos.

In [5]:
def tokenize(text: str):
    processed = text.lower()  # Solo minúsculas
    processed = re.sub(r'[^a-z\s\']', ' ', processed)  # Mantener letras y apóstrofes
    processed = re.sub(r'\s+', ' ', processed).strip()  # Normalizar espacios
    tokens = processed.split()
    tokens = [token for token in tokens if len(token) > 1]  # Eliminar tokens de 1 letra
    return tokens 

train_df['text_processed'] = train_df['text'].apply(tokenize)
val_df['text_processed'] = val_df['text'].apply(tokenize)
test_df['text_processed'] = test_df['text'].apply(tokenize)

In [6]:
# Cargar los embeddings pre-entrenados
from gensim.models import Word2Vec

w2v_models = {}
vector_sizes = [128, 512, 1024]

for vector_size in vector_sizes:
    path = f"./vectors/Books_{vector_size}_001.model"
    w2v_models[vector_size] = Word2Vec.load(path)
    print(f"Loaded model with vector size {vector_size}: vocab size = {len(w2v_models[vector_size].wv)}")

Loaded model with vector size 128: vocab size = 5148
Loaded model with vector size 512: vocab size = 5148
Loaded model with vector size 1024: vocab size = 5148


In [8]:
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences

# Convertir listas de tokens de vuelta a strings
train_texts = [' '.join(tokens) for tokens in train_df['text_processed']]
val_texts = [' '.join(tokens) for tokens in val_df['text_processed']]
test_texts = [' '.join(tokens) for tokens in test_df['text_processed']]

# USAR EL VOCABULARIO DE WORD2VEC DIRECTAMENTE
w2v_vocab = set(w2v_models[128].wv.key_to_index.keys())

print(f"Word2Vec vocabulary size: {len(w2v_vocab)}")

# Filtrar textos para incluir SOLO palabras conocidas por Word2Vec
def filter_by_w2v_vocab(tokens, vocab):
    return [token for token in tokens if token in vocab]

train_texts_filtered = [' '.join(filter_by_w2v_vocab(tokens, w2v_vocab)) 
                        for tokens in train_df['text_processed']]
val_texts_filtered = [' '.join(filter_by_w2v_vocab(tokens, w2v_vocab)) 
                      for tokens in val_df['text_processed']]
test_texts_filtered = [' '.join(filter_by_w2v_vocab(tokens, w2v_vocab)) 
                       for tokens in test_df['text_processed']]

# Crear tokenizer
tokenizer = Tokenizer(oov_token='<OOV>')
tokenizer.fit_on_texts(train_texts_filtered)

max_len = 250

# Convertir textos a secuencias
X_train = tokenizer.texts_to_sequences(train_texts_filtered)
X_val = tokenizer.texts_to_sequences(val_texts_filtered)
X_test = tokenizer.texts_to_sequences(test_texts_filtered)

# Padding
X_train = pad_sequences(X_train, maxlen=max_len, padding='post', truncating='post')
X_val = pad_sequences(X_val, maxlen=max_len, padding='post', truncating='post')
X_test = pad_sequences(X_test, maxlen=max_len, padding='post', truncating='post')

# Labels
y_train = train_df['author_id'].values
y_val = val_df['author_id'].values
y_test = test_df['author_id'].values

print(f"\nX_train shape: {X_train.shape}")
print(f"X_val shape: {X_val.shape}")
print(f"X_test shape: {X_test.shape}")
print(f"Vocabulary size (aligned with W2V): {len(tokenizer.word_index)}")
print(f"Max sequence length: {max_len}")

# Verificar alineación
tokenizer_vocab = set(tokenizer.word_index.keys())
overlap = w2v_vocab.intersection(tokenizer_vocab)
print(f"\nWords in both vocabularies: {len(overlap)} / {len(tokenizer_vocab)}")
print(f"Coverage: {100 * len(overlap) / len(tokenizer_vocab):.1f}%")

Word2Vec vocabulary size: 5148

X_train shape: (1255, 250)
X_val shape: (269, 250)
X_test shape: (270, 250)
Vocabulary size (aligned with W2V): 5144
Max sequence length: 250

Words in both vocabularies: 5143 / 5144
Coverage: 100.0%


In [9]:
def create_embedding_matrix(tokenizer, w2v_model, embedding_dim):
    vocab_size = len(tokenizer.word_index) + 1 
    embedding_matrix = np.zeros((vocab_size, embedding_dim))
    
    found_words = 0
    for word, i in tokenizer.word_index.items():
        if word in w2v_model.wv:
            embedding_matrix[i] = w2v_model.wv[word]
            found_words += 1
        else:
            # Esto NO debería pasar si filtraste bien
            embedding_matrix[i] = np.random.normal(0, 0.1, embedding_dim)
    
    print(f"  Found {found_words}/{len(tokenizer.word_index)} words in Word2Vec ({100*found_words/len(tokenizer.word_index):.1f}%)")
    return embedding_matrix

embedding_matrices = {}
for size in vector_sizes:
    print(f"\nCreating embedding matrix for size {size}:")
    embedding_matrices[size] = create_embedding_matrix(tokenizer, w2v_models[size], size)
    print(f"  Shape: {embedding_matrices[size].shape}")


Creating embedding matrix for size 128:
  Found 5143/5144 words in Word2Vec (100.0%)
  Shape: (5145, 128)

Creating embedding matrix for size 512:
  Found 5143/5144 words in Word2Vec (100.0%)
  Shape: (5145, 512)

Creating embedding matrix for size 1024:
  Found 5143/5144 words in Word2Vec (100.0%)
  Shape: (5145, 1024)


In [10]:
# ARQUITECTURA 1: Shallow Network (simple)
def create_architecture_1(embedding_matrix, embedding_dim):
    model = tf.keras.Sequential([
        layers.Embedding(
            input_dim=embedding_matrix.shape[0],
            output_dim=embedding_dim,
            weights=[embedding_matrix],
            input_length=max_len,
            trainable=False  # Embeddings congelados
        ),
        layers.GlobalAveragePooling1D(),  # Promedio de embeddings
        layers.Dense(64, activation='relu'),
        layers.Dropout(0.3),
        layers.Dense(3, activation='softmax')
    ])
    return model

# ARQUITECTURA 2: Medium Network (profundidad media)
def create_architecture_2(embedding_matrix, embedding_dim):
    model = tf.keras.Sequential([
        layers.Embedding(
            input_dim=embedding_matrix.shape[0],
            output_dim=embedding_dim,
            weights=[embedding_matrix],
            input_length=max_len,
            trainable=False
        ),
        layers.GlobalAveragePooling1D(),
        layers.Dense(256, activation='relu'),
        layers.Dropout(0.4),
        layers.Dense(128, activation='relu'),
        layers.Dropout(0.3),
        layers.Dense(3, activation='softmax')
    ])
    return model

# ARQUITECTURA 3: Deep Network (más profunda)
def create_architecture_3(embedding_matrix, embedding_dim):
    model = tf.keras.Sequential([
        layers.Embedding(
            input_dim=embedding_matrix.shape[0],
            output_dim=embedding_dim,
            weights=[embedding_matrix],
            input_length=max_len,
            trainable=False
        ),
        layers.GlobalAveragePooling1D(),
        layers.Dense(512, activation='relu'),
        layers.BatchNormalization(),
        layers.Dropout(0.5),
        layers.Dense(256, activation='relu'),
        layers.BatchNormalization(),
        layers.Dropout(0.4),
        layers.Dense(128, activation='relu'),
        layers.Dropout(0.3),
        layers.Dense(3, activation='softmax')
    ])
    return model

In [12]:
# Mostrar arquitecturas con cada tamaño de embedding
print("=" * 80)
print("ARCHITECTURE SUMMARIES")
print("=" * 80)

for size in vector_sizes:
    print(f"\n{'='*80}")
    print(f"EMBEDDINGS: {size} dimensions")
    print(f"{'='*80}")
    
    print(f"\n--- Architecture 1: Shallow Network ---")
    model_1 = create_architecture_1(embedding_matrices[size], size)
    model_1.build(input_shape=(None, max_len))
    model_1.summary()
    
    print(f"\n--- Architecture 2: Medium Network ---")
    model_2 = create_architecture_2(embedding_matrices[size], size)
    model_2.build(input_shape=(None, max_len))
    model_2.summary()
    
    print(f"\n--- Architecture 3: Deep Network ---")
    model_3 = create_architecture_3(embedding_matrices[size], size)
    model_3.build(input_shape=(None, max_len))
    model_3.summary()

ARCHITECTURE SUMMARIES

EMBEDDINGS: 128 dimensions

--- Architecture 1: Shallow Network ---



--- Architecture 2: Medium Network ---



--- Architecture 3: Deep Network ---



EMBEDDINGS: 512 dimensions

--- Architecture 1: Shallow Network ---



--- Architecture 2: Medium Network ---



--- Architecture 3: Deep Network ---



EMBEDDINGS: 1024 dimensions

--- Architecture 1: Shallow Network ---



--- Architecture 2: Medium Network ---



--- Architecture 3: Deep Network ---


In [13]:
from sklearn.metrics import classification_report, confusion_matrix
import time

# Diccionario de arquitecturas
architectures = {
    'Arch_1_Shallow': create_architecture_1,
    'Arch_2_Medium': create_architecture_2,
    'Arch_3_Deep': create_architecture_3
}

# Almacenar resultados
results = []
trained_models = {}

print("\n" + "="*80)
print("TRAINING ALL COMBINATIONS: 3 Architectures × 3 Embedding Sizes")
print("="*80)

for arch_name, arch_func in architectures.items():
    for embedding_size in vector_sizes:
        print(f"\n{'='*80}")
        print(f"Training: {arch_name} with Embedding Size {embedding_size}")
        print(f"{'='*80}")
        
        # Crear modelo
        model = arch_func(embedding_matrices[embedding_size], embedding_size)
        
        # Compilar
        model.compile(
            optimizer='adam',
            loss='sparse_categorical_crossentropy',
            metrics=['accuracy']
        )
        
        # Callbacks para mejorar el entrenamiento
        early_stop = tf.keras.callbacks.EarlyStopping(
            monitor='val_loss',
            patience=5,
            restore_best_weights=True,
            verbose=1
        )
        
        reduce_lr = tf.keras.callbacks.ReduceLROnPlateau(
            monitor='val_loss',
            factor=0.5,
            patience=3,
            min_lr=1e-7,
            verbose=1
        )
        
        # Entrenar
        start_time = time.time()
        history = model.fit(
            X_train, y_train,
            validation_data=(X_val, y_val),
            epochs=50,
            batch_size=32,
            callbacks=[early_stop, reduce_lr],
            verbose=1
        )
        training_time = time.time() - start_time
        
        # Evaluar en test
        test_loss, test_acc = model.evaluate(X_test, y_test, verbose=0)
        
        # Predicciones
        y_pred = model.predict(X_test, verbose=0)
        y_pred_classes = np.argmax(y_pred, axis=1)
        
        # Métricas detalladas
        precision = precision_score(y_test, y_pred_classes, average='weighted')
        recall = recall_score(y_test, y_pred_classes, average='weighted')
        f1 = f1_score(y_test, y_pred_classes, average='weighted')
        
        # Guardar resultados
        results.append({
            'Architecture': arch_name,
            'Embedding_Size': embedding_size,
            'Test_Loss': test_loss,
            'Test_Accuracy': test_acc,
            'Test_Precision': precision,
            'Test_Recall': recall,
            'Test_F1': f1,
            'Training_Time': training_time,
            'Epochs_Trained': len(history.history['loss'])
        })
        
        # Guardar modelo entrenado
        model_key = f"{arch_name}_emb{embedding_size}"
        trained_models[model_key] = {
            'model': model,
            'history': history,
            'y_pred': y_pred_classes
        }
        
        # Mostrar resultados
        print(f"\n{'='*80}")
        print(f"RESULTS: {arch_name} + Embedding {embedding_size}")
        print(f"{'='*80}")
        print(f"  Test Loss:     {test_loss:.4f}")
        print(f"  Test Accuracy: {test_acc:.4f}")
        print(f"  Precision:     {precision:.4f}")
        print(f"  Recall:        {recall:.4f}")
        print(f"  F1-Score:      {f1:.4f}")
        print(f"  Training Time: {training_time:.2f}s")
        print(f"  Epochs:        {len(history.history['loss'])}")
        
        # Reporte detallado por clase
        print(f"\nClassification Report:")
        target_names = [name for name, _ in sorted(author_to_id.items(), key=lambda x: x[1])]
        print(classification_report(y_test, y_pred_classes, target_names=target_names))
        
        print("\n")

print("\n" + "="*80)
print("TRAINING COMPLETED!")
print("="*80)


TRAINING ALL COMBINATIONS: 3 Architectures × 3 Embedding Sizes

Training: Arch_1_Shallow with Embedding Size 128
Epoch 1/50
[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 10ms/step - accuracy: 0.6781 - loss: 0.8262 - val_accuracy: 0.7546 - val_loss: 0.6085 - learning_rate: 0.0010
Epoch 2/50
[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step - accuracy: 0.8765 - loss: 0.4677 - val_accuracy: 0.9405 - val_loss: 0.3434 - learning_rate: 0.0010
Epoch 3/50
[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - accuracy: 0.9410 - loss: 0.2682 - val_accuracy: 0.9591 - val_loss: 0.2163 - learning_rate: 0.0010
Epoch 4/50
[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - accuracy: 0.9602 - loss: 0.1861 - val_accuracy: 0.9591 - val_loss: 0.1588 - learning_rate: 0.0010
Epoch 5/50
[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - accuracy: 0.9657 - loss: 0.1478 - val_accuracy: 0.9665 - val_lo



[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 13ms/step - accuracy: 0.7108 - loss: 0.7087 - val_accuracy: 0.8996 - val_loss: 0.4469 - learning_rate: 0.0010
Epoch 2/50
[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 9ms/step - accuracy: 0.9347 - loss: 0.3019 - val_accuracy: 0.9480 - val_loss: 0.2205 - learning_rate: 0.0010
Epoch 3/50
[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 9ms/step - accuracy: 0.9602 - loss: 0.1772 - val_accuracy: 0.9628 - val_loss: 0.1434 - learning_rate: 0.0010
Epoch 4/50
[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 9ms/step - accuracy: 0.9689 - loss: 0.1303 - val_accuracy: 0.9777 - val_loss: 0.1077 - learning_rate: 0.0010
Epoch 5/50
[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 9ms/step - accuracy: 0.9777 - loss: 0.1044 - val_accuracy: 0.9777 - val_loss: 0.0882 - learning_rate: 0.0010
Epoch 6/50
[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 9ms/step - accu



[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 19ms/step - accuracy: 0.7673 - loss: 0.6022 - val_accuracy: 0.9517 - val_loss: 0.3090 - learning_rate: 0.0010
Epoch 2/50
[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 16ms/step - accuracy: 0.9602 - loss: 0.2123 - val_accuracy: 0.9591 - val_loss: 0.1589 - learning_rate: 0.0010
Epoch 3/50
[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 16ms/step - accuracy: 0.9665 - loss: 0.1335 - val_accuracy: 0.9703 - val_loss: 0.1102 - learning_rate: 0.0010
Epoch 4/50
[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 15ms/step - accuracy: 0.9705 - loss: 0.1026 - val_accuracy: 0.9777 - val_loss: 0.0862 - learning_rate: 0.0010
Epoch 5/50
[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 15ms/step - accuracy: 0.9753 - loss: 0.0838 - val_accuracy: 0.9814 - val_loss: 0.0725 - learning_rate: 0.0010
Epoch 6/50
[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 14ms/step -



[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 10ms/step - accuracy: 0.7386 - loss: 0.6585 - val_accuracy: 0.9517 - val_loss: 0.2435 - learning_rate: 0.0010
Epoch 2/50
[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 7ms/step - accuracy: 0.9610 - loss: 0.1524 - val_accuracy: 0.9740 - val_loss: 0.0788 - learning_rate: 0.0010
Epoch 3/50
[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 7ms/step - accuracy: 0.9729 - loss: 0.0852 - val_accuracy: 0.9814 - val_loss: 0.0541 - learning_rate: 0.0010
Epoch 4/50
[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step - accuracy: 0.9793 - loss: 0.0670 - val_accuracy: 0.9814 - val_loss: 0.0469 - learning_rate: 0.0010
Epoch 5/50
[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step - accuracy: 0.9817 - loss: 0.0546 - val_accuracy: 0.9814 - val_loss: 0.0498 - learning_rate: 0.0010
Epoch 6/50
[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step - accu



[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 14ms/step - accuracy: 0.7952 - loss: 0.4975 - val_accuracy: 0.9628 - val_loss: 0.1177 - learning_rate: 0.0010
Epoch 2/50
[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 10ms/step - accuracy: 0.9705 - loss: 0.1064 - val_accuracy: 0.9777 - val_loss: 0.0618 - learning_rate: 0.0010
Epoch 3/50
[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 10ms/step - accuracy: 0.9785 - loss: 0.0751 - val_accuracy: 0.9814 - val_loss: 0.0464 - learning_rate: 0.0010
Epoch 4/50
[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 10ms/step - accuracy: 0.9849 - loss: 0.0472 - val_accuracy: 0.9851 - val_loss: 0.0427 - learning_rate: 0.0010
Epoch 5/50
[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 10ms/step - accuracy: 0.9912 - loss: 0.0364 - val_accuracy: 0.9814 - val_loss: 0.0376 - learning_rate: 0.0010
Epoch 6/50
[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 10ms/step -



[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 21ms/step - accuracy: 0.8279 - loss: 0.4534 - val_accuracy: 0.9703 - val_loss: 0.1048 - learning_rate: 0.0010
Epoch 2/50
[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 17ms/step - accuracy: 0.9697 - loss: 0.0946 - val_accuracy: 0.9851 - val_loss: 0.0553 - learning_rate: 0.0010
Epoch 3/50
[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 17ms/step - accuracy: 0.9817 - loss: 0.0526 - val_accuracy: 0.9888 - val_loss: 0.0512 - learning_rate: 0.0010
Epoch 4/50
[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 17ms/step - accuracy: 0.9865 - loss: 0.0428 - val_accuracy: 0.9851 - val_loss: 0.0447 - learning_rate: 0.0010
Epoch 5/50
[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 17ms/step - accuracy: 0.9825 - loss: 0.0456 - val_accuracy: 0.9703 - val_loss: 0.0737 - learning_rate: 0.0010
Epoch 6/50
[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 17ms/step -



[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 13ms/step - accuracy: 0.8876 - loss: 0.2928 - val_accuracy: 0.6468 - val_loss: 0.7400 - learning_rate: 0.0010
Epoch 2/50
[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 9ms/step - accuracy: 0.9665 - loss: 0.1022 - val_accuracy: 0.7138 - val_loss: 0.6611 - learning_rate: 0.0010
Epoch 3/50
[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 8ms/step - accuracy: 0.9681 - loss: 0.0898 - val_accuracy: 0.6320 - val_loss: 0.6720 - learning_rate: 0.0010
Epoch 4/50
[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 8ms/step - accuracy: 0.9737 - loss: 0.0799 - val_accuracy: 0.7584 - val_loss: 0.5182 - learning_rate: 0.0010
Epoch 5/50
[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 8ms/step - accuracy: 0.9689 - loss: 0.0801 - val_accuracy: 0.7844 - val_loss: 0.4408 - learning_rate: 0.0010
Epoch 6/50
[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 8ms/step - accu



[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 20ms/step - accuracy: 0.8781 - loss: 0.3198 - val_accuracy: 0.6580 - val_loss: 0.7138 - learning_rate: 0.0010
Epoch 2/50
[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 16ms/step - accuracy: 0.9681 - loss: 0.1004 - val_accuracy: 0.6357 - val_loss: 0.6749 - learning_rate: 0.0010
Epoch 3/50
[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 13ms/step - accuracy: 0.9657 - loss: 0.0958 - val_accuracy: 0.6431 - val_loss: 0.6506 - learning_rate: 0.0010
Epoch 4/50
[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 13ms/step - accuracy: 0.9649 - loss: 0.0911 - val_accuracy: 0.6803 - val_loss: 0.6241 - learning_rate: 0.0010
Epoch 5/50
[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 13ms/step - accuracy: 0.9729 - loss: 0.0776 - val_accuracy: 0.7212 - val_loss: 0.5073 - learning_rate: 0.0010
Epoch 6/50
[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 14ms/step -



[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 27ms/step - accuracy: 0.9076 - loss: 0.2885 - val_accuracy: 0.6691 - val_loss: 0.6888 - learning_rate: 0.0010
Epoch 2/50
[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 22ms/step - accuracy: 0.9618 - loss: 0.1153 - val_accuracy: 0.6468 - val_loss: 0.6428 - learning_rate: 0.0010
Epoch 3/50
[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 21ms/step - accuracy: 0.9761 - loss: 0.0799 - val_accuracy: 0.6840 - val_loss: 0.5877 - learning_rate: 0.0010
Epoch 4/50
[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 24ms/step - accuracy: 0.9729 - loss: 0.0815 - val_accuracy: 0.7584 - val_loss: 0.4471 - learning_rate: 0.0010
Epoch 5/50
[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 21ms/step - accuracy: 0.9681 - loss: 0.1065 - val_accuracy: 0.8550 - val_loss: 0.3257 - learning_rate: 0.0010
Epoch 6/50
[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 23ms/step -

In [14]:
# Crear DataFrame de resultados
results_df = pd.DataFrame(results)

print("\n" + "="*100)
print("COMPLETE RESULTS TABLE")
print("="*100)
print(results_df.to_string(index=False))

# Tablas pivot
print("\n" + "="*100)
print("TEST ACCURACY BY ARCHITECTURE AND EMBEDDING SIZE")
print("="*100)
accuracy_pivot = results_df.pivot(index='Architecture', 
                                   columns='Embedding_Size', 
                                   values='Test_Accuracy')
print(accuracy_pivot.round(4))

print("\n" + "="*100)
print("TEST PRECISION BY ARCHITECTURE AND EMBEDDING SIZE")
print("="*100)
precision_pivot = results_df.pivot(index='Architecture', 
                                    columns='Embedding_Size', 
                                    values='Test_Precision')
print(precision_pivot.round(4))

print("\n" + "="*100)
print("TEST RECALL BY ARCHITECTURE AND EMBEDDING SIZE")
print("="*100)
recall_pivot = results_df.pivot(index='Architecture', 
                                 columns='Embedding_Size', 
                                 values='Test_Recall')
print(recall_pivot.round(4))

print("\n" + "="*100)
print("TEST F1-SCORE BY ARCHITECTURE AND EMBEDDING SIZE")
print("="*100)
f1_pivot = results_df.pivot(index='Architecture', 
                             columns='Embedding_Size', 
                             values='Test_F1')
print(f1_pivot.round(4))

# Mejores configuraciones
print("\n" + "="*100)
print("BEST CONFIGURATIONS")
print("="*100)

best_acc_idx = results_df['Test_Accuracy'].idxmax()
best_acc = results_df.loc[best_acc_idx]
print(f"\n Best Accuracy: {best_acc['Test_Accuracy']:.4f}")
print(f"   Architecture: {best_acc['Architecture']}")
print(f"   Embedding Size: {best_acc['Embedding_Size']}")
print(f"   Precision: {best_acc['Test_Precision']:.4f}")
print(f"   Recall: {best_acc['Test_Recall']:.4f}")
print(f"   F1-Score: {best_acc['Test_F1']:.4f}")

best_f1_idx = results_df['Test_F1'].idxmax()
best_f1 = results_df.loc[best_f1_idx]
print(f"\n Best F1-Score: {best_f1['Test_F1']:.4f}")
print(f"   Architecture: {best_f1['Architecture']}")
print(f"   Embedding Size: {best_f1['Embedding_Size']}")
print(f"   Accuracy: {best_f1['Test_Accuracy']:.4f}")


COMPLETE RESULTS TABLE
  Architecture  Embedding_Size  Test_Loss  Test_Accuracy  Test_Precision  Test_Recall  Test_F1  Training_Time  Epochs_Trained
Arch_1_Shallow             128   0.076763       0.981481        0.981480     0.981481 0.981382      11.182479              41
Arch_1_Shallow             512   0.078531       0.981481        0.981480     0.981481 0.981382      13.247131              32
Arch_1_Shallow            1024   0.078525       0.977778        0.977779     0.977778 0.977563      22.032479              32
 Arch_2_Medium             128   0.082183       0.970370        0.970802     0.970370 0.970284       4.676910              13
 Arch_2_Medium             512   0.095085       0.981481        0.981480     0.981481 0.981382       7.955924              16
 Arch_2_Medium            1024   0.089247       0.981481        0.981480     0.981481 0.981382      10.946823              14
   Arch_3_Deep             128   0.109325       0.985185        0.985721     0.985185 0.984993