# HW04 - NLP
## Punto IV

Repeat III but instead of using your embeddings, use the Google-Word2Vec or Glove pre-trained embeddings with different dimensionalities (at least 3). You can download these embeddings from different sources like Gensim data repository.

In [1]:
import pandas as pd
import os
import re
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, classification_report
import tensorflow as tf
from tensorflow.keras import layers
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
import gensim.downloader as api
import time

2025-10-18 19:23:02.171634: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
2025-10-18 19:23:02.475942: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2025-10-18 19:23:05.214591: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.


Iniciamos cargando y procesando los datos igual que en el Punto 3.

In [2]:
def serialize_text(f) -> list[str]:
    begun = False
    full_text = []
    paragraph = ""

    for base_line in f:
        line = base_line.strip()

        if len(line) == 0:
            if len(paragraph) > 0:
                full_text.append(paragraph.strip())
                paragraph = ""
            continue

        if line.startswith("*** START OF THE PROJECT GUTENBERG EBOOK"):
            begun = True
            continue

        if line.startswith("*** END OF THE PROJECT GUTENBERG EBOOK"):
            break

        if begun:
            paragraph += line + " "

    return full_text

def create_text_samples(text, min_words=150, max_words=250) -> list[str]:
    """Crea segmentos de texto entre min_words y max_words palabras"""
    samples = []
    words = text.split()

    i = 0
    while i < len(words):
        sample_size = np.random.randint(min_words, max_words + 1)
        if i + sample_size <= len(words):
            sample = ' '.join(words[i:i+sample_size])
            samples.append(sample)
            i += sample_size
        else:
            if len(words) - i >= min_words:
                sample = ' '.join(words[i:])
                samples.append(sample)
            break

    return samples

def tokenize(text: str):
    processed = text.lower()
    processed = re.sub(r'[^a-z\s\']', ' ', processed)
    processed = re.sub(r'\s+', ' ', processed).strip()
    tokens = processed.split()
    tokens = [token for token in tokens if len(token) > 1]
    return tokens

In [3]:
base_path = "./books"
books = os.listdir(base_path)

data = []
author_mapping = {
    'arthur': 'Arthur Conan Doyle',
    'lewis': 'Lewis Carroll',
    'shakespear': 'William Shakespeare'
}

for book in books:
    author_key = book.split('-')[0]
    author = author_mapping[author_key]

    path = os.path.join(base_path, book)
    with open(path, encoding="utf-8") as f:
        paragraphs = serialize_text(f)
        full_text = ' '.join(paragraphs)
        samples = create_text_samples(full_text, min_words=150, max_words=250)

        for sample in samples:
            data.append({
                'text': sample,
                'author': author,
                'book': book.replace('.txt', '')
            })

df = pd.DataFrame(data)
author_to_id = {author: idx for idx, author in enumerate(df['author'].unique())}
df['author_id'] = df['author'].map(author_to_id)

print(f"Total de muestras: {len(df)}")
print(f"\nDistribución por autor:")
print(df['author'].value_counts())

# Train/Val/Test split
train_df, temp_df = train_test_split(df, test_size=0.3, stratify=df['author_id'], random_state=42)
val_df, test_df = train_test_split(temp_df, test_size=0.5, stratify=temp_df['author_id'], random_state=42)

print(f"\n=== DATASET SPLITS ===")
print(f"Train: {len(train_df)} samples")
print(f"Validation: {len(val_df)} samples")
print(f"Test: {len(test_df)} samples")

# Preprocesar textos
train_df['text_processed'] = train_df['text'].apply(tokenize)
val_df['text_processed'] = val_df['text'].apply(tokenize)
test_df['text_processed'] = test_df['text'].apply(tokenize)

Total de muestras: 1779

Distribución por autor:
author
Arthur Conan Doyle     1061
William Shakespeare     412
Lewis Carroll           306
Name: count, dtype: int64

=== DATASET SPLITS ===
Train: 1245 samples
Validation: 267 samples
Test: 267 samples


Descargamos los embeddings pre-entrenados usando GENSIM. Seleccionamos embeddings con diferentes dimensionalidades

In [4]:
print("="*80)
print("DOWNLOADING PRE-TRAINED EMBEDDINGS")
print("="*80)

pretrained_models = {
    # Google Word2Vec (300 dim)
    'word2vec-google-news-300': 300,
    
    # GloVe embeddings
    'glove-wiki-gigaword-50': 50,
    'glove-wiki-gigaword-100': 100,
    'glove-wiki-gigaword-200': 200,
}

embeddings = {}
for model_name, dim in pretrained_models.items():
    print(f"\nDownloading {model_name} ({dim} dimensions)...")
    try:
        embeddings[model_name] = api.load(model_name)
        print(f"  ✓ Loaded: vocab size = {len(embeddings[model_name])}")
    except Exception as e:
        print(f"  ✗ Error loading {model_name}: {e}")

print(f"\n✓ Successfully loaded {len(embeddings)} embedding models")

DOWNLOADING PRE-TRAINED EMBEDDINGS

Downloading word2vec-google-news-300 (300 dimensions)...
  ✓ Loaded: vocab size = 3000000

Downloading glove-wiki-gigaword-50 (50 dimensions)...
  ✓ Loaded: vocab size = 400000

Downloading glove-wiki-gigaword-100 (100 dimensions)...
  ✓ Loaded: vocab size = 400000

Downloading glove-wiki-gigaword-200 (200 dimensions)...
  ✓ Loaded: vocab size = 400000

✓ Successfully loaded 4 embedding models


Preparamos los datos para Keras al igual que en el Punto 3, pero sin filtrar el vocabulario de los embeddings pre-entrenados.

In [5]:
train_texts = [' '.join(tokens) for tokens in train_df['text_processed']]
val_texts = [' '.join(tokens) for tokens in val_df['text_processed']]
test_texts = [' '.join(tokens) for tokens in test_df['text_processed']]

tokenizer = Tokenizer(oov_token='<OOV>')
tokenizer.fit_on_texts(train_texts)

max_len = 250
vocab_size = len(tokenizer.word_index) + 1

print(f"Vocabulary size: {vocab_size}")
print(f"Max sequence length: {max_len}")

X_train = pad_sequences(tokenizer.texts_to_sequences(train_texts), maxlen=max_len, padding='post')
X_val = pad_sequences(tokenizer.texts_to_sequences(val_texts), maxlen=max_len, padding='post')
X_test = pad_sequences(tokenizer.texts_to_sequences(test_texts), maxlen=max_len, padding='post')

y_train = train_df['author_id'].values
y_val = val_df['author_id'].values
y_test = test_df['author_id'].values

print(f"\nX_train shape: {X_train.shape}")
print(f"X_val shape: {X_val.shape}")
print(f"X_test shape: {X_test.shape}")

Vocabulary size: 15198
Max sequence length: 250

X_train shape: (1245, 250)
X_val shape: (267, 250)
X_test shape: (267, 250)


In [6]:
def create_embedding_matrix_from_pretrained(tokenizer, pretrained_model, embedding_dim):
    """
    Crea matriz de embeddings usando modelo pre-entrenado
    Para palabras no encontradas, usa vectores aleatorios
    """
    vocab_size = len(tokenizer.word_index) + 1
    embedding_matrix = np.zeros((vocab_size, embedding_dim))
    
    found_words = 0
    missing_words = []
    
    for word, i in tokenizer.word_index.items():
        try:
            # Intentar obtener el vector de la palabra
            if word in pretrained_model:
                embedding_matrix[i] = pretrained_model[word]
                found_words += 1
            else:
                # Para palabras no encontradas, usar vector aleatorio pequeño
                embedding_matrix[i] = np.random.normal(0, 0.1, embedding_dim)
                missing_words.append(word)
        except KeyError:
            embedding_matrix[i] = np.random.normal(0, 0.1, embedding_dim)
            missing_words.append(word)
    
    coverage = 100 * found_words / len(tokenizer.word_index)
    print(f"  Found {found_words}/{len(tokenizer.word_index)} words ({coverage:.1f}% coverage)")
    print(f"  Missing {len(missing_words)} words")
    
    return embedding_matrix, coverage

# Crear matrices de embeddings
embedding_matrices = {}
coverage_stats = {}

print("\n" + "="*80)
print("CREATING EMBEDDING MATRICES")
print("="*80)

for model_name, pretrained_model in embeddings.items():
    dim = pretrained_models[model_name]
    print(f"\n{model_name} ({dim}D):")
    
    matrix, coverage = create_embedding_matrix_from_pretrained(
        tokenizer, pretrained_model, dim
    )
    
    embedding_matrices[model_name] = matrix
    coverage_stats[model_name] = coverage
    print(f"  Matrix shape: {matrix.shape}")

# Mostrar estadísticas de cobertura
print("\n" + "="*80)
print("EMBEDDING COVERAGE SUMMARY")
print("="*80)
for model_name, coverage in sorted(coverage_stats.items(), key=lambda x: x[1], reverse=True):
    print(f"{model_name:40s} : {coverage:5.1f}%")


CREATING EMBEDDING MATRICES

word2vec-google-news-300 (300D):
  Found 12719/15197 words (83.7% coverage)
  Missing 2478 words
  Matrix shape: (15198, 300)

glove-wiki-gigaword-50 (50D):
  Found 13211/15197 words (86.9% coverage)
  Missing 1986 words
  Matrix shape: (15198, 50)

glove-wiki-gigaword-100 (100D):
  Found 13211/15197 words (86.9% coverage)
  Missing 1986 words
  Matrix shape: (15198, 100)

glove-wiki-gigaword-200 (200D):
  Found 13211/15197 words (86.9% coverage)
  Missing 1986 words
  Matrix shape: (15198, 200)

EMBEDDING COVERAGE SUMMARY
glove-wiki-gigaword-50                   :  86.9%
glove-wiki-gigaword-100                  :  86.9%
glove-wiki-gigaword-200                  :  86.9%
word2vec-google-news-300                 :  83.7%


In [None]:
def create_architecture_1(embedding_matrix: np.ndarray, embedding_dim: int):
    """
    Shallow Network with GlobalAveragePooling1D.
    inputs:
        - embedding_matrix: np.ndarray
        - embedding_dim: int
    outputs:
        - model: tf.keras.Model
    """
    model = tf.keras.Sequential([
        layers.Embedding(
            input_dim=embedding_matrix.shape[0],
            output_dim=embedding_dim,
            weights=[embedding_matrix],
            input_length=max_len,
            trainable=False
        ),
        layers.GlobalAveragePooling1D(),
        layers.Dense(64, activation='relu'),
        layers.Dropout(0.3),
        layers.Dense(3, activation='softmax')
    ])
    return model

def create_architecture_2(embedding_matrix: np.ndarray, embedding_dim: int):
    """Medium Network with GlobalAveragePooling1D.
    inputs:
        - embedding_matrix: np.ndarray
        - embedding_dim: int
    outputs:
        - model: tf.keras.Model
    """
    model = tf.keras.Sequential([
        layers.Embedding(
            input_dim=embedding_matrix.shape[0],
            output_dim=embedding_dim,
            weights=[embedding_matrix],
            input_length=max_len,
            trainable=False
        ),
        layers.GlobalAveragePooling1D(),
        layers.Dense(256, activation='relu'),
        layers.Dropout(0.4),
        layers.Dense(128, activation='relu'),
        layers.Dropout(0.3),
        layers.Dense(3, activation='softmax')
    ])
    return model

def create_architecture_3(embedding_matrix, embedding_dim):
    """Deep Network with GlobalAveragePooling1D.
    inputs:
        - embedding_matrix: np.ndarray
        - embedding_dim: int
    outputs:
        - model: tf.keras.Model
    """
    model = tf.keras.Sequential([
        layers.Embedding(
            input_dim=embedding_matrix.shape[0],
            output_dim=embedding_dim,
            weights=[embedding_matrix],
            input_length=max_len,
            trainable=False
        ),
        layers.GlobalAveragePooling1D(),
        layers.Dense(512, activation='relu'),
        layers.BatchNormalization(),
        layers.Dropout(0.5),
        layers.Dense(256, activation='relu'),
        layers.BatchNormalization(),
        layers.Dropout(0.4),
        layers.Dense(128, activation='relu'),
        layers.Dropout(0.3),
        layers.Dense(3, activation='softmax')
    ])
    return model

architectures = {
    'Arch_1_Shallow': create_architecture_1,
    'Arch_2_Medium': create_architecture_2,
    'Arch_3_Deep': create_architecture_3
}

In [8]:
print("=" * 80)
print("ARCHITECTURE SUMMARIES WITH PRE-TRAINED EMBEDDINGS")
print("=" * 80)

for model_name, embedding_matrix in embedding_matrices.items():
    dim = pretrained_models[model_name]
    print(f"\n{'='*80}")
    print(f"EMBEDDINGS: {model_name} ({dim} dimensions)")
    print(f"{'='*80}")
    
    for arch_name, arch_func in architectures.items():
        print(f"\n--- {arch_name} ---")
        model = arch_func(embedding_matrix, dim)
        model.build(input_shape=(None, max_len))
        model.summary()

ARCHITECTURE SUMMARIES WITH PRE-TRAINED EMBEDDINGS

EMBEDDINGS: word2vec-google-news-300 (300 dimensions)

--- Arch_1_Shallow ---


E0000 00:00:1760835232.263499   13275 cuda_executor.cc:1309] INTERNAL: CUDA Runtime error: Failed call to cudaGetRuntimeVersion: Error loading CUDA libraries. GPU will not be used.: Error loading CUDA libraries. GPU will not be used.
W0000 00:00:1760835232.279654   13275 gpu_device.cc:2342] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...



--- Arch_2_Medium ---



--- Arch_3_Deep ---



EMBEDDINGS: glove-wiki-gigaword-50 (50 dimensions)

--- Arch_1_Shallow ---



--- Arch_2_Medium ---



--- Arch_3_Deep ---



EMBEDDINGS: glove-wiki-gigaword-100 (100 dimensions)

--- Arch_1_Shallow ---



--- Arch_2_Medium ---



--- Arch_3_Deep ---



EMBEDDINGS: glove-wiki-gigaword-200 (200 dimensions)

--- Arch_1_Shallow ---



--- Arch_2_Medium ---



--- Arch_3_Deep ---


In [9]:
results = []
trained_models = {}

print("\n" + "="*80)
print("TRAINING ALL COMBINATIONS WITH PRE-TRAINED EMBEDDINGS")
print(f"Total: {len(architectures)} Architectures × {len(embeddings)} Embeddings = {len(architectures) * len(embeddings)} models")
print("="*80)

for arch_name, arch_func in architectures.items():
    for model_name, embedding_matrix in embedding_matrices.items():
        dim = pretrained_models[model_name]
        
        print(f"\n{'='*80}")
        print(f"Training: {arch_name} with {model_name}")
        print(f"{'='*80}")
        
        # Crear y compilar modelo
        model = arch_func(embedding_matrix, dim)
        model.compile(
            optimizer='adam',
            loss='sparse_categorical_crossentropy',
            metrics=['accuracy']
        )
        
        # Callbacks
        early_stop = tf.keras.callbacks.EarlyStopping(
            monitor='val_loss',
            patience=5,
            restore_best_weights=True,
            verbose=1
        )
        
        reduce_lr = tf.keras.callbacks.ReduceLROnPlateau(
            monitor='val_loss',
            factor=0.5,
            patience=3,
            min_lr=1e-7,
            verbose=1
        )
        
        # Entrenar
        start_time = time.time()
        history = model.fit(
            X_train, y_train,
            validation_data=(X_val, y_val),
            epochs=50,
            batch_size=32,
            callbacks=[early_stop, reduce_lr],
            verbose=1
        )
        training_time = time.time() - start_time
        
        # Evaluar
        test_loss, test_acc = model.evaluate(X_test, y_test, verbose=0)
        
        # Predicciones
        y_pred = model.predict(X_test, verbose=0)
        y_pred_classes = np.argmax(y_pred, axis=1)
        
        # Métricas
        precision = precision_score(y_test, y_pred_classes, average='weighted')
        recall = recall_score(y_test, y_pred_classes, average='weighted')
        f1 = f1_score(y_test, y_pred_classes, average='weighted')
        
        # Guardar resultados
        results.append({
            'Architecture': arch_name,
            'Embedding': model_name,
            'Embedding_Dim': dim,
            'Coverage': coverage_stats[model_name],
            'Test_Loss': test_loss,
            'Test_Accuracy': test_acc,
            'Test_Precision': precision,
            'Test_Recall': recall,
            'Test_F1': f1,
            'Training_Time': training_time,
            'Epochs_Trained': len(history.history['loss'])
        })
        
        # Guardar modelo
        model_key = f"{arch_name}_{model_name}"
        trained_models[model_key] = {
            'model': model,
            'history': history,
            'y_pred': y_pred_classes
        }
        
        # Mostrar resultados
        print(f"\n{'='*80}")
        print(f"RESULTS: {arch_name} + {model_name}")
        print(f"{'='*80}")
        print(f"  Embedding Coverage: {coverage_stats[model_name]:.1f}%")
        print(f"  Test Loss:          {test_loss:.4f}")
        print(f"  Test Accuracy:      {test_acc:.4f}")
        print(f"  Precision:          {precision:.4f}")
        print(f"  Recall:             {recall:.4f}")
        print(f"  F1-Score:           {f1:.4f}")
        print(f"  Training Time:      {training_time:.2f}s")
        print(f"  Epochs:             {len(history.history['loss'])}")
        
        # Classification report
        print(f"\nClassification Report:")
        target_names = [name for name, _ in sorted(author_to_id.items(), key=lambda x: x[1])]
        print(classification_report(y_test, y_pred_classes, target_names=target_names))
        
        print("\n")

print("\n" + "="*80)
print("TRAINING COMPLETED!")
print("="*80)


TRAINING ALL COMBINATIONS WITH PRE-TRAINED EMBEDDINGS
Total: 3 Architectures × 4 Embeddings = 12 models

Training: Arch_1_Shallow with word2vec-google-news-300
Epoch 1/50
[1m39/39[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 12ms/step - accuracy: 0.5968 - loss: 0.9601 - val_accuracy: 0.5955 - val_loss: 0.8973 - learning_rate: 0.0010
Epoch 2/50
[1m39/39[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 7ms/step - accuracy: 0.6104 - loss: 0.8524 - val_accuracy: 0.6030 - val_loss: 0.8189 - learning_rate: 0.0010
Epoch 3/50
[1m39/39[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 7ms/step - accuracy: 0.6771 - loss: 0.7573 - val_accuracy: 0.7079 - val_loss: 0.7279 - learning_rate: 0.0010
Epoch 4/50
[1m39/39[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 8ms/step - accuracy: 0.7574 - loss: 0.6599 - val_accuracy: 0.7678 - val_loss: 0.6360 - learning_rate: 0.0010
Epoch 5/50
[1m39/39[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 7ms/step - accuracy: 0.7952



[1m39/39[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 9ms/step - accuracy: 0.5092 - loss: 1.0072 - val_accuracy: 0.5955 - val_loss: 0.9091 - learning_rate: 0.0010
Epoch 2/50
[1m39/39[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - accuracy: 0.6024 - loss: 0.8849 - val_accuracy: 0.5955 - val_loss: 0.8528 - learning_rate: 0.0010
Epoch 3/50
[1m39/39[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - accuracy: 0.6241 - loss: 0.8292 - val_accuracy: 0.6292 - val_loss: 0.8013 - learning_rate: 0.0010
Epoch 4/50
[1m39/39[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - accuracy: 0.6715 - loss: 0.7677 - val_accuracy: 0.7004 - val_loss: 0.7469 - learning_rate: 0.0010
Epoch 5/50
[1m39/39[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - accuracy: 0.7149 - loss: 0.7164 - val_accuracy: 0.7191 - val_loss: 0.6932 - learning_rate: 0.0010
Epoch 6/50
[1m39/39[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - accur



[1m39/39[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 9ms/step - accuracy: 0.5960 - loss: 0.9067 - val_accuracy: 0.5955 - val_loss: 0.8616 - learning_rate: 0.0010
Epoch 2/50
[1m39/39[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - accuracy: 0.6161 - loss: 0.8339 - val_accuracy: 0.6217 - val_loss: 0.8038 - learning_rate: 0.0010
Epoch 3/50
[1m39/39[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - accuracy: 0.6610 - loss: 0.7715 - val_accuracy: 0.6854 - val_loss: 0.7408 - learning_rate: 0.0010
Epoch 4/50
[1m39/39[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - accuracy: 0.7277 - loss: 0.6952 - val_accuracy: 0.7266 - val_loss: 0.6734 - learning_rate: 0.0010
Epoch 5/50
[1m39/39[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - accuracy: 0.7631 - loss: 0.6347 - val_accuracy: 0.7453 - val_loss: 0.6132 - learning_rate: 0.0010
Epoch 6/50
[1m39/39[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - accur



[1m39/39[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 10ms/step - accuracy: 0.6056 - loss: 0.8714 - val_accuracy: 0.6030 - val_loss: 0.8090 - learning_rate: 0.0010
Epoch 2/50
[1m39/39[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 8ms/step - accuracy: 0.6578 - loss: 0.7573 - val_accuracy: 0.7004 - val_loss: 0.7070 - learning_rate: 0.0010
Epoch 3/50
[1m39/39[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step - accuracy: 0.7454 - loss: 0.6463 - val_accuracy: 0.7228 - val_loss: 0.6087 - learning_rate: 0.0010
Epoch 4/50
[1m39/39[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step - accuracy: 0.8016 - loss: 0.5533 - val_accuracy: 0.7978 - val_loss: 0.5160 - learning_rate: 0.0010
Epoch 5/50
[1m39/39[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step - accuracy: 0.8466 - loss: 0.4667 - val_accuracy: 0.8315 - val_loss: 0.4509 - learning_rate: 0.0010
Epoch 6/50
[1m39/39[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step - accu



[1m39/39[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 13ms/step - accuracy: 0.6048 - loss: 0.9094 - val_accuracy: 0.6517 - val_loss: 0.7826 - learning_rate: 0.0010
Epoch 2/50
[1m39/39[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 8ms/step - accuracy: 0.7671 - loss: 0.6114 - val_accuracy: 0.7828 - val_loss: 0.4910 - learning_rate: 0.0010
Epoch 3/50
[1m39/39[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 8ms/step - accuracy: 0.8578 - loss: 0.3719 - val_accuracy: 0.9026 - val_loss: 0.3123 - learning_rate: 0.0010
Epoch 4/50
[1m39/39[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 8ms/step - accuracy: 0.9301 - loss: 0.2380 - val_accuracy: 0.9363 - val_loss: 0.2173 - learning_rate: 0.0010
Epoch 5/50
[1m39/39[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 10ms/step - accuracy: 0.9518 - loss: 0.1699 - val_accuracy: 0.9288 - val_loss: 0.2285 - learning_rate: 0.0010
Epoch 6/50
[1m39/39[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 8ms/step - acc



[1m39/39[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 9ms/step - accuracy: 0.5871 - loss: 0.9065 - val_accuracy: 0.5955 - val_loss: 0.8334 - learning_rate: 0.0010
Epoch 2/50
[1m39/39[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - accuracy: 0.6940 - loss: 0.7399 - val_accuracy: 0.7566 - val_loss: 0.6050 - learning_rate: 0.0010
Epoch 3/50
[1m39/39[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - accuracy: 0.8024 - loss: 0.5237 - val_accuracy: 0.8240 - val_loss: 0.4438 - learning_rate: 0.0010
Epoch 4/50
[1m39/39[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - accuracy: 0.8361 - loss: 0.4047 - val_accuracy: 0.8390 - val_loss: 0.3860 - learning_rate: 0.0010
Epoch 5/50
[1m39/39[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - accuracy: 0.8763 - loss: 0.3452 - val_accuracy: 0.8764 - val_loss: 0.3180 - learning_rate: 0.0010
Epoch 6/50
[1m39/39[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - accur



[1m39/39[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 10ms/step - accuracy: 0.5936 - loss: 0.9065 - val_accuracy: 0.6067 - val_loss: 0.8170 - learning_rate: 0.0010
Epoch 2/50
[1m39/39[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step - accuracy: 0.6988 - loss: 0.7333 - val_accuracy: 0.7715 - val_loss: 0.5995 - learning_rate: 0.0010
Epoch 3/50
[1m39/39[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step - accuracy: 0.7952 - loss: 0.5326 - val_accuracy: 0.7828 - val_loss: 0.4650 - learning_rate: 0.0010
Epoch 4/50
[1m39/39[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step - accuracy: 0.8522 - loss: 0.3925 - val_accuracy: 0.8727 - val_loss: 0.3429 - learning_rate: 0.0010
Epoch 5/50
[1m39/39[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step - accuracy: 0.9004 - loss: 0.3060 - val_accuracy: 0.8839 - val_loss: 0.3084 - learning_rate: 0.0010
Epoch 6/50
[1m39/39[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step - accu



[1m39/39[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 11ms/step - accuracy: 0.5984 - loss: 0.8907 - val_accuracy: 0.6404 - val_loss: 0.7658 - learning_rate: 0.0010
Epoch 2/50
[1m39/39[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 7ms/step - accuracy: 0.7430 - loss: 0.6444 - val_accuracy: 0.7828 - val_loss: 0.4920 - learning_rate: 0.0010
Epoch 3/50
[1m39/39[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 7ms/step - accuracy: 0.8394 - loss: 0.4241 - val_accuracy: 0.9101 - val_loss: 0.3222 - learning_rate: 0.0010
Epoch 4/50
[1m39/39[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 7ms/step - accuracy: 0.9149 - loss: 0.2677 - val_accuracy: 0.9026 - val_loss: 0.2543 - learning_rate: 0.0010
Epoch 5/50
[1m39/39[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 7ms/step - accuracy: 0.9181 - loss: 0.2379 - val_accuracy: 0.9326 - val_loss: 0.2196 - learning_rate: 0.0010
Epoch 6/50
[1m39/39[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 7ms/step - accu



[1m39/39[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 17ms/step - accuracy: 0.8016 - loss: 0.5291 - val_accuracy: 0.5955 - val_loss: 0.9293 - learning_rate: 0.0010
Epoch 2/50
[1m39/39[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 10ms/step - accuracy: 0.9309 - loss: 0.1934 - val_accuracy: 0.5955 - val_loss: 0.9354 - learning_rate: 0.0010
Epoch 3/50
[1m39/39[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 10ms/step - accuracy: 0.9518 - loss: 0.1231 - val_accuracy: 0.5955 - val_loss: 1.0127 - learning_rate: 0.0010
Epoch 4/50
[1m35/39[0m [32m━━━━━━━━━━━━━━━━━[0m[37m━━━[0m [1m0s[0m 8ms/step - accuracy: 0.9641 - loss: 0.1012
Epoch 4: ReduceLROnPlateau reducing learning rate to 0.0005000000237487257.
[1m39/39[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 10ms/step - accuracy: 0.9639 - loss: 0.1021 - val_accuracy: 0.5955 - val_loss: 1.2463 - learning_rate: 0.0010
Epoch 5/50
[1m39/39[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 11ms/step - a

  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])
  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])
  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])
  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])


[1m39/39[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 13ms/step - accuracy: 0.6835 - loss: 0.8234 - val_accuracy: 0.5955 - val_loss: 0.9461 - learning_rate: 0.0010
Epoch 2/50
[1m39/39[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 8ms/step - accuracy: 0.8490 - loss: 0.4461 - val_accuracy: 0.5955 - val_loss: 0.9910 - learning_rate: 0.0010
Epoch 3/50
[1m39/39[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 8ms/step - accuracy: 0.8811 - loss: 0.3173 - val_accuracy: 0.5955 - val_loss: 1.1478 - learning_rate: 0.0010
Epoch 4/50
[1m39/39[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - accuracy: 0.8893 - loss: 0.3465
Epoch 4: ReduceLROnPlateau reducing learning rate to 0.0005000000237487257.
[1m39/39[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 7ms/step - accuracy: 0.8996 - loss: 0.2846 - val_accuracy: 0.5955 - val_loss: 1.1787 - learning_rate: 0.0010
Epoch 5/50
[1m39/39[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 9ms/step - accur

  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])
  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])
  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])
  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])


[1m39/39[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 14ms/step - accuracy: 0.7470 - loss: 0.6758 - val_accuracy: 0.5955 - val_loss: 0.9321 - learning_rate: 0.0010
Epoch 2/50
[1m39/39[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 9ms/step - accuracy: 0.8803 - loss: 0.3345 - val_accuracy: 0.5955 - val_loss: 1.0224 - learning_rate: 0.0010
Epoch 3/50
[1m39/39[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 8ms/step - accuracy: 0.9012 - loss: 0.2747 - val_accuracy: 0.5955 - val_loss: 1.1783 - learning_rate: 0.0010
Epoch 4/50
[1m38/39[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 6ms/step - accuracy: 0.9297 - loss: 0.2079
Epoch 4: ReduceLROnPlateau reducing learning rate to 0.0005000000237487257.
[1m39/39[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 8ms/step - accuracy: 0.9213 - loss: 0.2245 - val_accuracy: 0.5955 - val_loss: 1.4843 - learning_rate: 0.0010
Epoch 5/50
[1m39/39[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 8ms/step - accur

  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])
  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])
  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])
  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])


[1m39/39[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 16ms/step - accuracy: 0.7679 - loss: 0.6005 - val_accuracy: 0.5955 - val_loss: 0.8974 - learning_rate: 0.0010
Epoch 2/50
[1m39/39[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 9ms/step - accuracy: 0.9149 - loss: 0.2320 - val_accuracy: 0.5955 - val_loss: 0.9195 - learning_rate: 0.0010
Epoch 3/50
[1m39/39[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 9ms/step - accuracy: 0.9165 - loss: 0.2164 - val_accuracy: 0.5955 - val_loss: 1.1035 - learning_rate: 0.0010
Epoch 4/50
[1m34/39[0m [32m━━━━━━━━━━━━━━━━━[0m[37m━━━[0m [1m0s[0m 7ms/step - accuracy: 0.9491 - loss: 0.1791
Epoch 4: ReduceLROnPlateau reducing learning rate to 0.0005000000237487257.
[1m39/39[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 9ms/step - accuracy: 0.9373 - loss: 0.1915 - val_accuracy: 0.5955 - val_loss: 1.1743 - learning_rate: 0.0010
Epoch 5/50
[1m39/39[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 9ms/step - accur

  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])
  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])
  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])
  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])


Resumen de resultados y análisis.

In [12]:
results_df = pd.DataFrame(results)

print("\n" + "="*100)
print("RESUMEN COMPLETO - EMBEDDINGS PRE-ENTRENADOS vs ARQUITECTURAS")
print("="*100)

# Ordenar por F1-Score
results_sorted = results_df.sort_values('Test_F1', ascending=False)

print("\n MEJORES MODELOS (por F1-Score):")
print("-"*100)
top_10 = results_sorted.head(10)[['Architecture', 'Embedding', 'Embedding_Dim', 
                                   'Coverage', 'Test_Accuracy', 'Test_F1']]
print(top_10.to_string(index=False))

print("\n" + "-"*100)
print("\nPEORES MODELOS:")
print("-"*100)
worst_5 = results_sorted.tail(5)[['Architecture', 'Embedding', 'Embedding_Dim',
                                   'Coverage', 'Test_Accuracy', 'Test_F1']]
print(worst_5.to_string(index=False))

# ============================================================================
# ANÁLISIS POR ARQUITECTURA
# ============================================================================
print("\n\n" + "="*100)
print("ANÁLISIS POR ARQUITECTURA")
print("="*100)

for arch_name in architectures.keys():
    arch_results = results_df[results_df['Architecture'] == arch_name]
    
    print(f"\n  {arch_name}")
    print("-"*100)
    print(f"  Accuracy promedio:  {arch_results['Test_Accuracy'].mean():.4f} ± {arch_results['Test_Accuracy'].std():.4f}")
    print(f"  F1-Score promedio:  {arch_results['Test_F1'].mean():.4f} ± {arch_results['Test_F1'].std():.4f}")
    print(f"  Mejor embedding:    {arch_results.loc[arch_results['Test_F1'].idxmax(), 'Embedding']}")
    print(f"  Peor embedding:     {arch_results.loc[arch_results['Test_F1'].idxmin(), 'Embedding']}")
    print(f"  Tiempo promedio:    {arch_results['Training_Time'].mean():.2f}s")

print("\n\n" + "="*100)
print("ANÁLISIS POR TIPO DE EMBEDDING")
print("="*100)

for model_name in embeddings.keys():
    emb_results = results_df[results_df['Embedding'] == model_name]
    
    print(f"\n {model_name} ({pretrained_models[model_name]}D)")
    print("-"*100)
    print(f"  Cobertura vocabulario: {coverage_stats[model_name]:.1f}%")
    print(f"  Accuracy promedio:     {emb_results['Test_Accuracy'].mean():.4f} ± {emb_results['Test_Accuracy'].std():.4f}")
    print(f"  F1-Score promedio:     {emb_results['Test_F1'].mean():.4f} ± {emb_results['Test_F1'].std():.4f}")
    print(f"  Mejor arquitectura:    {emb_results.loc[emb_results['Test_F1'].idxmax(), 'Architecture']}")

print("\n\n" + "="*100)
print("IMPACTO DE LA DIMENSIONALIDAD")
print("="*100)

dim_analysis = results_df.groupby('Embedding_Dim').agg({
    'Test_Accuracy': ['mean', 'std', 'max'],
    'Test_F1': ['mean', 'std', 'max'],
    'Training_Time': 'mean'
}).round(4)

print("\n", dim_analysis)

best_model = results_sorted.iloc[0]
worst_model = results_sorted.iloc[-1]

print(f"""
MEJOR MODELO:
   • Arquitectura:  {best_model['Architecture']}
   • Embedding:     {best_model['Embedding']} ({best_model['Embedding_Dim']}D)
   • Accuracy:      {best_model['Test_Accuracy']:.4f}
   • F1-Score:      {best_model['Test_F1']:.4f}
   • Cobertura:     {best_model['Coverage']:.1f}%

PEOR MODELO:
   • Arquitectura:  {worst_model['Architecture']}
   • Embedding:     {worst_model['Embedding']} ({worst_model['Embedding_Dim']}D)
   • Accuracy:      {worst_model['Test_Accuracy']:.4f}
   • F1-Score:      {worst_model['Test_F1']:.4f}
   • Cobertura:     {worst_model['Coverage']:.1f}%
""")




RESUMEN COMPLETO - EMBEDDINGS PRE-ENTRENADOS vs ARQUITECTURAS

 MEJORES MODELOS (por F1-Score):
----------------------------------------------------------------------------------------------------
  Architecture                Embedding  Embedding_Dim  Coverage  Test_Accuracy  Test_F1
 Arch_2_Medium word2vec-google-news-300            300 83.694150       0.992509 0.992495
Arch_1_Shallow word2vec-google-news-300            300 83.694150       0.988764 0.988721
Arch_1_Shallow  glove-wiki-gigaword-200            200 86.931631       0.981273 0.981160
 Arch_2_Medium  glove-wiki-gigaword-100            100 86.931631       0.977528 0.977361
 Arch_2_Medium   glove-wiki-gigaword-50             50 86.931631       0.973783 0.973715
 Arch_2_Medium  glove-wiki-gigaword-200            200 86.931631       0.973783 0.973407
Arch_1_Shallow  glove-wiki-gigaword-100            100 86.931631       0.970037 0.969659
Arch_1_Shallow   glove-wiki-gigaword-50             50 86.931631       0.966292 0.965955
 

## Conclusiones - Punto 4: Embeddings Pre-entrenados

### Mejor y Peor Modelo

**Mejor Modelo:**
- Arquitectura: Arch_2_Medium
- Embedding: word2vec-google-news-300 (300D)
- Test Accuracy: 0.9925
- F1-Score: 0.9925
- Cobertura vocabulario: 83.7%

**Peor Modelo:**
- Arquitectura: Arch_3_Deep
- Embedding: glove-wiki-gigaword-200 (200D)
- Test Accuracy: 0.5955
- F1-Score: 0.4445

---

### Hallazgos Principales

#### 1. Impacto de la Arquitectura
- Arch_1_Shallow y Arch_2_Medium logran accuracy cercana al 99%
- Arch_3_Deep colapsa con ~59% accuracy, prediciendo solo la clase mayoritaria (Arthur Conan Doyle)
- Arquitecturas simples generalizan mejor con 1260 training samples
- El ratio parámetros/samples de Arch_3 (200:1) es excesivo para este dataset

#### 2. Dimensionalidad de Embeddings
- No existe relación lineal entre dimensionalidad y performance
- Word2Vec-300D alcanza el mejor resultado (99.25% accuracy)
- GloVe-50D y GloVe-100D también funcionan bien (97-98%)
- Mayor dimensión no garantiza mejor clasificación

#### 3. Cobertura de Vocabulario
- Word2Vec (Google News): 83.7% de cobertura
- GloVe (Wikipedia): 86.9% de cobertura
- Mayor cobertura no implica mejor accuracy
- Vocabulario literario del siglo XIX difiere de corpus modernos (Google News, Wikipedia)

#### 4. Problema Específico de Arch_3_Deep
- 250K+ parámetros entrenables vs 1260 samples
- Overfitting desde la primera época
- Early stopping se activa en épocas 1-3
- BatchNormalization + Dropout alto + poco dato genera inestabilidad
- En test set solo predice "Arthur Conan Doyle" (clase mayoritaria)

#### 5. Eficiencia Computacional
- Embeddings congelados (trainable=False) aceleran entrenamiento
- Tiempo de entrenamiento: Arch_1 < Arch_2 < Arch_3
- Modelos simples convergen más rápido y mejor

---

### Comparación: Embeddings Propios vs Pre-entrenados

#### Embeddings Pre-entrenados
- Convergencia más rápida
- Aprovechan conocimiento de corpus grandes (Google News, Wikipedia)
- No requieren fase de entrenamiento de embeddings
- Mejor performance en este problema (99% vs 95-96%)

#### Embeddings Propios (Punto 3)
- 100% cobertura del vocabulario del dataset
- Capturan contexto específico de autores literarios
- Representan bien palabras raras del siglo XIX
- Sin sesgos de corpus modernos

---

**Conclusión general:**
- La arquitectura impacta más que la fuente de embeddings
- Simplicidad > Complejidad en datasets pequeños
- Embeddings pre-entrenados funcionan excelentemente para este problema
- Transfer learning es efectivo incluso con vocabulario literario histórico