# Práctica 4: Transfer Learning con Modelos Pre-entrenados para MNIST

En esta práctica implementaremos transfer learning utilizando modelos pre-entrenados de TensorFlow para clasificar dígitos MNIST. Compararemos diferentes arquitecturas (VGG16, ResNet50, MobileNetV2) con un modelo CNN básico para evaluar la efectividad del transfer learning.

## Objetivos:
1. Implementar transfer learning con modelos de tf.keras.applications
2. Adaptar modelos pre-entrenados en ImageNet para MNIST
3. Comparar rendimiento vs modelo CNN desde cero
4. Aplicar fine-tuning para mejorar resultados
5. Analizar arquitecturas y características de cada modelo

In [None]:
# ============================================================================================
# SECCIÓN 1: IMPORTACIÓN DE LIBRERÍAS Y CONFIGURACIÓN
# ============================================================================================

import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers, models
from tensorflow.keras.datasets import mnist
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.applications import VGG16, ResNet50, MobileNetV2
from tensorflow.keras.optimizers import Adam
from sklearn.metrics import classification_report, confusion_matrix
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')

# Configuración para reproducibilidad
np.random.seed(42)
tf.random.set_seed(42)

# Verificar GPU
print(f"GPU disponible: {tf.config.list_physical_devices('GPU')}")
print(f"Versión de TensorFlow: {tf.__version__}")


In [None]:
# ============================================================================================
# SECCIÓN 2: CARGA Y PREPROCESAMIENTO DE DATOS
# ============================================================================================

# Cargar MNIST
print("Cargando dataset MNIST...")
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

# Normalizar píxeles [0, 255] -> [0, 1]
train_images = train_images.astype('float32') / 255.0
test_images = test_images.astype('float32') / 255.0

# Adaptación para modelos pre-entrenados:
# 1. Convertir a RGB (MNIST es escala de grises)
# 2. Redimensionar a 32x32 (mínimo para modelos pre-entrenados)
train_images_rgb = np.stack([train_images] * 3, axis=-1)
test_images_rgb = np.stack([test_images] * 3, axis=-1)

train_images_resized = tf.image.resize(train_images_rgb, [32, 32]).numpy()
test_images_resized = tf.image.resize(test_images_rgb, [32, 32]).numpy()

# Convertir etiquetas a one-hot encoding
train_labels_categorical = to_categorical(train_labels, 10)
test_labels_categorical = to_categorical(test_labels, 10)

print(f"\nDimensiones de los datos:")
print(f"  Train images: {train_images_resized.shape}")
print(f"  Test images: {test_images_resized.shape}")
print(f"  Train labels: {train_labels_categorical.shape}")
print(f"  Test labels: {test_labels_categorical.shape}")

# Crear conjunto de validación
validation_split = 0.1
print(f"\nUsando {validation_split*100}% de datos para validación")

In [None]:
# ============================================================================================
# SECCIÓN 3: DEFINICIÓN DE ARQUITECTURAS DE MODELOS
# ============================================================================================

def create_transfer_vgg16():
    """
    VGG16 con Transfer Learning
    - 16 capas (13 convolucionales + 3 densas)
    - Pre-entrenado en ImageNet
    - Capas convolucionales congeladas
    """
    base_model = VGG16(
        weights='imagenet',
        include_top=False,
        input_shape=(32, 32, 3)
    )
    
    # Congelar capas base
    base_model.trainable = False
    
    model = models.Sequential([
        base_model,
        layers.GlobalAveragePooling2D(),
        layers.Dropout(0.5),
        layers.Dense(256, activation='relu', name='dense_256'),
        layers.Dropout(0.3),
        layers.Dense(10, activation='softmax', name='output')
    ], name='VGG16_Transfer')
    
    return model

def create_transfer_resnet50():
    """
    ResNet50 con Transfer Learning
    - 50 capas con bloques residuales
    - Pre-entrenado en ImageNet
    - Incluye BatchNormalization
    """
    base_model = ResNet50(
        weights='imagenet',
        include_top=False,
        input_shape=(32, 32, 3)
    )
    
    base_model.trainable = False
    
    model = models.Sequential([
        base_model,
        layers.GlobalAveragePooling2D(),
        layers.Dropout(0.5),
        layers.Dense(256, activation='relu', name='dense_256'),
        layers.BatchNormalization(),
        layers.Dropout(0.3),
        layers.Dense(10, activation='softmax', name='output')
    ], name='ResNet50_Transfer')
    
    return model

def create_transfer_mobilenet():
    """
    MobileNetV2 con Transfer Learning
    - Arquitectura optimizada para móviles
    - Menor número de parámetros
    - Separable convolutions
    """
    base_model = MobileNetV2(
        weights='imagenet',
        include_top=False,
        input_shape=(32, 32, 3)
    )
    
    base_model.trainable = False
    
    model = models.Sequential([
        base_model,
        layers.GlobalAveragePooling2D(),
        layers.Dropout(0.4),
        layers.Dense(128, activation='relu', name='dense_128'),
        layers.Dropout(0.2),
        layers.Dense(10, activation='softmax', name='output')
    ], name='MobileNetV2_Transfer')
    
    return model

def create_simple_cnn():
    """
    CNN básica como baseline para comparar
    - Entrenada desde cero
    - Arquitectura simple pero efectiva
    """
    model = models.Sequential([
        layers.Conv2D(32, (3, 3), activation='relu', 
                      input_shape=(32, 32, 3), padding='same'),
        layers.MaxPooling2D((2, 2)),
        layers.Conv2D(64, (3, 3), activation='relu', padding='same'),
        layers.MaxPooling2D((2, 2)),
        layers.Conv2D(128, (3, 3), activation='relu', padding='same'),
        layers.Flatten(),
        layers.Dense(256, activation='relu'),
        layers.Dropout(0.5),
        layers.Dense(10, activation='softmax')
    ], name='CNN_Baseline')
    
    return model

print("Arquitecturas de modelos definidas correctamente")

In [None]:
# ============================================================================================
# SECCIÓN 4: ENTRENAMIENTO Y EVALUACIÓN DE MODELOS
# ============================================================================================

# Configuración de entrenamiento
EPOCHS = 10
BATCH_SIZE = 128
PATIENCE = 3  # Para early stopping

# Callback para early stopping
early_stopping = keras.callbacks.EarlyStopping(
    monitor='val_loss',
    patience=PATIENCE,
    restore_best_weights=True
)

# Diccionario para almacenar resultados
models_dict = {}
histories = {}
training_times = {}
evaluation_results = {}

print("=" * 80)
print("ENTRENAMIENTO DE MODELOS CON TRANSFER LEARNING")
print("=" * 80)

# Lista de modelos a entrenar
model_configs = [
    ('VGG16_Transfer', create_transfer_vgg16, 0.0001),
    ('ResNet50_Transfer', create_transfer_resnet50, 0.0001),
    ('MobileNetV2_Transfer', create_transfer_mobilenet, 0.0001),
    ('CNN_Baseline', create_simple_cnn, 0.001)
]

import time

for model_name, model_func, learning_rate in model_configs:
    print(f"\n{'-' * 60}")
    print(f"ENTRENANDO: {model_name}")
    print(f"{'-' * 60}")
    
    # Crear modelo
    model = model_func()
    models_dict[model_name] = model
    
    # Compilar con optimizador apropiado
    model.compile(
        optimizer=Adam(learning_rate=learning_rate),
        loss='categorical_crossentropy',
        metrics=['accuracy']
    )
    
    # Mostrar información del modelo
    total_params = model.count_params()
    trainable_params = sum([tf.size(w).numpy() for w in model.trainable_weights])
    non_trainable_params = total_params - trainable_params
    
    print(f"\nInformación del modelo:")
    print(f"  Parámetros totales: {total_params:,}")
    print(f"  Parámetros entrenables: {trainable_params:,}")
    print(f"  Parámetros congelados: {non_trainable_params:,}")
    print(f"  % Congelados: {(non_trainable_params/total_params)*100:.1f}%")
    
    # Entrenar modelo
    start_time = time.time()
    
    history = model.fit(
        train_images_resized,
        train_labels_categorical,
        epochs=EPOCHS,
        batch_size=BATCH_SIZE,
        validation_split=validation_split,
        callbacks=[early_stopping],
        verbose=1
    )
    
    elapsed_time = time.time() - start_time
    training_times[model_name] = elapsed_time
    
    print(f"\n⏱️  Tiempo de entrenamiento: {elapsed_time/60:.2f} minutos")
    
    # Guardar historial
    histories[model_name] = history.history
    
    # Evaluar en conjunto de prueba
    test_loss, test_accuracy = model.evaluate(
        test_images_resized,
        test_labels_categorical,
        verbose=0
    )
    
    evaluation_results[model_name] = {
        'test_accuracy': test_accuracy,
        'test_loss': test_loss,
        'total_params': total_params,
        'trainable_params': trainable_params,
        'training_time': elapsed_time
    }
    
    print(f"✅ Resultados - {model_name}:")
    print(f"   Accuracy: {test_accuracy*100:.2f}%")
    print(f"   Loss: {test_loss:.4f}")

print("\n" + "=" * 80)
print("ENTRENAMIENTO COMPLETADO")
print("=" * 80)

In [None]:
# ============================================================================================
# SECCIÓN 5: FINE-TUNING DEL MEJOR MODELO
# ============================================================================================

def create_finetuned_vgg16():
    """
    VGG16 con Fine-tuning
    - Descongelar las últimas 4 capas
    - Learning rate muy bajo
    """
    base_model = VGG16(
        weights='imagenet',
        include_top=False,
        input_shape=(32, 32, 3)
    )
    
    # Estrategia de fine-tuning
    base_model.trainable = True
    
    # Congelar todas las capas excepto las últimas 4
    for layer in base_model.layers[:-4]:
        layer.trainable = False
    
    model = models.Sequential([
        base_model,
        layers.GlobalAveragePooling2D(),
        layers.Dropout(0.5),
        layers.Dense(256, activation='relu', name='dense_256'),
        layers.Dropout(0.3),
        layers.Dense(10, activation='softmax', name='output')
    ], name='VGG16_FineTuned')
    
    return model

print("\n" + "=" * 80)
print("FINE-TUNING: VGG16 con capas descongeladas")
print("=" * 80)

model_finetuned = create_finetuned_vgg16()

# Compilar con learning rate MUY bajo
model_finetuned.compile(
    optimizer=Adam(learning_rate=0.00001),
    loss='categorical_crossentropy',
    metrics=['accuracy']
)

# Mostrar información
total_params_ft = model_finetuned.count_params()
trainable_params_ft = sum([tf.size(w).numpy() for w in model_finetuned.trainable_weights])
non_trainable_params_ft = total_params_ft - trainable_params_ft

print(f"\nInformación Fine-tuning:")
print(f"  Parámetros entrenables: {trainable_params_ft:,}")
print(f"  Parámetros congelados: {non_trainable_params_ft:,}")
print(f"  % Entrenables: {(trainable_params_ft/total_params_ft)*100:.1f}%")

# Entrenar con menos épocas
start_time = time.time()

history_finetuned = model_finetuned.fit(
    train_images_resized,
    train_labels_categorical,
    epochs=5,
    batch_size=BATCH_SIZE,
    validation_split=validation_split,
    verbose=1
)

elapsed_time = time.time() - start_time
print(f"\n⏱️  Tiempo de fine-tuning: {elapsed_time/60:.2f} minutos")

# Evaluar modelo fine-tuned
test_loss_ft, test_accuracy_ft = model_finetuned.evaluate(
    test_images_resized,
    test_labels_categorical,
    verbose=0
)

print(f"✅ VGG16_FineTuned:")
print(f"   Accuracy: {test_accuracy_ft*100:.2f}%")
print(f"   Loss: {test_loss_ft:.4f}")

# Agregar a resultados
histories['VGG16_FineTuned'] = history_finetuned.history
evaluation_results['VGG16_FineTuned'] = {
    'test_accuracy': test_accuracy_ft,
    'test_loss': test_loss_ft,
    'total_params': total_params_ft,
    'trainable_params': trainable_params_ft,
    'training_time': elapsed_time
}

In [None]:
# ============================================================================================
# SECCIÓN 6: COMPARACIÓN DE RESULTADOS
# ============================================================================================

import pandas as pd

print("\n" + "=" * 80)
print("COMPARACIÓN DE RESULTADOS: TRANSFER LEARNING vs BASELINE")
print("=" * 80)

# Crear tabla comparativa
comparison_data = []

for model_name, results in evaluation_results.items():
    comparison_data.append({
        'Modelo': model_name,
        'Accuracy (%)': f"{results['test_accuracy']*100:.2f}",
        'Loss': f"{results['test_loss']:.4f}",
        'Parámetros Totales': f"{results['total_params']:,}",
        'Parámetros Entrenables': f"{results['trainable_params']:,}",
        'Tiempo (min)': f"{results['training_time']/60:.2f}"
    })

df_comparison = pd.DataFrame(comparison_data)

print("\nTabla Comparativa:")
print(df_comparison.to_string(index=False))

# Encontrar el mejor modelo
best_model_name = max(evaluation_results.keys(), 
                      key=lambda x: evaluation_results[x]['test_accuracy'])
best_accuracy = evaluation_results[best_model_name]['test_accuracy']

print(f"\n🏆 MEJOR MODELO: {best_model_name}")
print(f"   Accuracy: {best_accuracy*100:.2f}%")

# Análisis de eficiencia
print(f"\n📈 ANÁLISIS DE EFICIENCIA:")
for model_name, results in evaluation_results.items():
    params_per_accuracy = results['trainable_params'] / (results['test_accuracy'] * 100)
    print(f"  {model_name}: {params_per_accuracy:,.0f} params/accuracy%")

In [None]:
# ============================================================================================
# SECCIÓN 7: VISUALIZACIÓN DE RESULTADOS
# ============================================================================================

plt.style.use('seaborn-v0_8')

# Gráfico de comparación - Training curves
fig, axes = plt.subplots(2, 2, figsize=(16, 12))
fig.suptitle('Transfer Learning vs CNN Baseline - Curvas de Entrenamiento', 
             fontsize=16, fontweight='bold')

colors = ['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728', '#9467bd']

# Training Accuracy
ax = axes[0, 0]
for i, (name, history) in enumerate(histories.items()):
    epochs_range = range(1, len(history['accuracy']) + 1)
    ax.plot(epochs_range, history['accuracy'], 
           label=name, linewidth=2, color=colors[i % len(colors)])
ax.set_title('Training Accuracy', fontsize=14, fontweight='bold')
ax.set_xlabel('Época')
ax.set_ylabel('Accuracy')
ax.legend()
ax.grid(True, alpha=0.3)

# Validation Accuracy
ax = axes[0, 1]
for i, (name, history) in enumerate(histories.items()):
    epochs_range = range(1, len(history['val_accuracy']) + 1)
    ax.plot(epochs_range, history['val_accuracy'], 
           label=name, linewidth=2, color=colors[i % len(colors)])
ax.set_title('Validation Accuracy', fontsize=14, fontweight='bold')
ax.set_xlabel('Época')
ax.set_ylabel('Accuracy')
ax.legend()
ax.grid(True, alpha=0.3)

# Training Loss
ax = axes[1, 0]
for i, (name, history) in enumerate(histories.items()):
    epochs_range = range(1, len(history['loss']) + 1)
    ax.plot(epochs_range, history['loss'], 
           label=name, linewidth=2, color=colors[i % len(colors)])
ax.set_title('Training Loss', fontsize=14, fontweight='bold')
ax.set_xlabel('Época')
ax.set_ylabel('Loss')
ax.legend()
ax.grid(True, alpha=0.3)

# Validation Loss
ax = axes[1, 1]
for i, (name, history) in enumerate(histories.items()):
    epochs_range = range(1, len(history['val_loss']) + 1)
    ax.plot(epochs_range, history['val_loss'], 
           label=name, linewidth=2, color=colors[i % len(colors)])
ax.set_title('Validation Loss', fontsize=14, fontweight='bold')
ax.set_xlabel('Época')
ax.set_ylabel('Loss')
ax.legend()
ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig('transfer_learning_comparison.png', dpi=300, bbox_inches='tight')
plt.show()

print("\n💾 Gráfica guardada como 'transfer_learning_comparison.png'")

# Gráfico de barras - Comparación de accuracies
plt.figure(figsize=(12, 8))

model_names = list(evaluation_results.keys())
accuracies = [results['test_accuracy'] * 100 for results in evaluation_results.values()]

bars = plt.bar(model_names, accuracies, color=colors[:len(model_names)], alpha=0.8)

# Añadir valores en las barras
for bar, acc in zip(bars, accuracies):
    plt.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 0.5, 
            f'{acc:.2f}%', ha='center', va='bottom', fontweight='bold')

plt.title('Comparación de Test Accuracy - Transfer Learning vs Baseline', 
         fontsize=16, fontweight='bold')
plt.xlabel('Modelo')
plt.ylabel('Test Accuracy (%)')
plt.xticks(rotation=45, ha='right')
plt.grid(axis='y', alpha=0.3)
plt.ylim(0, 100)
plt.tight_layout()
plt.savefig('accuracy_comparison.png', dpi=300, bbox_inches='tight')
plt.show()

print("💾 Gráfica guardada como 'accuracy_comparison.png'")

In [None]:
# ============================================================================================
# SECCIÓN 8: ANÁLISIS DE FEATURE MAPS
# ============================================================================================

print("\n" + "=" * 80)
print("ANÁLISIS DE FEATURE MAPS - VGG16")
print("=" * 80)

# Seleccionar una imagen de ejemplo
example_idx = 0
example_image = test_images_resized[example_idx:example_idx+1]
example_label = test_labels[example_idx]

print(f"Analizando imagen del dígito: {example_label}")

# Extraer modelo base VGG16
vgg16_model = models_dict['VGG16_Transfer'].layers[0]

# Obtener capas convolucionales
conv_layers = [layer for layer in vgg16_model.layers if 'conv' in layer.name][:3]
layer_names = [layer.name for layer in conv_layers]

print(f"Visualizando feature maps de: {layer_names}")

# Crear modelo para extraer features
layer_outputs = [layer.output for layer in conv_layers]
activation_model = models.Model(inputs=vgg16_model.input, outputs=layer_outputs)

# Obtener activaciones
activations = activation_model.predict(example_image, verbose=0)

# Visualizar
fig, axes = plt.subplots(2, 3, figsize=(18, 12))
fig.suptitle(f'Feature Maps de VGG16 para dígito {example_label}', 
             fontsize=16, fontweight='bold')

# Imagen original
axes[0, 0].imshow(example_image[0, :, :, 0], cmap='gray')
axes[0, 0].set_title('Imagen Original', fontweight='bold')
axes[0, 0].axis('off')

# Feature maps
for i, (activation, layer_name) in enumerate(zip(activations, layer_names)):
    if i < 5:  # Mostrar solo 5 feature maps
        row = (i + 1) // 3
        col = (i + 1) % 3
        
        # Tomar primer filtro de la activación
        feature_map = activation[0, :, :, 0]
        
        axes[row, col].imshow(feature_map, cmap='viridis')
        axes[row, col].set_title(f'{layer_name}\nShape: {activation.shape[1:]}', 
                                fontweight='bold')
        axes[row, col].axis('off')

plt.tight_layout()
plt.savefig('feature_maps_vgg16.png', dpi=300, bbox_inches='tight')
plt.show()

print("💾 Feature maps guardados como 'feature_maps_vgg16.png'")

In [None]:
# ============================================================================================
# SECCIÓN 9: PREDICCIONES Y MATRIZ DE CONFUSIÓN
# ============================================================================================

print("\n" + "=" * 80)
print("EVALUACIÓN DETALLADA DEL MEJOR MODELO")
print("=" * 80)

# Usar el mejor modelo para análisis detallado
best_model = models_dict[best_model_name]
print(f"Modelo seleccionado: {best_model_name}")

# Predicciones en conjunto de test
predictions = best_model.predict(test_images_resized, verbose=0)
predicted_labels = np.argmax(predictions, axis=1)

# Reporte de clasificación
print(f"\n📊 Reporte de Clasificación:")
print(classification_report(test_labels, predicted_labels, digits=4))

# Matriz de confusión
cm = confusion_matrix(test_labels, predicted_labels)

plt.figure(figsize=(10, 8))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', 
            xticklabels=range(10), yticklabels=range(10))
plt.title(f'Matriz de Confusión - {best_model_name}', fontsize=16, fontweight='bold')
plt.xlabel('Predicción')
plt.ylabel('Etiqueta Real')
plt.tight_layout()
plt.savefig('confusion_matrix.png', dpi=300, bbox_inches='tight')
plt.show()

print("💾 Matriz de confusión guardada como 'confusion_matrix.png'")

In [None]:
# ============================================================================================
# SECCIÓN 10: EJEMPLOS DE PREDICCIONES
# ============================================================================================

print("\n" + "=" * 80)
print(f"EJEMPLOS DE PREDICCIONES - {best_model_name}")
print("=" * 80)

# Seleccionar ejemplos aleatorios
np.random.seed(42)
sample_indices = np.random.choice(len(test_images), 12, replace=False)

sample_images = test_images_resized[sample_indices]
sample_labels_true = test_labels[sample_indices]

# Predecir
sample_predictions = best_model.predict(sample_images, verbose=0)
sample_labels_pred = np.argmax(sample_predictions, axis=1)

# Visualizar
fig, axes = plt.subplots(3, 4, figsize=(16, 12))
fig.suptitle(f'Ejemplos de Predicciones - {best_model_name}', 
             fontsize=16, fontweight='bold')

for i in range(12):
    row = i // 4
    col = i % 4
    
    # Mostrar imagen original (primer canal)
    axes[row, col].imshow(sample_images[i, :, :, 0], cmap='gray')
    axes[row, col].axis('off')
    
    # Color según acierto/error
    color = 'green' if sample_labels_pred[i] == sample_labels_true[i] else 'red'
    confidence = sample_predictions[i][sample_labels_pred[i]] * 100
    
    title = f"Pred: {sample_labels_pred[i]}\nReal: {sample_labels_true[i]}\n{confidence:.1f}%"
    axes[row, col].set_title(title, color=color, fontweight='bold')

plt.tight_layout()
plt.savefig('prediction_examples.png', dpi=300, bbox_inches='tight')
plt.show()

print("💾 Ejemplos guardados como 'prediction_examples.png'")

## Conclusiones y Análisis

### 🎯 Ventajas del Transfer Learning:

1. **Reutilización de conocimiento**: Los modelos pre-entrenados en ImageNet ya tienen características básicas aprendidas (bordes, formas, patrones)

2. **Menor tiempo de entrenamiento**: Al congelar las capas base, se reduce significativamente el número de parámetros a entrenar

3. **Mejor generalización**: Los features de bajo nivel aprendidos en ImageNet son útiles para MNIST

4. **Eficiencia computacional**: Menos parámetros entrenables = menos recursos computacionales

### 📊 Comparación de Arquitecturas:

- **VGG16**: Arquitectura simple y profunda, muchos parámetros
- **ResNet50**: Bloques residuales permiten entrenar redes más profundas
- **MobileNetV2**: Optimizada para eficiencia, menos parámetros

### 🔧 Fine-tuning:

El fine-tuning permite ajustar las capas superiores del modelo base para el dominio específico, potencialmente mejorando el rendimiento a costa de mayor tiempo de entrenamiento.

### 💡 Consideraciones:

- MNIST es relativamente simple, por lo que la diferencia con CNN básica puede ser menor
- En datasets más complejos, el transfer learning mostraría ventajas más significativas
- La elección del modelo depende del balance entre precisión y eficiencia computacional