# **PRÁCTICA 3: RED NEURONAL CONVOLUCIONAL (CNN) PARA MNIST**

## **Objetivos de la práctica:**
- Crear una red convolucional simple (CNN) para clasificar MNIST
- Probar distintas arquitecturas combinando capas vistas en clase
- Comparar rendimiento y resultados de diferentes CNNs propuestas

---

In [None]:
# ====================================================================
# IMPORTACIÓN DE LIBRERÍAS
# ====================================================================

import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.metrics import classification_report, confusion_matrix
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers, models
from tensorflow.keras.datasets import mnist
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.callbacks import EarlyStopping, ReduceLROnPlateau
import time

# Configuración para reproducibilidad
np.random.seed(42)
tf.random.set_seed(42)

# Configurar matplotlib
plt.style.use('seaborn-v0_8-whitegrid')
sns.set_palette('husl')

print(f"TensorFlow versión: {tf.__version__}")
print(f"GPU disponible: {len(tf.config.list_physical_devices('GPU')) > 0}")

## **1. Carga y Preprocesamiento de Datos**

In [None]:
# Cargar MNIST
(x_train, y_train), (x_test, y_test) = mnist.load_data()

# Normalización [0, 255] -> [0, 1]
x_train = x_train.astype('float32') / 255.0
x_test = x_test.astype('float32') / 255.0

# Reshape para CNNs: (samples, height, width, channels)
x_train = x_train.reshape(-1, 28, 28, 1)
x_test = x_test.reshape(-1, 28, 28, 1)

# One-hot encoding de labels
y_train_cat = to_categorical(y_train, 10)
y_test_cat = to_categorical(y_test, 10)

print(f"X_train shape: {x_train.shape}")
print(f"Y_train shape: {y_train_cat.shape}")
print(f"X_test shape: {x_test.shape}")
print(f"Y_test shape: {y_test_cat.shape}")

In [None]:
# Visualización de algunas imágenes
fig, axes = plt.subplots(2, 5, figsize=(12, 6))
fig.suptitle('Muestra del dataset MNIST', fontsize=14, fontweight='bold')

for i in range(10):
    ax = axes[i//5, i%5]
    ax.imshow(x_train[i].reshape(28, 28), cmap='gray')
    ax.set_title(f'Clase: {y_train[i]}')
    ax.axis('off')

plt.tight_layout()
plt.show()

## **2. Diseño de Arquitecturas CNN**

Probaremos diferentes arquitecturas para comparar su rendimiento:

In [None]:
# ====================================================================
# CNN BÁSICA (Baseline)
# ====================================================================

def create_basic_cnn():
    """
    CNN básica con 2 capas convolucionales.
    Arquitectura simple para baseline.
    """
    model = models.Sequential([
        layers.Conv2D(32, (3, 3), activation='relu', 
                      input_shape=(28, 28, 1), padding='same'),
        layers.MaxPooling2D((2, 2)),
        
        layers.Conv2D(64, (3, 3), activation='relu', padding='same'),
        layers.MaxPooling2D((2, 2)),
        
        layers.Flatten(),
        layers.Dense(128, activation='relu'),
        layers.Dense(10, activation='softmax')
    ], name='CNN_Basic')
    
    return model

# ====================================================================
# CNN INTERMEDIA (Más profunda)
# ====================================================================

def create_intermediate_cnn():
    """
    CNN intermedia con 3 capas convolucionales.
    Incluye Dropout para regularización.
    """
    model = models.Sequential([
        layers.Conv2D(32, (3, 3), activation='relu', 
                      input_shape=(28, 28, 1), padding='same'),
        layers.MaxPooling2D((2, 2)),
        
        layers.Conv2D(64, (3, 3), activation='relu', padding='same'),
        layers.MaxPooling2D((2, 2)),
        
        layers.Conv2D(128, (3, 3), activation='relu', padding='same'),
        layers.MaxPooling2D((2, 2)),
        
        layers.Flatten(),
        layers.Dropout(0.5),
        layers.Dense(256, activation='relu'),
        layers.Dropout(0.3),
        layers.Dense(10, activation='softmax')
    ], name='CNN_Intermediate')
    
    return model

# ====================================================================
# CNN AVANZADA (Con Batch Normalization)
# ====================================================================

def create_advanced_cnn():
    """
    CNN avanzada con más capas y Batch Normalization.
    Arquitectura más compleja y robusta.
    """
    model = models.Sequential([
        layers.Conv2D(64, (3, 3), activation='relu', 
                      input_shape=(28, 28, 1), padding='same'),
        layers.BatchNormalization(),
        layers.MaxPooling2D((2, 2)),
        
        layers.Conv2D(128, (3, 3), activation='relu', padding='same'),
        layers.BatchNormalization(),
        layers.MaxPooling2D((2, 2)),
        
        layers.Conv2D(256, (3, 3), activation='relu', padding='same'),
        layers.BatchNormalization(),
        layers.MaxPooling2D((2, 2)),
        
        layers.Flatten(),
        layers.Dropout(0.5),
        layers.Dense(512, activation='relu'),
        layers.BatchNormalization(),
        layers.Dropout(0.3),
        layers.Dense(10, activation='softmax')
    ], name='CNN_Advanced')
    
    return model

# ====================================================================
# CNN COMPACTA (Menos parámetros, más eficiente)
# ====================================================================

def create_compact_cnn():
    """
    CNN compacta con menos parámetros pero bien optimizada.
    Incluye separable convolutions para eficiencia.
    """
    model = models.Sequential([
        layers.Conv2D(32, (5, 5), activation='relu', 
                      input_shape=(28, 28, 1), padding='same'),
        layers.BatchNormalization(),
        layers.MaxPooling2D((2, 2)),
        
        layers.SeparableConv2D(64, (3, 3), activation='relu', padding='same'),
        layers.BatchNormalization(),
        layers.MaxPooling2D((2, 2)),
        
        layers.SeparableConv2D(128, (3, 3), activation='relu', padding='same'),
        layers.BatchNormalization(),
        
        layers.GlobalAveragePooling2D(),
        layers.Dropout(0.4),
        layers.Dense(10, activation='softmax')
    ], name='CNN_Compact')
    
    return model

## **3. Entrenamiento de Modelos CNN**

In [None]:
# Lista de modelos CNN a entrenar
cnn_models = {
    'CNN_Basic': create_basic_cnn(),
    'CNN_Intermediate': create_intermediate_cnn(),
    'CNN_Advanced': create_advanced_cnn(),
    'CNN_Compact': create_compact_cnn()
}

# Diccionario para guardar historias de entrenamiento
histories = {}

# Configuración de entrenamiento
EPOCHS = 10
BATCH_SIZE = 128

print("=" * 80)
print("ENTRENAMIENTO DE REDES NEURONALES CONVOLUCIONALES")
print("=" * 80)

In [None]:
# Entrenar cada modelo
for name, model in cnn_models.items():
    print(f"\n{'-'*80}")
    print(f"ENTRENANDO: {name}")
    print(f"{'-'*80}")
    
    # Compilar el modelo
    model.compile(
        optimizer='adam',
        loss='categorical_crossentropy',
        metrics=['accuracy']
    )
    
    # Mostrar resumen del modelo
    print(f"\nArquitectura de {name}:")
    model.summary()
    
    # Callbacks para mejores resultados
    callbacks = [
        EarlyStopping(monitor='val_accuracy', patience=3, restore_best_weights=True),
        ReduceLROnPlateau(monitor='val_loss', factor=0.5, patience=2)
    ]
    
    # Medir tiempo de entrenamiento
    start_time = time.time()
    
    # Entrenar el modelo
    history = model.fit(
        x_train, y_train_cat,
        validation_split=0.1,
        epochs=EPOCHS,
        batch_size=BATCH_SIZE,
        callbacks=callbacks,
        verbose=1
    )
    
    training_time = time.time() - start_time
    print(f"\n⏱️  Tiempo de entrenamiento: {training_time/60:.2f} minutos")
    
    # Guardar historia
    histories[name] = history.history
    
    # Evaluar en conjunto de test
    test_loss, test_acc = model.evaluate(x_test, y_test_cat, verbose=0)
    
    print(f"\n📊 RESULTADOS FINALES para {name}:")
    print(f"   Test Accuracy: {test_acc:.4f} ({test_acc*100:.2f}%)")
    print(f"   Test Loss: {test_loss:.4f}")
    print(f"   Parámetros totales: {model.count_params():,}")

## **4. Análisis y Comparación de Resultados**

In [None]:
# Visualización de las curvas de entrenamiento
fig, axes = plt.subplots(2, 2, figsize=(15, 10))
fig.suptitle('Comparación de Arquitecturas CNN', fontsize=16, fontweight='bold')

# Training Accuracy
ax = axes[0, 0]
for name, history in histories.items():
    ax.plot(history['accuracy'], label=name)
ax.set_title('Accuracy de Entrenamiento')
ax.set_xlabel('Época')
ax.set_ylabel('Accuracy')
ax.legend()
ax.grid(True, alpha=0.3)

# Training Loss
ax = axes[0, 1]
for name, history in histories.items():
    ax.plot(history['loss'], label=name)
ax.set_title('Loss de Entrenamiento')
ax.set_xlabel('Época')
ax.set_ylabel('Loss')
ax.legend()
ax.grid(True, alpha=0.3)

# Validation Accuracy
ax = axes[1, 0]
for name, history in histories.items():
    ax.plot(history['val_accuracy'], label=name)
ax.set_title('Accuracy de Validación')
ax.set_xlabel('Época')
ax.set_ylabel('Accuracy')
ax.legend()
ax.grid(True, alpha=0.3)

# Validation Loss
ax = axes[1, 1]
for name, history in histories.items():
    ax.plot(history['val_loss'], label=name)
ax.set_title('Loss de Validación')
ax.set_xlabel('Época')
ax.set_ylabel('Loss')
ax.legend()
ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

In [None]:
# Tabla comparativa de resultados
comparison_data = []

for name, model in cnn_models.items():
    test_loss, test_acc = model.evaluate(x_test, y_test_cat, verbose=0)
    total_params = model.count_params()
    
    comparison_data.append({
        'Modelo': name,
        'Test Accuracy': f"{test_acc:.4f}",
        'Test Loss': f"{test_loss:.4f}",
        'Parámetros': f"{total_params:,}",
        'Accuracy %': f"{test_acc*100:.2f}%"
    })

print("\n" + "=" * 100)
print("TABLA COMPARATIVA - REDES NEURONALES CONVOLUCIONALES")
print("=" * 100)
print(f"{' ' * 5}Modelo{' ' * 10}Test Accuracy{' ' * 5}Test Loss{' ' * 5}Parámetros{' ' * 5}Accuracy %")
print("-" * 100)

for data in comparison_data:
    print(f"{data['Modelo']:15} {data['Test Accuracy']:>12} {data['Test Loss']:>12} {data['Parámetros']:>12} {data['Accuracy %']:>12}")

print("-" * 100)

## **5. Análisis Detallado del Mejor Modelo**

In [None]:
# Seleccionar el mejor modelo basado en test accuracy
test_accs = {name: model.evaluate(x_test, y_test_cat, verbose=0)[1] for name, model in cnn_models.items()}
best_model_name = max(test_accs, key=test_accs.get)
best_model = cnn_models[best_model_name]

print(f"Mejor modelo: {best_model_name} con accuracy: {test_accs[best_model_name]:.4f}")

# Predicciones del mejor modelo
y_pred = best_model.predict(x_test)
y_pred_classes = np.argmax(y_pred, axis=1)

# Classification Report
print(f"\nClassification Report para {best_model_name}:")
print(classification_report(y_test, y_pred_classes))

# Matriz de Confusión
cm = confusion_matrix(y_test, y_pred_classes)
plt.figure(figsize=(10, 8))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', 
            xticklabels=range(10), yticklabels=range(10))
plt.title(f'Matriz de Confusión - {best_model_name}', fontsize=14, fontweight='bold')
plt.ylabel('Clase Real')
plt.xlabel('Predicción')
plt.tight_layout()
plt.show()

## **6. Visualización de Feature Maps**

Vamos a ver qué aprenden las capas convolucionales:

In [None]:
# Extraer la primera imagen de test
test_image = x_test[0:1]  # Shape: (1, 28, 28, 1)
test_label = y_test[0]

# Crear un modelo para visualizar feature maps
def visualize_feature_maps(model, test_image, test_label):
    # Obtener las capas convolucionales
    conv_layers = []
    layer_names = []
    
    for layer in model.layers:
        if isinstance(layer, (layers.Conv2D, layers.SeparableConv2D)):
            conv_layers.append(layer.output)
            layer_names.append(layer.name)
    
    # Crear modelo para extraer activaciones
    if conv_layers:
        activation_model = models.Model(inputs=model.input, outputs=conv_layers)
        activations = activation_model.predict(test_image, verbose=0)
        
        # Visualizar los feature maps de las 3 primeras capas convolucionales
        n_cols = min(3, len(activations))
        fig, axes = plt.subplots(1, n_cols, figsize=(15, 5))
        
        if n_cols == 1:
            axes = [axes]
            
        fig.suptitle(f'Feature Maps de {model.name} - Dígito {test_label}', 
                     fontsize=14, fontweight='bold')
        
        for i in range(n_cols):
            # Obtener los primeros 8 filtros de la capa
            activation = activations[i][0]  # Primera imagen del batch
            n_filters = min(8, activation.shape[-1])
            
            # Crear subgrid para mostrar múltiples filtros
            ax = axes[i]
            feature_map = activation[:, :, 0]  # Primer filtro
            
            ax.imshow(feature_map, cmap='viridis')
            ax.set_title(f'{layer_names[i]}\nShape: {activation.shape}')
            ax.axis('off')
        
        plt.tight_layout()
        plt.show()

print(f"\nVisualizando feature maps del mejor modelo: {best_model_name}")
visualize_feature_maps(best_model, test_image, test_label)

## **7. Análisis de Error y Predicciones**

In [None]:
# Análisis de errores del mejor modelo
y_pred = best_model.predict(x_test)
y_pred_classes = np.argmax(y_pred, axis=1)

# Encontrar imágenes clasificadas incorrectamente
incorrect_indices = np.where(y_pred_classes != y_test)[0]

print(f"Imágenes clasificadas incorrectamente: {len(incorrect_indices)} de {len(y_test)}")
print(f"Accuracy: {1 - len(incorrect_indices)/len(y_test):.4f}")

# Mostrar algunos errores
fig, axes = plt.subplots(2, 5, figsize=(15, 6))
fig.suptitle(f'Predicciones Incorrectas - {best_model_name}', 
             fontsize=14, fontweight='bold')

for i in range(min(10, len(incorrect_indices))):
    idx = incorrect_indices[i]
    ax = axes[i//5, i%5]
    
    ax.imshow(x_test[idx].reshape(28, 28), cmap='gray')
    predicted_class = y_pred_classes[idx]
    true_class = y_test[idx]
    confidence = np.max(y_pred[idx]) * 100
    
    ax.set_title(f'Real: {true_class}\nPred: {predicted_class}\nConf: {confidence:.1f}%',
                color='red')
    ax.axis('off')

plt.tight_layout()
plt.show()

## **8. Comparación CNN vs MLPs**

Comparamos nuestros modelos CNN con los MLPs de las prácticas anteriores:

In [None]:
# Crear un MLP simple para comparar
mlp_model = models.Sequential([
    layers.Flatten(input_shape=(28, 28, 1)),
    layers.Dense(256, activation='relu'),
    layers.Dropout(0.3),
    layers.Dense(128, activation='relu'),
    layers.Dense(10, activation='softmax')
], name='MLP_Comparison')

mlp_model.compile(
    optimizer='adam',
    loss='categorical_crossentropy',
    metrics=['accuracy']
)

print("Entrenando MLP para comparación...")
mlp_history = mlp_model.fit(
    x_train, y_train_cat,
    validation_split=0.1,
    epochs=10,
    batch_size=128,
    verbose=0
)

mlp_test_loss, mlp_test_acc = mlp_model.evaluate(x_test, y_test_cat, verbose=0)
best_cnn_acc = max(test_accs.values())

print(f"\n📊 CNN vs MLP COMPARISON:")
print(f"-" * 50)
print(f"Mejor CNN Accuracy: {best_cnn_acc:.4f} ({best_cnn_acc*100:.2f}%)")
print(f"MLP Accuracy:       {mlp_test_acc:.4f} ({mlp_test_acc*100:.2f}%)")
print(f"Mejora CNN:         +{(best_cnn_acc-mlp_test_acc)*100:.2f} puntos")
print(f"-" * 50)

## **9. Conclusiones y Análisis**

In [None]:
print("\n" + "=" * 80)
print("CONCLUSIONES DE LA PRÁCTICA 3 - CNNs")
print("=" * 80)

print("🔍 ANÁLISIS DE ARQUITECTURAS:")
print("-" * 40)
print("✓ CNN Básica: Arquitectura simple, buen rendimiento base")
print("✓ CNN Intermedia: Mayor profundidad, regularización con Dropout")
print("✓ CNN Avanzada: Batch Normalization, más filtros, mejor convergencia")
print("✓ CNN Compacta: Separable convolutions, menos parámetros, eficiente")

print("📊 VENTAJAS DE LAS CNNs:")
print("-" * 30)
print("✓ Invarianza a traslación: filtros detectan patrones en cualquier posición")
print("✓ Compartición de parámetros: menos parámetros que MLPs densas")
print("✓ Jerarquía de features: de bordes simples a patrones complejos")
print("✓ Pooling: reduce dimensionalidad manteniendo información importante")

print("🏗️ COMPONENTES CLAVE:")
print("-" * 25)
print("✓ Capas Convolucionales: extracción de características locales")
print("✓ MaxPooling: reducción de dimensionalidad e invarianza")
print("✓ Batch Normalization: estabiliza entrenamiento, acelera convergencia")
print("✓ Dropout: previene overfitting, mejora generalización")
print("✓ GlobalAveragePooling: alternativa eficiente a Flatten + Dense")

print("🎯 RESULTADOS FINALES:")
print("-" * 25)
for name, acc in test_accs.items():
    print(f"✓ {name}: {acc*100:.2f}% accuracy")
print(f"✅ Mejor modelo: {best_model_name} ({test_accs[best_model_name]*100:.2f}% accuracy)")

## **10. Experimento Adicional: CNN con Data Augmentation**

In [None]:
# Crear un modelo con data augmentation
augmented_model = models.Sequential([
    # Data augmentation layers
    layers.RandomRotation(0.1, input_shape=(28, 28, 1)),
    layers.RandomTranslation(0.1, 0.1),
    layers.RandomZoom(0.1),
    
    # CNN layers
    layers.Conv2D(64, (3, 3), activation='relu', padding='same'),
    layers.BatchNormalization(),
    layers.MaxPooling2D((2, 2)),
    
    layers.Conv2D(128, (3, 3), activation='relu', padding='same'),
    layers.BatchNormalization(),
    layers.MaxPooling2D((2, 2)),
    
    layers.Conv2D(256, (3, 3), activation='relu', padding='same'),
    layers.BatchNormalization(),
    
    layers.GlobalAveragePooling2D(),
    layers.Dropout(0.5),
    layers.Dense(10, activation='softmax')
], name='CNN_Augmented')

augmented_model.compile(
    optimizer='adam',
    loss='categorical_crossentropy',
    metrics=['accuracy']
)

print("Entrenando CNN con Data Augmentation...")
print("(Nota: Data augmentation se aplica solo durante entrenamiento)")

aug_history = augmented_model.fit(
    x_train, y_train_cat,
    validation_split=0.1,
    epochs=8,
    batch_size=128,
    verbose=1
)

aug_test_loss, aug_test_acc = augmented_model.evaluate(x_test, y_test_cat, verbose=0)

print(f"\n📊 RESULTADOS CON DATA AUGMENTATION:")
print(f"CNN con Augmentation: {aug_test_acc:.4f} ({aug_test_acc*100:.2f}%)")
print(f"Mejora vs mejor CNN:  {(aug_test_acc-best_cnn_acc)*100:.2f} puntos")

---

# **RESUMEN DE LA PRÁCTICA 3**

## ✅ **Objetivos Completados:**

1. ✅ **CNN Simple**: Implementada y probada con 2 capas convolucionales
2. ✅ **Distintas Arquitecturas**: 4 arquitecturas diferentes comparadas
3. ✅ **Comparación de Rendimiento**: Análisis detallado de cada modelo
4. ✅ **Feature Maps**: Visualización de lo que aprenden las capas
5. ✅ **Data Augmentation**: Experimento adicional para mejorar resultados

## 🔍 **Principales Hallazgos:**

- **Batch Normalization** acelera convergencia y estabiliza entrenamiento
- **Dropout** previene overfitting y mejora generalización
- **Data Augmentation** puede mejorar robustez sin más parámetros
- **CNNs superan a MLPs** para datos con estructura espacial como imágenes
- **Separable Convolutions** reducen parámetros manteniendo rendimiento

## 📊 **Métricas Finales:**

- Accuracy típica en MNIST: **98%+**
- Reducción significativa de parámetros vs MLPs densas
- Mejor interpretabilidad a través de feature maps

---