# Tema 14: Librer√≠as de Big Data e IA - Parte 2
## Teor√≠a y Ejemplos

### Contenido
5. Scikit-learn - Machine Learning
6. TensorFlow/Keras - Deep Learning
7. PyTorch - Deep Learning Avanzado

---

## 5. Scikit-learn - Machine Learning

**Scikit-learn** es la librer√≠a m√°s popular para Machine Learning en Python.

### ¬øPor qu√© Scikit-learn?
- **Completo:** Algoritmos de clasificaci√≥n, regresi√≥n, clustering, etc.
- **Consistente:** API uniforme para todos los algoritmos
- **Documentaci√≥n:** Excelente documentaci√≥n y ejemplos
- **Integraci√≥n:** Funciona con NumPy, Pandas y Matplotlib
- **Producci√≥n:** C√≥digo optimizado y listo para producci√≥n

### Instalaci√≥n
```bash
pip install scikit-learn
```

### 5.1. Flujo de Trabajo de Machine Learning

```python
1. Cargar y explorar datos
2. Preparar datos (limpieza, normalizaci√≥n)
3. Dividir en train/test
4. Entrenar modelo
5. Hacer predicciones
6. Evaluar modelo
7. Ajustar hiperpar√°metros
8. Modelo final
```

### 5.2. Regresi√≥n Lineal

In [None]:
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score, mean_absolute_error

# Preparar datos
X = tips[['total_bill']].values  # Variable independiente
y = tips['tip'].values           # Variable dependiente

# Dividir en train/test (80/20)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

print(f"Datos de entrenamiento: {X_train.shape[0]}")
print(f"Datos de prueba: {X_test.shape[0]}")

# Crear y entrenar modelo
modelo = LinearRegression()
modelo.fit(X_train, y_train)

print(f"\nCoeficiente (pendiente): {modelo.coef_[0]:.4f}")
print(f"Intercepto: {modelo.intercept_:.4f}")

In [None]:
# Hacer predicciones
y_pred = modelo.predict(X_test)

# Evaluar modelo
mse = mean_squared_error(y_test, y_pred)
rmse = np.sqrt(mse)
mae = mean_absolute_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print("=== M√âTRICAS DE EVALUACI√ìN ===")
print(f"MSE (Error Cuadr√°tico Medio): {mse:.4f}")
print(f"RMSE (Ra√≠z del MSE): {rmse:.4f}")
print(f"MAE (Error Absoluto Medio): {mae:.4f}")
print(f"R¬≤ Score: {r2:.4f}")
print(f"\nInterpretaci√≥n: El modelo explica el {r2*100:.1f}% de la varianza")

In [None]:
# Visualizar resultados
plt.figure(figsize=(14, 6))

# Gr√°fico 1: L√≠nea de regresi√≥n
plt.subplot(1, 2, 1)
plt.scatter(X_test, y_test, alpha=0.6, label='Datos reales')
plt.plot(X_test, y_pred, color='red', linewidth=2, label='Predicci√≥n')
plt.xlabel('Cuenta Total ($)')
plt.ylabel('Propina ($)')
plt.title('Regresi√≥n Lineal: Cuenta vs Propina')
plt.legend()
plt.grid(True, alpha=0.3)

# Gr√°fico 2: Predicciones vs valores reales
plt.subplot(1, 2, 2)
plt.scatter(y_test, y_pred, alpha=0.6)
plt.plot([y_test.min(), y_test.max()], 
         [y_test.min(), y_test.max()], 
         'r--', linewidth=2, label='Predicci√≥n perfecta')
plt.xlabel('Valores Reales')
plt.ylabel('Predicciones')
plt.title('Valores Reales vs Predicciones')
plt.legend()
plt.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

### 5.3. Clasificaci√≥n - Regresi√≥n Log√≠stica

In [None]:
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
from sklearn.preprocessing import LabelEncoder

# Cargar dataset de iris
from sklearn.datasets import load_iris
iris = load_iris()

# Crear DataFrame para mejor visualizaci√≥n
iris_df = pd.DataFrame(
    data=iris.data,
    columns=iris.feature_names
)
iris_df['species'] = iris.target
iris_df['species_name'] = iris_df['species'].map(
    {0: 'setosa', 1: 'versicolor', 2: 'virginica'}
)

print("Dataset Iris (primeras 10 filas):")
print(iris_df.head(10))

print(f"\nForma: {iris_df.shape}")
print(f"\nDistribuci√≥n de especies:")
print(iris_df['species_name'].value_counts())

In [None]:
# Preparar datos
X = iris.data
y = iris.target

# Dividir datos
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42, stratify=y
)

# Entrenar modelo
modelo_clf = LogisticRegression(max_iter=200)
modelo_clf.fit(X_train, y_train)

# Predicciones
y_pred = modelo_clf.predict(X_test)

# Evaluar
accuracy = accuracy_score(y_test, y_pred)
print(f"Precisi√≥n (Accuracy): {accuracy:.4f} ({accuracy*100:.2f}%)")

print("\n=== REPORTE DE CLASIFICACI√ìN ===")
print(classification_report(y_test, y_pred, target_names=iris.target_names))

In [None]:
# Matriz de confusi√≥n
cm = confusion_matrix(y_test, y_pred)

plt.figure(figsize=(10, 8))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues',
            xticklabels=iris.target_names,
            yticklabels=iris.target_names)
plt.title('Matriz de Confusi√≥n - Clasificaci√≥n de Iris', 
          fontsize=14, fontweight='bold')
plt.ylabel('Valor Real')
plt.xlabel('Predicci√≥n')
plt.show()

### 5.4. √Årboles de Decisi√≥n

In [None]:
from sklearn.tree import DecisionTreeClassifier, plot_tree

# Entrenar √°rbol de decisi√≥n
tree_model = DecisionTreeClassifier(
    max_depth=3, 
    random_state=42
)
tree_model.fit(X_train, y_train)

# Predicciones
y_pred_tree = tree_model.predict(X_test)
accuracy_tree = accuracy_score(y_test, y_pred_tree)

print(f"Precisi√≥n del √Årbol: {accuracy_tree:.4f} ({accuracy_tree*100:.2f}%)")

# Visualizar √°rbol
plt.figure(figsize=(20, 10))
plot_tree(tree_model, 
          feature_names=iris.feature_names,
          class_names=iris.target_names,
          filled=True,
          rounded=True,
          fontsize=10)
plt.title('√Årbol de Decisi√≥n - Clasificaci√≥n de Iris', 
          fontsize=16, fontweight='bold')
plt.show()

### 5.5. Random Forest

In [None]:
from sklearn.ensemble import RandomForestClassifier

# Entrenar Random Forest
rf_model = RandomForestClassifier(
    n_estimators=100,  # N√∫mero de √°rboles
    max_depth=5,
    random_state=42
)
rf_model.fit(X_train, y_train)

# Predicciones
y_pred_rf = rf_model.predict(X_test)
accuracy_rf = accuracy_score(y_test, y_pred_rf)

print(f"Precisi√≥n del Random Forest: {accuracy_rf:.4f} ({accuracy_rf*100:.2f}%)")

# Importancia de caracter√≠sticas
importancias = rf_model.feature_importances_
indices = np.argsort(importancias)[::-1]

print("\n=== IMPORTANCIA DE CARACTER√çSTICAS ===")
for i, idx in enumerate(indices):
    print(f"{i+1}. {iris.feature_names[idx]}: {importancias[idx]:.4f}")

In [None]:
# Visualizar importancia de caracter√≠sticas
plt.figure(figsize=(10, 6))
plt.barh(range(len(importancias)), importancias[indices], color='steelblue')
plt.yticks(range(len(importancias)), 
           [iris.feature_names[i] for i in indices])
plt.xlabel('Importancia')
plt.title('Importancia de Caracter√≠sticas - Random Forest', 
          fontsize=14, fontweight='bold')
plt.grid(axis='x', alpha=0.3)
plt.tight_layout()
plt.show()

### 5.6. Clustering - K-Means

In [None]:
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler

# Normalizar datos (importante para K-Means)
scaler = StandardScaler()
X_scaled = scaler.fit_transform(iris.data)

# M√©todo del codo para encontrar K √≥ptimo
inertias = []
K_range = range(1, 11)

for k in K_range:
    kmeans = KMeans(n_clusters=k, random_state=42, n_init=10)
    kmeans.fit(X_scaled)
    inertias.append(kmeans.inertia_)

# Visualizar m√©todo del codo
plt.figure(figsize=(10, 6))
plt.plot(K_range, inertias, 'bo-', linewidth=2, markersize=8)
plt.xlabel('N√∫mero de Clusters (K)')
plt.ylabel('Inercia')
plt.title('M√©todo del Codo para Determinar K √ìptimo', 
          fontsize=14, fontweight='bold')
plt.grid(True, alpha=0.3)
plt.show()

In [None]:
# Aplicar K-Means con K=3
kmeans = KMeans(n_clusters=3, random_state=42, n_init=10)
clusters = kmeans.fit_predict(X_scaled)

# Visualizar clusters (usando primeras 2 caracter√≠sticas)
plt.figure(figsize=(12, 5))

plt.subplot(1, 2, 1)
plt.scatter(X_scaled[:, 0], X_scaled[:, 1], c=clusters, 
            cmap='viridis', s=100, alpha=0.6, edgecolors='black')
plt.scatter(kmeans.cluster_centers_[:, 0], 
            kmeans.cluster_centers_[:, 1],
            c='red', s=300, marker='X', 
            edgecolors='black', linewidths=2,
            label='Centroides')
plt.xlabel(iris.feature_names[0])
plt.ylabel(iris.feature_names[1])
plt.title('K-Means Clustering (Caracter√≠sticas 0-1)')
plt.legend()
plt.grid(True, alpha=0.3)

plt.subplot(1, 2, 2)
plt.scatter(X_scaled[:, 2], X_scaled[:, 3], c=clusters, 
            cmap='viridis', s=100, alpha=0.6, edgecolors='black')
plt.scatter(kmeans.cluster_centers_[:, 2], 
            kmeans.cluster_centers_[:, 3],
            c='red', s=300, marker='X', 
            edgecolors='black', linewidths=2,
            label='Centroides')
plt.xlabel(iris.feature_names[2])
plt.ylabel(iris.feature_names[3])
plt.title('K-Means Clustering (Caracter√≠sticas 2-3)')
plt.legend()
plt.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Comparar con especies reales
from sklearn.metrics import adjusted_rand_score
ari = adjusted_rand_score(iris.target, clusters)
print(f"\nAdjusted Rand Index: {ari:.4f}")
print("(Mide similitud entre clustering y clases reales, 1=perfecto)")

### 5.7. Validaci√≥n Cruzada

In [None]:
from sklearn.model_selection import cross_val_score, cross_validate

# Validaci√≥n cruzada con diferentes modelos
modelos = {
    'Regresi√≥n Log√≠stica': LogisticRegression(max_iter=200),
    '√Årbol de Decisi√≥n': DecisionTreeClassifier(max_depth=3, random_state=42),
    'Random Forest': RandomForestClassifier(n_estimators=100, random_state=42)
}

print("=== VALIDACI√ìN CRUZADA (5-Fold) ===")
print()

resultados = {}
for nombre, modelo in modelos.items():
    scores = cross_val_score(modelo, iris.data, iris.target, 
                            cv=5, scoring='accuracy')
    resultados[nombre] = scores
    
    print(f"{nombre}:")
    print(f"  Scores: {scores}")
    print(f"  Media: {scores.mean():.4f}")
    print(f"  Desv. Std: {scores.std():.4f}")
    print()

In [None]:
# Visualizar resultados de validaci√≥n cruzada
plt.figure(figsize=(12, 6))

nombres = list(resultados.keys())
datos_box = [resultados[nombre] for nombre in nombres]

bp = plt.boxplot(datos_box, labels=nombres, patch_artist=True)
for patch, color in zip(bp['boxes'], ['lightblue', 'lightgreen', 'lightcoral']):
    patch.set_facecolor(color)

plt.ylabel('Accuracy')
plt.title('Comparaci√≥n de Modelos - Validaci√≥n Cruzada 5-Fold', 
          fontsize=14, fontweight='bold')
plt.grid(axis='y', alpha=0.3)
plt.ylim(0.8, 1.05)
plt.show()

---

## 6. TensorFlow/Keras - Deep Learning

**TensorFlow** es la librer√≠a de Google para Deep Learning. **Keras** es su API de alto nivel.

### ¬øPor qu√© TensorFlow/Keras?
- **Potencia:** Redes neuronales profundas y complejas
- **Facilidad:** Keras hace Deep Learning accesible
- **Flexibilidad:** Desde prototipos r√°pidos hasta producci√≥n
- **GPU:** Aceleraci√≥n autom√°tica con GPU
- **Comunidad:** Enorme ecosistema y recursos

### Instalaci√≥n
```bash
pip install tensorflow
```

### 6.1. Red Neuronal Simple

In [None]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers, models

print(f"TensorFlow version: {tf.__version__}")

# Preparar datos de Iris
from sklearn.preprocessing import StandardScaler

# Normalizar caracter√≠sticas
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Convertir etiquetas a formato one-hot
y_train_cat = keras.utils.to_categorical(y_train, 3)
y_test_cat = keras.utils.to_categorical(y_test, 3)

print(f"Shape de X_train: {X_train_scaled.shape}")
print(f"Shape de y_train: {y_train_cat.shape}")

In [None]:
# Crear red neuronal
model = models.Sequential([
    layers.Dense(16, activation='relu', input_shape=(4,)),
    layers.Dense(8, activation='relu'),
    layers.Dense(3, activation='softmax')
])

# Compilar modelo
model.compile(
    optimizer='adam',
    loss='categorical_crossentropy',
    metrics=['accuracy']
)

# Resumen del modelo
print("=== ARQUITECTURA DEL MODELO ===")
model.summary()

In [None]:
# Entrenar modelo
history = model.fit(
    X_train_scaled, y_train_cat,
    epochs=100,
    batch_size=16,
    validation_split=0.2,
    verbose=0  # No mostrar progreso detallado
)

print("Entrenamiento completado!")

# Evaluar modelo
test_loss, test_accuracy = model.evaluate(X_test_scaled, y_test_cat, verbose=0)
print(f"\nPrecisi√≥n en test: {test_accuracy:.4f} ({test_accuracy*100:.2f}%)")

In [None]:
# Visualizar curvas de aprendizaje
plt.figure(figsize=(14, 5))

# Precisi√≥n
plt.subplot(1, 2, 1)
plt.plot(history.history['accuracy'], label='Entrenamiento')
plt.plot(history.history['val_accuracy'], label='Validaci√≥n')
plt.xlabel('√âpoca')
plt.ylabel('Accuracy')
plt.title('Precisi√≥n del Modelo')
plt.legend()
plt.grid(True, alpha=0.3)

# P√©rdida
plt.subplot(1, 2, 2)
plt.plot(history.history['loss'], label='Entrenamiento')
plt.plot(history.history['val_loss'], label='Validaci√≥n')
plt.xlabel('√âpoca')
plt.ylabel('Loss')
plt.title('P√©rdida del Modelo')
plt.legend()
plt.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

### 6.2. Red Neuronal Convolucional (CNN) - MNIST

In [None]:
# Cargar dataset MNIST (d√≠gitos escritos a mano)
from tensorflow.keras.datasets import mnist

(X_train_mnist, y_train_mnist), (X_test_mnist, y_test_mnist) = mnist.load_data()

print(f"Shape de entrenamiento: {X_train_mnist.shape}")
print(f"Shape de test: {X_test_mnist.shape}")

# Visualizar algunos ejemplos
plt.figure(figsize=(12, 4))
for i in range(10):
    plt.subplot(2, 5, i+1)
    plt.imshow(X_train_mnist[i], cmap='gray')
    plt.title(f'Etiqueta: {y_train_mnist[i]}')
    plt.axis('off')
plt.suptitle('Ejemplos del Dataset MNIST', fontsize=14, fontweight='bold')
plt.tight_layout()
plt.show()

In [None]:
# Preprocesar datos
# Normalizar a rango [0, 1]
X_train_mnist = X_train_mnist.astype('float32') / 255.0
X_test_mnist = X_test_mnist.astype('float32') / 255.0

# A√±adir dimensi√≥n del canal
X_train_mnist = X_train_mnist.reshape(-1, 28, 28, 1)
X_test_mnist = X_test_mnist.reshape(-1, 28, 28, 1)

# One-hot encoding para etiquetas
y_train_mnist = keras.utils.to_categorical(y_train_mnist, 10)
y_test_mnist = keras.utils.to_categorical(y_test_mnist, 10)

print(f"Shape final X_train: {X_train_mnist.shape}")
print(f"Shape final y_train: {y_train_mnist.shape}")

In [None]:
# Crear CNN
cnn_model = models.Sequential([
    # Primera capa convolucional
    layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)),
    layers.MaxPooling2D((2, 2)),
    
    # Segunda capa convolucional
    layers.Conv2D(64, (3, 3), activation='relu'),
    layers.MaxPooling2D((2, 2)),
    
    # Tercera capa convolucional
    layers.Conv2D(64, (3, 3), activation='relu'),
    
    # Capas densas
    layers.Flatten(),
    layers.Dense(64, activation='relu'),
    layers.Dropout(0.5),  # Prevenir overfitting
    layers.Dense(10, activation='softmax')
])

# Compilar
cnn_model.compile(
    optimizer='adam',
    loss='categorical_crossentropy',
    metrics=['accuracy']
)

print("=== ARQUITECTURA CNN ===")
cnn_model.summary()

In [None]:
# Entrenar CNN (solo 5 √©pocas para demostraci√≥n)
history_cnn = cnn_model.fit(
    X_train_mnist[:10000], y_train_mnist[:10000],  # Subset para rapidez
    epochs=5,
    batch_size=128,
    validation_split=0.2,
    verbose=1
)

# Evaluar
test_loss, test_accuracy = cnn_model.evaluate(
    X_test_mnist[:2000], y_test_mnist[:2000], 
    verbose=0
)
print(f"\nPrecisi√≥n en test: {test_accuracy:.4f} ({test_accuracy*100:.2f}%)")

In [None]:
# Hacer predicciones
predicciones = cnn_model.predict(X_test_mnist[:10])
predicciones_clases = np.argmax(predicciones, axis=1)
verdaderos = np.argmax(y_test_mnist[:10], axis=1)

# Visualizar predicciones
plt.figure(figsize=(15, 3))
for i in range(10):
    plt.subplot(2, 5, i+1)
    plt.imshow(X_test_mnist[i].reshape(28, 28), cmap='gray')
    color = 'green' if predicciones_clases[i] == verdaderos[i] else 'red'
    plt.title(f'P: {predicciones_clases[i]}\nV: {verdaderos[i]}', color=color)
    plt.axis('off')
plt.suptitle('Predicciones CNN (Verde=Correcto, Rojo=Error)', 
             fontsize=14, fontweight='bold')
plt.tight_layout()
plt.show()

---

## 7. PyTorch - Deep Learning Avanzado

**PyTorch** es la librer√≠a de Facebook para Deep Learning, especialmente popular en investigaci√≥n.

### ¬øPor qu√© PyTorch?
- **Pythonic:** Sintaxis intuitiva y f√°cil de debuggear
- **Din√°mico:** Grafos computacionales din√°micos
- **Investigaci√≥n:** Preferido en papers acad√©micos
- **GPU:** F√°cil transici√≥n entre CPU y GPU
- **Flexibilidad:** Control total sobre el proceso

### Instalaci√≥n
```bash
pip install torch torchvision
```

### 7.1. Tensores en PyTorch

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset

print(f"PyTorch version: {torch.__version__}")
print(f"CUDA disponible: {torch.cuda.is_available()}")

# Crear tensores
tensor1 = torch.tensor([1, 2, 3, 4])
tensor2 = torch.zeros(3, 4)
tensor3 = torch.randn(2, 3)  # Distribuci√≥n normal

print("\nTensor 1:")
print(tensor1)
print(f"Shape: {tensor1.shape}, dtype: {tensor1.dtype}")

print("\nTensor 2 (ceros):")
print(tensor2)

print("\nTensor 3 (aleatorio):")
print(tensor3)

In [None]:
# Operaciones con tensores
a = torch.tensor([1.0, 2.0, 3.0])
b = torch.tensor([4.0, 5.0, 6.0])

print("Tensor a:", a)
print("Tensor b:", b)
print(f"\na + b = {a + b}")
print(f"a * b = {a * b}")
print(f"a @ b = {torch.dot(a, b)}")

# Operaciones matriciales
matriz1 = torch.randn(3, 4)
matriz2 = torch.randn(4, 2)
producto = torch.mm(matriz1, matriz2)  # Multiplicaci√≥n matricial

print(f"\nMatriz 1 shape: {matriz1.shape}")
print(f"Matriz 2 shape: {matriz2.shape}")
print(f"Producto shape: {producto.shape}")

### 7.2. Red Neuronal Simple en PyTorch

In [None]:
# Definir modelo
class RedNeuronal(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(RedNeuronal, self).__init__()
        self.fc1 = nn.Linear(input_size, hidden_size)
        self.relu = nn.ReLU()
        self.fc2 = nn.Linear(hidden_size, output_size)
    
    def forward(self, x):
        x = self.fc1(x)
        x = self.relu(x)
        x = self.fc2(x)
        return x

# Crear modelo
modelo_torch = RedNeuronal(input_size=4, hidden_size=16, output_size=3)
print("=== MODELO PYTORCH ===")
print(modelo_torch)

# Contar par√°metros
total_params = sum(p.numel() for p in modelo_torch.parameters())
print(f"\nTotal de par√°metros: {total_params}")

In [None]:
# Preparar datos para PyTorch
X_train_tensor = torch.FloatTensor(X_train_scaled)
y_train_tensor = torch.LongTensor(y_train)
X_test_tensor = torch.FloatTensor(X_test_scaled)
y_test_tensor = torch.LongTensor(y_test)

# Crear DataLoader
train_dataset = TensorDataset(X_train_tensor, y_train_tensor)
train_loader = DataLoader(train_dataset, batch_size=16, shuffle=True)

print(f"N√∫mero de batches: {len(train_loader)}")
print(f"Tama√±o de batch: 16")

In [None]:
# Definir funci√≥n de p√©rdida y optimizador
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(modelo_torch.parameters(), lr=0.01)

# Entrenar modelo
num_epochs = 100
losses = []

for epoch in range(num_epochs):
    epoch_loss = 0
    for batch_X, batch_y in train_loader:
        # Forward pass
        outputs = modelo_torch(batch_X)
        loss = criterion(outputs, batch_y)
        
        # Backward pass y optimizaci√≥n
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        
        epoch_loss += loss.item()
    
    losses.append(epoch_loss / len(train_loader))
    
    if (epoch + 1) % 20 == 0:
        print(f"√âpoca [{epoch+1}/{num_epochs}], Loss: {losses[-1]:.4f}")

print("\nEntrenamiento completado!")

In [None]:
# Evaluar modelo
modelo_torch.eval()  # Modo evaluaci√≥n
with torch.no_grad():
    outputs = modelo_torch(X_test_tensor)
    _, predicted = torch.max(outputs, 1)
    accuracy = (predicted == y_test_tensor).float().mean()

print(f"Precisi√≥n en test: {accuracy:.4f} ({accuracy*100:.2f}%)")

# Visualizar curva de p√©rdida
plt.figure(figsize=(10, 6))
plt.plot(losses, linewidth=2)
plt.xlabel('√âpoca')
plt.ylabel('Loss')
plt.title('Curva de Aprendizaje - PyTorch', fontsize=14, fontweight='bold')
plt.grid(True, alpha=0.3)
plt.show()

### 7.3. CNN en PyTorch

In [None]:
# Definir CNN
class CNN(nn.Module):
    def __init__(self):
        super(CNN, self).__init__()
        # Capas convolucionales
        self.conv1 = nn.Conv2d(1, 32, kernel_size=3, padding=1)
        self.conv2 = nn.Conv2d(32, 64, kernel_size=3, padding=1)
        self.pool = nn.MaxPool2d(2, 2)
        
        # Capas fully connected
        self.fc1 = nn.Linear(64 * 7 * 7, 128)
        self.fc2 = nn.Linear(128, 10)
        
        # Dropout
        self.dropout = nn.Dropout(0.5)
        
    def forward(self, x):
        # Convoluciones
        x = self.pool(torch.relu(self.conv1(x)))
        x = self.pool(torch.relu(self.conv2(x)))
        
        # Flatten
        x = x.view(-1, 64 * 7 * 7)
        
        # Fully connected
        x = torch.relu(self.fc1(x))
        x = self.dropout(x)
        x = self.fc2(x)
        return x

# Crear modelo
cnn_torch = CNN()
print("=== CNN PYTORCH ===")
print(cnn_torch)

total_params = sum(p.numel() for p in cnn_torch.parameters())
print(f"\nTotal de par√°metros: {total_params:,}")

---

## 8. Proyecto Integrador Final

### An√°lisis Completo: Predicci√≥n de Precios de Casas

In [None]:
# Crear dataset sint√©tico de precios de casas
np.random.seed(42)

n_samples = 1000

# Generar caracter√≠sticas
superficie = np.random.randint(50, 300, n_samples)
habitaciones = np.random.randint(1, 6, n_samples)
banos = np.random.randint(1, 4, n_samples)
antiguedad = np.random.randint(0, 50, n_samples)
distancia_centro = np.random.uniform(0, 20, n_samples)

# Generar precio (con cierta relaci√≥n con las caracter√≠sticas)
precio = (
    superficie * 1000 + 
    habitaciones * 15000 + 
    banos * 20000 - 
    antiguedad * 500 - 
    distancia_centro * 2000 +
    np.random.randn(n_samples) * 20000
)

# Crear DataFrame
casas_df = pd.DataFrame({
    'superficie': superficie,
    'habitaciones': habitaciones,
    'banos': banos,
    'antiguedad': antiguedad,
    'distancia_centro': distancia_centro,
    'precio': precio
})

print("=== DATASET DE CASAS ===")
print(casas_df.head(10))
print(f"\nForma: {casas_df.shape}")
print("\nEstad√≠sticas:")
print(casas_df.describe())

In [None]:
# 1. EXPLORACI√ìN CON SEABORN
print("=== FASE 1: EXPLORACI√ìN DE DATOS ===")

fig = plt.figure(figsize=(16, 10))

# Distribuci√≥n de precios
ax1 = plt.subplot(2, 3, 1)
sns.histplot(data=casas_df, x='precio', kde=True, color='steelblue', ax=ax1)
ax1.set_title('Distribuci√≥n de Precios', fontweight='bold')

# Precio vs Superficie
ax2 = plt.subplot(2, 3, 2)
sns.scatterplot(data=casas_df, x='superficie', y='precio', 
                hue='habitaciones', palette='viridis', ax=ax2)
ax2.set_title('Precio vs Superficie', fontweight='bold')

# Matriz de correlaci√≥n
ax3 = plt.subplot(2, 3, 3)
corr_casas = casas_df.corr()
sns.heatmap(corr_casas, annot=True, cmap='coolwarm', center=0, ax=ax3,
            square=True)
ax3.set_title('Matriz de Correlaci√≥n', fontweight='bold')

# Precio por habitaciones
ax4 = plt.subplot(2, 3, 4)
sns.boxplot(data=casas_df, x='habitaciones', y='precio', palette='Set2', ax=ax4)
ax4.set_title('Precio por Habitaciones', fontweight='bold')

# Precio vs Antig√ºedad
ax5 = plt.subplot(2, 3, 5)
sns.scatterplot(data=casas_df, x='antiguedad', y='precio', 
                alpha=0.5, color='coral', ax=ax5)
sns.regplot(data=casas_df, x='antiguedad', y='precio', 
            scatter=False, color='darkred', ax=ax5)
ax5.set_title('Precio vs Antig√ºedad', fontweight='bold')

# Pairplot subset
ax6 = plt.subplot(2, 3, 6)
ax6.axis('off')
ax6.text(0.5, 0.5, 'Ver Pairplot\nabajo', 
         ha='center', va='center', fontsize=14)

plt.suptitle('An√°lisis Exploratorio - Precios de Casas', 
             fontsize=16, fontweight='bold')
plt.tight_layout()
plt.show()

# Pairplot
sns.pairplot(casas_df.sample(200), 
             vars=['superficie', 'habitaciones', 'precio'],
             height=3)
plt.suptitle('Relaciones entre Variables Clave', y=1.02, fontsize=14)
plt.show()

In [None]:
# 2. MACHINE LEARNING CON SCIKIT-LEARN
print("\n=== FASE 2: MODELOS DE MACHINE LEARNING ===")

from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LinearRegression, Ridge, Lasso
from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor
from sklearn.metrics import mean_squared_error, r2_score, mean_absolute_error

# Preparar datos
X = casas_df.drop('precio', axis=1).values
y = casas_df['precio'].values

# Dividir datos
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Normalizar
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Entrenar m√∫ltiples modelos
modelos_ml = {
    'Regresi√≥n Lineal': LinearRegression(),
    'Ridge': Ridge(alpha=10),
    'Lasso': Lasso(alpha=10),
    'Random Forest': RandomForestRegressor(n_estimators=100, random_state=42),
    'Gradient Boosting': GradientBoostingRegressor(n_estimators=100, random_state=42)
}

resultados_ml = {}

print("\nEntrenando modelos...\n")
for nombre, modelo in modelos_ml.items():
    # Entrenar
    modelo.fit(X_train_scaled, y_train)
    
    # Predecir
    y_pred = modelo.predict(X_test_scaled)
    
    # M√©tricas
    mse = mean_squared_error(y_test, y_pred)
    rmse = np.sqrt(mse)
    mae = mean_absolute_error(y_test, y_pred)
    r2 = r2_score(y_test, y_pred)
    
    resultados_ml[nombre] = {
        'RMSE': rmse,
        'MAE': mae,
        'R¬≤': r2,
        'predicciones': y_pred
    }
    
    print(f"{nombre}:")
    print(f"  RMSE: ${rmse:,.2f}")
    print(f"  MAE:  ${mae:,.2f}")
    print(f"  R¬≤:   {r2:.4f}")
    print()

In [None]:
# Visualizar comparaci√≥n de modelos
fig, axes = plt.subplots(2, 3, figsize=(18, 10))
axes = axes.flatten()

for i, (nombre, resultados) in enumerate(resultados_ml.items()):
    ax = axes[i]
    y_pred = resultados['predicciones']
    
    # Scatter plot: real vs predicho
    ax.scatter(y_test, y_pred, alpha=0.5)
    ax.plot([y_test.min(), y_test.max()], 
            [y_test.min(), y_test.max()], 
            'r--', linewidth=2)
    
    ax.set_xlabel('Precio Real')
    ax.set_ylabel('Precio Predicho')
    ax.set_title(f"{nombre}\nR¬≤ = {resultados['R¬≤']:.3f}")
    ax.grid(True, alpha=0.3)

axes[5].axis('off')
plt.suptitle('Comparaci√≥n de Modelos ML - Predicci√≥n de Precios', 
             fontsize=16, fontweight='bold')
plt.tight_layout()
plt.show()

In [None]:
# Comparaci√≥n de m√©tricas
metricas_df = pd.DataFrame({
    nombre: {
        'RMSE': resultados['RMSE'],
        'MAE': resultados['MAE'],
        'R¬≤': resultados['R¬≤']
    }
    for nombre, resultados in resultados_ml.items()
}).T

print("\n=== COMPARACI√ìN DE MODELOS ===")
print(metricas_df.sort_values('R¬≤', ascending=False))

# Visualizar comparaci√≥n
fig, axes = plt.subplots(1, 3, figsize=(16, 5))

for i, metrica in enumerate(['RMSE', 'MAE', 'R¬≤']):
    ax = axes[i]
    metricas_df[metrica].sort_values().plot(kind='barh', ax=ax, color='steelblue')
    ax.set_title(f'{metrica} por Modelo', fontweight='bold')
    ax.grid(axis='x', alpha=0.3)

plt.tight_layout()
plt.show()

In [None]:
# 3. DEEP LEARNING CON TENSORFLOW/KERAS
print("\n=== FASE 3: DEEP LEARNING ===")

# Crear modelo de red neuronal
dl_model = models.Sequential([
    layers.Dense(64, activation='relu', input_shape=(5,)),
    layers.Dropout(0.2),
    layers.Dense(32, activation='relu'),
    layers.Dropout(0.2),
    layers.Dense(16, activation='relu'),
    layers.Dense(1)  # Regresi√≥n: 1 salida
])

dl_model.compile(
    optimizer='adam',
    loss='mse',
    metrics=['mae']
)

print("\n=== ARQUITECTURA DE LA RED NEURONAL ===")
dl_model.summary()

# Entrenar
print("\nEntrenando red neuronal...")
history_dl = dl_model.fit(
    X_train_scaled, y_train,
    epochs=100,
    batch_size=32,
    validation_split=0.2,
    verbose=0
)

print("Entrenamiento completado!")

# Evaluar
y_pred_dl = dl_model.predict(X_test_scaled, verbose=0).flatten()
mse_dl = mean_squared_error(y_test, y_pred_dl)
rmse_dl = np.sqrt(mse_dl)
mae_dl = mean_absolute_error(y_test, y_pred_dl)
r2_dl = r2_score(y_test, y_pred_dl)

print(f"\n=== RESULTADOS DEEP LEARNING ===")
print(f"RMSE: ${rmse_dl:,.2f}")
print(f"MAE:  ${mae_dl:,.2f}")
print(f"R¬≤:   {r2_dl:.4f}")

In [None]:
# Visualizar curvas de aprendizaje y resultados
fig = plt.figure(figsize=(16, 10))

# Curva de p√©rdida
ax1 = plt.subplot(2, 2, 1)
ax1.plot(history_dl.history['loss'], label='Entrenamiento')
ax1.plot(history_dl.history['val_loss'], label='Validaci√≥n')
ax1.set_xlabel('√âpoca')
ax1.set_ylabel('MSE Loss')
ax1.set_title('Curva de P√©rdida', fontweight='bold')
ax1.legend()
ax1.grid(True, alpha=0.3)

# Curva de MAE
ax2 = plt.subplot(2, 2, 2)
ax2.plot(history_dl.history['mae'], label='Entrenamiento')
ax2.plot(history_dl.history['val_mae'], label='Validaci√≥n')
ax2.set_xlabel('√âpoca')
ax2.set_ylabel('MAE')
ax2.set_title('Error Absoluto Medio', fontweight='bold')
ax2.legend()
ax2.grid(True, alpha=0.3)

# Predicciones vs Real
ax3 = plt.subplot(2, 2, 3)
ax3.scatter(y_test, y_pred_dl, alpha=0.5)
ax3.plot([y_test.min(), y_test.max()], 
         [y_test.min(), y_test.max()], 
         'r--', linewidth=2)
ax3.set_xlabel('Precio Real')
ax3.set_ylabel('Precio Predicho')
ax3.set_title(f'Deep Learning\nR¬≤ = {r2_dl:.3f}', fontweight='bold')
ax3.grid(True, alpha=0.3)

# Comparaci√≥n final de todos los modelos
ax4 = plt.subplot(2, 2, 4)
todos_modelos = list(resultados_ml.keys()) + ['Deep Learning']
todos_r2 = [resultados_ml[m]['R¬≤'] for m in resultados_ml.keys()] + [r2_dl]
colores = ['steelblue'] * len(resultados_ml) + ['coral']
bars = ax4.barh(todos_modelos, todos_r2, color=colores)
ax4.set_xlabel('R¬≤ Score')
ax4.set_title('Comparaci√≥n Final - Todos los Modelos', fontweight='bold')
ax4.grid(axis='x', alpha=0.3)

# A√±adir valores en las barras
for i, (bar, valor) in enumerate(zip(bars, todos_r2)):
    ax4.text(valor, i, f' {valor:.3f}', 
             va='center', fontweight='bold')

plt.suptitle('Resultados Finales - Deep Learning', 
             fontsize=16, fontweight='bold')
plt.tight_layout()
plt.show()

In [None]:
# CONCLUSIONES FINALES
print("\n" + "="*60)
print("CONCLUSIONES DEL PROYECTO INTEGRADOR")
print("="*60)

# Mejor modelo
mejor_modelo_ml = max(resultados_ml.items(), 
                      key=lambda x: x[1]['R¬≤'])

print(f"\nüìä MEJOR MODELO ML: {mejor_modelo_ml[0]}")
print(f"   R¬≤: {mejor_modelo_ml[1]['R¬≤']:.4f}")
print(f"   RMSE: ${mejor_modelo_ml[1]['RMSE']:,.2f}")

print(f"\nüß† DEEP LEARNING:")
print(f"   R¬≤: {r2_dl:.4f}")
print(f"   RMSE: ${rmse_dl:,.2f}")

if r2_dl > mejor_modelo_ml[1]['R¬≤']:
    print(f"\n‚úÖ Deep Learning SUPERA a ML tradicional")
else:
    print(f"\n‚úÖ {mejor_modelo_ml[0]} es el MEJOR modelo")

print(f"\nüìà LIBRER√çAS UTILIZADAS:")
print(f"   ‚Ä¢ NumPy: Computaci√≥n num√©rica")
print(f"   ‚Ä¢ Pandas: Manipulaci√≥n de datos")
print(f"   ‚Ä¢ Matplotlib: Visualizaci√≥n b√°sica")
print(f"   ‚Ä¢ Seaborn: Visualizaci√≥n estad√≠stica")
print(f"   ‚Ä¢ Scikit-learn: Machine Learning")
print(f"   ‚Ä¢ TensorFlow/Keras: Deep Learning")

print("\n" + "="*60)
print("FIN DEL PROYECTO")
print("="*60)

---

## üìö Resumen Final

### Scikit-learn
- **Regresi√≥n:** LinearRegression, Ridge, Lasso
- **Clasificaci√≥n:** LogisticRegression, DecisionTree, RandomForest
- **Clustering:** K-Means, DBSCAN
- **Validaci√≥n:** train_test_split, cross_val_score
- **M√©tricas:** accuracy, precision, recall, r2_score

### TensorFlow/Keras
- **Secuencial:** models.Sequential para redes simples
- **Capas:** Dense, Conv2D, MaxPooling, Dropout
- **Compilaci√≥n:** optimizer, loss, metrics
- **Entrenamiento:** fit(), evaluate(), predict()
- **Callbacks:** EarlyStopping, ModelCheckpoint

### PyTorch
- **Tensores:** torch.tensor, operaciones GPU
- **M√≥dulos:** nn.Module, nn.Linear, nn.Conv2d
- **Optimizaci√≥n:** optim.Adam, optim.SGD
- **Entrenamiento:** Forward pass, backward pass, optimizaci√≥n
- **DataLoader:** Gesti√≥n eficiente de datos

### Flujo de Trabajo Completo
```python
1. Cargar datos (Pandas)
2. Explorar y visualizar (Seaborn, Matplotlib)
3. Limpiar y preparar (NumPy, Pandas)
4. Dividir train/test (Scikit-learn)
5. Entrenar modelos ML (Scikit-learn)
6. Entrenar Deep Learning (TensorFlow/PyTorch)
7. Evaluar y comparar (M√©tricas)
8. Visualizar resultados (Matplotlib, Seaborn)
9. Seleccionar mejor modelo
10. Desplegar a producci√≥n
```

### Cu√°ndo Usar Cada Librer√≠a

**NumPy:** Operaciones num√©ricas, √°lgebra lineal

**Pandas:** Datos tabulares, an√°lisis exploratorio

**Matplotlib:** Visualizaciones personalizadas, control total

**Seaborn:** Visualizaciones estad√≠sticas r√°pidas y hermosas

**Scikit-learn:** ML tradicional, datasets peque√±os-medianos

**TensorFlow/Keras:** Deep Learning, producci√≥n, datasets grandes

**PyTorch:** Investigaci√≥n, prototipos, control fino

---
