<a href="https://colab.research.google.com/github/misanchz98/bitcoin-direction-prediction/blob/main/03_modeling/03_modeling.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 📊 Modelos de Deep Learning para Series Temporales con Walk-Forward Validation

En este notebook implementamos diferentes modelos de **Deep Learning** y técnicas de validación temporal con **Purged Walk-Forward Split**.  

Incluye:
- Preprocesamiento y creación de secuencias.
- Modelos: LSTM, GRU, CNN-LSTM, Transformer y TCN.
- Métricas personalizadas y evaluación.
- Importancia de características.
- Pipeline de entrenamiento y validación.

## 🔹 1. Librerías
Instalamos e importamos las librerías necesarias para manipulación de datos, visualización, machine learning y deep learning.


In [44]:
!pip install boruta
!pip install keras-tcn --quiet



In [45]:
# =============================================================================
# LIBRERIAS
# =============================================================================
import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import random

from sklearn.preprocessing import RobustScaler, StandardScaler
from sklearn.metrics import (accuracy_score, precision_score, recall_score,
                             f1_score, matthews_corrcoef)

from sklearn.model_selection import BaseCrossValidator
from sklearn.ensemble import RandomForestClassifier

import tensorflow as tf
from tensorflow.keras.models import Sequential, Model
from tensorflow.keras.layers import (Dense, Dropout, LSTM, GRU, Conv1D, MaxPooling1D,
                                     Flatten, Input, LayerNormalization)
from tensorflow.keras.layers import MultiHeadAttention
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.callbacks import EarlyStopping, ReduceLROnPlateau, ModelCheckpoint

import warnings

# Establecer el nivel de advertencias a "ignore" para ignorar todas las advertencias
warnings.filterwarnings("ignore")

## 🔹 2. Semillas
Para garantizar **reproducibilidad** en los experimentos, es importante fijar las semillas de las librerías que generan números aleatorios:

- `os.environ['PYTHONHASHSEED']` → controla el hash en Python.  
- `numpy.random.seed` → asegura resultados reproducibles en operaciones de NumPy.  
- `random.seed` → fija la semilla del generador de números aleatorios nativo de Python.  
- `tf.random.set_seed` → fija la semilla para TensorFlow y Keras.  

Esto ayuda a que los modelos se entrenen con resultados consistentes entre ejecuciones.

In [46]:
# resetting the seeds for reproducibility
def reset_random_seeds():
    n = 42
    os.environ['PYTHONHASHSEED'] = str(n)
    tf.random.set_seed(n)
    np.random.seed(n)
    random.seed(n)

reset_random_seeds()

## 🔹 3. Conjunto de Datos
Importamos el conjunto de datos en nuestro entorno de trabajo. Se encuentran almacenados en un archivo CSV llamado `btc_historical_data_eda.csv`, cuya obtención se explica en el *notebook* `02_data_analysis.ipynb`.

In [47]:
# Importamos CSV
url = 'https://raw.githubusercontent.com/misanchz98/bitcoin-direction-prediction/main/02_data_analysis/data/btc_historical_data_eda.csv'
df_bitcoin = pd.read_csv(url, parse_dates=['Open time'])
df_bitcoin

Unnamed: 0,Open time,Close,Number of trades,Taker buy base asset volume,Taker buy quote asset volume,Range,Candle,Target,CMF_20,MFI_14,...,c2_ta_tendencia,c3_ta_tendencia,c1_ta_momentum,c2_ta_momentum,c3_ta_momentum,c4_ta_momentum,c5_ta_momentum,c1_ta_volatilidad,c2_ta_volatilidad,c3_ta_volatilidad
0,2017-10-05,4292.43,9158.0,351.042019,1.483037e+06,245.00,83.84,1,0.081329,56.225018,...,1.426309,-0.136861,1.317719,-0.126726,-1.676720,1.289633,0.019128,-4.644643,-0.938175,-0.259897
1,2017-10-06,4369.00,6546.0,226.148177,9.881066e+05,125.00,50.01,1,0.090972,62.048701,...,1.684975,-0.223654,1.843789,-0.313902,-0.766032,1.200655,0.108543,-4.656021,-1.404483,-0.298095
2,2017-10-07,4423.00,4804.0,145.313076,6.371469e+05,166.94,54.00,1,0.072898,60.780168,...,1.837639,-0.272101,1.714315,0.851111,-1.474281,-0.169404,-0.611932,-4.658149,-1.694457,-0.310605
3,2017-10-08,4640.00,7580.0,280.094854,1.268661e+06,233.00,215.00,1,0.064115,66.225272,...,2.655718,-0.595153,4.539268,-1.327677,-1.397994,0.344940,0.095779,-4.643444,-2.930102,-0.299058
4,2017-10-09,4786.95,10372.0,350.756559,1.654275e+06,339.98,146.95,0,0.105281,66.423592,...,3.068594,-0.711866,4.065484,0.284535,-0.496135,-0.373562,-0.383308,-4.619178,-3.116037,-0.259659
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2872,2025-08-16,117380.66,1179842.0,2995.228650,3.521588e+08,755.01,38.62,1,-0.078079,61.679789,...,-2.974279,-0.597347,-1.738575,-0.246759,0.045690,1.947962,-0.826190,11.130802,0.715864,-3.617415
2873,2025-08-17,117405.01,1177563.0,2804.731130,3.307994e+08,1402.79,24.35,0,-0.071478,61.441782,...,-3.056170,-0.520901,-2.679964,0.452454,-1.078307,1.833305,-0.382838,11.062656,0.708259,-3.717682
2874,2025-08-18,116227.05,3345487.0,7647.218200,8.850528e+08,2903.61,-1177.96,0,-0.058026,54.527915,...,-3.371373,-0.335897,-3.708156,1.216496,-0.767095,1.913662,1.633951,11.085307,1.427748,-3.651715
2875,2025-08-19,112872.94,3291170.0,8609.360780,9.840874e+08,3993.11,-3354.11,1,-0.133646,53.037041,...,-4.158761,0.112011,-5.328226,0.713262,0.105456,1.232647,-1.954862,11.229262,3.161925,-3.269720


## 🔹 4. Split Purgado Walk-Forward

Definimos un validador temporal con **embargo** para evitar fugas de información y simular un escenario *walk-forward*.

### 📘 Purged Time Series Split

La técnica **Purged Time Series Split** se utiliza para realizar validación cruzada en series temporales, evitando fugas de información entre entrenamiento y validación.

#### 🧠 ¿Por qué es necesaria?

En series temporales (como datos financieros), los datos futuros **no deben influir** en el entrenamiento del modelo. Usar validación cruzada tradicional puede provocar que el modelo aprenda de datos que cronológicamente ocurren después de los datos de validación, lo que genera resultados engañosos.

#### 🔍 ¿Qué significa "purged"?

Se eliminan (purge) los datos cercanos al conjunto de validación del conjunto de entrenamiento para evitar que el modelo aprenda patrones que están demasiado próximos en el tiempo y puedan estar correlacionados.

#### ⏳ ¿Qué es el "embargo"?

Es una zona de exclusión temporal entre el final del conjunto de entrenamiento y el inicio del conjunto de validación. Sirve para evitar que eventos cercanos contaminen el entrenamiento.

|--- entrenamiento ---| embargo |--- validación ---|

#### ✅ Ventajas

- Evita **data leakage** (fugas de información).
- Simula condiciones reales de predicción.
- Mejora la **validez del modelo** en contextos temporales como mercados financieros.

Esta técnica es especialmente útil cuando se trabaja con datos como precios de Bitcoin, donde el orden temporal y la independencia entre conjuntos es crítica.

In [48]:
# =======================================
# SPLIT PURGADO WALK-FORWARD
# =======================================

#class PurgedTimeSeriesSplit(BaseCrossValidator):
#    def __init__(self, n_splits=4, embargo=0, test_size=0.2):
#        self.n_splits = n_splits
#        self.embargo = embargo
#        self.test_size = test_size
#
#    def split(self, X, y=None, groups=None):
#        n_samples = len(X)
#        test_size = int(n_samples * self.test_size)
#        n_trainval = n_samples - test_size
#        fold_sizes = np.linspace(0.5, 1.0, self.n_splits+1)
#
#        for i in range(self.n_splits):
#            end = int(fold_sizes[i+1] * n_trainval)
#            val_size = int(0.15 * end)
#            val_start = end - val_size
#            val_end = end
#            train_end = max(0, val_start - self.embargo)
#            train_idx = np.arange(0, train_end)
#            val_idx = np.arange(val_start, val_end)
#
#            yield train_idx, val_idx
#
#    def get_n_splits(self, X=None, y=None, groups=None):
#        return self.n_splits

class PurgedCV:
    def __init__(self, embargo_size=0):
        self.embargo_size = embargo_size

    def split(self, X, events):
        """
        Generate purged and embargoed cross-validation splits.
        - X: array de secuencias (no se usa directamente aquí, pero se mantiene por compatibilidad)
        - events: Series con índice de fechas y labels (ej. Target)
        """
        events = events.sort_index()
        unique_dates = events.index.unique()
        n_splits = len(unique_dates)

        for test_date in unique_dates:
            # Test = todos los eventos en esa fecha
            test_indices = events.index == test_date

            # Train = todo lo demás, menos embargo
            train_indices = self._get_train_indices(events, test_date, self.embargo_size)
            yield train_indices, test_indices

    @staticmethod
    def _get_train_indices(events, test_date, embargo_size):
        train_indices = events.index != test_date
        if embargo_size > 0:
            embargo_dates = pd.date_range(
                start=test_date,
                periods=embargo_size + 1,
                freq=events.index.freq  # necesita DateTimeIndex con freq definida
            )
            train_indices &= ~events.index.isin(embargo_dates)
        return train_indices

## 🔹 4. Métrica F1 Personalizada
Definimos una métrica de **F1-score** compatible con TensorFlow/Keras.

In [49]:
# =======================================
# MÉTRICA F1 PERSONALIZADA
# ======================================

def f1_score_metric(y_true, y_pred):
    y_true = tf.cast(y_true, tf.float32)
    y_pred = tf.round(tf.cast(y_pred, tf.float32))
    tp = tf.reduce_sum(y_true * y_pred)
    fp = tf.reduce_sum((1 - y_true) * y_pred)
    fn = tf.reduce_sum(y_true * (1 - y_pred))
    precision = tp / (tp + fp + tf.keras.backend.epsilon())
    recall = tp / (tp + fn + tf.keras.backend.epsilon())
    return 2 * (precision * recall) / (precision + recall + tf.keras.backend.epsilon())

## 🔹 5. Creación de Secuencias

Función para crear ventanas deslizantes en series temporales multivariadas.

In [50]:
# =======================================
# CREACIÓN DE SECUENCIAS
# =======================================

def create_windows_multivariate_np(data, target, window_size, horizon=1, shuffle=False):
    if isinstance(data, pd.DataFrame):
        data = data.values
    if isinstance(target, (pd.DataFrame, pd.Series)):
        target = target.values

    X, y = [], []
    for i in range(len(data) - window_size - horizon + 1):
        X.append(data[i:i+window_size, :])
        y.append(target[i+window_size+horizon-1])

    X, y = np.array(X), np.array(y)

    if shuffle:
        idx = np.arange(X.shape[0])
        np.random.shuffle(idx)
        X, y = X[idx], y[idx]

    return X, y

## 🔹 6. Modelos Base

Definimos diferentes arquitecturas:
- **LSTM**
- **GRU**
- **CNN+LSTM**
- **CNN+GRU**

In [51]:
# =======================================
# MODELOS BASE
# =======================================

def build_lstm_simple(input_shape):
    model = Sequential([
        LSTM(64, input_shape=input_shape, return_sequences=False),
        Dropout(0.2),
        Dense(32, activation="relu"),
        Dropout(0.2),
        Dense(1, activation="sigmoid")
    ])
    model.compile(optimizer=Adam(0.001), loss="binary_crossentropy", metrics=[f1_score_metric])
    return model

def build_gru_simple(input_shape):
    model = Sequential([
        GRU(64, input_shape=input_shape, return_sequences=False),
        Dropout(0.2),
        Dense(32, activation="relu"),
        Dropout(0.2),
        Dense(1, activation="sigmoid")
    ])
    model.compile(optimizer=Adam(0.001), loss="binary_crossentropy", metrics=[f1_score_metric])
    return model

def build_lstm_cnn_simple(input_shape):
    model = Sequential([
        Conv1D(32, kernel_size=3, activation="relu", input_shape=input_shape),
        MaxPooling1D(pool_size=2),
        LSTM(32, return_sequences=False),
        Dropout(0.2),
        Dense(16, activation="relu"),
        Dense(1, activation="sigmoid")
    ])
    model.compile(optimizer=Adam(0.001), loss="binary_crossentropy", metrics=[f1_score_metric])
    return model

def build_gru_cnn_simple(input_shape):
    model = Sequential([
        Conv1D(32, kernel_size=3, activation="relu", input_shape=input_shape),
        MaxPooling1D(pool_size=2),
        GRU(32, return_sequences=False),
        Dropout(0.2),
        Dense(16, activation="relu"),
        Dense(1, activation="sigmoid")
    ])
    model.compile(optimizer=Adam(0.001), loss="binary_crossentropy", metrics=[f1_score_metric])
    return model

## 🔹 7. Nuevos Modelos

Se incluyen arquitecturas modernas:
- **Transformer Encoder**
- **Temporal Convolutional Network (TCN)**

In [52]:
# =======================================
# NUEVOS MODELOS (Transformer + TCN)
# =======================================

def build_transformer_encoder(input_shape, num_heads=4, ff_dim=64):
    inputs = Input(shape=input_shape)
    x = MultiHeadAttention(num_heads=num_heads, key_dim=input_shape[-1])(inputs, inputs)
    x = LayerNormalization(epsilon=1e-6)(x)
    x = Dense(ff_dim, activation="relu")(x)
    x = Flatten()(x)
    x = Dropout(0.3)(x)
    outputs = Dense(1, activation="sigmoid")(x)
    model = Model(inputs, outputs)
    model.compile(optimizer=Adam(0.001), loss="binary_crossentropy", metrics=[f1_score_metric])
    return model

def build_tcn_simple(input_shape, nb_filters=32, kernel_size=3, nb_stacks=1, dilations=[1,2,4,8]):
    from tcn import TCN
    model = Sequential([
        TCN(nb_filters=nb_filters, kernel_size=kernel_size, dilations=dilations,
            nb_stacks=nb_stacks, dropout_rate=0.2, return_sequences=False,
            input_shape=input_shape),
        Dense(32, activation="relu"),
        Dropout(0.2),
        Dense(1, activation="sigmoid")
    ])
    model.compile(optimizer=Adam(0.001), loss="binary_crossentropy", metrics=[f1_score_metric])
    return model

## 🔹 8. Evaluación de Métricas

Función para calcular métricas de clasificación y métricas financieras (retorno y Sharpe ratio).

In [53]:
# =======================================
# EVALUACIÓN DE MÉTRICAS
# =======================================

def evaluate_metrics(y_true, y_pred, returns=None):
    metrics = {
        "accuracy": accuracy_score(y_true, y_pred),
        "precision": precision_score(y_true, y_pred, zero_division=0),
        "recall": recall_score(y_true, y_pred, zero_division=0),
        "f1": f1_score(y_true, y_pred, zero_division=0),
        "mcc": matthews_corrcoef(y_true, y_pred)
    }
    if returns is not None:
        strat_returns = np.where(y_pred==1, returns, -returns)
        metrics["cum_return"] = strat_returns.cumsum()[-1]
        metrics["sharpe"] = strat_returns.mean() / (strat_returns.std() + 1e-8)
    return metrics


## 🔹 9. Importancia de *Features*

Evaluamos la importancia de variables mediante **permutación** y visualizaciones.

In [54]:
# =======================================
# IMPORTANCIA DE FEATURES
# =======================================

def permutation_importance_seq(model, X_val, y_val, features, n_repeats=5):
    baseline_acc = accuracy_score(y_val, (model.predict(X_val) > 0.5).astype(int))
    importances = {}
    for j, feat in enumerate(features):
        scores = []
        X_val_permuted = X_val.copy()
        for _ in range(n_repeats):
            np.random.shuffle(X_val_permuted[:, :, j])
            acc = accuracy_score(y_val, (model.predict(X_val_permuted) > 0.5).astype(int))
            scores.append(baseline_acc - acc)
        importances[feat] = np.mean(scores)
    return pd.Series(importances).sort_values(ascending=False)


## 🔹 10. Resumen de Métricas

Funciones para resumir y visualizar resultados de los modelos.

In [55]:
# =======================================
# RESUMEN DE MÉTRICAS
# =======================================

def summarize_results(metrics_df):
    summary = metrics_df.groupby("model").agg(["mean", "std"])
    summary = summary.sort_values(("f1", "mean"), ascending=False)
    return summary.round(4)

def aggregate_feature_importances(feature_importances_all, top_n=10):
    agg_results = {}
    for model_name in set([d["model"] for d in feature_importances_all]):
        imps = [d["importances"] for d in feature_importances_all if d["model"] == model_name]
        if imps:
            imp_mean = pd.concat(imps, axis=1).mean(axis=1).sort_values(ascending=False)
            agg_results[model_name] = imp_mean.head(top_n)
            plt.figure(figsize=(8,5))
            imp_mean.head(top_n).plot(kind="barh")
            plt.gca().invert_yaxis()
            plt.title(f"Top-{top_n} Features ({model_name})")
            plt.show()
    return agg_results

def plot_feature_importance_heatmap(feature_importances_all, top_n=10):
    agg_results = aggregate_feature_importances(feature_importances_all, top_n)
    df_heatmap = pd.DataFrame(agg_results).fillna(0)
    plt.figure(figsize=(10,6))
    sns.heatmap(df_heatmap, annot=True, fmt=".3f", cmap="YlOrBr")
    plt.title(f"Heatmap Features vs Modelos (Top-{top_n})")
    plt.show()


## 🔹 11. *Pipeline Walk-Forward*
*Pipeline* que entrena múltiples modelos, evalúa métricas y calcula importancia de features.

In [56]:
# =======================================
# PIPELINE WALK-FORWARD
# =======================================

#def run_pipeline_walkforward(df, target_col="Target", return_col="Return",
#                             window_size=30, horizon=1, n_splits=6, scaler_type="robust"):
#    embargo = window_size + horizon - 1
#    test_size = int(len(df) * 0.2)
#
#    df_trainval = df.iloc[:-test_size]
#    features = [c for c in df.columns if c not in [target_col, return_col]]
#    print(f"\n📌 Usando todas las features: {features}")
#
#    X_all = df_trainval[features].values
#    y_all = df_trainval[target_col].values
#    returns_all = df_trainval[return_col].values
#    X_all_seq, y_all_seq = create_windows_multivariate_np(X_all, y_all, window_size, horizon)
#    ret_seq = returns_all[window_size+horizon-1:]
#
#    splitter = PurgedTimeSeriesSplit(n_splits=n_splits, embargo=embargo, test_size=0.2)
#    metrics_list, feature_importances_all = [], []
#
#    models = {
#        "LSTM": build_lstm_simple,
#        "GRU": build_gru_simple,
#        "LSTM+CNN": build_lstm_cnn_simple,
#        "GRU+CNN": build_gru_cnn_simple
#    }
#
#    for fold, (train_idx, val_idx) in enumerate(splitter.split(X_all_seq)):
#        print(f"\n=== Fold {fold+1}/{n_splits} ===")
#        X_train_raw, y_train = X_all_seq[train_idx], y_all_seq[train_idx]
#        X_val_raw, y_val = X_all_seq[val_idx], y_all_seq[val_idx]
#        r_val = ret_seq[val_idx]
#
#        scaler = RobustScaler() if scaler_type=="robust" else StandardScaler()
#        n_samples, seq_len, n_features = X_train_raw.shape
#        X_train_scaled = scaler.fit_transform(X_train_raw.reshape(-1, n_features)).reshape(n_samples, seq_len, n_features)
#        X_val_scaled = scaler.transform(X_val_raw.reshape(-1, n_features)).reshape(X_val_raw.shape[0], seq_len, n_features)
#
#        for name, fn in models.items():
#            print(f"\n--- Modelo: {name} ---")
#            model = fn(X_train_scaled.shape[1:])
#            es = EarlyStopping(monitor="val_loss", patience=5, restore_best_weights=True)
#            rlrop = ReduceLROnPlateau(monitor="val_loss", patience=3, factor=0.5, min_lr=1e-5)
#            ckpt = ModelCheckpoint(f"best_{name}_fold{fold+1}.keras", save_best_only=True)
#
#            model.fit(X_train_scaled, y_train,
#                      validation_data=(X_val_scaled, y_val),
#                      epochs=50, batch_size=32,
#                      callbacks=[es, rlrop, ckpt], verbose=0)
#
#            y_pred_val = (model.predict(X_val_scaled) > 0.5).astype("int32")
#            metrics_val = evaluate_metrics(y_val, y_pred_val, returns=r_val)
#            metrics_val.update({"fold": fold+1, "model": name})
#            metrics_list.append(metrics_val)
#
#            imp = permutation_importance_seq(model, X_val_scaled.copy(), y_val, features)
#            feature_importances_all.append({"fold": fold+1, "model": name, "importances": imp})
#
#    return pd.DataFrame(metrics_list), feature_importances_all
#

# =======================================
# PIPELINE WALK-FORWARD USANDO PURGEDCV
# =======================================
def run_pipeline_walkforward(df, target_col="Target", return_col="Return",
                             window_size=30, horizon=1, embargo_size=5,
                             scaler_type="robust"):

    # Dejamos un bloque final como test (out-of-sample)
    test_size = int(len(df) * 0.2)
    df_trainval = df.iloc[:-test_size]

    features = [c for c in df.columns if c not in [target_col, return_col]]
    print(f"\n📌 Usando todas las features: {features}")

    # -----------------------------
    # Crear secuencias
    # -----------------------------
    X_all = df_trainval[features].values
    y_all = df_trainval[target_col].values
    returns_all = df_trainval[return_col].values

    X_all_seq, y_all_seq = create_windows_multivariate_np(
        X_all, y_all, window_size, horizon
    )
    ret_seq = returns_all[window_size+horizon-1:]

    # Índice temporal para las secuencias
    seq_index = df_trainval.index[window_size+horizon-1:]
    events = pd.Series(y_all_seq, index=seq_index)

    # -----------------------------
    # Splitter PurgedCV
    # -----------------------------
    purged_cv = PurgedCV(embargo_size=embargo_size)
    metrics_list, feature_importances_all = [], []

    # Modelos a evaluar
    models = {
        "LSTM": build_lstm_simple,
        "GRU": build_gru_simple,
        "LSTM+CNN": build_lstm_cnn_simple,
        "GRU+CNN": build_gru_cnn_simple
    }

    # -----------------------------
    # Loop de cross-validation
    # -----------------------------
    for fold, (train_mask, val_mask) in enumerate(purged_cv.split(X_all_seq, events)):
        train_idx = np.where(train_mask)[0]
        val_idx = np.where(val_mask)[0]

        if len(val_idx) == 0 or len(train_idx) == 0:
            continue  # evitar folds vacíos

        print(f"\n=== Fold {fold+1} ===")
        X_train_raw, y_train = X_all_seq[train_idx], y_all_seq[train_idx]
        X_val_raw, y_val = X_all_seq[val_idx], y_all_seq[val_idx]
        r_val = ret_seq[val_idx]

        # -----------------------------
        # Escalado
        # -----------------------------
        scaler = RobustScaler() if scaler_type == "robust" else StandardScaler()
        n_samples, seq_len, n_features = X_train_raw.shape
        X_train_scaled = scaler.fit_transform(
            X_train_raw.reshape(-1, n_features)
        ).reshape(n_samples, seq_len, n_features)
        X_val_scaled = scaler.transform(
            X_val_raw.reshape(-1, n_features)
        ).reshape(X_val_raw.shape[0], seq_len, n_features)

        # -----------------------------
        # Entrenamiento de modelos
        # -----------------------------
        for name, fn in models.items():
            print(f"\n--- Modelo: {name} ---")
            model = fn(X_train_scaled.shape[1:])

            es = EarlyStopping(monitor="val_loss", patience=5, restore_best_weights=True)
            rlrop = ReduceLROnPlateau(monitor="val_loss", patience=3, factor=0.5, min_lr=1e-5)
            ckpt = ModelCheckpoint(f"best_{name}_fold{fold+1}.keras", save_best_only=True)

            model.fit(
                X_train_scaled, y_train,
                validation_data=(X_val_scaled, y_val),
                epochs=50, batch_size=32,
                callbacks=[es, rlrop, ckpt], verbose=0
            )

            # -----------------------------
            # Evaluación
            # -----------------------------
            y_pred_val = (model.predict(X_val_scaled) > 0.5).astype("int32")
            metrics_val = evaluate_metrics(y_val, y_pred_val, returns=r_val)
            metrics_val.update({"fold": fold+1, "model": name})
            metrics_list.append(metrics_val)

            # Feature importance por permutación
            imp = permutation_importance_seq(
                model, X_val_scaled.copy(), y_val, features
            )
            feature_importances_all.append(
                {"fold": fold+1, "model": name, "importances": imp}
            )

    return pd.DataFrame(metrics_list), feature_importances_all


## 🔹 11. *Benchmarking*
Probamos los modelos

In [None]:
# =======================================
# BENCHMARKING
# =======================================

# === Prepara los datos ===
df_bitcoin["Open time"] = pd.to_datetime(df_bitcoin["Open time"])
df_bitcoin = df_bitcoin.set_index("Open time")

df_bitcoin["Return"] = np.log(df_bitcoin["Close"] / df_bitcoin["Close"].shift(1)).fillna(0)

# === Ejecutar pipeline ===
results, features_importance_all = run_pipeline_walkforward(
    df_bitcoin,
    target_col="Target",
    return_col="Return",
    window_size=14,
    horizon=1
)

# === Tabla bonita de métricas ===
print("\n📊 Resumen final:")
summary = summarize_results(results)
print(summary)

# === Importancia de features ===
agg_importances = aggregate_feature_importances(features_importance_all, top_n=10)

# === Heatmap comparativo ===
plot_feature_importance_heatmap(features_importance_all, top_n=10)

[1;30;43mSe han truncado las últimas 5000 líneas del flujo de salida.[0m
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 58ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 38ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 49ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 41ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 40ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 42ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 39ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 40ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 40ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 40ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 39ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 41ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m

In [None]:
# =======================================
# IMPORTS
# =======================================
#import pandas as pd
#import numpy as np
#import matplotlib.pyplot as plt
#import seaborn as sns
#
#from sklearn.preprocessing import RobustScaler, StandardScaler
#from sklearn.metrics import (accuracy_score, precision_score, recall_score,
#                             f1_score, matthews_corrcoef)
#
#from sklearn.model_selection import BaseCrossValidator
#from sklearn.ensemble import RandomForestClassifier
#from boruta import BorutaPy
#
#import tensorflow as tf
#from tensorflow.keras.models import Sequential, Model
#from tensorflow.keras.layers import (Dense, Dropout, LSTM, GRU, Conv1D, MaxPooling1D,
#                                     Flatten, Input, LayerNormalization)
#from tensorflow.keras.layers import MultiHeadAttention
#from tensorflow.keras.optimizers import Adam
#from tensorflow.keras.callbacks import EarlyStopping, ReduceLROnPlateau, ModelCheckpoint
#
## =======================================
## SPLIT PURGADO WALK-FORWARD
## =======================================
#class PurgedTimeSeriesSplit(BaseCrossValidator):
#    def __init__(self, n_splits=4, embargo=0, test_size=0.2):
#        self.n_splits = n_splits
#        self.embargo = embargo
#        self.test_size = test_size
#
#    def split(self, X, y=None, groups=None):
#        n_samples = len(X)
#        test_size = int(n_samples * self.test_size)
#        n_trainval = n_samples - test_size
#        fold_sizes = np.linspace(0.5, 1.0, self.n_splits+1)
#
#        for i in range(self.n_splits):
#            end = int(fold_sizes[i+1] * n_trainval)
#            val_size = int(0.15 * end)
#            val_start = end - val_size
#            val_end = end
#            train_end = max(0, val_start - self.embargo)
#            train_idx = np.arange(0, train_end)
#            val_idx = np.arange(val_start, val_end)
#
#            yield train_idx, val_idx
#
#    def get_n_splits(self, X=None, y=None, groups=None):
#        return self.n_splits
#
## =======================================
## MÉTRICA F1 PERSONALIZADA
## =======================================
#def f1_score_metric(y_true, y_pred):
#    y_true = tf.cast(y_true, tf.float32)
#    y_pred = tf.round(tf.cast(y_pred, tf.float32))
#    tp = tf.reduce_sum(y_true * y_pred)
#    fp = tf.reduce_sum((1 - y_true) * y_pred)
#    fn = tf.reduce_sum(y_true * (1 - y_pred))
#    precision = tp / (tp + fp + tf.keras.backend.epsilon())
#    recall = tp / (tp + fn + tf.keras.backend.epsilon())
#    return 2 * (precision * recall) / (precision + recall + tf.keras.backend.epsilon())
#
## =======================================
## CREACIÓN DE SECUENCIAS
## =======================================
#def create_windows_multivariate_np(data, target, window_size, horizon=1, shuffle=False):
#    if isinstance(data, pd.DataFrame):
#        data = data.values
#    if isinstance(target, (pd.DataFrame, pd.Series)):
#        target = target.values
#
#    X, y = [], []
#    for i in range(len(data) - window_size - horizon + 1):
#        X.append(data[i:i+window_size, :])
#        y.append(target[i+window_size+horizon-1])
#
#    X, y = np.array(X), np.array(y)
#
#    if shuffle:
#        idx = np.arange(X.shape[0])
#        np.random.shuffle(idx)
#        X, y = X[idx], y[idx]
#
#    return X, y
#
## =======================================
## MODELOS BASE
## =======================================
#def build_lstm_simple(input_shape):
#    model = Sequential([
#        LSTM(64, input_shape=input_shape, return_sequences=False),
#        Dropout(0.2),
#        Dense(32, activation="relu"),
#        Dropout(0.2),
#        Dense(1, activation="sigmoid")
#    ])
#    model.compile(optimizer=Adam(0.001), loss="binary_crossentropy", metrics=[f1_score_metric])
#    return model
#
#def build_gru_simple(input_shape):
#    model = Sequential([
#        GRU(64, input_shape=input_shape, return_sequences=False),
#        Dropout(0.2),
#        Dense(32, activation="relu"),
#        Dropout(0.2),
#        Dense(1, activation="sigmoid")
#    ])
#    model.compile(optimizer=Adam(0.001), loss="binary_crossentropy", metrics=[f1_score_metric])
#    return model
#
#def build_lstm_cnn_simple(input_shape):
#    model = Sequential([
#        Conv1D(32, kernel_size=3, activation="relu", input_shape=input_shape),
#        MaxPooling1D(pool_size=2),
#        LSTM(32, return_sequences=False),
#        Dropout(0.2),
#        Dense(16, activation="relu"),
#        Dense(1, activation="sigmoid")
#    ])
#    model.compile(optimizer=Adam(0.001), loss="binary_crossentropy", metrics=[f1_score_metric])
#    return model
#
#def build_gru_cnn_simple(input_shape):
#    model = Sequential([
#        Conv1D(32, kernel_size=3, activation="relu", input_shape=input_shape),
#        MaxPooling1D(pool_size=2),
#        GRU(32, return_sequences=False),
#        Dropout(0.2),
#        Dense(16, activation="relu"),
#        Dense(1, activation="sigmoid")
#    ])
#    model.compile(optimizer=Adam(0.001), loss="binary_crossentropy", metrics=[f1_score_metric])
#    return model
#
## =======================================
## NUEVOS MODELOS (TRANSFORMER + TCN)
## =======================================
#def build_transformer_encoder(input_shape, num_heads=4, ff_dim=64):
#    inputs = Input(shape=input_shape)
#    x = MultiHeadAttention(num_heads=num_heads, key_dim=input_shape[-1])(inputs, inputs)
#    x = LayerNormalization(epsilon=1e-6)(x)
#    x = Dense(ff_dim, activation="relu")(x)
#    x = Flatten()(x)
#    x = Dropout(0.3)(x)
#    outputs = Dense(1, activation="sigmoid")(x)
#    model = Model(inputs, outputs)
#    model.compile(optimizer=Adam(0.001), loss="binary_crossentropy", metrics=[f1_score_metric])
#    return model
#
#def build_tcn_simple(input_shape, nb_filters=32, kernel_size=3, nb_stacks=1, dilations=[1,2,4,8]):
#    from tcn import TCN
#    model = Sequential([
#        TCN(nb_filters=nb_filters, kernel_size=kernel_size, dilations=dilations,
#            nb_stacks=nb_stacks, dropout_rate=0.2, return_sequences=False,
#            input_shape=input_shape),
#        Dense(32, activation="relu"),
#        Dropout(0.2),
#        Dense(1, activation="sigmoid")
#    ])
#    model.compile(optimizer=Adam(0.001), loss="binary_crossentropy", metrics=[f1_score_metric])
#    return model
#
## =======================================
## EVALUACIÓN
## =======================================
#def evaluate_metrics(y_true, y_pred, returns=None):
#    metrics = {
#        "accuracy": accuracy_score(y_true, y_pred),
#        "precision": precision_score(y_true, y_pred, zero_division=0),
#        "recall": recall_score(y_true, y_pred, zero_division=0),
#        "f1": f1_score(y_true, y_pred, zero_division=0),
#        "mcc": matthews_corrcoef(y_true, y_pred)
#    }
#    if returns is not None:
#        strat_returns = np.where(y_pred==1, returns, -returns)
#        metrics["cum_return"] = strat_returns.cumsum()[-1]
#        metrics["sharpe"] = strat_returns.mean() / (strat_returns.std() + 1e-8)
#    return metrics
#
## =======================================
## FEATURE IMPORTANCE
## =======================================
#def permutation_importance_seq(model, X_val, y_val, features, n_repeats=5):
#    baseline_acc = accuracy_score(y_val, (model.predict(X_val) > 0.5).astype(int))
#    importances = {}
#    for j, feat in enumerate(features):
#        scores = []
#        X_val_permuted = X_val.copy()
#        for _ in range(n_repeats):
#            np.random.shuffle(X_val_permuted[:, :, j])
#            acc = accuracy_score(y_val, (model.predict(X_val_permuted) > 0.5).astype(int))
#            scores.append(baseline_acc - acc)
#        importances[feat] = np.mean(scores)
#    return pd.Series(importances).sort_values(ascending=False)
#
## =======================================
## RESUMEN DE MÉTRICAS
## =======================================
#def summarize_results(metrics_df):
#    summary = metrics_df.groupby("model").agg(["mean", "std"])
#    # ordenar por F1
#    summary = summary.sort_values(("f1", "mean"), ascending=False)
#    return summary.round(4)
#
#def aggregate_feature_importances(feature_importances_all, top_n=10):
#    agg_results = {}
#    for model_name in set([d["model"] for d in feature_importances_all]):
#        imps = [d["importances"] for d in feature_importances_all if d["model"] == model_name]
#        if imps:
#            imp_mean = pd.concat(imps, axis=1).mean(axis=1).sort_values(ascending=False)
#            agg_results[model_name] = imp_mean.head(top_n)
#            plt.figure(figsize=(8,5))
#            imp_mean.head(top_n).plot(kind="barh")
#            plt.gca().invert_yaxis()
#            plt.title(f"Top-{top_n} Features ({model_name})")
#            plt.show()
#    return agg_results
#
#def plot_feature_importance_heatmap(feature_importances_all, top_n=10):
#    agg_results = aggregate_feature_importances(feature_importances_all, top_n)
#    df_heatmap = pd.DataFrame(agg_results).fillna(0)
#    plt.figure(figsize=(10,6))
#    sns.heatmap(df_heatmap, annot=True, fmt=".3f", cmap="YlOrBr")
#    plt.title(f"Heatmap Features vs Modelos (Top-{top_n})")
#    plt.show()
#
## =======================================
## PIPELINE WALK-FORWARD
## =======================================
#def run_pipeline_walkforward(df, target_col="Target", return_col="Return",
#                             window_size=30, horizon=1, n_splits=6, scaler_type="robust"):
#    embargo = window_size + horizon - 1
#    test_size = int(len(df) * 0.2)
#
#    df_trainval = df.iloc[:-test_size]
#    features = [c for c in df.columns if c not in [target_col, return_col]]
#    print(f"\n📌 Usando todas las features: {features}")
#
#    X_all = df_trainval[features].values
#    y_all = df_trainval[target_col].values
#    returns_all = df_trainval[return_col].values
#    X_all_seq, y_all_seq = create_windows_multivariate_np(X_all, y_all, window_size, horizon)
#    ret_seq = returns_all[window_size+horizon-1:]
#
#    splitter = PurgedTimeSeriesSplit(n_splits=n_splits, embargo=embargo, test_size=0.2)
#    metrics_list, feature_importances_all = [], []
#
#    models = {
#        "LSTM": build_lstm_simple,
#        "GRU": build_gru_simple,
#        "LSTM+CNN": build_lstm_cnn_simple,
#        "GRU+CNN": build_gru_cnn_simple,
#        "Transformer": build_transformer_encoder,
#        "TCN": build_tcn_simple
#    }
#
#    for fold, (train_idx, val_idx) in enumerate(splitter.split(X_all_seq)):
#        print(f"\n=== Fold {fold+1}/{n_splits} ===")
#        X_train_raw, y_train = X_all_seq[train_idx], y_all_seq[train_idx]
#        X_val_raw, y_val = X_all_seq[val_idx], y_all_seq[val_idx]
#        r_val = ret_seq[val_idx]
#
#        scaler = RobustScaler() if scaler_type=="robust" else StandardScaler()
#        n_samples, seq_len, n_features = X_train_raw.shape
#        X_train_scaled = scaler.fit_transform(X_train_raw.reshape(-1, n_features)).reshape(n_samples, seq_len, n_features)
#        X_val_scaled = scaler.transform(X_val_raw.reshape(-1, n_features)).reshape(X_val_raw.shape[0], seq_len, n_features)
#
#        for name, fn in models.items():
#            print(f"\n--- Modelo: {name} ---")
#            model = fn(X_train_scaled.shape[1:])
#            es = EarlyStopping(monitor="val_loss", patience=5, restore_best_weights=True)
#            rlrop = ReduceLROnPlateau(monitor="val_loss", patience=3, factor=0.5, min_lr=1e-5)
#            ckpt = ModelCheckpoint(f"best_{name}_fold{fold+1}.keras", save_best_only=True)
#
#            model.fit(X_train_scaled, y_train,
#                      validation_data=(X_val_scaled, y_val),
#                      epochs=50, batch_size=32,
#                      callbacks=[es, rlrop, ckpt], verbose=0)
#
#            y_pred_val = (model.predict(X_val_scaled) > 0.5).astype("int32")
#            metrics_val = evaluate_metrics(y_val, y_pred_val, returns=r_val)
#            metrics_val.update({"fold": fold+1, "model": name})
#            metrics_list.append(metrics_val)
#
#            imp = permutation_importance_seq(model, X_val_scaled.copy(), y_val, features)
#            feature_importances_all.append({"fold": fold+1, "model": name, "importances": imp})
#
#    return pd.DataFrame(metrics_list), feature_importances_all
#