# HR Overtime Prediction and Forecasting
## Predicción y Pronóstico de Horas Extra de RRHH

Este notebook implementa un pipeline automatizado para el pronóstico de horas extra por departamento, utilizando datos históricos almacenados en SQL Server. El flujo incluye:

- **Carga y limpieza de datos:** Importación desde SQL Server, agregación quincenal, interpolación de valores faltantes y detección/corrección de outliers con el filtro de Hampel.
- **Análisis exploratorio y descomposición:** Selección automática del tipo de descomposición (aditiva o multiplicativa) según la varianza de los residuos y análisis de estacionariedad.
- **Entrenamiento y comparación de modelos:** Se entrenan y comparan cuatro modelos de forecasting:
  - ARIMA (selección automática de parámetros)
  - Prophet (con modo de estacionalidad dinámico y silenciamiento de outputs)
  - Holt-Winters (Suavización Exponencial)
  - XGBoost (con variables de rezago y codificación seno/coseno del mes para capturar estacionalidad)
- **Selección automática del mejor modelo:** Basada en métricas de validación cruzada (RMSE, MAPE, etc.), con lógica para evitar modelos de línea plana.
- **Generación de pronósticos:** Predicción de las próximas 12 quincenas (6 meses) con intervalos de confianza, usando solo el mejor modelo por departamento.
- **Visualización interactiva:** Gráficas con Plotly para históricos, descomposición y pronóstico.
- **Trazabilidad y auditoría:** Almacenamiento de modelos, métricas y predicciones en SQL Server para seguimiento y control.
- **Outputs limpios:** El notebook suprime mensajes innecesarios, especialmente de Prophet, y utiliza logging para trazabilidad.

**Objetivo:** Proveer pronósticos robustos y auditables de horas extra, minimizando la intervención manual y maximizando la calidad y trazabilidad del proceso analítico.

## 1. Importar Librerías y Configuración

Importamos las librerías necesarias para análisis, modelado, visualización y conexión con la base de datos. Configuramos el logging para trazabilidad y suprimimos advertencias innecesarias para mantener un output limpio.

In [56]:
import pandas as pd
import numpy as np
import pymssql
import logging
import datetime
from datetime import datetime
import os
from statsmodels.tsa.seasonal import seasonal_decompose
from statsmodels.tsa.holtwinters import ExponentialSmoothing
from statsmodels.tsa.stattools import adfuller
from pmdarima import auto_arima
from prophet import Prophet
from sklearn.metrics import mean_squared_error, mean_absolute_error, mean_absolute_percentage_error
from sklearn.model_selection import TimeSeriesSplit
from scipy.stats import median_abs_deviation
import joblib
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import warnings
from xgboost import XGBRegressor
import contextlib
import sys
import io

# Configurar advertencias y logging
warnings.filterwarnings("ignore")
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(levelname)s - %(message)s'
)

## 2. Conectar a la Base de Datos
Establecemos una conexión segura a SQL Server para cargar datos históricos y almacenar resultados.

In [57]:
def get_db_connection():
    SQL_SERVER = "172.28.192.1:50121"
    SQL_DB = "HR_Analytics"
    SQL_USER = "sa"
    SQL_PASSWORD = "123456"
    try:
        conn = pymssql.connect(
            server=SQL_SERVER,
            database=SQL_DB,
            user=SQL_USER,
            password=SQL_PASSWORD
        )
        logging.info("Conexión a la base de datos exitosa")
        return conn
    except Exception as e:
        logging.error(f"Error de conexión a la base de datos: {e}")
        raise

## 3. Cargar y Procesar Datos Históricos

Cargamos los datos históricos con `work_date < 2025-05-01`, los agregamos mes a mes y manejamos datos faltantes con interpolación. Se aplica un filtrado de Hampel para detectar y corregir outliers. Finalmente, se realiza una descomposición de series de tiempo, seleccionando automáticamente el modelo (aditivo o multiplicativo) que mejor se ajuste a los datos según la varianza de los residuos.

In [58]:
# Switch para eliminar outliers (por defecto False)
REMOVE_OUTLIERS = False

# Switches para selección de modelo
AUTO_MODEL_SELECTION = True  # Si True, selecciona automáticamente el mejor modelo por métricas
USE_XGBOOST = False          # Si AUTO_MODEL_SELECTION es False y este es True, usa XGBoost para todos los departamentos
USE_ARIMA = False            # Si AUTO_MODEL_SELECTION es False y este es True, usa ARIMA para todos los departamentos
USE_PROPHET = False          # Si AUTO_MODEL_SELECTION es False y este es True, usa Prophet para todos los departamentos
USE_ES = False               # Si AUTO_MODEL_SELECTION es False y este es True, usa Exponential Smoothing para todos los departamentos

def load_data():
    try:
        conn = get_db_connection()
        query = "SELECT work_date, department, total_overtime FROM vw_historical_data WHERE work_date < '2025-05-01' ORDER BY work_date"
        df = pd.read_sql(query, conn)
        conn.close()
        logging.info(f"Datos cargados: {len(df)} registros")
        df['work_date'] = pd.to_datetime(df['work_date'])
        
        # Agrupar por mes (MoM) - ahora al inicio del mes
        df.set_index('work_date', inplace=True)
        df = df.groupby('department').resample('MS').sum(numeric_only=True).reset_index()  # 'MS' = Month Start
        df.rename(columns={'work_date': 'ds', 'total_overtime': 'y'}, inplace=True)
        
        return df
    except Exception as e:
        logging.error(f"Error al cargar datos: {e}")
        return pd.DataFrame()

def handle_outliers(df):
    df_cleaned = df.copy()
    for dept in df_cleaned['department'].unique():
        dept_data = df_cleaned[df_cleaned['department'] == dept]['y']
        
        # Filtrado de Hampel para detección de outliers
        median = dept_data.median()
        mad = median_abs_deviation(dept_data)
        threshold = 3 * mad
        
        lower_bound = median - threshold
        upper_bound = median + threshold
        
        outliers = dept_data[(dept_data < lower_bound) | (dept_data > upper_bound)]
        if not outliers.empty:
            logging.warning(f"Outliers detectados en {dept}:\n{df_cleaned[(df_cleaned['department'] == dept) & df_cleaned['y'].isin(outliers)]}")
            df_cleaned.loc[(df_cleaned['department'] == dept) & (df_cleaned['y'].isin(outliers)), 'y'] = np.nan
        
    df_cleaned['y'] = df_cleaned.groupby('department')['y'].transform(lambda x: x.interpolate(method='linear'))
    return df_cleaned

# Cargar datos y aplicar/remover outliers según el switch
df_all_data = load_data()
if REMOVE_OUTLIERS:
    df_all_data = handle_outliers(df_all_data)

logging.info(f"Resumen de datos:\nDepartamentos únicos: {df_all_data['department'].unique()}\nFechas únicas: {df_all_data['ds'].unique()}\nTotal registros: {len(df_all_data)}")


def select_decomposition_type(series):
    # Rellenar valores nulos con la media de la serie para evitar errores
    series_filled = series.fillna(series.mean())

    # Realizar el test de Dickey-Fuller para verificar estacionalidad
    adf_result = adfuller(series_filled)
    is_stationary = adf_result[1] <= 0.05

    # Descomposición aditiva
    result_add = seasonal_decompose(series_filled, model='additive', period=12)
    std_add = np.nanstd(result_add.resid)

    # Verificar si hay ceros o negativos para evitar error en multiplicativa
    if (series_filled <= 0).any():
        logging.warning("La serie contiene ceros o valores negativos. Solo se usará descomposición aditiva.")
        std_mul = np.inf
        decomposition_type = 'additive'
    else:
        # Descomposición multiplicativa solo si todos los valores son positivos
        try:
            result_mul = seasonal_decompose(series_filled, model='multiplicative', period=12)
            std_mul = np.nanstd(result_mul.resid)
            decomposition_type = 'aditive' if std_add < std_mul else 'multiplicative'
        except Exception as e:
            logging.warning(f"No se pudo calcular la descomposición multiplicativa: {e}")
            std_mul = np.inf
            decomposition_type = 'additive'

    # Limpiar prints: solo logging
    logging.info(f"Desviación estándar de los residuales aditivos: {std_add:.2f}")
    logging.info(f"Desviación estándar de los residuales multiplicativos: {std_mul if std_mul != np.inf else 'N/A'}")
    logging.info(f"P-value del test ADF: {adf_result[1]:.2f}")

    if not is_stationary and decomposition_type == 'additive':
        logging.info("La serie no es estacionaria y no tiene una tendencia clara. Se descarta ARIMA.")
        return decomposition_type, False

    logging.info(f"Criterio de selección: La desviación estándar de los residuos {decomposition_type} es menor.")
    return decomposition_type, True

def plot_decomposition(series, decomposition_type, dept):
    # Rellenar valores nulos con la media de la serie para una visualización completa
    series_filled = series.fillna(series.mean())
    result = seasonal_decompose(series_filled, model=decomposition_type, period=12)
    fig = make_subplots(rows=4, cols=1, subplot_titles=['Original', 'Tendencia', 'Estacionalidad', 'Residuos'])
    
    fig.add_trace(go.Scatter(x=series.index, y=series, mode='lines', name='Original'), row=1, col=1)
    fig.add_trace(go.Scatter(x=result.trend.index, y=result.trend, mode='lines', name='Tendencia'), row=2, col=1)
    fig.add_trace(go.Scatter(x=result.seasonal.index, y=result.seasonal, mode='lines', name='Estacionalidad'), row=3, col=1)
    fig.add_trace(go.Scatter(x=result.resid.index, y=result.resid, mode='lines', name='Residuos'), row=4, col=1)
    
    fig.update_layout(height=800, title_text=f"Descomposición de la Serie Temporal ({decomposition_type.capitalize()}) para {dept}")
    fig.show()

# Ejemplo de uso
for dept in df_all_data['department'].unique():
    dept_data = df_all_data[df_all_data['department'] == dept].set_index('ds')['y']
    decomposition_type, _ = select_decomposition_type(dept_data)
    logging.info(f"El departamento {dept} tiene una descomposición {decomposition_type}.")
    plot_decomposition(dept_data, decomposition_type, dept)

2025-08-14 17:13:15,506 - INFO - Conexión a la base de datos exitosa
2025-08-14 17:13:17,044 - INFO - Datos cargados: 1031 registros
2025-08-14 17:13:17,044 - INFO - Datos cargados: 1031 registros
2025-08-14 17:13:17,083 - INFO - Resumen de datos:
Departamentos únicos: ['Finance' 'HR' 'IT' 'Inventory' 'Marketing' 'Sales']
Fechas únicas: <DatetimeArray>
['2022-02-01 00:00:00', '2022-03-01 00:00:00', '2022-04-01 00:00:00',
 '2022-05-01 00:00:00', '2022-06-01 00:00:00', '2022-07-01 00:00:00',
 '2022-08-01 00:00:00', '2022-09-01 00:00:00', '2022-10-01 00:00:00',
 '2022-11-01 00:00:00', '2022-12-01 00:00:00', '2023-01-01 00:00:00',
 '2023-02-01 00:00:00', '2023-03-01 00:00:00', '2023-04-01 00:00:00',
 '2023-05-01 00:00:00', '2023-06-01 00:00:00', '2023-07-01 00:00:00',
 '2023-08-01 00:00:00', '2023-09-01 00:00:00', '2023-10-01 00:00:00',
 '2023-11-01 00:00:00', '2023-12-01 00:00:00', '2024-01-01 00:00:00',
 '2024-02-01 00:00:00', '2024-03-01 00:00:00', '2024-04-01 00:00:00',
 '2024-05-01 00

2025-08-14 17:13:17,354 - INFO - Desviación estándar de los residuales aditivos: 21.38
2025-08-14 17:13:17,356 - INFO - Desviación estándar de los residuales multiplicativos: 0.18835410543659423
2025-08-14 17:13:17,357 - INFO - P-value del test ADF: 0.99
2025-08-14 17:13:17,358 - INFO - Criterio de selección: La desviación estándar de los residuos multiplicative es menor.
2025-08-14 17:13:17,359 - INFO - El departamento HR tiene una descomposición multiplicative.
2025-08-14 17:13:17,356 - INFO - Desviación estándar de los residuales multiplicativos: 0.18835410543659423
2025-08-14 17:13:17,357 - INFO - P-value del test ADF: 0.99
2025-08-14 17:13:17,358 - INFO - Criterio de selección: La desviación estándar de los residuos multiplicative es menor.
2025-08-14 17:13:17,359 - INFO - El departamento HR tiene una descomposición multiplicative.


2025-08-14 17:13:17,481 - INFO - Desviación estándar de los residuales aditivos: 24.20
2025-08-14 17:13:17,483 - INFO - Desviación estándar de los residuales multiplicativos: 0.1301486444030368
2025-08-14 17:13:17,485 - INFO - P-value del test ADF: 0.96
2025-08-14 17:13:17,486 - INFO - Criterio de selección: La desviación estándar de los residuos multiplicative es menor.
2025-08-14 17:13:17,490 - INFO - El departamento IT tiene una descomposición multiplicative.
2025-08-14 17:13:17,483 - INFO - Desviación estándar de los residuales multiplicativos: 0.1301486444030368
2025-08-14 17:13:17,485 - INFO - P-value del test ADF: 0.96
2025-08-14 17:13:17,486 - INFO - Criterio de selección: La desviación estándar de los residuos multiplicative es menor.
2025-08-14 17:13:17,490 - INFO - El departamento IT tiene una descomposición multiplicative.


2025-08-14 17:13:18,091 - INFO - Desviación estándar de los residuales aditivos: 50.32
2025-08-14 17:13:18,093 - INFO - Desviación estándar de los residuales multiplicativos: 0.1478968553656041
2025-08-14 17:13:18,097 - INFO - P-value del test ADF: 0.94
2025-08-14 17:13:18,101 - INFO - Criterio de selección: La desviación estándar de los residuos multiplicative es menor.
2025-08-14 17:13:18,103 - INFO - El departamento Inventory tiene una descomposición multiplicative.
2025-08-14 17:13:18,093 - INFO - Desviación estándar de los residuales multiplicativos: 0.1478968553656041
2025-08-14 17:13:18,097 - INFO - P-value del test ADF: 0.94
2025-08-14 17:13:18,101 - INFO - Criterio de selección: La desviación estándar de los residuos multiplicative es menor.
2025-08-14 17:13:18,103 - INFO - El departamento Inventory tiene una descomposición multiplicative.


2025-08-14 17:13:18,628 - INFO - Desviación estándar de los residuales aditivos: 34.11
2025-08-14 17:13:18,631 - INFO - Desviación estándar de los residuales multiplicativos: 0.17471737636297135
2025-08-14 17:13:18,633 - INFO - P-value del test ADF: 0.73
2025-08-14 17:13:18,639 - INFO - Criterio de selección: La desviación estándar de los residuos multiplicative es menor.
2025-08-14 17:13:18,641 - INFO - El departamento Marketing tiene una descomposición multiplicative.
2025-08-14 17:13:18,631 - INFO - Desviación estándar de los residuales multiplicativos: 0.17471737636297135
2025-08-14 17:13:18,633 - INFO - P-value del test ADF: 0.73
2025-08-14 17:13:18,639 - INFO - Criterio de selección: La desviación estándar de los residuos multiplicative es menor.
2025-08-14 17:13:18,641 - INFO - El departamento Marketing tiene una descomposición multiplicative.


2025-08-14 17:13:19,461 - INFO - Desviación estándar de los residuales aditivos: 350.22
2025-08-14 17:13:19,464 - INFO - Desviación estándar de los residuales multiplicativos: 0.09279681745875171
2025-08-14 17:13:19,467 - INFO - P-value del test ADF: 0.84
2025-08-14 17:13:19,469 - INFO - Criterio de selección: La desviación estándar de los residuos multiplicative es menor.
2025-08-14 17:13:19,464 - INFO - Desviación estándar de los residuales multiplicativos: 0.09279681745875171
2025-08-14 17:13:19,467 - INFO - P-value del test ADF: 0.84
2025-08-14 17:13:19,469 - INFO - Criterio de selección: La desviación estándar de los residuos multiplicative es menor.
2025-08-14 17:13:19,474 - INFO - El departamento Sales tiene una descomposición multiplicative.
2025-08-14 17:13:19,474 - INFO - El departamento Sales tiene una descomposición multiplicative.


## 4. Entrenamiento y Selección de Modelos

Para cada departamento, entrenamos y evaluamos tres modelos de pronóstico:

1.  **ARIMA**: Se utiliza `auto_arima` para encontrar los parámetros óptimos. 
2.  **Prophet**: Actúa como un modelo robusto de respaldo. Su modo de estacionalidad (`additive` o `multiplicative`) se configura dinámicamente según el análisis de descomposición previo.
3.  **Holt (Suavización Exponencial)**: Un modelo clásico que captura tendencias.

Los modelos se evalúan mediante validación cruzada de series de tiempo. El mejor modelo se selecciona en función de su rendimiento en métricas clave (RMSE, MAE, SMAPE, MASE). Se genera una tabla comparativa para visualizar el rendimiento y la selección final.

In [59]:
## 4. Entrenamiento y Selección de Modelos Mejorado

import pandas as pd
import numpy as np
import warnings
from prophet import Prophet
import xgboost as xgb
from statsmodels.tsa.holtwinters import ExponentialSmoothing
from pmdarima import auto_arima
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
from sklearn.model_selection import TimeSeriesSplit, ParameterGrid
import logging
import contextlib
import sys
import io
from itertools import product
import joblib

warnings.filterwarnings("ignore")

# =====================
# FUNCIONES DE MÉTRICAS
# =====================
def calculate_mape(y_true, y_pred):
    """Calcula MAPE evitando división por cero"""
    mask = y_true != 0
    if mask.sum() == 0:
        return float('inf')
    return np.mean(np.abs((y_true[mask] - y_pred[mask]) / y_true[mask])) * 100

def calculate_smape(y_true, y_pred):
    """Calcula SMAPE simétrico"""
    numerator = 2 * np.abs(y_pred - y_true)
    denominator = np.abs(y_true) + np.abs(y_pred)
    mask = denominator != 0
    if mask.sum() == 0:
        return float('inf')
    return np.mean(numerator[mask] / denominator[mask]) * 100

def calculate_rmse(y_true, y_pred):
    """Calcula RMSE"""
    return np.sqrt(mean_squared_error(y_true, y_pred))

def calculate_mae(y_true, y_pred):
    """Calcula MAE"""
    return mean_absolute_error(y_true, y_pred)

def calculate_mase(y_true, y_pred, y_train):
    """Calcula MASE (Mean Absolute Scaled Error)"""
    try:
        naive_forecast = y_train.shift(1).dropna()
        mae_naive = calculate_mae(y_train[1:], naive_forecast)
        if mae_naive == 0:
            return float('inf')
        return calculate_mae(y_true, y_pred) / mae_naive
    except:
        return float('inf')

def evaluate_model_performance(y_true, y_pred, y_train):
    """Evalúa el desempeño del modelo con múltiples métricas"""
    metrics = {
        'RMSE': calculate_rmse(y_true, y_pred),
        'MAE': calculate_mae(y_true, y_pred),
        'MAPE': calculate_mape(y_true, y_pred),
        'SMAPE': calculate_smape(y_true, y_pred),
        'MASE': calculate_mase(y_true, y_pred, y_train)
    }
    
    # Score compuesto (menor es mejor)
    weights = {'RMSE': 0.2, 'MAE': 0.2, 'MAPE': 0.3, 'SMAPE': 0.2, 'MASE': 0.1}
    normalized_metrics = {}
    
    for metric, value in metrics.items():
        if np.isfinite(value):
            if metric in ['MAPE', 'SMAPE']:
                normalized_metrics[metric] = min(value / 20, 1.0)  # Normalizar a 20%
            elif metric == 'MASE':
                normalized_metrics[metric] = min(value / 2.0, 1.0)  # Normalizar a 2.0
            else:
                # Para RMSE y MAE, normalizar por la desviación estándar
                std_train = y_train.std()
                normalized_metrics[metric] = min(value / std_train, 1.0) if std_train > 0 else 1.0
        else:
            normalized_metrics[metric] = 1.0
    
    composite_score = sum(normalized_metrics[metric] * weights[metric] for metric in weights.keys())
    metrics['Composite_Score'] = composite_score
    
    return metrics

# ==========================
# VALIDACIÓN CRUZADA ROBUSTA
# ==========================
def time_series_cross_validation(series, model_func, model_params, n_splits=3, test_size=6, min_train_size=24):
    """
    Validación cruzada específica para series temporales con grid search
    """
    if len(series) < min_train_size + test_size:
        return None
    
    # Crear splits de validación
    total_length = len(series)
    scores = []
    
    for i in range(n_splits):
        # Calcular índices de entrenamiento y prueba
        test_start = total_length - test_size * (n_splits - i)
        train_end = test_start
        
        if train_end < min_train_size:
            continue
            
        train_data = series.iloc[:train_end]
        test_data = series.iloc[test_start:test_start + test_size]
        
        if len(train_data) < min_train_size or len(test_data) == 0:
            continue
        
        try:
            # Entrenar modelo con parámetros específicos
            predictions = model_func(train_data, len(test_data), **model_params)
            
            if predictions is None or len(predictions) == 0:
                continue
            
            # Ajustar longitud de predicciones
            predictions = np.array(predictions)[:len(test_data)]
            test_values = test_data.values[:len(predictions)]
            
            if len(predictions) == 0 or len(set(predictions)) <= 1:
                continue
            
            # Evaluar métricas
            metrics = evaluate_model_performance(test_values, predictions, train_data)
            scores.append(metrics)
            
        except Exception as e:
            logging.warning(f"Error en validación cruzada: {e}")
            continue
    
    if not scores:
        return None
    
    # Promediar métricas
    avg_metrics = {}
    for metric in ['RMSE', 'MAE', 'MAPE', 'SMAPE', 'MASE', 'Composite_Score']:
        values = [score[metric] for score in scores if np.isfinite(score[metric])]
        avg_metrics[metric] = np.mean(values) if values else float('inf')
    
    return avg_metrics

# =====================================
# MODELOS CON HIPERPARÁMETROS OPTIMIZADOS
# =====================================

class ARIMAOptimizer:
    def __init__(self):
        self.param_grid = [
            {'seasonal': True, 'stepwise': True, 'approximation': False, 'max_p': 3, 'max_q': 3, 'max_P': 2, 'max_Q': 2},
            {'seasonal': True, 'stepwise': False, 'approximation': True, 'max_p': 2, 'max_q': 2, 'max_P': 1, 'max_Q': 1},
            {'seasonal': False, 'stepwise': True, 'approximation': False, 'max_p': 5, 'max_q': 5},
        ]
    
    def fit_predict(self, train_data, horizon, **params):
        try:
            model = auto_arima(
                train_data,
                start_p=0, start_q=0,
                max_p=params.get('max_p', 3),
                max_q=params.get('max_q', 3),
                start_P=0, start_Q=0,
                max_P=params.get('max_P', 2),
                max_Q=params.get('max_Q', 2),
                seasonal=params.get('seasonal', True),
                m=12,
                stepwise=params.get('stepwise', True),
                approximation=params.get('approximation', False),
                suppress_warnings=True,
                error_action='ignore',
                trace=False,
                random_state=42,
                n_fits=30
            )
            
            predictions = model.predict(n_periods=horizon)
            return predictions, model
        except Exception as e:
            logging.warning(f"Error en ARIMA: {e}")
            return None, None
    
    def optimize(self, series, horizon=6):
        best_score = float('inf')
        best_params = None
        best_model = None
        
        for params in self.param_grid:
            try:
                metrics = time_series_cross_validation(
                    series, 
                    lambda train, h, **p: self.fit_predict(train, h, **p)[0],
                    params,
                    n_splits=3,
                    test_size=horizon
                )
                
                if metrics and metrics['Composite_Score'] < best_score:
                    best_score = metrics['Composite_Score']
                    best_params = params
                    
            except Exception as e:
                logging.warning(f"Error optimizando ARIMA con params {params}: {e}")
                continue
        
        if best_params:
            predictions, model = self.fit_predict(series, horizon, **best_params)
            best_model = model
        
        return best_model, best_params, best_score

class ProphetOptimizer:
    def __init__(self):
        self.param_grid = [
            {
                'seasonality_mode': 'additive',
                'changepoint_prior_scale': 0.05,
                'seasonality_prior_scale': 10.0,
                'yearly_seasonality': True,
                'weekly_seasonality': False,
                'daily_seasonality': False
            },
            {
                'seasonality_mode': 'multiplicative',
                'changepoint_prior_scale': 0.1,
                'seasonality_prior_scale': 5.0,
                'yearly_seasonality': True,
                'weekly_seasonality': False,
                'daily_seasonality': False
            },
            {
                'seasonality_mode': 'additive',
                'changepoint_prior_scale': 0.01,
                'seasonality_prior_scale': 15.0,
                'yearly_seasonality': True,
                'weekly_seasonality': False,
                'daily_seasonality': False
            }
        ]
    
    def fit_predict(self, train_data, horizon, **params):
        try:
            df = pd.DataFrame({
                'ds': train_data.index,
                'y': train_data.values
            })
            
            with contextlib.redirect_stdout(io.StringIO()), contextlib.redirect_stderr(io.StringIO()):
                model = Prophet(**params)
                model.fit(df)
                
                future = model.make_future_dataframe(periods=horizon, freq='MS')
                forecast = model.predict(future)
                
                predictions = forecast['yhat'].tail(horizon).values
                
            return predictions, model
        except Exception as e:
            logging.warning(f"Error en Prophet: {e}")
            return None, None
    
    def optimize(self, series, horizon=6):
        best_score = float('inf')
        best_params = None
        best_model = None
        
        for params in self.param_grid:
            try:
                metrics = time_series_cross_validation(
                    series,
                    lambda train, h, **p: self.fit_predict(train, h, **p)[0],
                    params,
                    n_splits=3,
                    test_size=horizon
                )
                
                if metrics and metrics['Composite_Score'] < best_score:
                    best_score = metrics['Composite_Score']
                    best_params = params
                    
            except Exception as e:
                logging.warning(f"Error optimizando Prophet con params {params}: {e}")
                continue
        
        if best_params:
            predictions, model = self.fit_predict(series, horizon, **best_params)
            best_model = model
        
        return best_model, best_params, best_score

class ExponentialSmoothingOptimizer:
    def __init__(self):
        self.param_grid = list(product(
            ['add', 'mul', None],  # trend
            ['add', 'mul', None],  # seasonal
            [True, False]  # damped_trend
        ))
    
    def fit_predict(self, train_data, horizon, **params):
        try:
            trend = params.get('trend')
            seasonal = params.get('seasonal')
            damped_trend = params.get('damped_trend', False)
            
            # Validaciones
            if trend is None and damped_trend:
                return None, None
            if seasonal == 'mul' and (train_data <= 0).any():
                seasonal = 'add'
            
            model = ExponentialSmoothing(
                train_data,
                trend=trend,
                seasonal=seasonal,
                seasonal_periods=12,
                damped_trend=damped_trend
            )
            
            fitted_model = model.fit(optimized=True, remove_bias=True)
            predictions = fitted_model.forecast(horizon)
            
            return predictions.values if hasattr(predictions, 'values') else predictions, fitted_model
        except Exception as e:
            logging.warning(f"Error en Exponential Smoothing: {e}")
            return None, None
    
    def optimize(self, series, horizon=6):
        best_score = float('inf')
        best_params = None
        best_model = None
        
        for trend, seasonal, damped in self.param_grid:
            params = {'trend': trend, 'seasonal': seasonal, 'damped_trend': damped}
            
            try:
                metrics = time_series_cross_validation(
                    series,
                    lambda train, h, **p: self.fit_predict(train, h, **p)[0],
                    params,
                    n_splits=3,
                    test_size=horizon
                )
                
                if metrics and metrics['Composite_Score'] < best_score:
                    best_score = metrics['Composite_Score']
                    best_params = params
                    
            except Exception as e:
                continue
        
        if best_params:
            predictions, model = self.fit_predict(series, horizon, **best_params)
            best_model = model
        
        return best_model, best_params, best_score

class XGBoostOptimizer:
    def __init__(self):
        self.param_grid = [
            {
                'n_estimators': 200,
                'max_depth': 4,
                'learning_rate': 0.05,
                'subsample': 0.8,
                'colsample_bytree': 0.8,
                'reg_alpha': 0.1,
                'reg_lambda': 1.0,
                'n_lags': 6
            },
            {
                'n_estimators': 300,
                'max_depth': 6,
                'learning_rate': 0.03,
                'subsample': 0.9,
                'colsample_bytree': 0.9,
                'reg_alpha': 0.05,
                'reg_lambda': 0.5,
                'n_lags': 12
            },
            {
                'n_estimators': 500,
                'max_depth': 3,
                'learning_rate': 0.01,
                'subsample': 0.85,
                'colsample_bytree': 0.85,
                'reg_alpha': 0.2,
                'reg_lambda': 2.0,
                'n_lags': 9
            }
        ]
    
    def create_features(self, series, n_lags=12):
        """Crear características para XGBoost"""
        df = pd.DataFrame({'y': series.values}, index=series.index)
        
        # Características de rezago
        for lag in range(1, n_lags + 1):
            df[f'lag_{lag}'] = df['y'].shift(lag)
        
        # Características temporales
        df['month'] = df.index.month
        df['quarter'] = df.index.quarter
        df['year'] = df.index.year
        
        # Características cíclicas
        df['month_sin'] = np.sin(2 * np.pi * df['month'] / 12)
        df['month_cos'] = np.cos(2 * np.pi * df['month'] / 12)
        
        # Estadísticas móviles
        for window in [3, 6, 12]:
            if len(series) > window:
                df[f'ma_{window}'] = df['y'].rolling(window=window, min_periods=1).mean().shift(1)
                df[f'std_{window}'] = df['y'].rolling(window=window, min_periods=1).std().shift(1).fillna(0)
        
        return df.dropna()
    
    def fit_predict(self, train_data, horizon, **params):
        try:
            n_lags = params.pop('n_lags', 12)
            df = self.create_features(train_data, n_lags)
            
            if len(df) < 12:
                return None, None
            
            X = df.drop('y', axis=1)
            y = df['y']
            
            model = xgb.XGBRegressor(
                random_state=42,
                n_jobs=-1,
                **params
            )
            
            model.fit(X, y)
            
            # Predicciones recursivas
            predictions = []
            last_features = X.iloc[-1].values.copy()
            feature_names = X.columns.tolist()
            
            for step in range(horizon):
                pred = model.predict(last_features.reshape(1, -1))[0]
                predictions.append(pred)
                
                # Actualizar características
                n_lag_features = sum(1 for col in feature_names if col.startswith('lag_'))
                if n_lag_features > 0:
                    last_features[1:n_lag_features] = last_features[0:n_lag_features-1]
                    last_features[0] = pred
                
                # Actualizar características temporales
                current_date = train_data.index[-1] + pd.DateOffset(months=step+1)
                
                if 'month' in feature_names:
                    last_features[feature_names.index('month')] = current_date.month
                if 'quarter' in feature_names:
                    last_features[feature_names.index('quarter')] = current_date.quarter
                if 'year' in feature_names:
                    last_features[feature_names.index('year')] = current_date.year
                if 'month_sin' in feature_names:
                    last_features[feature_names.index('month_sin')] = np.sin(2 * np.pi * current_date.month / 12)
                if 'month_cos' in feature_names:
                    last_features[feature_names.index('month_cos')] = np.cos(2 * np.pi * current_date.month / 12)
            
            return predictions, model
        except Exception as e:
            logging.warning(f"Error en XGBoost: {e}")
            return None, None
    
    def optimize(self, series, horizon=6):
        best_score = float('inf')
        best_params = None
        best_model = None
        
        for params in self.param_grid:
            try:
                metrics = time_series_cross_validation(
                    series,
                    lambda train, h, **p: self.fit_predict(train, h, **p)[0],
                    params,
                    n_splits=3,
                    test_size=horizon
                )
                
                if metrics and metrics['Composite_Score'] < best_score:
                    best_score = metrics['Composite_Score']
                    best_params = params
                    
            except Exception as e:
                logging.warning(f"Error optimizando XGBoost con params {params}: {e}")
                continue
        
        if best_params:
            predictions, model = self.fit_predict(series, horizon, **best_params)
            best_model = model
        
        return best_model, best_params, best_score

# ==============================
# ENTRENAMIENTO Y SELECCIÓN FINAL
# ==============================
def train_and_select_best_model_improved(series, horizon=12, switches=None):
    """
    Entrena y selecciona el mejor modelo basado en switches y métricas
    """
    if switches is None:
        switches = {
            'AUTO_MODEL_SELECTION': True,
            'USE_XGBOOST': False,
            'USE_ARIMA': False,
            'USE_PROPHET': False,
            'USE_ES': False
        }
    
    models_to_train = []
    
    # Determinar qué modelos entrenar según switches
    if switches.get('AUTO_MODEL_SELECTION', True):
        models_to_train = ['ARIMA', 'Prophet', 'Exponential_Smoothing', 'XGBoost']
    else:
        if switches.get('USE_ARIMA', False):
            models_to_train.append('ARIMA')
        if switches.get('USE_PROPHET', False):
            models_to_train.append('Prophet')
        if switches.get('USE_ES', False):
            models_to_train.append('Exponential_Smoothing')
        if switches.get('USE_XGBOOST', False):
            models_to_train.append('XGBoost')
    
    if not models_to_train:
        logging.error("No se especificaron modelos para entrenar")
        return None, None, None
    
    results = []
    trained_models = {}
    
    # Entrenar cada modelo
    for model_name in models_to_train:
        logging.info(f"Optimizando {model_name}...")
        
        try:
            if model_name == 'ARIMA':
                optimizer = ARIMAOptimizer()
            elif model_name == 'Prophet':
                optimizer = ProphetOptimizer()
            elif model_name == 'Exponential_Smoothing':
                optimizer = ExponentialSmoothingOptimizer()
            elif model_name == 'XGBoost':
                optimizer = XGBoostOptimizer()
            
            model, best_params, best_score = optimizer.optimize(series, horizon)
            
            if model is not None:
                # Evaluar el modelo final
                final_metrics = time_series_cross_validation(
                    series,
                    lambda train, h, **p: optimizer.fit_predict(train, h, **best_params)[0],
                    best_params or {},
                    n_splits=3,
                    test_size=6
                )
                
                if final_metrics:
                    trained_models[model_name] = {
                        'model': model,
                        'params': best_params,
                        'optimizer': optimizer
                    }
                    
                    results.append({
                        'Model': model_name,
                        'RMSE': round(final_metrics['RMSE'], 4),
                        'MAE': round(final_metrics['MAE'], 4),
                        'MAPE': round(final_metrics['MAPE'], 4),
                        'SMAPE': round(final_metrics['SMAPE'], 4),
                        'MASE': round(final_metrics['MASE'], 4),
                        'Composite_Score': round(final_metrics['Composite_Score'], 4),
                        'Meets_Target': final_metrics['MAPE'] <= 10 and final_metrics['SMAPE'] <= 10,
                        'Best_Params': str(best_params)
                    })
                    
        except Exception as e:
            logging.error(f"Error entrenando {model_name}: {e}")
            continue
    
    if not results:
        logging.error("No se pudieron entrenar modelos exitosamente")
        return None, None, None
    
    # Crear DataFrame de resultados
    results_df = pd.DataFrame(results).sort_values('Composite_Score')
    
    # Seleccionar mejor modelo
    best_model_name = results_df.iloc[0]['Model']
    best_model_info = trained_models[best_model_name]
    
    logging.info(f"Mejor modelo seleccionado: {best_model_name}")
    logging.info(f"Score compuesto: {results_df.iloc[0]['Composite_Score']:.4f}")
    
    return best_model_name, results_df, best_model_info

# =================
# EJECUCIÓN PRINCIPAL
# =================
def run_improved_model_training():
    """Ejecuta el entrenamiento mejorado para todos los departamentos"""
    
    # Usar switches definidos en el módulo 3
    switches = {
        'AUTO_MODEL_SELECTION': AUTO_MODEL_SELECTION,
        'USE_XGBOOST': USE_XGBOOST,
        'USE_ARIMA': USE_ARIMA,
        'USE_PROPHET': USE_PROPHET,
        'USE_ES': USE_ES
    }
    
    all_metrics = {}
    all_models = {}
    
    print("="*60)
    print("ENTRENAMIENTO Y SELECCIÓN DE MODELOS MEJORADO")
    print("="*60)
    
    for dept in df_all_data['department'].unique():
        print(f"\n{'='*20} DEPARTAMENTO: {dept} {'='*20}")
        
        # Preparar datos del departamento
        dept_series = (
            df_all_data[df_all_data['department'] == dept]
            .set_index('ds')['y']
            .asfreq('MS')
            .interpolate(method='linear')
        )
        
        if len(dept_series) < 24:
            logging.warning(f"Datos insuficientes para {dept}: {len(dept_series)} meses")
            continue
        
        # Entrenar y seleccionar mejor modelo
        best_model_name, metrics_df, best_model_info = train_and_select_best_model_improved(
            dept_series, 
            horizon=12,
            switches=switches
        )
        
        if best_model_name is None:
            logging.error(f"No se pudo entrenar ningún modelo para {dept}")
            continue
        
        # Almacenar resultados
        all_metrics[dept] = metrics_df
        all_models[dept] = best_model_info
        
        # Mostrar resultados
        print(f"\nResultados para {dept}:")
        print(metrics_df[['Model', 'RMSE', 'MAE', 'MAPE', 'SMAPE', 'Composite_Score', 'Meets_Target']].to_string(index=False))
        print(f"\n🏆 MEJOR MODELO: {best_model_name}")
        
        best_row = metrics_df.iloc[0]
        print(f"   MAPE: {best_row['MAPE']:.2f}%")
        print(f"   SMAPE: {best_row['SMAPE']:.2f}%")
        print(f"   RMSE: {best_row['RMSE']:.4f}")
        print(f"   Score Compuesto: {best_row['Composite_Score']:.4f}")
        print(f"   Cumple objetivos: {'✅' if best_row['Meets_Target'] else '❌'}")
        print(f"   Parámetros: {best_row['Best_Params']}")
    
    return all_metrics, all_models

# Ejecutar entrenamiento mejorado
all_metrics, all_models = run_improved_model_training()

2025-08-14 17:13:32,751 - INFO - Optimizando ARIMA...


ENTRENAMIENTO Y SELECCIÓN DE MODELOS MEJORADO



2025-08-14 17:13:35,258 - INFO - Optimizando Prophet...
2025-08-14 17:13:35,265 - DEBUG - cmd: where.exe tbb.dll
cwd: None
2025-08-14 17:13:35,685 - DEBUG - TBB already found in load path
2025-08-14 17:13:35,709 - INFO - n_changepoints greater than number of observations. Using 20.
2025-08-14 17:13:35,715 - DEBUG - input tempfile: C:\Users\joey_\AppData\Local\Temp\tmpzib1ovx4\2s8nqucv.json
2025-08-14 17:13:35,725 - DEBUG - input tempfile: C:\Users\joey_\AppData\Local\Temp\tmpzib1ovx4\7nwgo2xz.json
2025-08-14 17:13:35,729 - DEBUG - idx 0
2025-08-14 17:13:35,731 - DEBUG - running CmdStan, num_threads: None
2025-08-14 17:13:35,733 - DEBUG - CmdStan args: ['C:\\Users\\joey_\\AppData\\Local\\Programs\\Python\\Maestro_Yoda\\Lib\\site-packages\\prophet\\stan_model\\prophet_model.bin', 'random', 'seed=45338', 'data', 'file=C:\\Users\\joey_\\AppData\\Local\\Temp\\tmpzib1ovx4\\2s8nqucv.json', 'init=C:\\Users\\joey_\\AppData\\Local\\Temp\\tmpzib1ovx4\\7nwgo2xz.json', 'output', 'file=C:\\Users\\jo


Resultados para Finance:
                Model     RMSE     MAE    MAPE   SMAPE  Composite_Score  Meets_Target
Exponential_Smoothing  76.4656 48.9163 31.0879 31.0071           0.9096         False
              Prophet  92.7835 69.4585 57.5364 44.0052           0.9136         False
              XGBoost 135.6990 89.0772 60.9835 54.7763           0.9978         False

🏆 MEJOR MODELO: Exponential_Smoothing
   MAPE: 31.09%
   SMAPE: 31.01%
   RMSE: 76.4656
   Score Compuesto: 0.9096
   Cumple objetivos: ❌
   Parámetros: {'trend': None, 'seasonal': 'add', 'damped_trend': False}



2025-08-14 17:15:15,103 - INFO - Optimizando Prophet...
2025-08-14 17:15:15,107 - DEBUG - cmd: where.exe tbb.dll
cwd: None
2025-08-14 17:15:15,268 - DEBUG - TBB already found in load path
2025-08-14 17:15:15,293 - INFO - n_changepoints greater than number of observations. Using 21.
2025-08-14 17:15:15,297 - DEBUG - input tempfile: C:\Users\joey_\AppData\Local\Temp\tmpzib1ovx4\y0qgygwo.json
2025-08-14 17:15:15,304 - DEBUG - input tempfile: C:\Users\joey_\AppData\Local\Temp\tmpzib1ovx4\9i1nk5fc.json
2025-08-14 17:15:15,307 - DEBUG - idx 0
2025-08-14 17:15:15,308 - DEBUG - running CmdStan, num_threads: None
2025-08-14 17:15:15,309 - DEBUG - CmdStan args: ['C:\\Users\\joey_\\AppData\\Local\\Programs\\Python\\Maestro_Yoda\\Lib\\site-packages\\prophet\\stan_model\\prophet_model.bin', 'random', 'seed=24220', 'data', 'file=C:\\Users\\joey_\\AppData\\Local\\Temp\\tmpzib1ovx4\\y0qgygwo.json', 'init=C:\\Users\\joey_\\AppData\\Local\\Temp\\tmpzib1ovx4\\9i1nk5fc.json', 'output', 'file=C:\\Users\\jo


Resultados para HR:
                Model     RMSE     MAE    MAPE   SMAPE  Composite_Score  Meets_Target
Exponential_Smoothing  45.4617 34.0564 25.8290 25.5570           0.8305         False
              Prophet  91.3998 70.3228 52.4688 61.9843           0.9844         False
              XGBoost 134.9734 90.9395 55.2872 57.8229           1.0000         False

🏆 MEJOR MODELO: Exponential_Smoothing
   MAPE: 25.83%
   SMAPE: 25.56%
   RMSE: 45.4617
   Score Compuesto: 0.8305
   Cumple objetivos: ❌
   Parámetros: {'trend': 'add', 'seasonal': 'mul', 'damped_trend': False}



2025-08-14 17:15:26,377 - INFO - Optimizando Prophet...
2025-08-14 17:15:26,381 - DEBUG - cmd: where.exe tbb.dll
cwd: None
2025-08-14 17:15:26,569 - DEBUG - TBB already found in load path
2025-08-14 17:15:26,611 - INFO - n_changepoints greater than number of observations. Using 21.
2025-08-14 17:15:26,621 - DEBUG - input tempfile: C:\Users\joey_\AppData\Local\Temp\tmpzib1ovx4\c32_54_c.json
2025-08-14 17:15:26,644 - DEBUG - input tempfile: C:\Users\joey_\AppData\Local\Temp\tmpzib1ovx4\x92f7afc.json
2025-08-14 17:15:26,662 - DEBUG - idx 0
2025-08-14 17:15:26,665 - DEBUG - running CmdStan, num_threads: None
2025-08-14 17:15:26,667 - DEBUG - CmdStan args: ['C:\\Users\\joey_\\AppData\\Local\\Programs\\Python\\Maestro_Yoda\\Lib\\site-packages\\prophet\\stan_model\\prophet_model.bin', 'random', 'seed=45417', 'data', 'file=C:\\Users\\joey_\\AppData\\Local\\Temp\\tmpzib1ovx4\\c32_54_c.json', 'init=C:\\Users\\joey_\\AppData\\Local\\Temp\\tmpzib1ovx4\\x92f7afc.json', 'output', 'file=C:\\Users\\jo


Resultados para IT:
                Model     RMSE      MAE    MAPE   SMAPE  Composite_Score  Meets_Target
Exponential_Smoothing  42.4840  32.4379 12.2890 12.7351           0.4810         False
              Prophet 127.2692  99.2876 37.6471 45.1057           0.9497         False
              XGBoost 183.9575 127.5920 45.3179 42.2464           0.9861         False

🏆 MEJOR MODELO: Exponential_Smoothing
   MAPE: 12.29%
   SMAPE: 12.74%
   RMSE: 42.4840
   Score Compuesto: 0.4810
   Cumple objetivos: ❌
   Parámetros: {'trend': 'add', 'seasonal': 'mul', 'damped_trend': True}



2025-08-14 17:15:40,668 - INFO - Optimizando Prophet...
2025-08-14 17:15:40,674 - DEBUG - cmd: where.exe tbb.dll
cwd: None
2025-08-14 17:15:40,928 - DEBUG - TBB already found in load path
2025-08-14 17:15:40,954 - INFO - n_changepoints greater than number of observations. Using 21.
2025-08-14 17:15:40,961 - DEBUG - input tempfile: C:\Users\joey_\AppData\Local\Temp\tmpzib1ovx4\fe1blrza.json
2025-08-14 17:15:40,967 - DEBUG - input tempfile: C:\Users\joey_\AppData\Local\Temp\tmpzib1ovx4\83gwwy5b.json
2025-08-14 17:15:40,971 - DEBUG - idx 0
2025-08-14 17:15:40,973 - DEBUG - running CmdStan, num_threads: None
2025-08-14 17:15:40,974 - DEBUG - CmdStan args: ['C:\\Users\\joey_\\AppData\\Local\\Programs\\Python\\Maestro_Yoda\\Lib\\site-packages\\prophet\\stan_model\\prophet_model.bin', 'random', 'seed=34952', 'data', 'file=C:\\Users\\joey_\\AppData\\Local\\Temp\\tmpzib1ovx4\\fe1blrza.json', 'init=C:\\Users\\joey_\\AppData\\Local\\Temp\\tmpzib1ovx4\\83gwwy5b.json', 'output', 'file=C:\\Users\\jo


Resultados para Inventory:
                Model     RMSE      MAE    MAPE   SMAPE  Composite_Score  Meets_Target
              Prophet 110.2352  83.2764 27.5081 36.4747           0.6863         False
Exponential_Smoothing 179.1677 112.9101 25.0624 28.7752           0.8057         False
              XGBoost 401.3134 236.4751 51.5301 50.8229           0.9529         False

🏆 MEJOR MODELO: Prophet
   MAPE: 27.51%
   SMAPE: 36.47%
   RMSE: 110.2352
   Score Compuesto: 0.6863
   Cumple objetivos: ❌
   Parámetros: {'seasonality_mode': 'multiplicative', 'changepoint_prior_scale': 0.1, 'seasonality_prior_scale': 5.0, 'yearly_seasonality': True, 'weekly_seasonality': False, 'daily_seasonality': False}



2025-08-14 17:15:57,441 - INFO - Optimizando Prophet...
2025-08-14 17:15:57,444 - DEBUG - cmd: where.exe tbb.dll
cwd: None
2025-08-14 17:15:57,652 - DEBUG - TBB already found in load path
2025-08-14 17:15:57,681 - INFO - n_changepoints greater than number of observations. Using 21.
2025-08-14 17:15:57,689 - DEBUG - input tempfile: C:\Users\joey_\AppData\Local\Temp\tmpzib1ovx4\jprmhms_.json
2025-08-14 17:15:57,712 - DEBUG - input tempfile: C:\Users\joey_\AppData\Local\Temp\tmpzib1ovx4\13gdrwos.json
2025-08-14 17:15:57,723 - DEBUG - idx 0
2025-08-14 17:15:57,727 - DEBUG - running CmdStan, num_threads: None
2025-08-14 17:15:57,732 - DEBUG - CmdStan args: ['C:\\Users\\joey_\\AppData\\Local\\Programs\\Python\\Maestro_Yoda\\Lib\\site-packages\\prophet\\stan_model\\prophet_model.bin', 'random', 'seed=94457', 'data', 'file=C:\\Users\\joey_\\AppData\\Local\\Temp\\tmpzib1ovx4\\jprmhms_.json', 'init=C:\\Users\\joey_\\AppData\\Local\\Temp\\tmpzib1ovx4\\13gdrwos.json', 'output', 'file=C:\\Users\\jo


Resultados para Marketing:
                Model     RMSE     MAE    MAPE   SMAPE  Composite_Score  Meets_Target
Exponential_Smoothing  51.6125 42.6365 34.4347 28.9126           0.8164         False
              Prophet  93.4692 81.7209 75.0029 59.3089           0.8815         False
              XGBoost 136.2335 88.5466 51.2209 49.6340           0.9851         False

🏆 MEJOR MODELO: Exponential_Smoothing
   MAPE: 34.43%
   SMAPE: 28.91%
   RMSE: 51.6125
   Score Compuesto: 0.8164
   Cumple objetivos: ❌
   Parámetros: {'trend': 'add', 'seasonal': 'mul', 'damped_trend': True}



2025-08-14 17:16:22,455 - INFO - Optimizando Prophet...
2025-08-14 17:16:22,462 - DEBUG - cmd: where.exe tbb.dll
cwd: None
2025-08-14 17:16:22,768 - DEBUG - TBB already found in load path
2025-08-14 17:16:22,799 - INFO - n_changepoints greater than number of observations. Using 21.
2025-08-14 17:16:22,811 - DEBUG - input tempfile: C:\Users\joey_\AppData\Local\Temp\tmpzib1ovx4\vsknhukz.json
2025-08-14 17:16:22,831 - DEBUG - input tempfile: C:\Users\joey_\AppData\Local\Temp\tmpzib1ovx4\dkcjezm2.json
2025-08-14 17:16:22,852 - DEBUG - idx 0
2025-08-14 17:16:22,860 - DEBUG - running CmdStan, num_threads: None
2025-08-14 17:16:22,865 - DEBUG - CmdStan args: ['C:\\Users\\joey_\\AppData\\Local\\Programs\\Python\\Maestro_Yoda\\Lib\\site-packages\\prophet\\stan_model\\prophet_model.bin', 'random', 'seed=44355', 'data', 'file=C:\\Users\\joey_\\AppData\\Local\\Temp\\tmpzib1ovx4\\vsknhukz.json', 'init=C:\\Users\\joey_\\AppData\\Local\\Temp\\tmpzib1ovx4\\dkcjezm2.json', 'output', 'file=C:\\Users\\jo


Resultados para Sales:
                Model      RMSE       MAE    MAPE   SMAPE  Composite_Score  Meets_Target
Exponential_Smoothing  343.3263  292.8247 18.9139 16.6437           0.5424         False
              Prophet  817.5866  636.1250 37.9879 29.2323           0.7702         False
              XGBoost 2436.0958 1472.5340 55.3761 47.2319           0.9534         False
                ARIMA 2012.1424 1579.4018 74.5594 60.0741           0.9887         False

🏆 MEJOR MODELO: Exponential_Smoothing
   MAPE: 18.91%
   SMAPE: 16.64%
   RMSE: 343.3263
   Score Compuesto: 0.5424
   Cumple objetivos: ❌
   Parámetros: {'trend': 'add', 'seasonal': 'mul', 'damped_trend': True}


## 5. Generar Predicciones

Generamos predicciones para las próximas 12 quincenas con intervalos de confianza, utilizando el mejor modelo seleccionado para cada departamento.

In [60]:
## 5. Generación de Predicciones Homologadas

import pandas as pd
import numpy as np
import logging
from datetime import datetime
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import warnings

warnings.filterwarnings("ignore")

# =====================
# CLASE UNIFICADA PARA PREDICCIONES
# =====================
class UnifiedPredictor:
    """Clase unificada para generar predicciones homologadas de todos los modelos"""
    
    def __init__(self, model_info, model_name, series):
        self.model_info = model_info
        self.model_name = model_name
        self.model = model_info['model']
        self.params = model_info.get('params', {})
        self.optimizer = model_info.get('optimizer')
        self.series = series
        
    def generate_confidence_intervals(self, predictions, method='adaptive', confidence_level=0.95):
        """
        Genera intervalos de confianza adaptativos basados en la variabilidad histórica
        """
        try:
            predictions = np.array(predictions)
            alpha = 1 - confidence_level
            z_score = 1.96  # Para 95% de confianza
            
            # Calcular variabilidad histórica
            series_std = self.series.std()
            series_mean = self.series.mean()
            cv = series_std / series_mean if series_mean > 0 else 0.1
            
            if method == 'adaptive':
                # Método adaptativo que considera la tendencia y estacionalidad
                base_uncertainty = series_std * 0.15  # 15% de la desviación estándar base
                
                # Incrementar incertidumbre con el horizonte
                horizon_factor = np.sqrt(np.arange(1, len(predictions) + 1))
                
                # Factor de variabilidad relativa
                variability_factor = 1 + cv * 0.5
                
                # Calcular intervalos
                uncertainty = base_uncertainty * horizon_factor * variability_factor
                lower = predictions - z_score * uncertainty
                upper = predictions + z_score * uncertainty
                
            elif method == 'percentage':
                # Método basado en porcentaje de las predicciones
                uncertainty_pct = max(0.1, cv * 0.5)  # Mínimo 10% de incertidumbre
                uncertainty = predictions * uncertainty_pct
                lower = predictions - z_score * uncertainty
                upper = predictions + z_score * uncertainty
                
            else:
                # Método simple por defecto
                uncertainty = series_std * 0.2 * np.sqrt(np.arange(1, len(predictions) + 1))
                lower = predictions - z_score * uncertainty
                upper = predictions + z_score * uncertainty
            
            # Asegurar que los límites sean no negativos y lógicos
            lower = np.maximum(lower, 0)
            upper = np.maximum(upper, predictions)
            
            return lower, upper
            
        except Exception as e:
            logging.warning(f"Error generando intervalos de confianza: {e}")
            # Fallback simple
            return predictions * 0.8, predictions * 1.2
    
    def generate_historical_predictions(self):
        """
        Genera predicciones históricas homologadas para todos los tipos de modelo
        """
        try:
            if self.model_name == 'ARIMA':
                return self._generate_arima_historical()
            elif self.model_name == 'Prophet':
                return self._generate_prophet_historical()
            elif self.model_name == 'Exponential_Smoothing':
                return self._generate_es_historical()
            elif self.model_name == 'XGBoost':
                return self._generate_xgboost_historical()
            else:
                return self._generate_fallback_historical()
                
        except Exception as e:
            logging.error(f"Error generando predicciones históricas para {self.model_name}: {e}")
            return self._generate_fallback_historical()
    
    def _generate_arima_historical(self):
        """Predicciones históricas para ARIMA"""
        try:
            fitted_values = self.model.fittedvalues()
            
            # Alinear con el índice original
            if len(fitted_values) != len(self.series):
                fitted_values = fitted_values[-len(self.series):]
            
            # Generar intervalos de confianza
            lower, upper = self.generate_confidence_intervals(fitted_values, method='adaptive')
            
            historical_pred = pd.DataFrame({
                'yhat': fitted_values,
                'yhat_lower': lower,
                'yhat_upper': upper
            }, index=self.series.index[-len(fitted_values):])
            
            return historical_pred
            
        except Exception as e:
            logging.warning(f"Error en predicciones históricas ARIMA: {e}")
            return self._generate_fallback_historical()
    
    def _generate_prophet_historical(self):
        """Predicciones históricas para Prophet"""
        try:
            df_hist = pd.DataFrame({'ds': self.series.index, 'y': self.series.values})
            forecast_hist = self.model.predict(df_hist)
            
            historical_pred = pd.DataFrame({
                'yhat': forecast_hist['yhat'].values,
                'yhat_lower': forecast_hist['yhat_lower'].values,
                'yhat_upper': forecast_hist['yhat_upper'].values
            }, index=self.series.index)
            
            return historical_pred
            
        except Exception as e:
            logging.warning(f"Error en predicciones históricas Prophet: {e}")
            return self._generate_fallback_historical()
    
    def _generate_es_historical(self):
        """Predicciones históricas para Exponential Smoothing"""
        try:
            fitted_values = self.model.fittedvalues
            
            if len(fitted_values) != len(self.series):
                fitted_values = fitted_values[-len(self.series):]
            
            # Generar intervalos de confianza
            lower, upper = self.generate_confidence_intervals(fitted_values, method='adaptive')
            
            historical_pred = pd.DataFrame({
                'yhat': fitted_values,
                'yhat_lower': lower,
                'yhat_upper': upper
            }, index=self.series.index[-len(fitted_values):])
            
            return historical_pred
            
        except Exception as e:
            logging.warning(f"Error en predicciones históricas ES: {e}")
            return self._generate_fallback_historical()
    
    def _generate_xgboost_historical(self):
        """Predicciones históricas para XGBoost"""
        try:
            # Recrear características
            df_features = self.optimizer.create_features(self.series)
            X_hist = df_features.drop('y', axis=1)
            
            predictions_hist = self.model.predict(X_hist)
            
            # Alinear con el índice
            aligned_index = self.series.index[-len(predictions_hist):]
            
            # Generar intervalos de confianza
            lower, upper = self.generate_confidence_intervals(predictions_hist, method='adaptive')
            
            historical_pred = pd.DataFrame({
                'yhat': predictions_hist,
                'yhat_lower': lower,
                'yhat_upper': upper
            }, index=aligned_index)
            
            return historical_pred
            
        except Exception as e:
            logging.warning(f"Error en predicciones históricas XGBoost: {e}")
            return self._generate_fallback_historical()
    
    def _generate_fallback_historical(self):
        """Predicciones históricas de respaldo"""
        lower, upper = self.generate_confidence_intervals(self.series.values, method='percentage')
        
        return pd.DataFrame({
            'yhat': self.series.values,
            'yhat_lower': lower,
            'yhat_upper': upper
        }, index=self.series.index)
    
    def generate_future_predictions(self, horizon=12):
        """
        Genera predicciones futuras homologadas para todos los tipos de modelo
        """
        try:
            if self.model_name == 'ARIMA':
                return self._generate_arima_future(horizon)
            elif self.model_name == 'Prophet':
                return self._generate_prophet_future(horizon)
            elif self.model_name == 'Exponential_Smoothing':
                return self._generate_es_future(horizon)
            elif self.model_name == 'XGBoost':
                return self._generate_xgboost_future(horizon)
            else:
                return self._generate_fallback_future(horizon)
                
        except Exception as e:
            logging.error(f"Error generando predicciones futuras para {self.model_name}: {e}")
            return self._generate_fallback_future(horizon)
    
    def _generate_arima_future(self, horizon):
        """Predicciones futuras para ARIMA"""
        try:
            forecast_result = self.model.predict(n_periods=horizon, return_conf_int=True)
            predictions = forecast_result[0]
            conf_int = forecast_result[1]
            
            confidence_lower = conf_int[:, 0]
            confidence_upper = conf_int[:, 1]
            
            return self._create_forecast_dataframe(predictions, confidence_lower, confidence_upper, horizon)
            
        except Exception as e:
            logging.warning(f"Error en predicciones futuras ARIMA: {e}")
            return self._generate_fallback_future(horizon)
    
    def _generate_prophet_future(self, horizon):
        """Predicciones futuras para Prophet"""
        try:
            future = self.model.make_future_dataframe(periods=horizon, freq='MS')
            forecast = self.model.predict(future)
            
            # Tomar solo las predicciones futuras
            predictions = forecast['yhat'].tail(horizon).values
            confidence_lower = forecast['yhat_lower'].tail(horizon).values
            confidence_upper = forecast['yhat_upper'].tail(horizon).values
            
            return self._create_forecast_dataframe(predictions, confidence_lower, confidence_upper, horizon)
            
        except Exception as e:
            logging.warning(f"Error en predicciones futuras Prophet: {e}")
            return self._generate_fallback_future(horizon)
    
    def _generate_es_future(self, horizon):
        """Predicciones futuras para Exponential Smoothing"""
        try:
            predictions = self.model.forecast(horizon)
            if hasattr(predictions, 'values'):
                predictions = predictions.values
            
            # Generar intervalos de confianza estimados
            confidence_lower, confidence_upper = self.generate_confidence_intervals(
                predictions, method='adaptive'
            )
            
            return self._create_forecast_dataframe(predictions, confidence_lower, confidence_upper, horizon)
            
        except Exception as e:
            logging.warning(f"Error en predicciones futuras ES: {e}")
            return self._generate_fallback_future(horizon)
    
    def _generate_xgboost_future(self, horizon):
        """Predicciones futuras para XGBoost"""
        try:
            # Usar el optimizador para generar predicciones
            predictions, _ = self.optimizer.fit_predict(self.series, horizon, **self.params)
            
            if predictions is None:
                return self._generate_fallback_future(horizon)
            
            predictions = np.array(predictions)
            confidence_lower, confidence_upper = self.generate_confidence_intervals(
                predictions, method='adaptive'
            )
            
            return self._create_forecast_dataframe(predictions, confidence_lower, confidence_upper, horizon)
            
        except Exception as e:
            logging.warning(f"Error en predicciones futuras XGBoost: {e}")
            return self._generate_fallback_future(horizon)
    
    def _generate_fallback_future(self, horizon):
        """Predicciones futuras de respaldo usando tendencia simple"""
        try:
            # Calcular tendencia simple
            recent_values = self.series.tail(12).values
            trend = np.polyfit(range(len(recent_values)), recent_values, 1)[0]
            last_value = self.series.iloc[-1]
            
            # Generar predicciones con tendencia
            predictions = [last_value + trend * (i + 1) for i in range(horizon)]
            predictions = np.maximum(predictions, 0)  # No negativos
            
            confidence_lower, confidence_upper = self.generate_confidence_intervals(
                predictions, method='percentage'
            )
            
            return self._create_forecast_dataframe(predictions, confidence_lower, confidence_upper, horizon)
            
        except Exception as e:
            logging.error(f"Error en predicciones de respaldo: {e}")
            return None
    
    def _create_forecast_dataframe(self, predictions, confidence_lower, confidence_upper, horizon):
        """Crea un DataFrame estructurado con las predicciones"""
        try:
            # Asegurar que las predicciones sean no negativas
            predictions = np.maximum(predictions, 0)
            confidence_lower = np.maximum(confidence_lower, 0)
            confidence_upper = np.maximum(confidence_upper, predictions)
            
            # Crear fechas futuras
            start_date = self.series.index[-1] + pd.DateOffset(months=1)
            future_dates = pd.date_range(start=start_date, periods=horizon, freq='MS')
            
            forecast_df = pd.DataFrame({
                'yhat': predictions,
                'yhat_lower': confidence_lower,
                'yhat_upper': confidence_upper
            }, index=future_dates)
            
            return forecast_df
            
        except Exception as e:
            logging.error(f"Error creando DataFrame de predicciones: {e}")
            return None

# =====================
# VISUALIZACIÓN MEJORADA Y CORREGIDA
# =====================
def create_comprehensive_forecast_plot(historical_data, historical_pred, future_forecast, dept, model_name, metrics_row):
    """
    Crea una visualización completa y profesional del pronóstico
    """
    try:
        # Crear figura con subplots reorganizados para evitar problemas con tablas
        fig = make_subplots(
            rows=2, cols=2,
            subplot_titles=[
                f'Pronóstico de Horas Extra - {dept}',
                'Métricas de Calidad del Modelo',
                'Análisis de Residuos (Últimos 12 meses)',
                'Análisis de Tendencia'
            ],
            specs=[
                [{"colspan": 2}, None],
                [{}, {}]
            ],
            row_heights=[0.6, 0.4],
            vertical_spacing=0.1,
            horizontal_spacing=0.1
        )
        
        # GRÁFICO PRINCIPAL DE PRONÓSTICO (Fila 1, completa)
        # Datos históricos reales
        fig.add_trace(
            go.Scatter(
                x=historical_data.index,
                y=historical_data.values,
                mode='lines+markers',
                name='Histórico Real',
                line=dict(color='#1f77b4', width=2),
                marker=dict(size=4),
                hovertemplate='<b>Fecha:</b> %{x}<br><b>Valor Real:</b> %{y:.2f}<extra></extra>'
            ),
            row=1, col=1
        )
        
        # Predicciones históricas (ajuste del modelo)
        if historical_pred is not None and len(historical_pred) > 0:
            fig.add_trace(
                go.Scatter(
                    x=historical_pred.index,
                    y=historical_pred['yhat'],
                    mode='lines',
                    name='Ajuste del Modelo',
                    line=dict(color='#ff7f0e', width=2, dash='dash'),
                    opacity=0.8,
                    hovertemplate='<b>Fecha:</b> %{x}<br><b>Predicción:</b> %{y:.2f}<extra></extra>'
                ),
                row=1, col=1
            )
        
        # Predicciones futuras
        if future_forecast is not None and len(future_forecast) > 0:
            fig.add_trace(
                go.Scatter(
                    x=future_forecast.index,
                    y=future_forecast['yhat'],
                    mode='lines+markers',
                    name='Pronóstico Futuro',
                    line=dict(color='#d62728', width=3),
                    marker=dict(size=6, symbol='diamond'),
                    hovertemplate='<b>Fecha:</b> %{x}<br><b>Pronóstico:</b> %{y:.2f}<extra></extra>'
                ),
                row=1, col=1
            )
            
            # Intervalo de confianza
            fig.add_trace(
                go.Scatter(
                    x=list(future_forecast.index) + list(future_forecast.index[::-1]),
                    y=list(future_forecast['yhat_upper']) + list(future_forecast['yhat_lower'][::-1]),
                    fill='toself',
                    fillcolor='rgba(214, 39, 40, 0.2)',
                    line=dict(color='rgba(255,255,255,0)'),
                    hoverinfo="skip",
                    showlegend=True,
                    name='Intervalo de Confianza 95%'
                ),
                row=1, col=1
            )
        
        # Línea vertical separando histórico de pronóstico
        fig.add_vline(
            x=historical_data.index[-1],
            line_dash="dot",
            line_color="gray",
            row=1, col=1
        )
        
        # GRÁFICO DE MÉTRICAS (Fila 2, Columna 1) - Como barras en lugar de tabla
        metrics_names = ['MAPE (%)', 'SMAPE (%)', 'RMSE', 'MAE']
        metrics_values = [
            metrics_row['MAPE'],
            metrics_row['SMAPE'],
            metrics_row['RMSE'] * 100,  # Escalar RMSE para visualización
            metrics_row['MAE'] * 100   # Escalar MAE para visualización
        ]
        metrics_colors = [
            'green' if metrics_row['MAPE'] <= 10 else 'orange' if metrics_row['MAPE'] <= 20 else 'red',
            'green' if metrics_row['SMAPE'] <= 10 else 'orange' if metrics_row['SMAPE'] <= 20 else 'red',
            'green',
            'green'
        ]
        
        fig.add_trace(
            go.Bar(
                x=metrics_names,
                y=metrics_values,
                marker_color=metrics_colors,
                name='Métricas',
                showlegend=False,
                hovertemplate='<b>%{x}:</b> %{y:.2f}<extra></extra>'
            ),
            row=2, col=1
        )
        
        # ANÁLISIS DE RESIDUOS (Fila 2, Columna 2)
        if historical_pred is not None and len(historical_data) >= 12 and len(historical_pred) >= 12:
            recent_actual = historical_data.tail(12)
            recent_pred = historical_pred.tail(12)['yhat']
            
            # Alinear índices
            common_idx = recent_actual.index.intersection(recent_pred.index)
            if len(common_idx) > 0:
                residuals = recent_actual.loc[common_idx] - recent_pred.loc[common_idx]
                
                fig.add_trace(
                    go.Scatter(
                        x=common_idx,
                        y=residuals,
                        mode='markers+lines',
                        name='Residuos',
                        line=dict(color='purple'),
                        marker=dict(size=6),
                        showlegend=False,
                        hovertemplate='<b>Fecha:</b> %{x}<br><b>Residuo:</b> %{y:.2f}<extra></extra>'
                    ),
                    row=2, col=2
                )
                
                # Línea de referencia en cero
                fig.add_hline(y=0, line_dash="dash", line_color="gray", row=2, col=2)
        
        # Configurar layout
        fig.update_layout(
            height=800,
            title=dict(
                text=f"Análisis Completo de Pronóstico - {dept} (Modelo: {model_name})",
                x=0.5,
                font=dict(size=16)
            ),
            showlegend=True,
            template="plotly_white",
            font=dict(size=10)
        )
        
        # Configurar ejes
        fig.update_xaxes(title_text="Fecha", row=1, col=1)
        fig.update_yaxes(title_text="Horas Extra", row=1, col=1)
        fig.update_xaxes(title_text="Métricas", row=2, col=1)
        fig.update_yaxes(title_text="Valor", row=2, col=1)
        fig.update_xaxes(title_text="Fecha", row=2, col=2)
        fig.update_yaxes(title_text="Residuos", row=2, col=2)
        
        # Agregar anotaciones con información clave
        # Calcular tendencia si hay predicciones futuras
        if future_forecast is not None and len(future_forecast) > 1:
            trend_values = future_forecast['yhat'].values
            trend_change = ((trend_values[-1] - trend_values[0]) / trend_values[0]) * 100 if trend_values[0] > 0 else 0
            trend_direction = "📈 Ascendente" if trend_change > 2 else "📉 Descendente" if trend_change < -2 else "➡️ Estable"
            
            fig.add_annotation(
                x=0.02, y=0.98,
                xref="paper", yref="paper",
                text=f"<b>Resumen:</b><br>Modelo: {model_name}<br>Calidad: {'🟢 Excelente' if metrics_row['Meets_Target'] else '🟡 Aceptable'}<br>Tendencia: {trend_direction}<br>Cambio: {trend_change:+.1f}%",
                showarrow=False,
                align="left",
                bgcolor="rgba(255,255,255,0.8)",
                bordercolor="gray",
                borderwidth=1,
                font=dict(size=10)
            )
        
        fig.show()
        return fig
        
    except Exception as e:
        logging.error(f"Error creando visualización para {dept}: {e}")
        # Crear gráfico simple de respaldo
        return create_simple_forecast_plot(historical_data, future_forecast, dept, model_name)

def create_simple_forecast_plot(historical_data, future_forecast, dept, model_name):
    """
    Crea una visualización simple como respaldo
    """
    try:
        fig = go.Figure()
        
        # Datos históricos
        fig.add_trace(
            go.Scatter(
                x=historical_data.index,
                y=historical_data.values,
                mode='lines+markers',
                name='Histórico Real',
                line=dict(color='blue', width=2)
            )
        )
        
        # Predicciones futuras
        if future_forecast is not None and len(future_forecast) > 0:
            fig.add_trace(
                go.Scatter(
                    x=future_forecast.index,
                    y=future_forecast['yhat'],
                    mode='lines+markers',
                    name='Pronóstico Futuro',
                    line=dict(color='red', width=2)
                )
            )
            
            # Intervalo de confianza
            fig.add_trace(
                go.Scatter(
                    x=list(future_forecast.index) + list(future_forecast.index[::-1]),
                    y=list(future_forecast['yhat_upper']) + list(future_forecast['yhat_lower'][::-1]),
                    fill='toself',
                    fillcolor='rgba(255, 0, 0, 0.2)',
                    line=dict(color='rgba(255,255,255,0)'),
                    name='Intervalo de Confianza'
                )
            )
        
        fig.update_layout(
            title=f"Pronóstico de Horas Extra - {dept} (Modelo: {model_name})",
            xaxis_title="Fecha",
            yaxis_title="Horas Extra",
            template="plotly_white"
        )
        
        fig.show()
        return fig
        
    except Exception as e:
        logging.error(f"Error creando gráfico simple para {dept}: {e}")
        return None

# =====================
# RESUMEN EJECUTIVO MEJORADO
# =====================
def generate_executive_summary(all_models, all_metrics, all_forecasts):
    """
    Genera un resumen ejecutivo completo con análisis detallado
    """
    try:
        summary_data = []
        
        for dept in all_models.keys():
            if dept not in all_metrics or dept not in all_forecasts:
                continue
                
            # Obtener información del mejor modelo
            metrics_df = all_metrics[dept]
            best_model_row = metrics_df.iloc[0]
            
            forecast_info = all_forecasts[dept]
            future_forecast = forecast_info['future']
            
            if future_forecast is None or len(future_forecast) == 0:
                continue
            
            predictions = future_forecast['yhat'].values
            
            # Calcular estadísticas de las predicciones
            pred_mean = np.mean(predictions)
            pred_trend = "📈" if predictions[-1] > predictions[0] else "📉" if predictions[-1] < predictions[0] else "➡️"
            trend_change = ((predictions[-1] - predictions[0]) / predictions[0]) * 100 if predictions[0] > 0 else 0
            
            # Calcular variabilidad
            ci_width = (future_forecast['yhat_upper'] - future_forecast['yhat_lower']).mean()
            uncertainty_pct = (ci_width / pred_mean) * 100 if pred_mean > 0 else 0
            
            summary_data.append({
                'Departamento': dept,
                'Mejor_Modelo': best_model_row['Model'],
                'MAPE (%)': f"{best_model_row['MAPE']:.2f}",
                'SMAPE (%)': f"{best_model_row['SMAPE']:.2f}",
                'Score_Compuesto': f"{best_model_row['Composite_Score']:.4f}",
                'Calidad': '🟢 Excelente' if best_model_row['Meets_Target'] else '🟡 Aceptable',
                'Predicción_Promedio': f"{pred_mean:.1f}",
                'Tendencia': f"{pred_trend} {trend_change:+.1f}%",
                'Primer_Mes': f"{predictions[0]:.1f}",
                'Último_Mes': f"{predictions[-1]:.1f}",
                'Incertidumbre (%)': f"{uncertainty_pct:.1f}"
            })
        
        summary_df = pd.DataFrame(summary_data)
        
        print("\n" + "="*100)
        print("📊 RESUMEN EJECUTIVO DE PREDICCIONES DE HORAS EXTRA")
        print("="*100)
        print(summary_df.to_string(index=False))
        
        # Estadísticas generales
        total_depts = len(summary_df)
        excellent_models = len([x for x in summary_data if '🟢' in x['Calidad']])
        avg_mape = np.mean([float(x['MAPE (%)']) for x in summary_data])
        avg_smape = np.mean([float(x['SMAPE (%)']) for x in summary_data])
        
        print(f"\n📈 ESTADÍSTICAS GENERALES:")
        print(f"   • Total de departamentos analizados: {total_depts}")
        print(f"   • Modelos con calidad excelente: {excellent_models}/{total_depts} ({(excellent_models/total_depts)*100:.1f}%)")
        print(f"   • MAPE promedio: {avg_mape:.2f}%")
        print(f"   • SMAPE promedio: {avg_smape:.2f}%")
        
        # Distribución de modelos
        model_distribution = summary_df['Mejor_Modelo'].value_counts()
        print(f"\n🏆 DISTRIBUCIÓN DE MEJORES MODELOS:")
        for model, count in model_distribution.items():
            print(f"   • {model}: {count} departamentos ({(count/total_depts)*100:.1f}%)")
        
        return summary_df
        
    except Exception as e:
        logging.error(f"Error generando resumen ejecutivo: {e}")
        return pd.DataFrame()

# =====================
# EJECUCIÓN PRINCIPAL
# =====================
def run_unified_prediction_generation(all_models, all_metrics, horizon=12):
    """
    Ejecuta la generación unificada de predicciones para todos los departamentos
    """
    
    print("\n" + "="*80)
    print("🔮 GENERACIÓN UNIFICADA DE PREDICCIONES")
    print("="*80)
    
    all_forecasts = {}
    
    for dept in all_models.keys():
        print(f"\n🔍 Procesando {dept}...")
        
        try:
            # Obtener datos del departamento
            dept_series = (
                df_all_data[df_all_data['department'] == dept]
                .set_index('ds')['y']
                .asfreq('MS')
                .interpolate(method='linear')
            )
            
            if len(dept_series) < 24:
                logging.warning(f"Datos insuficientes para {dept}: {len(dept_series)} meses")
                continue
            
            # Obtener modelo y métricas
            model_info = all_models[dept]
            metrics_df = all_metrics[dept]
            best_model_name = metrics_df.iloc[0]['Model']
            best_metrics_row = metrics_df.iloc[0]
            
            # Crear predictor unificado
            predictor = UnifiedPredictor(model_info, best_model_name, dept_series)
            
            # Generar predicciones históricas
            historical_pred = predictor.generate_historical_predictions()
            
            # Generar predicciones futuras
            future_forecast = predictor.generate_future_predictions(horizon)
            
            if future_forecast is None:
                logging.error(f"No se pudieron generar predicciones para {dept}")
                continue
            
            # Almacenar resultados
            all_forecasts[dept] = {
                'historical_pred': historical_pred,
                'future': future_forecast,
                'model_name': best_model_name,
                'metrics': best_metrics_row,
                'predictor': predictor
            }
            
            # Crear visualización
            fig = create_comprehensive_forecast_plot(
                dept_series, historical_pred, future_forecast, 
                dept, best_model_name, best_metrics_row
            )
            
            print(f"   ✅ Predicciones generadas exitosamente")
            print(f"   📊 Modelo seleccionado: {best_model_name}")
            print(f"   📈 Próximos 3 meses: {future_forecast['yhat'].head(3).round(2).tolist()}")
            print(f"   🎯 MAPE: {best_metrics_row['MAPE']:.2f}%, SMAPE: {best_metrics_row['SMAPE']:.2f}%")
            
        except Exception as e:
            logging.error(f"Error procesando {dept}: {e}")
            continue
    
    # Generar resumen ejecutivo
    summary_df = generate_executive_summary(all_models, all_metrics, all_forecasts)
    
    return all_forecasts, summary_df

# =====================
# FUNCIÓN AUXILIAR PARA MOSTRAR RESULTADOS
# =====================
def display_forecast_summary(all_forecasts):
    """
    Muestra un resumen tabular de todas las predicciones
    """
    try:
        summary_data = []
        
        for dept, forecast_info in all_forecasts.items():
            future_forecast = forecast_info['future']
            model_name = forecast_info['model_name']
            metrics = forecast_info['metrics']
            
            if future_forecast is not None and len(future_forecast) >= 6:
                # Tomar los primeros 6 meses
                first_6_months = future_forecast.head(6)
                
                summary_data.append({
                    'Departamento': dept,
                    'Modelo': model_name,
                    'MAPE': f"{metrics['MAPE']:.2f}%",
                    'Mes_1': f"{first_6_months.iloc[0]['yhat']:.1f}",
                    'Mes_2': f"{first_6_months.iloc[1]['yhat']:.1f}",
                    'Mes_3': f"{first_6_months.iloc[2]['yhat']:.1f}",
                    'Mes_4': f"{first_6_months.iloc[3]['yhat']:.1f}",
                    'Mes_5': f"{first_6_months.iloc[4]['yhat']:.1f}",
                    'Mes_6': f"{first_6_months.iloc[5]['yhat']:.1f}",
                    'Promedio_6M': f"{first_6_months['yhat'].mean():.1f}",
                    'Tendencia': '📈' if first_6_months.iloc[-1]['yhat'] > first_6_months.iloc[0]['yhat'] else '📉'
                })
        
        if summary_data:
            summary_df = pd.DataFrame(summary_data)
            print("\n" + "="*120)
            print("📋 RESUMEN DE PREDICCIONES POR DEPARTAMENTO (Próximos 6 meses)")
            print("="*120)
            print(summary_df.to_string(index=False))
            print("\n📅 Nota: Los valores corresponden a horas extra predichas por mes")
            
            return summary_df
        else:
            print("⚠️ No hay datos de predicciones para mostrar")
            return pd.DataFrame()
            
    except Exception as e:
        logging.error(f"Error mostrando resumen de predicciones: {e}")
        return pd.DataFrame()

# =====================
# FUNCIÓN PARA CREAR GRÁFICO CONSOLIDADO
# =====================
def create_consolidated_forecast_plot(all_forecasts, top_n=6):
    """
    Crea un gráfico consolidado con los departamentos más relevantes
    """
    try:
        # Seleccionar los top departamentos por volumen promedio predicho
        dept_volumes = {}
        for dept, forecast_info in all_forecasts.items():
            future_forecast = forecast_info['future']
            if future_forecast is not None and len(future_forecast) > 0:
                dept_volumes[dept] = future_forecast['yhat'].mean()
        
        # Ordenar y tomar los top N
        top_depts = sorted(dept_volumes.items(), key=lambda x: x[1], reverse=True)[:top_n]
        
        if not top_depts:
            print("⚠️ No hay datos suficientes para crear gráfico consolidado")
            return None
        
        # Crear subplots
        fig = make_subplots(
            rows=2, cols=3,
            subplot_titles=[dept[0] for dept in top_depts],
            vertical_spacing=0.12,
            horizontal_spacing=0.08
        )
        
        colors = ['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728', '#9467bd', '#8c564b']
        
        for idx, (dept, _) in enumerate(top_depts):
            row = (idx // 3) + 1
            col = (idx % 3) + 1
            
            forecast_info = all_forecasts[dept]
            future_forecast = forecast_info['future']
            
            if future_forecast is not None:
                # Predicciones futuras
                fig.add_trace(
                    go.Scatter(
                        x=future_forecast.index,
                        y=future_forecast['yhat'],
                        mode='lines+markers',
                        name=f'{dept}',
                        line=dict(color=colors[idx % len(colors)], width=2),
                        showlegend=False,
                        hovertemplate=f'<b>{dept}</b><br>Fecha: %{{x}}<br>Predicción: %{{y:.1f}}<extra></extra>'
                    ),
                    row=row, col=col
                )
                
                # Intervalo de confianza
                fig.add_trace(
                    go.Scatter(
                        x=list(future_forecast.index) + list(future_forecast.index[::-1]),
                        y=list(future_forecast['yhat_upper']) + list(future_forecast['yhat_lower'][::-1]),
                        fill='toself',
                        fillcolor=f'rgba{colors[idx % len(colors)][3:-1]}, 0.2)',
                        line=dict(color='rgba(255,255,255,0)'),
                        hoverinfo="skip",
                        showlegend=False
                    ),
                    row=row, col=col
                )
        
        fig.update_layout(
            title=dict(
                text="🔮 Pronósticos Consolidados - Top Departamentos por Volumen",
                x=0.5,
                font=dict(size=16)
            ),
            height=600,
            template="plotly_white"
        )
        
        # Actualizar ejes
        for i in range(1, 3):
            for j in range(1, 4):
                fig.update_xaxes(title_text="Fecha", row=i, col=j)
                fig.update_yaxes(title_text="Horas Extra", row=i, col=j)
        
        fig.show()
        return fig
        
    except Exception as e:
        logging.error(f"Error creando gráfico consolidado: {e}")
        return None

# Ejecutar generación unificada de predicciones
print("🚀 Iniciando generación de predicciones...")
all_forecasts, predictions_summary = run_unified_prediction_generation(all_models, all_metrics, horizon=12)

# Mostrar resumen tabular
forecast_summary_df = display_forecast_summary(all_forecasts)

# Crear gráfico consolidado
print("\n🎨 Creando visualización consolidada...")
consolidated_fig = create_consolidated_forecast_plot(all_forecasts, top_n=6)

print(f"\n✅ Proceso completado exitosamente!")
print(f"   📊 Departamentos procesados: {len(all_forecasts)}")
print(f"   🎯 Predicciones generadas para los próximos 12 meses")
print(f"   📈 Visualizaciones creadas para análisis detallado")

🚀 Iniciando generación de predicciones...

🔮 GENERACIÓN UNIFICADA DE PREDICCIONES

🔍 Procesando Finance...


   ✅ Predicciones generadas exitosamente
   📊 Modelo seleccionado: Exponential_Smoothing
   📈 Próximos 3 meses: [122.94, 241.99, 128.14]
   🎯 MAPE: 31.09%, SMAPE: 31.01%

🔍 Procesando HR...


   ✅ Predicciones generadas exitosamente
   📊 Modelo seleccionado: Exponential_Smoothing
   📈 Próximos 3 meses: [139.73, 497.66, 117.63]
   🎯 MAPE: 25.83%, SMAPE: 25.56%

🔍 Procesando IT...


   ✅ Predicciones generadas exitosamente
   📊 Modelo seleccionado: Exponential_Smoothing
   📈 Próximos 3 meses: [243.12, 648.97, 253.7]
   🎯 MAPE: 12.29%, SMAPE: 12.74%

🔍 Procesando Inventory...


   ✅ Predicciones generadas exitosamente
   📊 Modelo seleccionado: Prophet
   📈 Próximos 3 meses: [282.36, 922.02, 273.9]
   🎯 MAPE: 27.51%, SMAPE: 36.47%

🔍 Procesando Marketing...


   ✅ Predicciones generadas exitosamente
   📊 Modelo seleccionado: Exponential_Smoothing
   📈 Próximos 3 meses: [126.48, 448.82, 124.19]
   🎯 MAPE: 34.43%, SMAPE: 28.91%

🔍 Procesando Sales...


   ✅ Predicciones generadas exitosamente
   📊 Modelo seleccionado: Exponential_Smoothing
   📈 Próximos 3 meses: [1755.66, 5762.96, 1812.5]
   🎯 MAPE: 18.91%, SMAPE: 16.64%

📊 RESUMEN EJECUTIVO DE PREDICCIONES DE HORAS EXTRA
Departamento          Mejor_Modelo MAPE (%) SMAPE (%) Score_Compuesto     Calidad Predicción_Promedio Tendencia Primer_Mes Último_Mes Incertidumbre (%)
     Finance Exponential_Smoothing    31.09     31.01          0.9096 🟡 Aceptable               141.1  📉 -14.1%      122.9      105.5             129.4
          HR Exponential_Smoothing    25.83     25.56          0.8305 🟡 Aceptable               200.5   📉 -4.7%      139.7      133.1              87.0
          IT Exponential_Smoothing    12.29     12.74          0.4810 🟡 Aceptable               343.4  📈 +29.8%      243.1      315.6              73.9
   Inventory               Prophet    27.51     36.47          0.6863 🟡 Aceptable               513.8  📈 +27.4%      282.4      359.6               7.7
   Marketing Exp

2025-08-14 17:17:55,894 - ERROR - Error creando gráfico consolidado: 
    Invalid value of type 'builtins.str' received for the 'fillcolor' property of scatter
        Received value: 'rgba77b, 0.2)'

    The 'fillcolor' property is a color and may be specified as:
      - A hex string (e.g. '#ff0000')
      - An rgb/rgba string (e.g. 'rgb(255,0,0)')
      - An hsl/hsla string (e.g. 'hsl(0,100%,50%)')
      - An hsv/hsva string (e.g. 'hsv(0,100%,100%)')
      - A named CSS color: see https://plotly.com/python/css-colors/ for a list



✅ Proceso completado exitosamente!
   📊 Departamentos procesados: 6
   🎯 Predicciones generadas para los próximos 12 meses
   📈 Visualizaciones creadas para análisis detallado


## 6. Guardar Resultados

Guardamos los modelos, predicciones y métricas en SQL Server. La tabla Overtime_Predictions incluye datos históricos y predicciones con una columna data_type para distinguir entre "Histórico" y "Forecast".


In [62]:
## 6. Guardado de Predicciones y Métricas en SQL Server (CORREGIDO)

import pandas as pd
import numpy as np
import logging
from datetime import datetime
import joblib
import os
import warnings
from statsmodels.tsa.holtwinters import ExponentialSmoothing
from pmdarima import auto_arima
from xgboost import XGBRegressor

warnings.filterwarnings("ignore")

# =====================
# FUNCIÓN MEJORADA PARA GUARDAR PREDICCIONES CONSISTENTES
# =====================
def save_consistent_predictions_and_metrics():
    """
    Guarda las predicciones y métricas ya calculadas en el módulo 5
    manteniendo consistencia total con lo mostrado en las visualizaciones
    """
    
    print("\n" + "="*80)
    print("💾 GUARDANDO PREDICCIONES Y MÉTRICAS CONSISTENTES EN SQL SERVER")
    print("="*80)
    
    # Verificar que tenemos los datos necesarios
    if not all_forecasts:
        logging.error("No hay predicciones disponibles para guardar")
        print("❌ Error: No hay predicciones disponibles")
        return None, None
    
    if not all_metrics:
        logging.error("No hay métricas disponibles para guardar")
        print("❌ Error: No hay métricas disponibles")
        return None, None
    
    # Crear directorio para modelos
    os.makedirs('Modelos Entrenados', exist_ok=True)
    timestamp = datetime.now()
    
    predictions_summary = []
    metrics_summary = []
    models_saved = []
    errors = []
    
    print(f"🕐 Timestamp de ejecución: {timestamp}")
    print(f"📊 Procesando {len(all_forecasts)} departamentos...")
    
    # =====================
    # PROCESAMIENTO POR DEPARTAMENTO
    # =====================
    for dept in all_forecasts.keys():
        try:
            print(f"\n🔍 Procesando {dept}...")
            
            # Obtener datos del departamento
            forecast_info = all_forecasts[dept]
            metrics_info = all_metrics[dept]
            
            # Obtener el mejor modelo y sus métricas (ya calculadas en módulo 5)
            best_model_row = metrics_info.iloc[0]  # El primer registro es el mejor
            best_model_name = best_model_row['Model']
            
            print(f"   🏆 Mejor modelo: {best_model_name}")
            print(f"   📈 MAPE: {best_model_row['MAPE']:.2f}%, SMAPE: {best_model_row['SMAPE']:.2f}%")
            
            # =====================
            # GUARDAR MODELO FÍSICO
            # =====================
            try:
                model_saved = save_physical_model(dept, best_model_name)
                if model_saved:
                    models_saved.append(f"{dept}_{best_model_name}")
                    print(f"   💾 Modelo guardado exitosamente")
                else:
                    print(f"   ⚠️ Modelo no guardado (Prophet o error)")
            except Exception as e:
                logging.warning(f"Error guardando modelo físico para {dept}: {e}")
                print(f"   ⚠️ Error guardando modelo: {e}")
            
            # =====================
            # PREPARAR MÉTRICAS (usando las ya calculadas)
            # =====================
            metrics_row = {
                'timestamp': timestamp,
                'department': dept,
                'rmse': best_model_row['RMSE'] if pd.notna(best_model_row['RMSE']) else None,
                'mae': best_model_row['MAE'] if pd.notna(best_model_row['MAE']) else None,
                'smape': best_model_row['SMAPE'] if pd.notna(best_model_row['SMAPE']) else None,
                'mape': best_model_row['MAPE'] if pd.notna(best_model_row['MAPE']) else None,
                'mase': best_model_row['MASE'] if pd.notna(best_model_row['MASE']) else None,
                'model_quality': 'Excelente' if best_model_row['Meets_Target'] else 'Aceptable',
                'model_type': best_model_name,
                'composite_score': best_model_row['Composite_Score'] if pd.notna(best_model_row['Composite_Score']) else None
            }
            
            metrics_summary.append(metrics_row)
            print(f"   ✅ Métricas preparadas")
            
            # =====================
            # PREPARAR PREDICCIONES (usando las ya calculadas)
            # =====================
            future_forecast = forecast_info['future']
            
            if future_forecast is None or len(future_forecast) == 0:
                logging.warning(f"No hay predicciones futuras para {dept}")
                print(f"   ⚠️ No hay predicciones futuras disponibles")
                continue
            
            # Validar y limpiar índice de fechas
            future_forecast_clean = future_forecast.copy()
            
            # Asegurar que el índice sea datetime
            if not pd.api.types.is_datetime64_any_dtype(future_forecast_clean.index):
                logging.warning(f"Índice no es datetime para {dept}, reconstruyendo...")
                # Obtener datos originales para calcular próxima fecha
                dept_data = df_all_data[df_all_data['department'] == dept].set_index('ds')['y']
                last_date = dept_data.index[-1]
                new_index = pd.date_range(
                    start=last_date + pd.DateOffset(months=1), 
                    periods=len(future_forecast_clean), 
                    freq='MS'
                )
                future_forecast_clean.index = new_index
            
            # Procesar cada predicción
            predictions_count = 0
            for date_idx, row in future_forecast_clean.iterrows():
                try:
                    prediction_date = date_idx.date() if hasattr(date_idx, 'date') else date_idx
                    
                    prediction_row = {
                        'timestamp': timestamp,
                        'department': dept,
                        'prediction_date': prediction_date,
                        'predicted_value': float(row['yhat']) if pd.notna(row['yhat']) else None,
                        'confidence_lower': float(row['yhat_lower']) if pd.notna(row['yhat_lower']) else None,
                        'confidence_upper': float(row['yhat_upper']) if pd.notna(row['yhat_upper']) else None,
                        'model_used': best_model_name
                    }
                    
                    # Validar que los valores sean finitos
                    for key in ['predicted_value', 'confidence_lower', 'confidence_upper']:
                        if prediction_row[key] is not None and not np.isfinite(prediction_row[key]):
                            prediction_row[key] = None
                    
                    predictions_summary.append(prediction_row)
                    predictions_count += 1
                    
                except Exception as e:
                    logging.error(f"Error procesando predicción para {dept} en fecha {date_idx}: {e}")
                    continue
            
            print(f"   ✅ {predictions_count} predicciones preparadas")
            
        except Exception as e:
            logging.error(f"Error procesando departamento {dept}: {e}")
            errors.append(dept)
            print(f"   ❌ Error procesando {dept}: {e}")
            continue
    
    # =====================
    # CREAR DATAFRAMES
    # =====================
    print(f"\n📋 Creando DataFrames...")
    predictions_df = pd.DataFrame(predictions_summary)
    metrics_df = pd.DataFrame(metrics_summary)
    
    print(f"   📊 Predicciones: {len(predictions_df)} registros")
    print(f"   📈 Métricas: {len(metrics_df)} registros")
    print(f"   💾 Modelos guardados: {len(models_saved)}")
    
    if errors:
        print(f"   ⚠️ Errores en: {', '.join(errors)}")
    
    # =====================
    # GUARDAR EN BASE DE DATOS
    # =====================
    try:
        print(f"\n🔌 Conectando a SQL Server...")
        conn = get_db_connection()
        cursor = conn.cursor()
        
        # Crear/actualizar tablas
        create_enhanced_tables(cursor)
        
        # Insertar datos
        inserted_predictions, inserted_metrics = insert_data_to_db(
            cursor, predictions_df, metrics_df
        )
        
        conn.commit()
        print(f"✅ Datos guardados exitosamente en SQL Server")
        print(f"   📊 Predicciones insertadas: {inserted_predictions}")
        print(f"   📈 Métricas insertadas: {inserted_metrics}")
        
        # Mostrar resumen
        display_save_summary(predictions_df, metrics_df)
        
        return predictions_df, metrics_df
        
    except Exception as e:
        conn.rollback()
        logging.error(f"Error guardando en base de datos: {e}")
        print(f"❌ Error guardando en SQL Server: {e}")
        raise
    finally:
        conn.close()

def save_physical_model(dept, model_name):
    """
    Guarda el modelo físico entrenado
    """
    try:
        # Obtener datos del departamento
        dept_data = df_all_data[df_all_data['department'] == dept].set_index('ds')['y']
        dept_data_clean = dept_data.fillna(dept_data.median()).interpolate(method='linear')
        
        model_path = f'Modelos Entrenados/overtime_forecast_model_{dept}_{model_name}.pkl'
        
        if model_name == 'ARIMA':
            # Re-entrenar ARIMA para guardar
            model = auto_arima(
                dept_data_clean, 
                seasonal=False, 
                stepwise=True, 
                suppress_warnings=True, 
                error_action='ignore'
            )
            joblib.dump(model, model_path)
            return True
            
        elif model_name == 'Exponential_Smoothing':
            # Re-entrenar Exponential Smoothing
            model = ExponentialSmoothing(
                dept_data_clean, 
                trend='add', 
                seasonal='add', 
                seasonal_periods=12
            ).fit()
            joblib.dump(model, model_path)
            return True
            
        elif model_name == 'XGBoost':
            # Re-entrenar XGBoost
            def create_lag_features_simple(series, n_lags=6):
                df = pd.DataFrame({'y': series})
                for i in range(1, n_lags + 1):
                    df[f'lag_{i}'] = df['y'].shift(i)
                df['month'] = series.index.month
                df['quarter'] = series.index.quarter
                return df.dropna()
            
            df_features = create_lag_features_simple(dept_data_clean)
            X = df_features.drop('y', axis=1).values
            y = df_features['y'].values
            
            model = XGBRegressor(n_estimators=100, random_state=42)
            model.fit(X, y)
            joblib.dump(model, model_path)
            return True
            
        elif model_name == 'Prophet':
            # Prophet no se guarda por problemas de serialización
            logging.info(f"Prophet no se guarda físicamente para {dept}")
            return False
            
        else:
            logging.warning(f"Tipo de modelo desconocido: {model_name}")
            return False
            
    except Exception as e:
        logging.error(f"Error guardando modelo físico {model_name} para {dept}: {e}")
        return False

def create_enhanced_tables(cursor):
    """
    Crea o actualiza las tablas en SQL Server con estructura mejorada
    """
    try:
        print("   🏗️ Creando/actualizando tablas...")
        
        # Tabla de predicciones mejorada
        cursor.execute("""
            IF NOT EXISTS (SELECT * FROM sys.tables WHERE name = 'Overtime_Predictions')
            BEGIN
                CREATE TABLE Overtime_Predictions (
                    id INT IDENTITY(1,1) PRIMARY KEY,
                    timestamp DATETIME NOT NULL,
                    department VARCHAR(100) NOT NULL,
                    prediction_date DATE NOT NULL,
                    predicted_value FLOAT,
                    confidence_lower FLOAT,
                    confidence_upper FLOAT,
                    model_used VARCHAR(50),
                    created_at DATETIME DEFAULT GETDATE()
                )
                
                CREATE INDEX IX_Overtime_Predictions_Dept_Date 
                ON Overtime_Predictions (department, prediction_date)
            END
        """)
        
        # Tabla de métricas mejorada
        cursor.execute("""
            IF NOT EXISTS (SELECT * FROM sys.tables WHERE name = 'ML_Model_Metrics_Overtime_Predictions')
            BEGIN
                CREATE TABLE ML_Model_Metrics_Overtime_Predictions (
                    id INT IDENTITY(1,1) PRIMARY KEY,
                    timestamp DATETIME NOT NULL,
                    department VARCHAR(100) NOT NULL,
                    rmse FLOAT,
                    mae FLOAT,
                    smape FLOAT,
                    mape FLOAT,
                    mase FLOAT,
                    model_quality VARCHAR(50),
                    model_type VARCHAR(50),
                    composite_score FLOAT,
                    created_at DATETIME DEFAULT GETDATE()
                )
                
                CREATE INDEX IX_ML_Metrics_Dept 
                ON ML_Model_Metrics_Overtime_Predictions (department)
            END
        """)
        
        # Verificar y agregar columnas faltantes
        cursor.execute("""
            IF EXISTS (SELECT * FROM sys.tables WHERE name = 'Overtime_Predictions')
            BEGIN
                IF NOT EXISTS (SELECT * FROM sys.columns 
                               WHERE object_id = OBJECT_ID('Overtime_Predictions') 
                               AND name = 'model_used')
                BEGIN
                    ALTER TABLE Overtime_Predictions
                    ADD model_used VARCHAR(50)
                END
                
                IF NOT EXISTS (SELECT * FROM sys.columns 
                               WHERE object_id = OBJECT_ID('Overtime_Predictions') 
                               AND name = 'created_at')
                BEGIN
                    ALTER TABLE Overtime_Predictions
                    ADD created_at DATETIME DEFAULT GETDATE()
                END
            END
        """)
        
        cursor.execute("""
            IF EXISTS (SELECT * FROM sys.tables WHERE name = 'ML_Model_Metrics_Overtime_Predictions')
            BEGIN
                IF NOT EXISTS (SELECT * FROM sys.columns 
                               WHERE object_id = OBJECT_ID('ML_Model_Metrics_Overtime_Predictions') 
                               AND name = 'composite_score')
                BEGIN
                    ALTER TABLE ML_Model_Metrics_Overtime_Predictions
                    ADD composite_score FLOAT
                END
                
                IF NOT EXISTS (SELECT * FROM sys.columns 
                               WHERE object_id = OBJECT_ID('ML_Model_Metrics_Overtime_Predictions') 
                               AND name = 'created_at')
                BEGIN
                    ALTER TABLE ML_Model_Metrics_Overtime_Predictions
                    ADD created_at DATETIME DEFAULT GETDATE()
                END
                
                IF NOT EXISTS (SELECT * FROM sys.columns 
                               WHERE object_id = OBJECT_ID('ML_Model_Metrics_Overtime_Predictions') 
                               AND name = 'mape')
                BEGIN
                    ALTER TABLE ML_Model_Metrics_Overtime_Predictions
                    ADD mape FLOAT
                END
                
                IF NOT EXISTS (SELECT * FROM sys.columns 
                               WHERE object_id = OBJECT_ID('ML_Model_Metrics_Overtime_Predictions') 
                               AND name = 'model_type')
                BEGIN
                    ALTER TABLE ML_Model_Metrics_Overtime_Predictions
                    ADD model_type VARCHAR(50)
                END
            END
        """)
        
        print("   ✅ Tablas creadas/actualizadas exitosamente")
        
    except Exception as e:
        logging.error(f"Error creando tablas: {e}")
        print(f"   ❌ Error creando tablas: {e}")
        raise

def insert_data_to_db(cursor, predictions_df, metrics_df):
    """
    Inserta los datos en las tablas de SQL Server
    """
    inserted_predictions = 0
    inserted_metrics = 0
    
    try:
        # Insertar predicciones
        if not predictions_df.empty:
            print(f"   📊 Insertando {len(predictions_df)} predicciones...")
            for _, row in predictions_df.iterrows():
                try:
                    cursor.execute("""
                        INSERT INTO Overtime_Predictions 
                        (timestamp, department, prediction_date, predicted_value, 
                         confidence_lower, confidence_upper, model_used)
                        VALUES (%s, %s, %s, %s, %s, %s, %s)
                    """, (
                        row['timestamp'],
                        row['department'],
                        row['prediction_date'],
                        row['predicted_value'],
                        row['confidence_lower'],
                        row['confidence_upper'],
                        row['model_used']
                    ))
                    inserted_predictions += 1
                except Exception as e:
                    logging.error(f"Error insertando predicción para {row['department']}: {e}")
                    continue
        
        # Insertar métricas
        if not metrics_df.empty:
            print(f"   📈 Insertando {len(metrics_df)} métricas...")
            for _, row in metrics_df.iterrows():
                try:
                    cursor.execute("""
                        INSERT INTO ML_Model_Metrics_Overtime_Predictions 
                        (timestamp, department, rmse, mae, smape, mape, mase, 
                         model_quality, model_type, composite_score)
                        VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s)
                    """, (
                        row['timestamp'],
                        row['department'],
                        row['rmse'],
                        row['mae'],
                        row['smape'],
                        row['mape'],
                        row['mase'],
                        row['model_quality'],
                        row['model_type'],
                        row['composite_score']
                    ))
                    inserted_metrics += 1
                except Exception as e:
                    logging.error(f"Error insertando métricas para {row['department']}: {e}")
                    continue
        
        return inserted_predictions, inserted_metrics
        
    except Exception as e:
        logging.error(f"Error insertando datos: {e}")
        raise

def display_save_summary(predictions_df, metrics_df):
    """
    Muestra un resumen de los datos guardados
    """
    print(f"\n" + "="*80)
    print(f"📋 RESUMEN DE DATOS GUARDADOS")
    print(f"="*80)
    
    if not metrics_df.empty:
        print(f"\n🏆 MÉTRICAS DE MODELOS SELECCIONADOS:")
        display_metrics = metrics_df[[
            'department', 'model_type', 'mape', 'smape', 
            'rmse', 'mae', 'model_quality', 'composite_score'
        ]].copy()
        display_metrics.columns = [
            'Departamento', 'Modelo', 'MAPE(%)', 'SMAPE(%)', 
            'RMSE', 'MAE', 'Calidad', 'Score'
        ]
        print(display_metrics.to_string(index=False, float_format='%.3f'))
    
    if not predictions_df.empty:
        print(f"\n📊 MUESTRA DE PREDICCIONES (Primeros 3 meses por departamento):")
        sample_predictions = []
        for dept in predictions_df['department'].unique()[:5]:  # Mostrar solo 5 departamentos
            dept_preds = predictions_df[predictions_df['department'] == dept].head(3)
            for _, row in dept_preds.iterrows():
                sample_predictions.append({
                    'Departamento': row['department'],
                    'Fecha': row['prediction_date'],
                    'Predicción': f"{row['predicted_value']:.1f}" if pd.notna(row['predicted_value']) else 'N/A',
                    'IC_Inferior': f"{row['confidence_lower']:.1f}" if pd.notna(row['confidence_lower']) else 'N/A',
                    'IC_Superior': f"{row['confidence_upper']:.1f}" if pd.notna(row['confidence_upper']) else 'N/A',
                    'Modelo': row['model_used']
                })
        
        if sample_predictions:
            sample_df = pd.DataFrame(sample_predictions)
            print(sample_df.to_string(index=False))
    
    # Estadísticas generales
    total_depts = len(predictions_df['department'].unique()) if not predictions_df.empty else 0
    total_predictions = len(predictions_df) if not predictions_df.empty else 0
    avg_mape = metrics_df['mape'].mean() if not metrics_df.empty and 'mape' in metrics_df.columns else 0
    
    print(f"\n📈 ESTADÍSTICAS GENERALES:")
    print(f"   • Departamentos procesados: {total_depts}")
    print(f"   • Total de predicciones: {total_predictions}")
    print(f"   • MAPE promedio: {avg_mape:.2f}%")
    
    if not metrics_df.empty:
        excellent_count = (metrics_df['model_quality'] == 'Excelente').sum()
        print(f"   • Modelos con calidad excelente: {excellent_count}/{len(metrics_df)}")

def verify_saved_data():
    """
    Verifica que los datos se hayan guardado correctamente
    """
    try:
        print(f"\n🔍 VERIFICANDO DATOS GUARDADOS EN SQL SERVER...")
        
        conn = get_db_connection()
        cursor = conn.cursor()
        
        # Verificar predicciones
        cursor.execute("SELECT COUNT(*) FROM Overtime_Predictions")
        pred_count = cursor.fetchone()[0]
        
        cursor.execute("""
            SELECT department, COUNT(*) as predictions_count 
            FROM Overtime_Predictions 
            GROUP BY department 
            ORDER BY predictions_count DESC
        """)
        dept_counts = cursor.fetchall()
        
        # Verificar métricas
        cursor.execute("SELECT COUNT(*) FROM ML_Model_Metrics_Overtime_Predictions")
        metrics_count = cursor.fetchone()[0]
        
        cursor.execute("""
            SELECT department, model_type, mape, model_quality
            FROM ML_Model_Metrics_Overtime_Predictions 
            ORDER BY department
        """)
        metrics_data = cursor.fetchall()
        
        print(f"✅ Predicciones en BD: {pred_count}")
        print(f"✅ Métricas en BD: {metrics_count}")
        
        if dept_counts:
            print(f"\n📊 Predicciones por departamento:")
            for dept, count in dept_counts[:5]:  # Top 5
                print(f"   • {dept}: {count} predicciones")
        
        if metrics_data:
            print(f"\n🏆 Métricas guardadas:")
            for dept, model, mape, quality in metrics_data[:5]:  # Top 5
                mape_str = f"{mape:.2f}%" if mape else "N/A"
                print(f"   • {dept}: {model} (MAPE: {mape_str}, Calidad: {quality})")
        
        conn.close()
        
    except Exception as e:
        logging.error(f"Error verificando datos guardados: {e}")
        print(f"❌ Error verificando datos: {e}")

# =====================
# EJECUCIÓN PRINCIPAL
# =====================
print("🚀 Iniciando guardado consistente de predicciones...")

try:
    # Guardar predicciones y métricas consistentes
    predictions_saved_df, metrics_saved_df = save_consistent_predictions_and_metrics()
    
    # Verificar que se guardaron correctamente
    verify_saved_data()
    
    print(f"\n🎉 ¡PROCESO COMPLETADO EXITOSAMENTE!")
    print(f"✅ Predicciones y métricas guardadas de forma consistente")
    print(f"✅ Los datos en SQL Server coinciden con las visualizaciones del módulo 5")
    
except Exception as e:
    print(f"\n❌ ERROR EN EL PROCESO DE GUARDADO:")
    print(f"   {e}")
    logging.error(f"Error en proceso principal de guardado: {e}")

🚀 Iniciando guardado consistente de predicciones...

💾 GUARDANDO PREDICCIONES Y MÉTRICAS CONSISTENTES EN SQL SERVER
🕐 Timestamp de ejecución: 2025-08-14 17:22:09.434578
📊 Procesando 6 departamentos...

🔍 Procesando Finance...
   🏆 Mejor modelo: Exponential_Smoothing
   📈 MAPE: 31.09%, SMAPE: 31.01%
   💾 Modelo guardado exitosamente
   ✅ Métricas preparadas
   ✅ 12 predicciones preparadas

🔍 Procesando HR...
   🏆 Mejor modelo: Exponential_Smoothing
   📈 MAPE: 25.83%, SMAPE: 25.56%
   💾 Modelo guardado exitosamente
   ✅ Métricas preparadas
   ✅ 12 predicciones preparadas

🔍 Procesando IT...
   🏆 Mejor modelo: Exponential_Smoothing
   📈 MAPE: 12.29%, SMAPE: 12.74%


2025-08-14 17:22:11,754 - INFO - Prophet no se guarda físicamente para Inventory


   💾 Modelo guardado exitosamente
   ✅ Métricas preparadas
   ✅ 12 predicciones preparadas

🔍 Procesando Inventory...
   🏆 Mejor modelo: Prophet
   📈 MAPE: 27.51%, SMAPE: 36.47%
   ⚠️ Modelo no guardado (Prophet o error)
   ✅ Métricas preparadas
   ✅ 12 predicciones preparadas

🔍 Procesando Marketing...
   🏆 Mejor modelo: Exponential_Smoothing
   📈 MAPE: 34.43%, SMAPE: 28.91%
   💾 Modelo guardado exitosamente
   ✅ Métricas preparadas
   ✅ 12 predicciones preparadas

🔍 Procesando Sales...
   🏆 Mejor modelo: Exponential_Smoothing
   📈 MAPE: 18.91%, SMAPE: 16.64%


2025-08-14 17:22:12,629 - INFO - Conexión a la base de datos exitosa


   💾 Modelo guardado exitosamente
   ✅ Métricas preparadas
   ✅ 12 predicciones preparadas

📋 Creando DataFrames...
   📊 Predicciones: 72 registros
   📈 Métricas: 6 registros
   💾 Modelos guardados: 5

🔌 Conectando a SQL Server...
   🏗️ Creando/actualizando tablas...
   ✅ Tablas creadas/actualizadas exitosamente
   📊 Insertando 72 predicciones...


2025-08-14 17:22:15,424 - INFO - Conexión a la base de datos exitosa


   📈 Insertando 6 métricas...
✅ Datos guardados exitosamente en SQL Server
   📊 Predicciones insertadas: 72
   📈 Métricas insertadas: 6

📋 RESUMEN DE DATOS GUARDADOS

🏆 MÉTRICAS DE MODELOS SELECCIONADOS:
Departamento                Modelo  MAPE(%)  SMAPE(%)    RMSE     MAE   Calidad  Score
     Finance Exponential_Smoothing   31.088    31.007  76.466  48.916 Aceptable  0.910
          HR Exponential_Smoothing   25.829    25.557  45.462  34.056 Aceptable  0.831
          IT Exponential_Smoothing   12.289    12.735  42.484  32.438 Aceptable  0.481
   Inventory               Prophet   27.508    36.475 110.235  83.276 Aceptable  0.686
   Marketing Exponential_Smoothing   34.435    28.913  51.612  42.636 Aceptable  0.816
       Sales Exponential_Smoothing   18.914    16.644 343.326 292.825 Aceptable  0.542

📊 MUESTRA DE PREDICCIONES (Primeros 3 meses por departamento):
Departamento      Fecha Predicción IC_Inferior IC_Superior                Modelo
     Finance 2025-05-01      122.9        