# HR Overtime Prediction and Forecasting
## Predicción y Pronóstico de Horas Extra de RRHH

Los componentes principales incluyen:

* Carga de datos históricos desde SQL Server con manejo de datos faltantes mediante interpolación.
* Análisis exploratorio con una lógica comparativa para determinar automáticamente si la serie es aditiva o multiplicativa.
* Preprocesamiento para garantizar estacionariedad y detección de outliers con filtrado de Hampel.
* Entrenamiento de tres modelos: ARIMA (con selección automática de parámetros), Prophet (como fallback y con modo de estacionalidad dinámico) y Suavización Exponencial (Holt) también como fallbaclk.
* Selección automática del mejor modelo basada en un conjunto de métricas (RMSE, MAE, SMAPE, MASE), con prioridad para ARIMA(0,0,0) si resulta ser un modelo válido.
* Generación de predicciones para los próximos 6 meses con intervalos de confianza, utilizando únicamente el mejor modelo seleccionado.
* Visualización interactiva con Plotly para datos históricos, descomposición de series y pronósticos.
* Almacenamiento de modelos, predicciones detalladas y métricas de rendimiento en SQL Server para trazabilidad y auditoría.

**Objetivo:** Producir pronósticos precisos y robustos de horas extra por departamento para los próximos 6 meses, minimizando la intervención manual y asegurando la calidad del modelo mediante una selección competitiva.

## 1. Importar Librerías y Configuración

Importamos las librerías necesarias para análisis, modelado, visualización y conexión con la base de datos. Configuramos el logging para trazabilidad y suprimimos advertencias innecesarias para mantener un output limpio.

In [1]:
import pandas as pd
import numpy as np
import pymssql
import logging
import datetime
from datetime import datetime
import os
from statsmodels.tsa.seasonal import seasonal_decompose
from statsmodels.tsa.holtwinters import ExponentialSmoothing
from statsmodels.tsa.stattools import adfuller
from pmdarima import auto_arima
from prophet import Prophet
from sklearn.metrics import mean_squared_error, mean_absolute_error, mean_absolute_percentage_error
from sklearn.model_selection import TimeSeriesSplit
from scipy.stats import median_abs_deviation
import joblib
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import warnings

# Configurar advertencias y logging
warnings.filterwarnings("ignore")
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(levelname)s - %(message)s',
    handlers=[
        logging.StreamHandler(),
        logging.FileHandler('overtime_forecast.log')
    ]
)

  from .autonotebook import tqdm as notebook_tqdm


## 2. Conectar a la Base de Datos
Establecemos una conexión segura a SQL Server para cargar datos históricos y almacenar resultados.

In [2]:
def get_db_connection():
    SQL_SERVER = "172.28.192.1:50121"
    SQL_DB = "HR_Analytics"
    SQL_USER = "sa"
    SQL_PASSWORD = "123456"
    try:
        conn = pymssql.connect(
            server=SQL_SERVER,
            database=SQL_DB,
            user=SQL_USER,
            password=SQL_PASSWORD
        )
        logging.info("Conexión a la base de datos exitosa")
        return conn
    except Exception as e:
        logging.error(f"Error de conexión a la base de datos: {e}")
        raise

## 3. Cargar y Procesar Datos Históricos

Cargamos los datos históricos con `work_date < 2025-05-04`, los agregamos quincenalmente y manejamos datos faltantes con interpolación. Se aplica un filtrado de Hampel para detectar y corregir outliers. Finalmente, se realiza una descomposición de series de tiempo, seleccionando automáticamente el modelo (aditivo o multiplicativo) que mejor se ajuste a los datos según la varianza de los residuos.

In [3]:
def load_data():
    try:
        conn = get_db_connection()
        query = "SELECT work_date, department, total_overtime FROM vw_historical_data WHERE work_date < '2025-05-04' ORDER BY work_date"
        df = pd.read_sql(query, conn)
        conn.close()
        logging.info(f"Datos cargados: {len(df)} registros")
        df['work_date'] = pd.to_datetime(df['work_date'])
        
        # Agrupar por quincena
        df.set_index('work_date', inplace=True)
        df = df.groupby('department').resample('2W').sum(numeric_only=True).reset_index()
        df.rename(columns={'work_date': 'ds', 'total_overtime': 'y'}, inplace=True)
        
        return df
    except Exception as e:
        logging.error(f"Error al cargar datos: {e}")
        return pd.DataFrame()

def handle_outliers(df):
    df_cleaned = df.copy()
    for dept in df_cleaned['department'].unique():
        dept_data = df_cleaned[df_cleaned['department'] == dept]['y']
        
        # Filtrado de Hampel para detección de outliers
        median = dept_data.median()
        mad = median_abs_deviation(dept_data)
        threshold = 3 * mad
        
        lower_bound = median - threshold
        upper_bound = median + threshold
        
        outliers = dept_data[(dept_data < lower_bound) | (dept_data > upper_bound)]
        if not outliers.empty:
            logging.warning(f"Outliers detectados en {dept}:\n{df_cleaned[(df_cleaned['department'] == dept) & df_cleaned['y'].isin(outliers)]}")
            df_cleaned.loc[(df_cleaned['department'] == dept) & (df_cleaned['y'].isin(outliers)), 'y'] = np.nan
        
    df_cleaned['y'] = df_cleaned.groupby('department')['y'].transform(lambda x: x.interpolate(method='linear'))
    return df_cleaned

df_all_data = load_data()
df_all_data = handle_outliers(df_all_data)

logging.info(f"Resumen de datos:\nDepartamentos únicos: {df_all_data['department'].unique()}\nFechas únicas: {df_all_data['ds'].unique()}\nTotal registros: {len(df_all_data)}")


def select_decomposition_type(series):
    # Rellenar valores nulos con la media de la serie para evitar errores
    series_filled = series.fillna(series.mean())

    # Realizar el test de Dickey-Fuller para verificar estacionalidad
    adf_result = adfuller(series_filled)
    is_stationary = adf_result[1] <= 0.05

    # Descomposición aditiva
    result_add = seasonal_decompose(series_filled, model='additive', period=12)
    std_add = np.nanstd(result_add.resid)
    
    # Descomposición multiplicativa
    result_mul = seasonal_decompose(series_filled, model='multiplicative', period=12)
    std_mul = np.nanstd(result_mul.resid)

    decomposition_type = 'additive' if std_add < std_mul else 'multiplicative'

    # Imprimir los valores de std y el p-value para referencia
    print(f"Desviación estándar de los residuales aditivos: {std_add:.2f}")
    print(f"Desviación estándar de los residuales multiplicativos: {std_mul:.2f}")
    print(f"P-value del test ADF: {adf_result[1]:.2f}")
    
    # Si la serie no es estacionaria y no tiene una tendencia clara, se descarta ARIMA
    if not is_stationary and decomposition_type == 'additive':
        logging.info("La serie no es estacionaria y no tiene una tendencia clara. Se descarta ARIMA.")
        return decomposition_type, False
    
    logging.info(f"Criterio de selección: La desviación estándar de los residuos {decomposition_type} es menor.")
    return decomposition_type, True

def plot_decomposition(series, decomposition_type, dept):
    # Rellenar valores nulos con la media de la serie para una visualización completa
    series_filled = series.fillna(series.mean())
    result = seasonal_decompose(series_filled, model=decomposition_type, period=12)
    fig = make_subplots(rows=4, cols=1, subplot_titles=['Original', 'Tendencia', 'Estacionalidad', 'Residuos'])
    
    fig.add_trace(go.Scatter(x=series.index, y=series, mode='lines', name='Original'), row=1, col=1)
    fig.add_trace(go.Scatter(x=result.trend.index, y=result.trend, mode='lines', name='Tendencia'), row=2, col=1)
    fig.add_trace(go.Scatter(x=result.seasonal.index, y=result.seasonal, mode='lines', name='Estacionalidad'), row=3, col=1)
    fig.add_trace(go.Scatter(x=result.resid.index, y=result.resid, mode='lines', name='Residuos'), row=4, col=1)
    
    fig.update_layout(height=800, title_text=f"Descomposición de la Serie Temporal ({decomposition_type.capitalize()}) para {dept}")
    fig.show()

# Ejemplo de uso
for dept in df_all_data['department'].unique():
    dept_data = df_all_data[df_all_data['department'] == dept].set_index('ds')['y']
    decomposition_type, _ = select_decomposition_type(dept_data)
    logging.info(f"El departamento {dept} tiene una descomposición {decomposition_type}.")
    plot_decomposition(dept_data, decomposition_type, dept)

2025-08-09 15:48:20,181 - INFO - Conexión a la base de datos exitosa
2025-08-09 15:48:20,785 - INFO - Datos cargados: 420 registros
   department         ds       y
0     Finance 2023-12-31  115.84
12    Finance 2024-06-16  382.81
25    Finance 2024-12-15  416.33
35    Finance 2025-05-04  128.96
   department         ds       y
36         HR 2023-12-31  159.77
71         HR 2025-05-04  194.28
    department         ds       y
72          IT 2023-12-31  144.26
73          IT 2024-01-14  330.33
97          IT 2024-12-15  540.77
107         IT 2025-05-04  245.44
    department         ds       y
108  Inventory 2023-12-31  261.12
129  Inventory 2024-10-20  616.69
132  Inventory 2024-12-01  661.05
133  Inventory 2024-12-15  847.48
143  Inventory 2025-05-04  262.63
    department         ds       y
144  Marketing 2023-12-31  200.85
169  Marketing 2024-12-15  653.42
179  Marketing 2025-05-04  238.23
    department         ds       y
180      Sales 2023-12-31  199.68
192      Sales 2024-06-16 

Desviación estándar de los residuales aditivos: 15.56
Desviación estándar de los residuales multiplicativos: 0.05
P-value del test ADF: 0.00


2025-08-09 15:48:29,879 - INFO - Criterio de selección: La desviación estándar de los residuos multiplicative es menor.
2025-08-09 15:48:29,891 - INFO - El departamento HR tiene una descomposición multiplicative.


Desviación estándar de los residuales aditivos: 24.99
Desviación estándar de los residuales multiplicativos: 0.06
P-value del test ADF: 0.00


2025-08-09 15:48:30,012 - INFO - Criterio de selección: La desviación estándar de los residuos multiplicative es menor.
2025-08-09 15:48:30,013 - INFO - El departamento IT tiene una descomposición multiplicative.


Desviación estándar de los residuales aditivos: 15.15
Desviación estándar de los residuales multiplicativos: 0.04
P-value del test ADF: 0.00


2025-08-09 15:48:30,126 - INFO - Criterio de selección: La desviación estándar de los residuos multiplicative es menor.
2025-08-09 15:48:30,128 - INFO - El departamento Inventory tiene una descomposición multiplicative.


Desviación estándar de los residuales aditivos: 17.28
Desviación estándar de los residuales multiplicativos: 0.03
P-value del test ADF: 0.00


2025-08-09 15:48:30,204 - INFO - Criterio de selección: La desviación estándar de los residuos multiplicative es menor.
2025-08-09 15:48:30,205 - INFO - El departamento Marketing tiene una descomposición multiplicative.


Desviación estándar de los residuales aditivos: 31.14
Desviación estándar de los residuales multiplicativos: 0.07
P-value del test ADF: 0.00


2025-08-09 15:48:30,279 - INFO - Criterio de selección: La desviación estándar de los residuos multiplicative es menor.
2025-08-09 15:48:30,280 - INFO - El departamento Sales tiene una descomposición multiplicative.


Desviación estándar de los residuales aditivos: 23.60
Desviación estándar de los residuales multiplicativos: 0.05
P-value del test ADF: 0.00


## 4. Entrenamiento y Selección de Modelos

Para cada departamento, entrenamos y evaluamos tres modelos de pronóstico:

1.  **ARIMA**: Se utiliza `auto_arima` para encontrar los parámetros óptimos. 
2.  **Prophet**: Actúa como un modelo robusto de respaldo. Su modo de estacionalidad (`additive` o `multiplicative`) se configura dinámicamente según el análisis de descomposición previo.
3.  **Holt (Suavización Exponencial)**: Un modelo clásico que captura tendencias.

Los modelos se evalúan mediante validación cruzada de series de tiempo. El mejor modelo se selecciona en función de su rendimiento en métricas clave (RMSE, MAE, SMAPE, MASE). Se genera una tabla comparativa para visualizar el rendimiento y la selección final.

In [4]:
def train_and_compare_models_with_cv(series, decomposition_type, consider_arima):
    series_filled = series.fillna(series.mean()).interpolate(method='linear')
    
    min_train_size = 24
    if len(series_filled) < min_train_size:
        raise ValueError(f"La serie tiene solo {len(series_filled)} puntos, se necesitan al menos {min_train_size} para el entrenamiento estacional.")
    
    n_splits = len(series_filled) // 12 - 1 
    if n_splits < 1:
        n_splits = 1
        
    tscv = TimeSeriesSplit(n_splits=n_splits)
    
    all_arima_rmses, all_arima_mapes = [], []
    all_es_rmses, all_es_mapes = [], []
    all_prophet_rmses, all_prophet_mapes = [], []

    for train_index, val_index in tscv.split(series_filled):
        train_data = series_filled.iloc[train_index]
        val_data = series_filled.iloc[val_index]
        
        if len(train_data) < min_train_size:
            continue

        # Modelo 1: ARIMA (Solo se entrena si se considera)
        if consider_arima:
            try:
                arima_model = auto_arima(train_data, seasonal=False, stepwise=True, suppress_warnings=True, error_action='ignore')
                arima_preds = arima_model.predict(n_periods=len(val_data))
                all_arima_rmses.append(np.sqrt(mean_squared_error(val_data, arima_preds)))
                all_arima_mapes.append(mean_absolute_percentage_error(val_data, arima_preds))
            except Exception as e:
                logging.error(f"Error al entrenar ARIMA: {e}")
                all_arima_rmses.append(np.inf)
                all_arima_mapes.append(np.inf)
        else:
            all_arima_rmses.append(np.inf)
            all_arima_mapes.append(np.inf)

        # Modelo 2: Exponential Smoothing
        try:
            es_model = ExponentialSmoothing(train_data, trend='add', seasonal=decomposition_type, seasonal_periods=12).fit()
            es_preds = es_model.predict(start=val_data.index[0], end=val_data.index[-1])
            all_es_rmses.append(np.sqrt(mean_squared_error(val_data, es_preds)))
            all_es_mapes.append(mean_absolute_percentage_error(val_data, es_preds))
        except Exception as e:
            logging.error(f"Error al entrenar Exponential Smoothing: {e}")
            all_es_rmses.append(np.inf)
            all_es_mapes.append(np.inf)

        # Modelo 3: Prophet
        try:
            prophet_df_train = train_data.reset_index().rename(columns={'index': 'ds', 'y': 'y'})
            prophet_model = Prophet(seasonality_mode=decomposition_type)
            # Ajustar seasonality biweekly a 26 periodos (6 meses bi-semanalmente)
            prophet_model.add_seasonality(name='biweekly', period=26, fourier_order=5)
            prophet_model.fit(prophet_df_train)
            
            future = prophet_model.make_future_dataframe(periods=len(val_data), freq='2W')
            prophet_preds_df = prophet_model.predict(future)
            prophet_preds = prophet_preds_df['yhat'].iloc[-len(val_data):]
            all_prophet_rmses.append(np.sqrt(mean_squared_error(val_data, prophet_preds)))
            all_prophet_mapes.append(mean_absolute_percentage_error(val_data, prophet_preds))
        except Exception as e:
            logging.error(f"Error al entrenar Prophet: {e}")
            all_prophet_rmses.append(np.inf)
            all_prophet_mapes.append(np.inf)
    
    avg_arima_rmse = np.mean(all_arima_rmses) if all_arima_rmses else np.inf
    avg_arima_mape = np.mean(all_arima_mapes) if all_arima_mapes else np.inf
    avg_es_rmse = np.mean(all_es_rmses) if all_es_rmses else np.inf
    avg_es_mape = np.mean(all_es_mapes) if all_es_mapes else np.inf
    avg_prophet_rmse = np.mean(all_prophet_rmses) if all_prophet_rmses else np.inf
    avg_prophet_mape = np.mean(all_prophet_mapes) if all_prophet_mapes else np.inf

    metrics = pd.DataFrame({
        'Modelo': ['ARIMA', 'Exponential Smoothing', 'Prophet'],
        'RMSE': [avg_arima_rmse, avg_es_rmse, avg_prophet_rmse],
        'MAPE': [avg_arima_mape * 100, avg_es_mape * 100, avg_prophet_mape * 100]
    }).sort_values('RMSE')

    best_model_name = metrics.iloc[0]['Modelo']
    best_model = None

    series_filled_final = series.fillna(series.mean()).interpolate(method='linear')
    if best_model_name == 'ARIMA':
        best_model = auto_arima(series_filled_final, seasonal=False, stepwise=True, suppress_warnings=True, error_action='ignore')
    elif best_model_name == 'Exponential Smoothing':
        best_model = ExponentialSmoothing(series_filled_final, trend='add', seasonal=decomposition_type, seasonal_periods=12).fit()
    elif best_model_name == 'Prophet':
        prophet_df_all = series_filled_final.reset_index().rename(columns={'index': 'ds', 'y': 'y'})
        best_model = Prophet(seasonality_mode=decomposition_type)
        best_model.add_seasonality(name='biweekly', period=26, fourier_order=5)
        best_model.fit(prophet_df_all)
    
    return best_model, best_model_name, metrics



# Lógica principal del script (Líneas corregidas)
all_metrics = {}
all_forecasts = {}
n_periods_forecast = 12 

for dept in df_all_data['department'].unique():
    logging.info(f"Iniciando análisis para el departamento: {dept}")
    dept_data = df_all_data[df_all_data['department'] == dept].set_index('ds')['y']
    

    decomposition_type, consider_arima = select_decomposition_type(dept_data)
    
    best_model, best_model_name, metrics = train_and_compare_models_with_cv(dept_data, decomposition_type, consider_arima)
    
    # ---- LÓGICA DE VERIFICACIÓN DE LÍNEA PLANA ----
    series_filled_final = dept_data.fillna(dept_data.mean()).interpolate(method='linear')
    
    historical_pred_check = None
    if best_model_name == 'ARIMA':
        historical_pred_check = best_model.predict_in_sample()
    elif best_model_name == 'Exponential Smoothing':
        historical_pred_check = pd.Series(best_model.fittedvalues, index=dept_data.index)
    elif best_model_name == 'Prophet':
        prophet_df_all = series_filled_final.reset_index().rename(columns={'index': 'ds', 'y': 'y'})
        future = best_model.make_future_dataframe(periods=n_periods_forecast, freq='2W', include_history=False)  # Solo futuro
        forecast_df = best_model.predict(future)
        historical_pred_check = forecast_df.set_index('ds')['yhat'].reindex(dept_data.index, method='ffill')  # Alinear con históricos

    if historical_pred_check is not None and np.std(historical_pred_check) < 0.1:
        logging.warning(f"El mejor modelo actual ({best_model_name}) produce una predicción histórica de línea plana. Se seleccionará el siguiente mejor modelo.")
        
        second_best_model_name = metrics.iloc[1]['Modelo']
        metrics = metrics.iloc[1:].reset_index(drop=True)
        best_model_name = second_best_model_name
        
        if best_model_name == 'ARIMA':
            best_model = auto_arima(series_filled_final, seasonal=False, stepwise=True, suppress_warnings=True, error_action='ignore')
        elif best_model_name == 'Exponential Smoothing':
            best_model = ExponentialSmoothing(series_filled_final, trend='add', seasonal=decomposition_type, seasonal_periods=12).fit()
        elif best_model_name == 'Prophet':
            prophet_df_all = series_filled_final.reset_index().rename(columns={'index': 'ds', 'y': 'y'})
            best_model = Prophet(seasonality_mode=decomposition_type)
            best_model.add_seasonality(name='biweekly', period=26, fourier_order=5)
            best_model.fit(prophet_df_all)
            
    # ---- FIN DE LA LÓGICA DE VERIFICACIÓN ----

    all_metrics[dept] = metrics
    
    logging.info(f"El mejor modelo FINAL para {dept} es: {best_model_name} con RMSE de {metrics.iloc[0]['RMSE']:.2f}")
    print(f"Métricas de los modelos para {dept}:\n{metrics}\n")
    
    # Generar predicciones para el periodo histórico + futuro
    if best_model_name == 'ARIMA':
        forecast_historical = best_model.predict_in_sample()
        forecast_future = best_model.predict(n_periods=n_periods_forecast)
        
        forecast_combined = pd.concat([forecast_historical, forecast_future])
        forecast_combined.name = 'yhat'

        future_index = pd.date_range(start=dept_data.index[-1] + pd.Timedelta(days=1), periods=n_periods_forecast, freq='2W')
        forecast_df_final = pd.DataFrame(forecast_combined.iloc[-n_periods_forecast:].values, index=future_index, columns=['yhat'])
        forecast_df_final['yhat_lower'] = forecast_df_final['yhat'] - (forecast_df_final['yhat'] * 0.1)
        forecast_df_final['yhat_upper'] = forecast_df_final['yhat'] + (forecast_df_final['yhat'] * 0.1)

        historical_pred_df = pd.DataFrame(forecast_combined.iloc[:len(dept_data)].values, index=dept_data.index, columns=['yhat'])
        historical_pred_df['yhat_lower'] = historical_pred_df['yhat'] - (historical_pred_df['yhat'] * 0.1)
        historical_pred_df['yhat_upper'] = historical_pred_df['yhat'] + (historical_pred_df['yhat'] * 0.1)

    elif best_model_name == 'Exponential Smoothing':
        forecast_historical = pd.Series(best_model.fittedvalues, index=dept_data.index)
        forecast_future = best_model.forecast(steps=n_periods_forecast)
        
        # Asegurar que las fechas futuras se asignen como columna 'ds'
        future_index = pd.date_range(start=dept_data.index[-1] + pd.Timedelta(days=1), periods=n_periods_forecast, freq='2W')
        forecast_df_final = pd.DataFrame({
            'ds': future_index,
            'yhat': forecast_future.values
        })
        forecast_df_final['yhat_lower'] = forecast_df_final['yhat'] - (forecast_df_final['yhat'] * 0.1)
        forecast_df_final['yhat_upper'] = forecast_df_final['yhat'] + (forecast_df_final['yhat'] * 0.1)
        
        historical_pred_df = pd.DataFrame(forecast_historical, columns=['yhat'])
        historical_pred_df['yhat_lower'] = historical_pred_df['yhat'] - (historical_pred_df['yhat'] * 0.1)
        historical_pred_df['yhat_upper'] = historical_pred_df['yhat'] + (historical_pred_df['yhat'] * 0.1)

    elif best_model_name == 'Prophet':
        prophet_df_all = dept_data.reset_index().rename(columns={'index': 'ds', 'y': 'y'})
        # Generar solo las fechas futuras a partir del día siguiente
        last_date = dept_data.index[-1]
        future_dates = pd.date_range(start=last_date + pd.Timedelta(days=1), periods=n_periods_forecast, freq='2W')
        future_df = pd.DataFrame({'ds': future_dates})
        forecast_df = best_model.predict(future_df)
        forecast_df_final = forecast_df.set_index('ds')[['yhat', 'yhat_lower', 'yhat_upper']]
        
        # Generar predicciones históricas
        future_all = best_model.make_future_dataframe(periods=len(dept_data) + n_periods_forecast, freq='2W', include_history=True)
        forecast_all = best_model.predict(future_all)
        historical_pred_df = forecast_all.iloc[:len(dept_data)].set_index('ds')[['yhat', 'yhat_lower', 'yhat_upper']]

    all_forecasts[dept] = {'future': forecast_df_final, 'historical_pred': historical_pred_df}

2025-08-09 15:48:30,433 - INFO - Iniciando análisis para el departamento: Finance
2025-08-09 15:48:30,447 - INFO - Criterio de selección: La desviación estándar de los residuos multiplicative es menor.


Desviación estándar de los residuales aditivos: 15.56
Desviación estándar de los residuales multiplicativos: 0.05
P-value del test ADF: 0.00


2025-08-09 15:48:35,463 - DEBUG - cmd: where.exe tbb.dll
cwd: None
2025-08-09 15:48:36,053 - DEBUG - Adding TBB (c:\Users\joey_\AppData\Local\Programs\Python\Maestro_Yoda\lib\site-packages\prophet\stan_model\cmdstan-2.33.1\stan\lib\stan_math\lib\tbb) to PATH
2025-08-09 15:48:36,083 - INFO - Disabling yearly seasonality. Run prophet with yearly_seasonality=True to override this.
2025-08-09 15:48:36,085 - INFO - Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this.
2025-08-09 15:48:36,087 - INFO - Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
2025-08-09 15:48:36,243 - INFO - n_changepoints greater than number of observations. Using 18.
2025-08-09 15:48:36,250 - DEBUG - input tempfile: C:\Users\joey_\AppData\Local\Temp\tmp9m1vbk43\uizm3h_g.json
2025-08-09 15:48:36,259 - DEBUG - input tempfile: C:\Users\joey_\AppData\Local\Temp\tmp9m1vbk43\_b_nby4m.json
2025-08-09 15:48:36,295 - DEBUG - idx 0
2025-08-09 15:48:36,29

Métricas de los modelos para Finance:
                  Modelo       RMSE      MAPE
1  Exponential Smoothing  21.304284  5.264780
0                  ARIMA  24.475237  6.670518
2                Prophet  27.261435  6.988199

Desviación estándar de los residuales aditivos: 24.99
Desviación estándar de los residuales multiplicativos: 0.06
P-value del test ADF: 0.00


2025-08-09 15:48:53,921 - DEBUG - cmd: where.exe tbb.dll
cwd: None
2025-08-09 15:48:54,485 - DEBUG - TBB already found in load path
2025-08-09 15:48:54,498 - INFO - Disabling yearly seasonality. Run prophet with yearly_seasonality=True to override this.
2025-08-09 15:48:54,501 - INFO - Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this.
2025-08-09 15:48:54,503 - INFO - Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
2025-08-09 15:48:54,535 - INFO - n_changepoints greater than number of observations. Using 18.
2025-08-09 15:48:54,543 - DEBUG - input tempfile: C:\Users\joey_\AppData\Local\Temp\tmp9m1vbk43\zckgs1rd.json
2025-08-09 15:48:54,551 - DEBUG - input tempfile: C:\Users\joey_\AppData\Local\Temp\tmp9m1vbk43\_vobeybj.json
2025-08-09 15:48:54,566 - DEBUG - idx 0
2025-08-09 15:48:54,569 - DEBUG - running CmdStan, num_threads: None
2025-08-09 15:48:54,572 - DEBUG - CmdStan args: ['C:\\Users\\joey_\\AppData\\Loc

Métricas de los modelos para HR:
                  Modelo       RMSE      MAPE
2                Prophet  32.490940  7.393169
0                  ARIMA  34.172555  7.003698
1  Exponential Smoothing  42.879806  8.428653



2025-08-09 15:48:57,037 - INFO - Iniciando análisis para el departamento: IT
2025-08-09 15:48:57,059 - INFO - Criterio de selección: La desviación estándar de los residuos multiplicative es menor.


Desviación estándar de los residuales aditivos: 15.15
Desviación estándar de los residuales multiplicativos: 0.04
P-value del test ADF: 0.00


2025-08-09 15:49:02,494 - DEBUG - cmd: where.exe tbb.dll
cwd: None
2025-08-09 15:49:03,277 - DEBUG - TBB already found in load path
2025-08-09 15:49:03,291 - INFO - Disabling yearly seasonality. Run prophet with yearly_seasonality=True to override this.
2025-08-09 15:49:03,294 - INFO - Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this.
2025-08-09 15:49:03,296 - INFO - Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
2025-08-09 15:49:03,326 - INFO - n_changepoints greater than number of observations. Using 18.
2025-08-09 15:49:03,334 - DEBUG - input tempfile: C:\Users\joey_\AppData\Local\Temp\tmp9m1vbk43\qh7mq84i.json
2025-08-09 15:49:03,352 - DEBUG - input tempfile: C:\Users\joey_\AppData\Local\Temp\tmp9m1vbk43\b8zb272c.json
2025-08-09 15:49:03,357 - DEBUG - idx 0
2025-08-09 15:49:03,360 - DEBUG - running CmdStan, num_threads: None
2025-08-09 15:49:03,362 - DEBUG - CmdStan args: ['C:\\Users\\joey_\\AppData\\Loc

Métricas de los modelos para IT:
                  Modelo       RMSE      MAPE
0  Exponential Smoothing  37.594536  8.293211
1                Prophet  45.665765  9.338043

Desviación estándar de los residuales aditivos: 17.28
Desviación estándar de los residuales multiplicativos: 0.03
P-value del test ADF: 0.00


2025-08-09 15:49:12,460 - DEBUG - cmd: where.exe tbb.dll
cwd: None
2025-08-09 15:49:13,013 - DEBUG - TBB already found in load path
2025-08-09 15:49:13,027 - INFO - Disabling yearly seasonality. Run prophet with yearly_seasonality=True to override this.
2025-08-09 15:49:13,029 - INFO - Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this.
2025-08-09 15:49:13,033 - INFO - Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
2025-08-09 15:49:13,077 - INFO - n_changepoints greater than number of observations. Using 18.
2025-08-09 15:49:13,159 - DEBUG - input tempfile: C:\Users\joey_\AppData\Local\Temp\tmp9m1vbk43\knnosjx6.json
2025-08-09 15:49:13,182 - DEBUG - input tempfile: C:\Users\joey_\AppData\Local\Temp\tmp9m1vbk43\gdyhe5oi.json
2025-08-09 15:49:13,189 - DEBUG - idx 0
2025-08-09 15:49:13,193 - DEBUG - running CmdStan, num_threads: None
2025-08-09 15:49:13,203 - DEBUG - CmdStan args: ['C:\\Users\\joey_\\AppData\\Loc

Métricas de los modelos para Inventory:
                  Modelo       RMSE      MAPE
0                  ARIMA  32.481733  4.658158
1  Exponential Smoothing  33.678559  5.401582
2                Prophet  46.783069  7.431419

Desviación estándar de los residuales aditivos: 31.14
Desviación estándar de los residuales multiplicativos: 0.07
P-value del test ADF: 0.00


2025-08-09 15:49:30,077 - DEBUG - cmd: where.exe tbb.dll
cwd: None
2025-08-09 15:49:30,430 - DEBUG - TBB already found in load path
2025-08-09 15:49:30,445 - INFO - Disabling yearly seasonality. Run prophet with yearly_seasonality=True to override this.
2025-08-09 15:49:30,447 - INFO - Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this.
2025-08-09 15:49:30,450 - INFO - Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
2025-08-09 15:49:30,480 - INFO - n_changepoints greater than number of observations. Using 18.
2025-08-09 15:49:30,490 - DEBUG - input tempfile: C:\Users\joey_\AppData\Local\Temp\tmp9m1vbk43\xb2cu56e.json
2025-08-09 15:49:30,497 - DEBUG - input tempfile: C:\Users\joey_\AppData\Local\Temp\tmp9m1vbk43\c54zbq55.json
2025-08-09 15:49:30,503 - DEBUG - idx 0
2025-08-09 15:49:30,509 - DEBUG - running CmdStan, num_threads: None
2025-08-09 15:49:30,511 - DEBUG - CmdStan args: ['C:\\Users\\joey_\\AppData\\Loc

Métricas de los modelos para Marketing:
                  Modelo       RMSE       MAPE
1  Exponential Smoothing  39.495186   6.665473
2                Prophet  56.455266   9.980317
0                  ARIMA  57.698094  10.268615

Desviación estándar de los residuales aditivos: 23.60
Desviación estándar de los residuales multiplicativos: 0.05
P-value del test ADF: 0.00


2025-08-09 15:49:35,233 - DEBUG - cmd: where.exe tbb.dll
cwd: None
2025-08-09 15:49:35,610 - DEBUG - TBB already found in load path
2025-08-09 15:49:35,620 - INFO - Disabling yearly seasonality. Run prophet with yearly_seasonality=True to override this.
2025-08-09 15:49:35,623 - INFO - Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this.
2025-08-09 15:49:35,625 - INFO - Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
2025-08-09 15:49:35,659 - INFO - n_changepoints greater than number of observations. Using 18.
2025-08-09 15:49:35,672 - DEBUG - input tempfile: C:\Users\joey_\AppData\Local\Temp\tmp9m1vbk43\ss95pbjq.json
2025-08-09 15:49:35,683 - DEBUG - input tempfile: C:\Users\joey_\AppData\Local\Temp\tmp9m1vbk43\86uff2c8.json
2025-08-09 15:49:35,694 - DEBUG - idx 0
2025-08-09 15:49:35,697 - DEBUG - running CmdStan, num_threads: None
2025-08-09 15:49:35,700 - DEBUG - CmdStan args: ['C:\\Users\\joey_\\AppData\\Loc

Métricas de los modelos para Sales:
                  Modelo       RMSE       MAPE
0  Exponential Smoothing  51.742158   9.909054
1                Prophet  52.606740  10.037457



## 5. Generar Predicciones

Generamos predicciones para las próximas 12 quincenas con intervalos de confianza, utilizando el mejor modelo seleccionado para cada departamento.

In [5]:
def plot_forecast_combined(historical_data, historical_pred, future_forecast, dept, best_model_name):
    fig = go.Figure()

    # Asegurarse de que los índices sean fechas continuas y ordenadas
    historical_data = historical_data.sort_index()
    historical_pred = historical_pred.sort_index()

    # Convertir future_forecast a DataFrame si es una Serie, usando el índice como 'ds'
    if isinstance(future_forecast, pd.Series):
        future_data = pd.DataFrame({
            'ds': future_forecast.index,
            'yhat': future_forecast.values
        })
        # Generar intervalos de confianza aproximados si no están presentes
        if 'yhat_lower' not in future_data.columns or 'yhat_upper' not in future_data.columns:
            future_data['yhat_lower'] = future_data['yhat'] * 0.9
            future_data['yhat_upper'] = future_data['yhat'] * 1.1
    elif isinstance(future_forecast, pd.DataFrame):
        future_data = future_forecast.copy()
        if 'ds' not in future_data.columns:
            future_data['ds'] = pd.to_datetime(future_data.index)
        if 'yhat' not in future_data.columns:
            raise ValueError("future_forecast debe contener la columna 'yhat'")
        if 'yhat_lower' not in future_data.columns or 'yhat_upper' not in future_data.columns:
            future_data['yhat_lower'] = future_data['yhat'] * 0.9  # Placeholder
            future_data['yhat_upper'] = future_data['yhat'] * 1.1  # Placeholder
    else:
        raise ValueError("future_forecast debe ser una Serie o DataFrame")

    # Convertir 'ds' a datetime si no lo es
    future_data['ds'] = pd.to_datetime(future_data['ds'])

    # Depuración: Mostrar el rango de fechas en future_data
    logging.info(f"Rango de fechas en future_data para {dept}: {future_data['ds'].min()} a {future_data['ds'].max()}")

    # Agregar la línea de datos históricos reales
    fig.add_trace(go.Scatter(x=historical_data.index, y=historical_data, mode='lines', name='Histórico Real', line=dict(color='blue')))

    # Agregar la línea de predicción histórica (amarillo dash)
    fig.add_trace(go.Scatter(x=historical_pred.index, y=historical_pred['yhat'], mode='lines', name='Predicción Histórica', line=dict(color='orange', dash='dash')))

    # Agregar la línea de pronóstico futuro (rojo punteado)
    future_mask = future_data['ds'] > historical_data.index[-1]
    if not future_mask.any():
        logging.warning(f"No hay datos futuros para {dept}. Verifica future_periods en calculate_metrics.")
    fig.add_trace(go.Scatter(x=future_data.loc[future_mask, 'ds'], y=future_data.loc[future_mask, 'yhat'], mode='lines', name='Pronóstico Futuro', line=dict(color='red', dash='dot')))

    # Agregar el intervalo de confianza para el pronóstico futuro
    fig.add_trace(go.Scatter(
        x=future_data.loc[future_mask, 'ds'].tolist() + future_data.loc[future_mask, 'ds'].tolist()[::-1],
        y=future_data.loc[future_mask, 'yhat_upper'].tolist() + future_data.loc[future_mask, 'yhat_lower'].tolist()[::-1],
        fill='toself',
        fillcolor='rgba(255,0,0,0.1)',
        line=dict(color='rgba(255,255,255,0)'),
        hoverinfo="skip",
        showlegend=False,
        name='Intervalo de Confianza'
    ))

    fig.update_layout(
        title=f"Pronóstico y Ajuste de Horas Extra para {dept} con Modelo {best_model_name}",
        xaxis_title="Fecha",
        yaxis_title="Horas Extra",
        legend=dict(
            orientation="h",
            yanchor="bottom",
            y=1.02,
            xanchor="center",
            x=0.5
        ),
        template="plotly_white"
    )
    fig.show()

import pandas as pd
import logging

logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')

# Asegurarse de que dept_data tenga el índice correcto
for dept in all_forecasts:
    # Verificar las columnas disponibles en df_all_data
    logging.info(f"Columnas disponibles en df_all_data para {dept}: {df_all_data.columns.tolist()}")
    
    # Ajustar el nombre de la columna de horas extra
    overtime_column = 'total_overtime' if 'total_overtime' in df_all_data.columns else 'y' if 'y' in df_all_data.columns else None
    if overtime_column is None:
        logging.error(f"No se encontró una columna de horas extra (e.g., 'total_overtime' o 'y') en df_all_data para {dept}")
        raise KeyError(f"No se encontró columna de horas extra en df_all_data para {dept}")
    
    dept_data = df_all_data[df_all_data['department'] == dept].set_index('ds')[overtime_column]
    future_forecast = all_forecasts[dept]['future']
    historical_pred = all_forecasts[dept]['historical_pred']
    
    # Verificar y ajustar future_forecast si es necesario
    if isinstance(future_forecast, pd.Series):
        future_forecast = pd.DataFrame({'ds': future_forecast.index, 'yhat': future_forecast.values})
        if 'yhat_lower' not in future_forecast.columns or 'yhat_upper' not in future_forecast.columns:
            future_forecast['yhat_lower'] = future_forecast['yhat'] * 0.9
            future_forecast['yhat_upper'] = future_forecast['yhat'] * 1.1
    
    # Asegurarse de que future_forecast contenga datos futuros
    if isinstance(future_forecast, pd.DataFrame) and 'ds' in future_forecast.columns:
        future_dates = pd.to_datetime(future_forecast['ds'])
        if future_dates.min() < dept_data.index[-1]:
            future_forecast = future_forecast[future_forecast['ds'] > dept_data.index[-1]].reset_index(drop=True)
            logging.info(f"Ajustado future_forecast para {dept} a partir de {dept_data.index[-1]}")
        elif future_dates.min() > dept_data.index[-1] + pd.Timedelta(days=1):
            logging.warning(f"future_forecast para {dept} comienza en {future_dates.min()} cuando debería ser después de {dept_data.index[-1]}")
    
    plot_forecast_combined(dept_data, historical_pred, future_forecast, dept, all_metrics[dept].iloc[0]['Modelo'])

2025-08-09 15:49:38,949 - INFO - Columnas disponibles en df_all_data para Finance: ['department', 'ds', 'y']
2025-08-09 15:49:38,974 - INFO - Rango de fechas en future_data para Finance: 2025-05-11 00:00:00 a 2025-10-12 00:00:00


2025-08-09 15:49:44,221 - INFO - Columnas disponibles en df_all_data para HR: ['department', 'ds', 'y']
2025-08-09 15:49:44,237 - INFO - Rango de fechas en future_data para HR: 2025-05-11 00:00:00 a 2025-10-12 00:00:00


2025-08-09 15:49:44,518 - INFO - Columnas disponibles en df_all_data para IT: ['department', 'ds', 'y']
2025-08-09 15:49:44,533 - INFO - Rango de fechas en future_data para IT: 2025-05-11 00:00:00 a 2025-10-12 00:00:00


2025-08-09 15:49:44,656 - INFO - Columnas disponibles en df_all_data para Inventory: ['department', 'ds', 'y']
2025-08-09 15:49:44,670 - INFO - Rango de fechas en future_data para Inventory: 2025-05-11 00:00:00 a 2025-10-12 00:00:00


2025-08-09 15:49:44,859 - INFO - Columnas disponibles en df_all_data para Marketing: ['department', 'ds', 'y']
2025-08-09 15:49:44,875 - INFO - Rango de fechas en future_data para Marketing: 2025-05-11 00:00:00 a 2025-10-12 00:00:00


2025-08-09 15:49:44,981 - INFO - Columnas disponibles en df_all_data para Sales: ['department', 'ds', 'y']
2025-08-09 15:49:45,005 - INFO - Rango de fechas en future_data para Sales: 2025-05-11 00:00:00 a 2025-10-12 00:00:00


## 6. Guardar Resultados

Guardamos los modelos, predicciones y métricas en SQL Server. La tabla Overtime_Predictions incluye datos históricos y predicciones con una columna data_type para distinguir entre "Histórico" y "Forecast".


In [6]:

def calculate_metrics(dept_data, historical_pred, model_type, train_test_split=True, test_size=0.2):

    try:
        department = dept_data.name if dept_data.name else 'Unknown'
        # Ensure no NaN in input data
        dept_data_clean = dept_data.fillna(dept_data.median())
        historical_pred_clean = historical_pred['yhat'].reindex(dept_data_clean.index).fillna(dept_data_clean.median()).clip(lower=0)

        # Split data into train and test
        if train_test_split:
            train_size = int(len(dept_data_clean) * (1 - test_size))
            actual = dept_data_clean.iloc[train_size:]
            predictions = historical_pred_clean.iloc[train_size:]
        else:
            actual = dept_data_clean
            predictions = historical_pred_clean

        # Ensure indices align
        common_index = actual.index.intersection(predictions.index)
        if len(common_index) == 0:
            logging.error(f"No hay índices comunes para {department}. Índice real: {actual.index[:3]}..., Índice predicciones: {predictions.index[:3]}...")
            raise ValueError(f"No hay índices comunes para {department}")
        actual = actual.loc[common_index]
        predictions = predictions.loc[common_index]

        # Log data statistics
        logging.info(f"{department} - Actual data: len={len(actual)}, mean={actual.mean():.2f}, std={actual.std():.2f}")
        logging.info(f"{department} - Predictions: len={len(predictions)}, mean={predictions.mean():.2f}, std={predictions.std():.2f}")

        # Calculate metrics
        rmse = np.sqrt(mean_squared_error(actual, predictions))
        mae = mean_absolute_error(actual, predictions)
        smape = np.mean(2 * np.abs(predictions - actual) / (np.abs(predictions) + np.abs(actual) + 1e-10)) * 100
        mape = mean_absolute_percentage_error(actual, predictions) * 100
        naive_forecast = actual.shift(1).fillna(actual.mean())
        mase = mae / mean_absolute_error(actual[1:], naive_forecast[1:]) if naive_forecast.std() > 0 else float('inf')

        # Calculate department-specific thresholds based on mean overtime
        mean_overtime = dept_data_clean.mean()
        mae_threshold_good = 0.05 * mean_overtime  # 5% of mean
        mae_threshold_acceptable = 0.10 * mean_overtime  # 10% of mean
        rmse_threshold_good = 0.10 * mean_overtime  # 10% of mean
        rmse_threshold_acceptable = 0.20 * mean_overtime  # 20% of mean

        # Determine model quality
        if (mae < mae_threshold_good and smape < 10 and mape < 10 and mase < 0.8 and rmse < rmse_threshold_good):
            quality = "Bueno"
        elif (mae < mae_threshold_acceptable and smape < 20 and mape < 20 and mase < 1.2 and rmse < rmse_threshold_acceptable):
            quality = "Aceptable"
        else:
            quality = "Pobre"

        # Ensure finite values for SQL
        rmse = rmse if np.isfinite(rmse) else None
        mae = mae if np.isfinite(mae) else None
        smape = smape if np.isfinite(smape) else None
        mape = mape if np.isfinite(mape) else None
        mase = mase if np.isfinite(mase) else None

        return {
            'rmse': rmse,
            'mae': mae,
            'smape': smape,
            'mape': mape,
            'mase': mase,
            'quality': quality
        }
    except Exception as e:
        logging.error(f"Error calculando métricas para {department} ({model_type}): {e}")
        return {
            'rmse': None,
            'mae': None,
            'smape': None,
            'mape': None,
            'mase': None,
            'quality': 'Pobre'
        }

def save_predictions():

    os.makedirs('Modelos Entrenados', exist_ok=True)
    timestamp = datetime.now()
    logging.info(f"Timestamp establecido: {timestamp}")
    predictions_summary = []
    metrics_summary = []
    insert_errors = []

    # Save models and collect predictions and metrics
    for dept in all_forecasts:
        try:
            # Get best model and metrics
            best_model = None  # Will be retrieved if saving is needed
            best_model_name = all_metrics[dept].iloc[0]['Modelo']
            model_path = f'Modelos Entrenados/overtime_forecast_model_{dept}_{best_model_name}.pkl'

            # Retrieve the best model for saving (re-train to ensure consistency)
            dept_data = df_all_data[df_all_data['department'] == dept].set_index('ds')['y']
            decomposition_type, consider_arima = select_decomposition_type(dept_data)
            series_filled_final = dept_data.fillna(dept_data.mean()).interpolate(method='linear')
            if best_model_name == 'ARIMA' and consider_arima:
                best_model = auto_arima(series_filled_final, seasonal=False, stepwise=True, suppress_warnings=True, error_action='ignore')
                joblib.dump(best_model, model_path)
                logging.info(f"Modelo ARIMA guardado para {dept} en: {model_path}")
            elif best_model_name == 'Exponential Smoothing':
                best_model = ExponentialSmoothing(series_filled_final, trend='add', seasonal=decomposition_type, seasonal_periods=12).fit()
                joblib.dump(best_model, model_path)
                logging.info(f"Modelo Exponential Smoothing guardado para {dept} en: {model_path}")
            elif best_model_name == 'Prophet':
                logging.warning(f"Prophet no se guarda en disco para {dept} debido a problemas de serialización.")
                # Prophet model is not saved due to serialization issues; re-trained if needed later

            # Calculate metrics using historical predictions
            historical_pred = all_forecasts[dept]['historical_pred']
            metrics = calculate_metrics(dept_data, historical_pred, best_model_name)

            # Collect metrics for summary
            metrics_summary.append({
                'timestamp': timestamp,
                'department': dept,
                'rmse': metrics['rmse'],
                'mae': metrics['mae'],
                'smape': metrics['smape'],
                'mape': metrics['mape'],
                'mase': metrics['mase'],
                'model_quality': metrics['quality'],
                'model_type': best_model_name
            })

            # Collect future predictions with robust index handling
            future_forecast = all_forecasts[dept]['future']
            n_periods_forecast = len(future_forecast)
            last_date = dept_data.index[-1]

            # Define expected_index before index handling
            expected_index = pd.date_range(start=last_date + pd.Timedelta(days=1), periods=n_periods_forecast, freq='2W')

            # Ensure future_forecast index is datetime
            if not pd.api.types.is_datetime64_any_dtype(future_forecast.index):
                logging.warning(f"Índice de future_forecast para {dept} no es datetime. Reconstruyendo índice.")
                future_forecast = future_forecast.copy()  # Avoid modifying original
                future_forecast.index = expected_index
            else:
                # Verify index alignment
                if not all(future_forecast.index == expected_index):
                    logging.warning(f"Índice de future_forecast para {dept} no está alineado. Reconstruyendo índice.")
                    future_forecast = future_forecast.copy()
                    future_forecast.index = expected_index

            # Log index details
            logging.info(f"Índice de future_forecast para {dept}: {future_forecast.index[:3]}... (len={len(future_forecast)})")
            logging.info(f"Primeras fechas esperadas: {expected_index[:3]}...")

            for _, row in future_forecast.iterrows():
                predicted_value = row['yhat'] if pd.notna(row['yhat']) else None
                confidence_lower = row['yhat_lower'] if pd.notna(row['yhat_lower']) else None
                confidence_upper = row['yhat_upper'] if pd.notna(row['yhat_upper']) else None
                try:
                    prediction_date = row.name.date() if pd.notna(row.name) and hasattr(row.name, 'date') else None
                    if prediction_date is None:
                        raise ValueError(f"Fecha inválida para {dept}, índice: {row.name}")
                    predictions_summary.append({
                        'timestamp': timestamp,
                        'department': dept,
                        'prediction_date': prediction_date,
                        'predicted_value': predicted_value,
                        'confidence_lower': confidence_lower,
                        'confidence_upper': confidence_upper
                    })
                except Exception as e:
                    logging.error(f"Error procesando fecha de predicción para {dept}: {e}")
                    # Fallback: Assign date from expected_index
                    row_index = future_forecast.index.get_loc(row.name)
                    prediction_date = expected_index[row_index].date()
                    logging.info(f"Asignando fecha de respaldo para {dept}: {prediction_date}")
                    predictions_summary.append({
                        'timestamp': timestamp,
                        'department': dept,
                        'prediction_date': prediction_date,
                        'predicted_value': predicted_value,
                        'confidence_lower': confidence_lower,
                        'confidence_upper': confidence_upper
                    })

        except Exception as e:
            logging.error(f"Error procesando datos para {dept}: {e}")
            insert_errors.append(dept)

    # Create DataFrames
    predictions_summary_df = pd.DataFrame(predictions_summary)
    metrics_summary_df = pd.DataFrame(metrics_summary)

    # Add 'id' column (will be auto-generated by SQL Server)
    if not predictions_summary_df.empty:
        predictions_summary_df.insert(0, 'id', range(1, len(predictions_summary_df) + 1))
    if not metrics_summary_df.empty:
        metrics_summary_df.insert(0, 'id', range(1, len(metrics_summary_df) + 1))

    # Save to database
    conn = get_db_connection()
    cursor = conn.cursor()

    try:
        logging.info("Conexión a la base de datos exitosa")
        # Create tables if they don’t exist
        cursor.execute("""
            IF NOT EXISTS (SELECT * FROM sys.tables WHERE name = 'Overtime_Predictions')
            CREATE TABLE Overtime_Predictions (
                id INT IDENTITY(1,1) PRIMARY KEY,
                timestamp DATETIME,
                department VARCHAR(100),
                prediction_date DATE,
                predicted_value FLOAT,
                confidence_lower FLOAT,
                confidence_upper FLOAT
            )
        """)

        # Create or alter ML_Model_Metrics_Overtime_Predictions table to include mape and model_type
        cursor.execute("""
            IF NOT EXISTS (SELECT * FROM sys.tables WHERE name = 'ML_Model_Metrics_Overtime_Predictions')
            BEGIN
                CREATE TABLE ML_Model_Metrics_Overtime_Predictions (
                    id INT IDENTITY(1,1) PRIMARY KEY,
                    timestamp DATETIME,
                    department VARCHAR(100),
                    rmse FLOAT,
                    mae FLOAT,
                    smape FLOAT,
                    mape FLOAT,
                    mase FLOAT,
                    model_quality VARCHAR(50),
                    model_type VARCHAR(50)
                )
            END
            ELSE
            BEGIN
                IF NOT EXISTS (SELECT * FROM sys.columns 
                               WHERE object_id = OBJECT_ID('ML_Model_Metrics_Overtime_Predictions') 
                               AND name = 'mape')
                BEGIN
                    ALTER TABLE ML_Model_Metrics_Overtime_Predictions
                    ADD mape FLOAT
                END
                IF NOT EXISTS (SELECT * FROM sys.columns 
                               WHERE object_id = OBJECT_ID('ML_Model_Metrics_Overtime_Predictions') 
                               AND name = 'model_type')
                BEGIN
                    ALTER TABLE ML_Model_Metrics_Overtime_Predictions
                    ADD model_type VARCHAR(50)
                END
            END
        """)

        # Insert predictions
        if not predictions_summary_df.empty:
            for _, row in predictions_summary_df.iterrows():
                cursor.execute("""
                    INSERT INTO Overtime_Predictions 
                    (timestamp, department, prediction_date, predicted_value, confidence_lower, confidence_upper)
                    VALUES (%s, %s, %s, %s, %s, %s)
                """, (
                    row['timestamp'],
                    row['department'],
                    row['prediction_date'],
                    row['predicted_value'],
                    row['confidence_lower'],
                    row['confidence_upper']
                ))
        else:
            logging.warning("predictions_summary_df está vacío, no se insertaron predicciones.")

        # Insert metrics
        if not metrics_summary_df.empty:
            for _, row in metrics_summary_df.iterrows():
                cursor.execute("""
                    INSERT INTO ML_Model_Metrics_Overtime_Predictions 
                    (timestamp, department, rmse, mae, smape, mape, mase, model_quality, model_type)
                    VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s)
                """, (
                    row['timestamp'],
                    row['department'],
                    row['rmse'],
                    row['mae'],
                    row['smape'],
                    row['mape'],
                    row['mase'],
                    row['model_quality'],
                    row['model_type']
                ))
        else:
            logging.warning("metrics_summary_df está vacío, no se insertaron métricas.")

        if insert_errors and len(insert_errors) == len(all_forecasts):
            raise ValueError("No se pudieron insertar datos para ningún departamento")

        conn.commit()
        logging.info("Predicciones y métricas guardadas exitosamente")

        # Display summary
        print("\n=== Resumen de Modelos y Predicciones ===")
        print("\nModelos y Métricas por Departamento:")
        display(metrics_summary_df.drop(columns=['id', 'timestamp']))
        print("\nValores Predichos con Intervalos de Confianza:")
        display(predictions_summary_df.drop(columns=['id', 'timestamp']))

        return predictions_summary_df, metrics_summary_df

    except Exception as e:
        conn.rollback()
        logging.error(f"Error al guardar los datos: {e}")
        raise
    finally:
        conn.close()

def verify_tables():

    conn = get_db_connection()
    cursor = conn.cursor()
    try:
        cursor.execute("SELECT COUNT(*) FROM Overtime_Predictions")
        pred_count = cursor.fetchone()[0]
        
        cursor.execute("SELECT COUNT(*) FROM ML_Model_Metrics_Overtime_Predictions")
        metrics_count = cursor.fetchone()[0]
        
        logging.info(f"Registros en Overtime_Predictions: {pred_count}")
        logging.info(f"Registros en ML_Model_Metrics_Overtime_Predictions: {metrics_count}")
        print(f"\nRegistros en Overtime_Predictions: {pred_count}")
        print(f"Registros en ML_Model_Metrics: {metrics_count}")
    except Exception as e:
        logging.error(f"Error verificando tablas: {e}")
        print(f"Error verificando tablas: {e}")
    finally:
        conn.close()

# Execute saving and display summary
try:
    predictions_summary_df, metrics_summary_df = save_predictions()
    print("\nDatos guardados exitosamente en la base de datos.")
except Exception as e:
    print(f"\nError al guardar los datos: {str(e)}")

# Verify tables
verify_tables()


2025-08-09 15:49:45,419 - INFO - Timestamp establecido: 2025-08-09 15:49:45.419611
2025-08-09 15:49:45,463 - INFO - Criterio de selección: La desviación estándar de los residuos multiplicative es menor.


Desviación estándar de los residuales aditivos: 15.56
Desviación estándar de los residuales multiplicativos: 0.05
P-value del test ADF: 0.00


2025-08-09 15:49:46,998 - INFO - Modelo Exponential Smoothing guardado para Finance en: Modelos Entrenados/overtime_forecast_model_Finance_Exponential Smoothing.pkl
2025-08-09 15:49:47,010 - INFO - y - Actual data: len=8, mean=296.86, std=23.75
2025-08-09 15:49:47,013 - INFO - y - Predictions: len=8, mean=300.43, std=22.64
2025-08-09 15:49:47,075 - INFO - Índice de future_forecast para Finance: DatetimeIndex(['2025-05-11', '2025-05-25', '2025-06-08'], dtype='datetime64[ns]', freq='2W-SUN')... (len=12)
2025-08-09 15:49:47,078 - INFO - Primeras fechas esperadas: DatetimeIndex(['2025-05-11', '2025-05-25', '2025-06-08'], dtype='datetime64[ns]', freq='2W-SUN')...
2025-08-09 15:49:47,178 - INFO - Criterio de selección: La desviación estándar de los residuos multiplicative es menor.
2025-08-09 15:49:47,192 - INFO - y - Actual data: len=8, mean=377.43, std=33.57
2025-08-09 15:49:47,198 - INFO - y - Predictions: len=8, mean=384.31, std=26.30
2025-08-09 15:49:47,230 - INFO - Índice de future_for

Desviación estándar de los residuales aditivos: 24.99
Desviación estándar de los residuales multiplicativos: 0.06
P-value del test ADF: 0.00


2025-08-09 15:49:47,527 - INFO - Criterio de selección: La desviación estándar de los residuos multiplicative es menor.


Desviación estándar de los residuales aditivos: 15.15
Desviación estándar de los residuales multiplicativos: 0.04
P-value del test ADF: 0.00


2025-08-09 15:49:48,520 - INFO - Modelo Exponential Smoothing guardado para IT en: Modelos Entrenados/overtime_forecast_model_IT_Exponential Smoothing.pkl
2025-08-09 15:49:48,527 - INFO - y - Actual data: len=8, mean=419.58, std=33.43
2025-08-09 15:49:48,530 - INFO - y - Predictions: len=8, mean=423.09, std=21.38
2025-08-09 15:49:48,560 - INFO - Índice de future_forecast para IT: DatetimeIndex(['2025-05-11', '2025-05-25', '2025-06-08'], dtype='datetime64[ns]', freq='2W-SUN')... (len=12)
2025-08-09 15:49:48,563 - INFO - Primeras fechas esperadas: DatetimeIndex(['2025-05-11', '2025-05-25', '2025-06-08'], dtype='datetime64[ns]', freq='2W-SUN')...
2025-08-09 15:49:48,643 - INFO - Criterio de selección: La desviación estándar de los residuos multiplicative es menor.


Desviación estándar de los residuales aditivos: 17.28
Desviación estándar de los residuales multiplicativos: 0.03
P-value del test ADF: 0.00


2025-08-09 15:49:57,928 - INFO - Modelo ARIMA guardado para Inventory en: Modelos Entrenados/overtime_forecast_model_Inventory_ARIMA.pkl
2025-08-09 15:49:57,943 - INFO - y - Actual data: len=8, mean=523.25, std=34.70
2025-08-09 15:49:57,946 - INFO - y - Predictions: len=8, mean=528.65, std=25.76
2025-08-09 15:49:57,972 - INFO - Índice de future_forecast para Inventory: DatetimeIndex(['2025-05-11', '2025-05-25', '2025-06-08'], dtype='datetime64[ns]', freq='2W-SUN')... (len=12)
2025-08-09 15:49:57,978 - INFO - Primeras fechas esperadas: DatetimeIndex(['2025-05-11', '2025-05-25', '2025-06-08'], dtype='datetime64[ns]', freq='2W-SUN')...
2025-08-09 15:49:58,020 - INFO - Criterio de selección: La desviación estándar de los residuos multiplicative es menor.


Desviación estándar de los residuales aditivos: 31.14
Desviación estándar de los residuales multiplicativos: 0.07
P-value del test ADF: 0.00


2025-08-09 15:49:58,708 - INFO - Modelo Exponential Smoothing guardado para Marketing en: Modelos Entrenados/overtime_forecast_model_Marketing_Exponential Smoothing.pkl
2025-08-09 15:49:58,716 - INFO - y - Actual data: len=8, mean=465.68, std=54.50
2025-08-09 15:49:58,718 - INFO - y - Predictions: len=8, mean=476.63, std=32.50
2025-08-09 15:49:58,749 - INFO - Índice de future_forecast para Marketing: DatetimeIndex(['2025-05-11', '2025-05-25', '2025-06-08'], dtype='datetime64[ns]', freq='2W-SUN')... (len=12)
2025-08-09 15:49:58,758 - INFO - Primeras fechas esperadas: DatetimeIndex(['2025-05-11', '2025-05-25', '2025-06-08'], dtype='datetime64[ns]', freq='2W-SUN')...
2025-08-09 15:49:58,865 - INFO - Criterio de selección: La desviación estándar de los residuos multiplicative es menor.


Desviación estándar de los residuales aditivos: 23.60
Desviación estándar de los residuales multiplicativos: 0.05
P-value del test ADF: 0.00


2025-08-09 15:49:59,651 - INFO - Modelo Exponential Smoothing guardado para Sales en: Modelos Entrenados/overtime_forecast_model_Sales_Exponential Smoothing.pkl
2025-08-09 15:49:59,661 - INFO - y - Actual data: len=8, mean=469.63, std=24.31
2025-08-09 15:49:59,665 - INFO - y - Predictions: len=8, mean=477.12, std=13.33
2025-08-09 15:49:59,718 - INFO - Índice de future_forecast para Sales: DatetimeIndex(['2025-05-11', '2025-05-25', '2025-06-08'], dtype='datetime64[ns]', freq='2W-SUN')... (len=12)
2025-08-09 15:49:59,726 - INFO - Primeras fechas esperadas: DatetimeIndex(['2025-05-11', '2025-05-25', '2025-06-08'], dtype='datetime64[ns]', freq='2W-SUN')...
2025-08-09 15:49:59,856 - INFO - Conexión a la base de datos exitosa
2025-08-09 15:49:59,858 - INFO - Conexión a la base de datos exitosa
2025-08-09 15:50:02,010 - INFO - Predicciones y métricas guardadas exitosamente



=== Resumen de Modelos y Predicciones ===

Modelos y Métricas por Departamento:


Unnamed: 0,department,rmse,mae,smape,mape,mase,model_quality,model_type
0,Finance,8.555561,6.741974,2.208585,2.233221,0.4269,Bueno,Exponential Smoothing
1,HR,22.696869,20.222837,5.351875,5.444007,0.835704,Aceptable,Prophet
2,IT,19.043617,15.957146,3.857117,3.893526,0.477167,Bueno,Exponential Smoothing
3,Inventory,26.791514,16.408519,3.150084,3.253257,0.407361,Bueno,ARIMA
4,Marketing,27.1156,19.351712,4.480681,4.687551,0.400018,Bueno,Exponential Smoothing
5,Sales,21.023305,19.505241,4.119082,4.158221,0.60535,Bueno,Exponential Smoothing



Valores Predichos con Intervalos de Confianza:


Unnamed: 0,department,prediction_date,predicted_value,confidence_lower,confidence_upper
0,Finance,2025-05-11,326.246547,293.621892,358.871202
1,Finance,2025-05-25,305.026651,274.523986,335.529317
2,Finance,2025-06-08,309.356433,278.420789,340.292076
3,Finance,2025-06-22,314.942507,283.448257,346.436758
4,Finance,2025-07-06,306.513712,275.862341,337.165083
...,...,...,...,...,...
67,Sales,2025-08-17,478.132831,430.319548,525.946114
68,Sales,2025-08-31,488.583616,439.725254,537.441977
69,Sales,2025-09-14,483.929179,435.536261,532.322097
70,Sales,2025-09-28,485.997606,437.397845,534.597366



Datos guardados exitosamente en la base de datos.


2025-08-09 15:50:02,859 - INFO - Conexión a la base de datos exitosa
2025-08-09 15:50:02,957 - INFO - Registros en Overtime_Predictions: 72
2025-08-09 15:50:02,959 - INFO - Registros en ML_Model_Metrics_Overtime_Predictions: 30



Registros en Overtime_Predictions: 72
Registros en ML_Model_Metrics: 30
