# HR Overtime Prediction and Forecasting

# Predicción y Pronóstico de Horas Extra de RRHH

Este notebook implementa el pronóstico de horas extra utilizando Prophet y visualiza datos históricos, predicciones e intervalos de confianza usando Plotly. Los componentes principales incluyen:
- Carga de datos históricos de horas extra desde SQL Server
- Entrenamiento de modelos SARIMA para cada departamento
- Generación de predicciones con intervalos de confianza 
- Visualización interactiva con Plotly
- Almacenamiento de modelos y predicciones en MS SQL SERVER

Objetivo: Predecir las horas extras acumuladas por departamento semana a semana para las próximas 4 semanas.

## 1. Importar Librerías and Setup


In [14]:
import pandas as pd
import numpy as np
import pymssql
import logging
import datetime
import warnings
import os

# Data visualization
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots

# Modelos y Tests
from statsmodels.tsa.statespace.sarimax import SARIMAX
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
from statsmodels.tsa.seasonal import seasonal_decompose
from statsmodels.tsa.stattools import adfuller
from statsmodels.stats.diagnostic import acorr_ljungbox
from pmdarima import auto_arima

# Métricas
from sklearn.metrics import mean_squared_error, mean_absolute_error
from sklearn.model_selection import train_test_split

# Persistencia de modelo
import joblib

# Configure warnings and logging
warnings.filterwarnings("ignore")
logging.basicConfig(level=logging.INFO)

## 2. Conectar a la base de datos

Setup de SQL Server

In [15]:
# SQL Server Setup
SQL_SERVER = "172.28.192.1:50121"
SQL_DB = "HR_Analytics"
SQL_USER = "sa"
SQL_PASSWORD = "123456"

# Conectar a SQL Server
def get_db_connection():
    server_name = SQL_SERVER
    try:
        conn = pymssql.connect(
            server=server_name,
            database=SQL_DB,
            user=SQL_USER,
            password=SQL_PASSWORD
        )
        return conn
    except Exception as e:
        logging.error(f"Error de conexión a la base de datos: {e}")
        raise

## 3. Cargar y procesar datos históricos de horas extras

Consulta y preparación

In [16]:
def load_historical_data():
    conn = get_db_connection()
    query = """
    SELECT k.work_date, w.department, SUM(k.overtime_hours) as total_overtime
    FROM Kronos_TimeEntries k
    JOIN Workday_Employees w ON k.employee_id = w.employee_id
    GROUP BY k.work_date, w.department
    """
    
    try:
        df = pd.read_sql(query, conn)
        df['work_date'] = pd.to_datetime(df['work_date']).dt.strftime('%Y-%m-%d')
        df['work_date'] = pd.to_datetime(df['work_date'])
        
        # Aggregate by week
        df = df.groupby([pd.Grouper(key='work_date', freq='W'), 'department'])['total_overtime'].sum().reset_index()
        
        logging.info(f"Loaded data: {len(df)} records")
        return df
    
    finally:
        conn.close()

# Load and display data
historical_data = load_historical_data()
print("Muestra de datos históricos:")
display(historical_data.head(10))

INFO:root:Loaded data: 324 records


Muestra de datos históricos:


Unnamed: 0,work_date,department,total_overtime
0,2024-05-19,Finance,0.0
1,2024-05-19,HR,0.0
2,2024-05-19,IT,1.29
3,2024-05-19,Inventory,0.0
4,2024-05-19,Marketing,3.04
5,2024-05-19,Sales,2.44
6,2024-05-26,Finance,1.69
7,2024-05-26,HR,8.84
8,2024-05-26,IT,3.85
9,2024-05-26,Inventory,7.03


### 3.1. Evaluación de Estacionaridad (Dickey-Fuller Test)

In [17]:
def Prueba_Dickey_Fuller(series, department, column_name):
    """
    Realiza la prueba de Dickey-Fuller para analizar estacionalidad
    """
    print(f'\nAnálisis de Estacionalidad para {department}')
    print(f'Resultados de la prueba de Dickey-Fuller para columna: {column_name}')
    
    # Realizar la prueba de Dickey-Fuller
    dftest = adfuller(series, autolag='AIC')
    
    # Crear una serie de pandas con los resultados principales
    dfoutput = pd.Series(dftest[0:4], index=['Test Statistic', 'p-value', 'No Lags Used', 
                                            'Número de observaciones utilizadas'])
    
    # Agregar los valores críticos al resultado
    for key, value in dftest[4].items():
        dfoutput[f'Critical Value ({key})'] = value
    
    # Mostrar los resultados
    print(dfoutput)
    
    # Interpretar los resultados
    if dftest[1] <= 0.05:
        conclusion = "Los datos son estacionarios, no requieren diferenciación"
        decision = "Rechazar la hipótesis nula"
    else:
        conclusion = "Los datos no son estacionarios, requieren diferenciación"
        decision = "No se puede rechazar la hipótesis nula"
    
    print("\nConclusión:====>")
    print(decision)
    print(conclusion)
    
    return {
        'department': department,
        'test_statistic': dftest[0],
        'p_value': dftest[1],
        'is_seasonal': dftest[1] <= 0.05
    }

# Realizar análisis de estacionaridad para cada departamento
seasonality_results = []
for department in historical_data['department'].unique():
    dept_data = historical_data[historical_data['department'] == department]
    
    if len(dept_data) < 10:
        print(f"\nAdvertencia: Datos insuficientes para {department}")
        continue
        
    result = Prueba_Dickey_Fuller(
        dept_data["total_overtime"],
        department,
        "total_overtime"
    )
    seasonality_results.append(result)

# Crear DataFrame con resultados
seasonality_df = pd.DataFrame(seasonality_results)
print("\nResumen de Estacionalidad por Departamento:")
display(seasonality_df)

# Visualizar distribución de horas extra por departamento
fig = go.Figure()
for department in historical_data['department'].unique():
    dept_data = historical_data[historical_data['department'] == department]
    fig.add_trace(
        go.Scatter(
            x=dept_data['work_date'],
            y=dept_data['total_overtime'],
            name=department,
            mode='lines+markers'
        )
    )

fig.update_layout(
    title='Distribución de Horas Extra por Departamento',
    xaxis_title='Fecha',
    yaxis_title='Horas Extra',
    template='plotly_white',
    showlegend=True
)
fig.show()


Análisis de Estacionalidad para Finance
Resultados de la prueba de Dickey-Fuller para columna: total_overtime
Test Statistic                       -5.967518e+00
p-value                               1.971607e-07
No Lags Used                          0.000000e+00
Número de observaciones utilizadas    5.300000e+01
Critical Value (1%)                  -3.560242e+00
Critical Value (5%)                  -2.917850e+00
Critical Value (10%)                 -2.596796e+00
dtype: float64

Conclusión:====>
Rechazar la hipótesis nula
Los datos son estacionarios, no requieren diferenciación

Análisis de Estacionalidad para HR
Resultados de la prueba de Dickey-Fuller para columna: total_overtime
Test Statistic                        -4.663216
p-value                                0.000099
No Lags Used                           2.000000
Número de observaciones utilizadas    51.000000
Critical Value (1%)                   -3.565624
Critical Value (5%)                   -2.920142
Critical Value (10%) 

Unnamed: 0,department,test_statistic,p_value,is_seasonal
0,Finance,-5.967518,1.971607e-07,True
1,HR,-4.663216,9.861228e-05,True
2,IT,-6.286076,3.695386e-08,True
3,Inventory,-5.828026,4.036166e-07,True
4,Marketing,-5.916445,2.566149e-07,True
5,Sales,-4.766825,6.291614e-05,True


### 3.2. Evaluar Estacionalidad

## 4. Entrenar Modelo SARIMA

Crear y entrenar modelo SARIMA para cada departamento.

In [22]:


def calculate_smape(y_true, y_pred):
    """Calcula SMAPE para manejar valores cero."""
    denominator = (np.abs(y_true) + np.abs(y_pred)) / 2
    non_zero_mask = denominator != 0
    if np.any(non_zero_mask):
        smape = np.mean(np.abs(y_true[non_zero_mask] - y_pred[non_zero_mask]) / 
                        denominator[non_zero_mask]) * 100
        return min(smape, 100)
    return np.mean(np.abs(y_true - y_pred))

def calculate_mase(y_true, y_pred, train_data, period=1):
    """Calcula MASE comparando con un modelo ingenuo (shifted)."""
    naive_forecast = train_data.shift(period).dropna()
    naive_error = np.abs(naive_forecast - train_data[period:])
    mean_naive_error = np.mean(naive_error) if len(naive_error) > 0 else np.inf
    mae = np.mean(np.abs(y_true - y_pred))
    return mae / mean_naive_error if mean_naive_error != 0 else np.inf

def evaluate_model_quality(metrics, residuals, test_size):
    """
    Evalúa la calidad del modelo según métricas y residuos.
    Retorna una clasificación: 'Bueno', 'Aceptable' o 'Pobre'.
    """
    rmse = metrics['rmse']
    mae = metrics['mae']
    smape = metrics['smape']
    mase = metrics['mase']
    
    lb_test = acorr_ljungbox(residuals, lags=[min(4, len(residuals)-1)], return_df=True)
    lb_pvalue = lb_test['lb_pvalue'].iloc[0] if len(lb_test) > 0 else 0
    residual_mean = np.mean(residuals)
    
    is_good = (
        rmse < 5.0 and
        mae < 4.0 and
        smape < 60 and
        mase < 1.0 and
        lb_pvalue > 0.05 and
        abs(residual_mean) < 0.5
    )
    
    is_acceptable = (
        rmse < 7.0 and
        mae < 6.0 and
        smape < 85 and
        mase < 1.5 and
        lb_pvalue > 0.01 and
        abs(residual_mean) < 1.0
    )
    
    if is_good:
        return "Bueno"
    elif is_acceptable:
        return "Aceptable"
    else:
        return "Pobre"

def train_sarimax_model(dept_data, exog_cols=None):
    """
    Entrena modelo SARIMAX con Auto-ARIMA para datos de overtime.
    exog_cols: lista de columnas exógenas, si existen.
    """
    # Preparar datos 
    df = dept_data[['work_date', 'total_overtime'] + (exog_cols if exog_cols else [])].copy()
    df.set_index('work_date', inplace=True)
    
    # Verificar datos faltantes
    if df[exog_cols].isna().any().any() if exog_cols else False:
        logging.error(f"Datos faltantes en variables exógenas para {dept_data['department'].iloc[0]}")
        return None, {'error': 'Datos faltantes en variables exógenas'}
    
    # Split 80-20 manteniendo orden temporal
    train_size = int(len(df) * 0.8)
    train = df[:train_size]
    test = df[train_size:]
    
    if len(test) < 2:
        logging.error(f"Datos de prueba insuficientes para {dept_data['department'].iloc[0]}")
        return None, {'error': 'Datos de prueba insuficientes'}
    
    # Verificar varianza en los datos
    if train['total_overtime'].std() == 0:
        logging.error(f"Datos de entrenamiento sin varianza para {dept_data['department'].iloc[0]}")
        return None, {'error': 'Datos de entrenamiento sin varianza'}
    
    # Preparar variables exógenas
    exog_train = train[exog_cols] if exog_cols else None
    exog_test = test[exog_cols] if exog_cols else None
    
    # Encontrar mejores parámetros con Auto-ARIMA
    try:
        modelo_auto = auto_arima(
            train['total_overtime'],
            exogenous=exog_train,
            start_p=0, d=0, start_q=0,
            max_p=4, max_d=2, max_q=4, 
            start_P=0, D=1, start_Q=0,
            max_P=2, max_D=1, max_Q=2,
            m=13, seasonal=False,
            error_action='warn',
            trace=False,
            suppress_warnings=True,
            stepwise=True,
            random_state=20,
            n_fits=50
        )
        
        # Obtener orden del modelo
        order = modelo_auto.order
        seasonal_order = modelo_auto.seasonal_order
        
        # Entrenar modelo final SARIMAX
        final_model = SARIMAX(
            df['total_overtime'],
            exog=df[exog_cols] if exog_cols else None,
            order=order,
            seasonal_order=seasonal_order
        )
        model_fit = final_model.fit(disp=False)
        
        # Evaluar modelo en datos de prueba
        predictions = model_fit.predict(
            start=test.index[0], 
            end=test.index[-1],
            exog=exog_test
        )
        y_true = test['total_overtime'].values
        y_pred = predictions.values
        
        # Verificar si las predicciones son válidas
        if np.any(np.isnan(y_pred)):
            logging.error(f"Predicciones inválidas (NaN) para {dept_data['department'].iloc[0]}")
            return None, {'error': 'Predicciones inválidas (NaN)'}
        
        # Calcular métricas
        metrics = {
            'rmse': np.sqrt(mean_squared_error(y_true, y_pred)),
            'mae': mean_absolute_error(y_true, y_pred),
            'smape': calculate_smape(y_true, y_pred),
            'mase': calculate_mase(y_true, y_pred, train['total_overtime'], period=1),
            'order': order,
            'seasonal_order': seasonal_order,
            'train_size': len(train),
            'test_size': len(test)
        }
        
        # Calcular residuos en datos de prueba
        residuals = y_true - y_pred
        
        # Evaluar calidad del modelo
        metrics['quality'] = evaluate_model_quality(metrics, residuals, len(test))
        
        return model_fit, metrics
    
    except Exception as e:
        logging.error(f"Error entrenando modelo SARIMAX para {dept_data['department'].iloc[0]}: {str(e)}")
        return None, {'error': f"Excepción: {str(e)}"}

def add_exogenous_variables(df):
    """
    Agrega variables exógenas de ejemplo (is_weekend, month).
    Modificar según variables reales disponibles.
    """
    df = df.copy()
    df['is_weekend'] = df['work_date'].dt.dayofweek.isin([5, 6]).astype(int)
    df['month'] = df['work_date'].dt.month
    return df

# Reiniciar variables de almacenamiento
department_models = {}
metrics = {}
department_forecasts = {}

# Agregar variables exógenas al DataFrame
historical_data = add_exogenous_variables(historical_data)

# Definir columnas exógenas
exog_cols = ['is_weekend', 'month']  # Ajustar según variables reales

print("Iniciando entrenamiento de modelos SARIMAX...")

# Iterar sobre cada departamento
for department in historical_data['department'].unique():
    print(f"\nProcesando departamento: {department}")
    dept_data = historical_data[historical_data['department'] == department]
    
    if len(dept_data) < 10:
        print(f"Advertencia: Datos insuficientes para {department}")
        continue
    
    # Entrenar modelo SARIMAX
    model, model_metrics = train_sarimax_model(dept_data, exog_cols=exog_cols)
    
    if model is None:
        print(f"Error: No se pudo entrenar el modelo para {department}: {model_metrics.get('error', 'Error desconocido')}")
        continue
        
    # Almacenar resultados
    department_models[department] = model
    metrics[department] = model_metrics
    
    # Mostrar resultados
    print(f"\nResultados para {department}:")
    print(f"Orden SARIMA: {model_metrics['order']}")
    print(f"Orden Seasonal: {model_metrics['seasonal_order']}")
    print(f"RMSE: {model_metrics['rmse']:.2f}")
    print(f"MAE: {model_metrics['mae']:.2f}")
    print(f"SMAPE: {model_metrics['smape']:.2f}%")
    print(f"MASE: {model_metrics['mase']:.2f}")
    print(f"Calidad del modelo: {model_metrics['quality']}")

Iniciando entrenamiento de modelos SARIMAX...

Procesando departamento: Finance

Resultados para Finance:
Orden SARIMA: (4, 0, 0)
Orden Seasonal: (0, 0, 0, 0)
RMSE: 3.46
MAE: 2.71
SMAPE: 50.14%
MASE: 0.96
Calidad del modelo: Bueno

Procesando departamento: HR

Resultados para HR:
Orden SARIMA: (4, 0, 1)
Orden Seasonal: (0, 0, 0, 0)
RMSE: 3.99
MAE: 3.22
SMAPE: 55.81%
MASE: 0.63
Calidad del modelo: Bueno

Procesando departamento: IT

Resultados para IT:
Orden SARIMA: (0, 0, 1)
Orden Seasonal: (0, 0, 0, 0)
RMSE: 6.24
MAE: 4.56
SMAPE: 60.28%
MASE: 1.29
Calidad del modelo: Pobre

Procesando departamento: Inventory

Resultados para Inventory:
Orden SARIMA: (0, 0, 1)
Orden Seasonal: (0, 0, 0, 0)
RMSE: 4.34
MAE: 3.23
SMAPE: 59.39%
MASE: 0.91
Calidad del modelo: Bueno

Procesando departamento: Marketing

Resultados para Marketing:
Orden SARIMA: (0, 0, 0)
Orden Seasonal: (0, 0, 0, 0)
RMSE: 4.31
MAE: 3.06
SMAPE: 61.40%
MASE: 0.73
Calidad del modelo: Aceptable

Procesando departamento: Sales

Resu

## 5. Generar Predicciones

Predicción para las próximas 4 semanas de Overtime

In [23]:


def generate_predictions_sarimax(model, last_date, dept_data, exog_cols=None, periods=4, freq='W'):
  
    # Validar entradas
    if model is None:
        return pd.DataFrame(), f"Modelo no válido para el departamento"
    
    if not isinstance(last_date, pd.Timestamp):
        try:
            last_date = pd.to_datetime(last_date)
        except:
            return pd.DataFrame(), f"Fecha inválida: {last_date}"
    
    if periods < 1:
        return pd.DataFrame(), f"Número de períodos inválido: {periods}"
    
    # Generar fechas futuras
    future_dates = pd.date_range(start=last_date, periods=periods+1, freq=freq)[1:]
    
    # Preparar variables exógenas futuras
    if exog_cols:
        # Crear DataFrame para variables exógenas futuras
        future_exog = pd.DataFrame(index=future_dates)
        future_exog['is_weekend'] = future_exog.index.dayofweek.isin([5, 6]).astype(int)
        future_exog['month'] = future_exog.index.month
        
        # Verificar que todas las columnas exógenas estén presentes
        missing_cols = [col for col in exog_cols if col not in future_exog.columns]
        if missing_cols:
            return pd.DataFrame(), f"Columnas exógenas faltantes: {missing_cols}"
    else:
        future_exog = None
    
    # Obtener predicciones y sus intervalos de confianza
    try:
        forecast = model.get_forecast(steps=periods, exog=future_exog)
        mean_forecast = forecast.predicted_mean
        confidence_int = forecast.conf_int()
        
        # Truncar predicciones negativas a cero
        mean_forecast = np.maximum(mean_forecast, 0)
        confidence_int.iloc[:, 0] = np.maximum(confidence_int.iloc[:, 0], 0)  # yhat_lower
        confidence_int.iloc[:, 1] = np.maximum(confidence_int.iloc[:, 1], 0)  # yhat_upper
        
        # Crear DataFrame con las predicciones
        predictions_df = pd.DataFrame({
            'ds': future_dates,
            'yhat': mean_forecast,
            'yhat_lower': confidence_int.iloc[:, 0],
            'yhat_upper': confidence_int.iloc[:, 1]
        })
        
        return predictions_df, None
    
    except Exception as e:
        return pd.DataFrame(), f"Error generando predicciones: {str(e)}"

# Generar predicciones para cada departamento
department_forecasts = {}
print("\nGenerando predicciones para las próximas 4 semanas...")

for department, model in department_models.items():
    print(f"\nProcesando departamento: {department}")
    
    # Obtener datos históricos del departamento
    dept_data = historical_data[historical_data['department'] == department]
    if dept_data.empty:
        print(f"Error: No hay datos históricos para {department}")
        continue
    
    # Obtener última fecha de datos históricos
    last_date = dept_data['work_date'].max()
    
    # Generar predicciones
    forecast, error = generate_predictions_sarimax(
        model, 
        last_date, 
        dept_data, 
        exog_cols=['is_weekend', 'month'],  # Ajustar según variables exógenas
        periods=4,
        freq='W'
    )
    
    if error:
        print(f"Error generando predicciones para {department}: {error}")
        continue
    
    # Almacenar predicciones
    department_forecasts[department] = forecast
    
    # Mostrar predicciones
    print(f"\nPredicciones para {department}:")
    print("\nFecha\t\tPredicción\tIntervalo de Confianza")
    print("-" * 60)
    for _, row in forecast.iterrows():
        print(f"{row['ds'].strftime('%Y-%m-%d')}\t{row['yhat']:.2f}\t\t({row['yhat_lower']:.2f}, {row['yhat_upper']:.2f})")

print("\nPredicciones generadas para departamentos:", list(department_forecasts.keys()))


Generando predicciones para las próximas 4 semanas...

Procesando departamento: Finance

Predicciones para Finance:

Fecha		Predicción	Intervalo de Confianza
------------------------------------------------------------
2025-06-01	5.05		(0.00, 10.90)
2025-06-08	6.75		(0.81, 12.69)
2025-06-15	3.49		(0.00, 9.44)
2025-06-22	5.55		(0.00, 11.72)

Procesando departamento: HR

Predicciones para HR:

Fecha		Predicción	Intervalo de Confianza
------------------------------------------------------------
2025-06-01	8.76		(0.89, 16.63)
2025-06-08	8.27		(0.39, 16.16)
2025-06-15	7.12		(0.00, 15.10)
2025-06-22	4.19		(0.00, 12.17)

Procesando departamento: IT

Predicciones para IT:

Fecha		Predicción	Intervalo de Confianza
------------------------------------------------------------
2025-06-01	5.95		(0.00, 13.90)
2025-06-08	7.02		(0.00, 15.10)
2025-06-15	7.02		(0.00, 15.10)
2025-06-22	7.02		(0.00, 15.10)

Procesando departamento: Inventory

Predicciones para Inventory:

Fecha		Predicción	Intervalo de C

## 6. Visualización de Resultados

Datos históricos de 24 semanas más pronósticos e intervalos de confianza para las próximas 4 semanas.

In [26]:
def generate_predictions_sarima(model, last_date, periods=4):
    """
    Generate predictions using a SARIMA model
    
    Args:
        model: Fitted SARIMA model
        last_date: Last date in the training data
        periods: Number of periods to forecast (default=4)
    
    Returns:
        DataFrame with predictions and confidence intervals
    """
    # Generate forecast
    forecast = model.get_forecast(steps=periods)
    
    # Get predicted mean and confidence intervals
    mean = forecast.predicted_mean
    conf_int = forecast.conf_int()
    
    # Create dates for forecast period
    dates = pd.date_range(start=last_date + pd.Timedelta(days=7), 
                         periods=periods, 
                         freq='W-SUN')
    
    # Create forecast DataFrame
    forecast_df = pd.DataFrame({
        'ds': dates,
        'yhat': mean,
        'yhat_lower': conf_int.iloc[:, 0],
        'yhat_upper': conf_int.iloc[:, 1]
    })
    
    return forecast_df

def plot_forecast(department, historical_data, forecast):
    """Create interactive plot for a department's forecast"""
    
    # Filter historical data for department
    hist_dept = historical_data[historical_data['department'] == department]
    last_24_weeks = hist_dept.tail(24).copy()
    
    # Create figure
    fig = go.Figure()
    
    # Get predictions for historical and future data
    model = department_models[department]
    historical_predictions = model.get_prediction(start=last_24_weeks['work_date'].min())
    historical_mean = historical_predictions.predicted_mean.clip(lower=0)
    historical_ci = historical_predictions.conf_int().clip(lower=0)
    
    # Add historical data with hover
    fig.add_trace(
        go.Scatter(
            x=last_24_weeks['work_date'],
            y=last_24_weeks['total_overtime'],
            name='Datos Históricos',
            mode='markers+lines',
            line=dict(color='blue'),
            hovertemplate="<b>Fecha:</b> %{x}<br>" +
                         "<b>Valor Real:</b> %{y:.1f}<br>" +
                         "<b>Predicción:</b> %{customdata[0]:.1f}<br>" +
                         "<b>IC Inferior:</b> %{customdata[1]:.1f}<br>" +
                         "<b>IC Superior:</b> %{customdata[2]:.1f}<extra></extra>",
            customdata=np.column_stack((
                historical_mean,
                historical_ci.iloc[:, 0],
                historical_ci.iloc[:, 1]
            ))
        )
    )
    
    # Combine historical and future dates/predictions for continuous line
    all_dates = pd.concat([pd.Series(last_24_weeks['work_date']), forecast['ds']])
    all_predictions = pd.concat([pd.Series(historical_mean), forecast['yhat'].clip(lower=0)])
    all_ci_lower = pd.concat([pd.Series(historical_ci.iloc[:, 0]), forecast['yhat_lower'].clip(lower=0)])
    all_ci_upper = pd.concat([pd.Series(historical_ci.iloc[:, 1]), forecast['yhat_upper'].clip(lower=0)])
    
    # Add continuous prediction line
    fig.add_trace(
        go.Scatter(
            x=all_dates,
            y=all_predictions,
            name='Predicción',
            mode='lines',
            line=dict(color='red'),
            hovertemplate="<b>Fecha:</b> %{x}<br>" +
                         "<b>Predicción:</b> %{y:.1f}<br>" +
                         "<b>IC Inferior:</b> %{customdata[0]:.1f}<br>" +
                         "<b>IC Superior:</b> %{customdata[1]:.1f}<extra></extra>",
            customdata=np.column_stack((all_ci_lower, all_ci_upper))
        )
    )
    
    # Add continuous confidence interval with light gray color
    fig.add_trace(
        go.Scatter(
            x=all_dates.tolist() + all_dates.tolist()[::-1],
            y=all_ci_upper.tolist() + all_ci_lower.tolist()[::-1],
            fill='toself',
            fillcolor='rgba(211,211,211,0.3)',  # Light gray with 0.3 opacity
            line=dict(color='rgba(255,255,255,0)'),
            name='Intervalo de Confianza 95%',
            showlegend=True,
            hoverinfo='skip'
        )
    )
    
    # Update layout
    fig.update_layout(
        title=f'Pronóstico de Horas Extra - {department}',
        xaxis_title='Fecha',
        yaxis_title='Horas Extra',
        hovermode='x unified',
        showlegend=True,
        template='plotly_white',
        legend=dict(
            orientation="h",
            yanchor="bottom",
            y=-0.4,
            xanchor="center",
            x=0.5
        )
    )
    
    return fig

# Generar predicciones para cada departamento
print("\nGenerando visualizaciones...")

for department in department_models.keys():
    try:
        # Get last date and generate predictions if needed
        dept_data = historical_data[historical_data['department'] == department]
        last_date = dept_data['work_date'].max()
        
        if department not in department_forecasts:
            forecast = generate_predictions_sarima(department_models[department], last_date)
            department_forecasts[department] = forecast
        
        # Create and display plot
        fig = plot_forecast(
            department,
            historical_data,
            department_forecasts[department]
        )
        fig.show()
        print(f"Visualización generada para {department}")
        
    except Exception as e:
        print(f"Error generando visualización para {department}: {e}")
        continue

print("\nProceso de visualización completado.")


Generando visualizaciones...


Visualización generada para Finance


Visualización generada para HR


Visualización generada para IT


Visualización generada para Inventory


Visualización generada para Marketing


Visualización generada para Sales

Proceso de visualización completado.


## 7. Guardar Modelo y Predicciones

Almacenamiento del Modelo y Resultados en MS SQL SERVER

In [25]:
def save_predictions():

    # Crear directorio si no existe
    os.makedirs('Modelos Entrenados', exist_ok=True)

    # Guardar modelo
    model_path = 'Modelos Entrenados/overtime_forecast_model.pkl'
    joblib.dump(model, model_path)
    print(f"Modelo guardado en: {model_path}")

    # Guardar predicciones y métricas en la base de datos
    conn = get_db_connection()
    cursor = conn.cursor()
    timestamp = datetime.datetime.now()

    try:
        # 1. Crear tabla Overtime_Predictions si no existe
        cursor.execute("""
            IF NOT EXISTS (SELECT * FROM sys.tables WHERE name = 'Overtime_Predictions')
            CREATE TABLE Overtime_Predictions (
                id INT IDENTITY(1,1) PRIMARY KEY,
                timestamp DATETIME,
                department VARCHAR(100),
                prediction_date DATE,
                predicted_value FLOAT,
                confidence_lower FLOAT,
                confidence_upper FLOAT
            )
        """)
        
        # 2. Crear tabla ML_Model_Metrics_Overtime_Predictions si no existe 
        cursor.execute("""
            IF NOT EXISTS (SELECT * FROM sys.tables WHERE name = 'ML_Model_Metrics_Overtime_Predictions')
            CREATE TABLE ML_Model_Metrics_Overtime_Predictions (
                id INT IDENTITY(1,1) PRIMARY KEY,
                timestamp DATETIME,
                department VARCHAR(100),
                rmse FLOAT,
                mae FLOAT,
                smape FLOAT,
                mase FLOAT,
                model_quality VARCHAR(50)
            )
        """)
        
        # Guardar predicciones para cada departamento
        for department, forecast in department_forecasts.items():
            # Insertar predicciones
            for _, row in forecast.iterrows():
                cursor.execute("""
                    INSERT INTO Overtime_Predictions 
                    (timestamp, department, prediction_date, predicted_value, 
                     confidence_lower, confidence_upper)
                    VALUES (%s, %s, %s, %s, %s, %s)
                """, (
                    timestamp,
                    department, 
                    row['ds'],
                    row['yhat'],
                    row['yhat_lower'],
                    row['yhat_upper']
                ))
            
            # Insertar métricas del modelo
            metric = metrics[department]
            cursor.execute("""
                INSERT INTO ML_Model_Metrics_Overtime_Predictions 
                (timestamp, department, rmse, mae, smape, mase, model_quality)
                VALUES (%s, %s, %s, %s, %s, %s, %s)
            """, (
                timestamp,
                department,
                metric['rmse'],
                metric['mae'],
                metric['smape'],
                metric['mase'],
                metric['quality']
            ))
        
        conn.commit()
        logging.info("Predicciones y métricas guardadas exitosamente")
        
        # Mostrar resumen de lo guardado
        print("\nResumen de predicciones guardadas:")
        print("-" * 70)
        for department in department_forecasts.keys():
            print(f"\nDepartamento: {department}")
            print("Fecha Predicha\t\tPredicción\tIntervalo de Confianza")
            print("-" * 60)
            
            forecast = department_forecasts[department]
            for _, row in forecast.iterrows():
                print(f"{row['ds'].strftime('%Y-%m-%d')}\t{row['yhat']:.2f}\t\t({row['yhat_lower']:.2f}, {row['yhat_upper']:.2f})")
            
            # Mostrar métricas
            metric = metrics[department]
            print(f"\nMétricas del modelo:")
            print(f"RMSE: {metric['rmse']:.2f}")
            print(f"MAE: {metric['mae']:.2f}")
            print(f"SMAPE: {metric['smape']:.2f}%")
            print(f"MASE: {metric['mase']:.2f}")
            print(f"Calidad del modelo: {metric['quality']}")
            print("-" * 70)
        
    except Exception as e:
        conn.rollback()
        logging.error(f"Error guardando datos: {e}")
        raise
    finally:
        conn.close()

# Ejecutar el guardado de predicciones
try:
    save_predictions()
    print("\nDatos guardados exitosamente en la base de datos.")
except Exception as e:
    print(f"\nError al guardar los datos: {str(e)}")


def verify_tables():
    conn = get_db_connection()
    cursor = conn.cursor()
    try:
        cursor.execute("""
            SELECT COUNT(*) FROM Overtime_Predictions;
            SELECT COUNT(*) FROM ML_Model_Metrics_Overtime_Predictions;
        """)
        results = cursor.fetchall()
        print(f"\nRegistros en Overtime_Predictions: {results[0][0]}")
        print(f"Registros en ML_Model_Metrics: {results[1][0]}")
    except Exception as e:
        print(f"Error verificando tablas: {e}")
    finally:
        conn.close()

# Verificar tablas
verify_tables()    

Modelo guardado en: Modelos Entrenados/overtime_forecast_model.pkl


INFO:root:Predicciones y métricas guardadas exitosamente



Resumen de predicciones guardadas:
----------------------------------------------------------------------

Departamento: Finance
Fecha Predicha		Predicción	Intervalo de Confianza
------------------------------------------------------------
2025-06-01	5.05		(0.00, 10.90)
2025-06-08	6.75		(0.81, 12.69)
2025-06-15	3.49		(0.00, 9.44)
2025-06-22	5.55		(0.00, 11.72)

Métricas del modelo:
RMSE: 3.46
MAE: 2.71
SMAPE: 50.14%
MASE: 0.96
Calidad del modelo: Bueno
----------------------------------------------------------------------

Departamento: HR
Fecha Predicha		Predicción	Intervalo de Confianza
------------------------------------------------------------
2025-06-01	8.76		(0.89, 16.63)
2025-06-08	8.27		(0.39, 16.16)
2025-06-15	7.12		(0.00, 15.10)
2025-06-22	4.19		(0.00, 12.17)

Métricas del modelo:
RMSE: 3.99
MAE: 3.22
SMAPE: 55.81%
MASE: 0.63
Calidad del modelo: Bueno
----------------------------------------------------------------------

Departamento: IT
Fecha Predicha		Predicción	Interval