# 📊 Validação de Modelos de Séries Temporais - CVC Lojas

## 🎯 Objetivo Executivo
Este notebook tem como objetivo realizar a **validação robusta (Backtesting)** de múltiplos algoritmos de previsão de vendas para as lojas da CVC. O processo simula cenários reais do passado para garantir que o modelo escolhido tenha performance consistente ao longo do tempo, e não apenas em um único período de teste.

## 🛠️ Metodologia: Walk-Forward Validation (Strict Mode)
Diferente da divisão tradicional (Treino/Teste), utilizamos a estratégia de **Walk-Forward** (Janela Deslizante):
1.  O modelo treina com dados até uma data de corte (ex: Dez/2024).
2.  Faz a previsão para o mês seguinte (ex: Jan/2025).
3.  A janela avança 1 mês, o modelo retreina com os dados reais de Jan/2025 e prevê Fev/2025.
4.  Isso se repete por 12 meses (Folds), gerando métricas de erro (RMSE, SMAPE) para cada mês.

> **Nota:** O modo "Strict" garante que **nenhum dado do futuro** (vazamento de dados) seja acessível ao modelo durante o treino, simulando fielmente a produção.

---

## 🤖 Estratégia de Modelos (Model)
A pipeline avalia automaticamente duas classes de algoritmos via biblioteca **Darts**:

### 1. Machine Learning Clássico (Regressores)
* **Linear Regression:** Baseline simples para capturar tendências lineares.
* **Random Forest:** Captura não-linearidades e interações complexas.
* **LightGBM / XGBoost / CatBoost:** Modelos baseados em *Gradient Boosting*, estado da arte para dados tabulares e séries temporais com covariáveis.

### 2. Deep Learning (SOTA - State of the Art)
* **TFT (Temporal Fusion Transformer):** Modelo de atenção que aprende a importância de cada variável ao longo do tempo.
* **N-BEATS:** Rede neural baseada em blocos de tendência e sazonalidade.
* **Transformer:** Arquitetura clássica de *Attention* adaptada para séries temporais.
* **BlockRNN (LSTM):** Redes recorrentes para capturar dependências de longo prazo.
* **TCN (Temporal Convolutional Network):** Convoluções causais para capturar padrões locais e globais.

---

## 🏛️ Arquitetura e Governança (Databricks Unity Catalog)
Este notebook implementa uma arquitetura híbrida para conformidade com o Unity Catalog:

| Componente | Local de Armazenamento | Função |
| :--- | :--- | :--- |
| **Experimentos** | `Workspace/Users/...` | Armazena métricas, gráficos e logs de execução (evita erro de path do UC). |
| **Registro de Modelos** | **Unity Catalog** (`ds_dev.cvc_val`) | O modelo final (`.pkl`) é versionado e governado oficialmente no catálogo. |
| **Assinatura (Signature)** | **Enforced** | Todos os modelos possuem contrato de entrada/saída (`long` -> `double`) validado para evitar erros de tipagem no serving. |

## 📥 Dados de Entrada
* **Target:** `bip_vhistorico_targuet_loja` (Vendas históricas).
* **Covariáveis Futuras:** `bip_vhistorico_feriados_loja` (Calendário nacional/regional).
* **Covariáveis Globais:** `bip_vhistorico_suporte_canal_loja` (Indicadores macroeconômicos e campanhas).

In [0]:
# --- CONFIGURAÇÕES GLOBAIS DE OTIMIZAÇÃO (BEST PRACTICES) ---
# Ativa otimização automática de gravações e compactação
spark.conf.set("spark.databricks.delta.optimizeWrite.enabled", "true")
spark.conf.set("spark.databricks.delta.autoCompact.enabled", "true")


In [0]:
# --- IMPORTS (REFATORED) ---
%load_ext autoreload
%autoreload 2

import sys
import os
sys.path.append(os.getcwd())

from src.validation.config import Config
# from src.validation.data import DataIngestion 
from src.validation.pipeline import ProjectPipeline
from src.validation.trainer import ModelTrainer

# Bibliotecas Padrão
import pandas as pd
import numpy as np
import mlflow

# Ingestão Imports
from databricks.feature_engineering import FeatureEngineeringClient, FeatureLookup
import pyspark.sql.functions as F
from darts import TimeSeries
from darts.utils.timeseries_generation import datetime_attribute_timeseries

# --- CLASSE DE INGESTÃO OTIMIZADA ---
class DataIngestion:
    def __init__(self, spark_session, config):
        self.spark = spark_session
        self.config = config
        self.fe = FeatureEngineeringClient()

    def create_training_set(self):
        """
        Gera o dataset completo via Feature Store e realiza ETL nativo no Spark.
        Retorna: DataFrame PySpark (Lazy Evaluation)
        """
        print("🛒 Construindo Training Set via Feature Store (Spark Native)...")

        # 1. Definir a 'Spine' (Target)
        target_table = f"{self.config.CATALOG}.{self.config.SCHEMA}.bip_vhistorico_targuet_loja"
        
        df_spine = (self.spark.table(target_table)
                    .filter(F.col("DATA").between(self.config.DATA_START, self.config.INGESTION_END))
                    .select("CODIGO_LOJA", "DATA", "VALOR")
                    .withColumnRenamed("VALOR", "TARGET_VENDAS")
                    .withColumn("CODIGO_LOJA", F.col("CODIGO_LOJA").cast("string"))
                   )

        # 2. Configurar Lookups
        feature_lookups = [
            FeatureLookup(
                table_name=f"{self.config.CATALOG}.{self.config.SCHEMA}.cmc_alojas",
                lookup_key=["CODIGO_LOJA"],
                feature_names=["CLUSTER_LOJA", "SIGLA_UF", "TIPO_LOJA", "MODELO_LOJA"]
            ),
            FeatureLookup(
                table_name=f"{self.config.CATALOG}.{self.config.SCHEMA}.bip_vhistorico_feriados_loja",
                lookup_key=["CODIGO_LOJA"],
                timestamp_lookup_key="DATA",
                feature_names=["VALOR"], 
                output_name="IS_FERIADO"
            )
        ]

        # 3. Criar Training Set (Retorna objeto FeatureEngineeringTrainingSet)
        training_set = self.fe.create_training_set(
            df=df_spine,
            feature_lookups=feature_lookups,
            label="TARGET_VENDAS",
            exclude_columns=[]
        )

        # 4. Carregar como DataFrame Spark (SEM toPandas aqui)
        df_spark = training_set.load_df()

        # --- ETL NATIVO NO SPARK (Distribuído) ---
        print("   ⚡ Executando limpeza e tratamento no Spark Cluster...")
        
        # A. Tratamento de Nulos (Left Joins geram nulos)
        # Sintaxe Spark: fill(valor, subset=[colunas]) ou fill({col: val})
        df_spark = df_spark.na.fill({
            "IS_FERIADO": 0.0, 
            "TARGET_VENDAS": 0.0,
            "CLUSTER_LOJA": "DESCONHECIDO",
            "SIGLA_UF": "DESCONHECIDO",
            "TIPO_LOJA": "DESCONHECIDO",
            "MODELO_LOJA": "DESCONHECIDO"
        })

        # B. Garantia de Tipos (Casting)
        df_spark = df_spark.withColumn("DATA", F.to_timestamp("DATA"))
        
        return df_spark

    def get_global_support(self):
        """
        Carrega suporte global mantendo processamento no Spark até o final.
        """
        table_name = "bip_vhistorico_suporte_canal_loja"
        print(f"🌍 Carregando suporte global (Spark Aggregation)...")
        
        # Toda a agregação ocorre no cluster
        df_spark = (self.spark.table(f"{self.config.CATALOG}.{self.config.SCHEMA}.{table_name}")
            .filter(F.col("DATA").between(self.config.DATA_START, self.config.INGESTION_END))
            .groupBy("DATA")
            .pivot("METRICAS")
            .agg(F.sum("VALOR"))
            .na.fill(0.0) # Preenche nulos do pivot no Spark
        )
        
        # Só converte o resultado final (pequeno) para Pandas
        pdf = df_spark.toPandas()
        pdf['DATA'] = pd.to_datetime(pdf['DATA'])
        return pdf.set_index('DATA').asfreq('D').fillna(0.0)

    def build_darts_objects(self, df_spark_wide, df_global_support):
        """
        Recebe Spark DataFrame -> Converte para Pandas -> Cria objetos Darts
        """
        print("⚙️ Materializando dados do Spark para Pandas (Driver)...")

        # AQUI acontece a transferência de dados Cluster -> Driver
        # Como já filtramos e limpamos no Spark, o dado vem menor e mais limpo.
        df_wide = df_spark_wide.toPandas()
        
        # Garante tipos Pandas compatíveis com Darts
        df_wide['DATA'] = pd.to_datetime(df_wide['DATA'])
        
        # Identificação dinâmica de colunas estáticas
        possible_static = ["CLUSTER_LOJA", "SIGLA_UF", "TIPO_LOJA", "MODELO_LOJA"]
        static_cols = [c for c in possible_static if c in df_wide.columns]

        print("   Build: Criando Target Series...")
        target_series_list = TimeSeries.from_group_dataframe(
            df_wide,
            group_cols="CODIGO_LOJA",
            time_col="DATA",
            value_cols="TARGET_VENDAS",
            static_cols=static_cols,
            freq='D',
            fill_missing_dates=True,
            fillna_value=0.0
        )
        
        target_dict = {str(ts.static_covariates.index[0]): ts for ts in target_series_list}
        valid_stores = list(target_dict.keys())

        print("   Build: Criando Covariáveis Locais...")
        feriado_series_list = TimeSeries.from_group_dataframe(
            df_wide,
            group_cols="CODIGO_LOJA",
            time_col="DATA",
            value_cols=["IS_FERIADO"], 
            freq='D',
            fill_missing_dates=True,
            fillna_value=0.0
        )
        feriado_dict = {str(ts.static_covariates["CODIGO_LOJA"].iloc[0]): ts for ts in feriado_series_list}

        # Globais (já vieram prontas do método get_global_support)
        ts_support = TimeSeries.from_dataframe(
            df_global_support, 
            fill_missing_dates=True, 
            freq='D',
            fillna_value=0.0
        )
        
        ts_time = datetime_attribute_timeseries(df_global_support.index, attribute="dayofweek", cyclic=True)
        ts_time = ts_time.stack(datetime_attribute_timeseries(df_global_support.index, attribute="quarter", one_hot=True))
        ts_time = ts_time.stack(datetime_attribute_timeseries(df_global_support.index, attribute="week", cyclic=True))
        
        global_covariates = ts_support.stack(ts_time)

        final_target_list = []
        full_covariates_list = []

        print("   Build: Stacking Final...")
        for loja in valid_stores:
            ts_target = target_dict[loja]
            final_target_list.append(ts_target)
            
            ts_local = feriado_dict.get(loja)
            if ts_local:
                 ts_local = ts_local.slice_intersect(ts_target)
            else:
                 ts_local = TimeSeries.from_times_and_values(ts_target.time_index, np.zeros(len(ts_target)), freq='D')
            
            ts_global = global_covariates.slice_intersect(ts_target)
            full_covariates_list.append(ts_global.stack(ts_local))

        print(f"✅ Objetos Darts Prontos: {len(final_target_list)} lojas.")
        return final_target_list, full_covariates_list


In [0]:
if spark is None:
    raise RuntimeError("Spark Session not available.")

config = Config()
config.spark_session = spark

In [0]:
# --- EXECUÇÃO DO PIPELINE (OTIMIZADO COM FEATURE STORE) ---
print(f"🚀 Iniciando Pipeline v{config.VERSION} (Walk-Forward Strict Mode)")

ingestion = DataIngestion(spark, config)

# No bloco de execução:
# 1. Busca Unificada (Feature Store + Spark ETL)
df_spark_wide = ingestion.create_training_set() # Retorna Spark DF
df_support_global = ingestion.get_global_support() # Retorna Pandas (pois é pequeno)

# 2. Construção dos Objetos Darts (Aqui ocorre o toPandas)
raw_series, raw_covs = ingestion.build_darts_objects(df_spark_wide, df_support_global)

# --- Daqui para baixo, o código original de treino se mantém igual ---
# 3. SPLIT DE TREINO
train_cutoff_date = pd.Timestamp(config.TRAIN_END_DATE) - pd.Timedelta(days=1)
print(f"✂️ Data corte para treino estático: {train_cutoff_date.date()}")

print("🛠️ Ajustando Pipeline (Scalers)...")
project_pipeline = ProjectPipeline()

train_for_fit = [s.drop_after(train_cutoff_date) for s in raw_series]
cov_for_fit = [s.drop_after(train_cutoff_date) for s in raw_covs]
project_pipeline.fit(train_for_fit, cov_for_fit)

print("🔄 Transformando TODAS as séries...")
series_scaled_full, cov_scaled_full = project_pipeline.transform(raw_series, raw_covs)

# Salvar Pipeline
pipeline_path = f"{config.PATH_SCALERS}/project_pipeline_v{config.VERSION}.pkl"
with open(pipeline_path, 'wb') as f:
    pickle.dump(project_pipeline, f)
print(f"💾 Pipeline salvo: {pipeline_path}")

# Filtragem de Séries Curtas e Sets Finais
print("🔍 Filtrando séries curtas...")
min_len = config.LAGS + config.FORECAST_HORIZON + 1
valid_indices = [i for i, ts in enumerate(train_for_fit) if len(ts) >= min_len]

train_series_static = [series_scaled_full[i].drop_after(train_cutoff_date) for i in valid_indices]
train_cov_static = [cov_scaled_full[i].drop_after(train_cutoff_date) for i in valid_indices]
full_series_valid = [series_scaled_full[i] for i in valid_indices]
full_cov_valid = [cov_scaled_full[i] for i in valid_indices]

print("🔄 Preparando targets originais para validação...")
val_series_original = project_pipeline.inverse_transform(full_series_valid, partial=True)

# 4. CONFIGURAÇÃO DE MODELOS
lag = config.LAGS
lag_covariantes = config.LAGS_FUTURE
forecast = config.FORECAST_HORIZON
lag_2 = lag + config.FORECAST_HORIZON # Lag estendido para Deep Learning
dynamic_kernel = 3 # Kernel safe size
EARLY_STOPPER = EarlyStopping(monitor="train_loss", patience=5, min_delta=0.001, mode='min')

models_dict = {
    # --- MODELOS ESTATÍSTICOS / ML CLÁSSICO ---
    "LinearRegression": LinearRegressionModel(
        lags=lag,
        lags_future_covariates=lag_covariantes,
        output_chunk_length=forecast,
        multi_models=True
    ),
    "RandomForest": RandomForest(
        lags=lag,
        lags_future_covariates=lag_covariantes,
        output_chunk_length=forecast,
        multi_models=False, # RF sklearn limitação
        random_state=42
    ),
    "LightGBM": LightGBMModel(
        lags=lag,
        lags_future_covariates=lag_covariantes,
        output_chunk_length=forecast,
        multi_models=True,
        random_state=42
    ),
    "XGBoost": XGBModel(
        lags=lag,
        lags_future_covariates=lag_covariantes,
        output_chunk_length=forecast,
        multi_models=True,
        random_state=42
    ),
    "CatBoost": CatBoostModel(
        lags=lag,
        lags_future_covariates=lag_covariantes,
        output_chunk_length=forecast,
        multi_models=True,
        random_state=42
    )
}

# --- MODELOS DE DEEP LEARNING (Adicionados se N_EPOCHS > 0) ---
if config.N_EPOCHS > 0:
    pl_trainer_kwargs = {"accelerator": "cpu", "callbacks": [EARLY_STOPPER]}
    models_dict.update({
        "TFT": TFTModel(
            input_chunk_length=lag_2,
            output_chunk_length=forecast,
            hidden_size=128,
            lstm_layers=2,
            num_attention_heads=4,
            dropout=0.2,
            batch_size=4,
            n_epochs=config.N_EPOCHS,
            add_relative_index=True,
            random_state=42,
            pl_trainer_kwargs=pl_trainer_kwargs
        ),
        "NBEATS": NBEATSModel(
            input_chunk_length=lag_2,
            output_chunk_length=forecast,
            generic_architecture=True,
            num_stacks=3,
            num_blocks=3,
            num_layers=4,
            layer_widths=256,
            batch_size=4,
            n_epochs=config.N_EPOCHS,
            random_state=42,
            pl_trainer_kwargs=pl_trainer_kwargs
        ),
        "Transformer": TransformerModel(
            input_chunk_length=lag_2,
            output_chunk_length=forecast,
            d_model=128,
            nhead=4,
            num_encoder_layers=3,
            num_decoder_layers=3,
            dim_feedforward=256,
            dropout=0.2,
            batch_size=4,
            n_epochs=config.N_EPOCHS,
            random_state=42,
            pl_trainer_kwargs=pl_trainer_kwargs
        ),
        "BlockRNN": BlockRNNModel(
            model='LSTM',
            input_chunk_length=lag_2,
            output_chunk_length=forecast,
            hidden_dim=128,
            n_rnn_layers=2,
            dropout=0.2,
            batch_size=4,
            n_epochs=config.N_EPOCHS,
            random_state=42,
            pl_trainer_kwargs=pl_trainer_kwargs
        ),
        "TCN": TCNModel(
            input_chunk_length=lag_2,
            output_chunk_length=forecast,
            kernel_size=dynamic_kernel,
            num_filters=lag_2,
            num_layers=None,
            dilation_base=2,
            dropout=0.2,
            batch_size=4,
            n_epochs=config.N_EPOCHS,
            random_state=42,
            pl_trainer_kwargs=pl_trainer_kwargs
        )
    })

trainer = ModelTrainer(config, models_dict)
trainer.train_evaluate_walkforward(
    train_series_static=train_series_static,
    train_covs_static=train_cov_static,
    full_series_scaled=full_series_valid,
    full_covariates_scaled=full_cov_valid,
    val_series_original=val_series_original,
    target_pipeline=project_pipeline
)
print("✅ Processo Finalizado.")

🚀 Iniciando Pipeline v2026_01_15_09_42 (Walk-Forward Strict Mode)
🛒 Construindo Training Set via Feature Store (Spark Native)...




   ⚡ Executando limpeza e tratamento no Spark Cluster...
🌍 Carregando suporte global (Spark Aggregation)...
⚙️ Materializando dados do Spark para Pandas (Driver)...
   Build: Criando Target Series...
   Build: Criando Covariáveis Locais...
   Build: Stacking Final...
✅ Objetos Darts Prontos: 1 lojas.
✂️ Data corte para treino estático: 2024-12-31
🛠️ Ajustando Pipeline (Scalers)...
🔄 Transformando TODAS as séries...
💾 Pipeline salvo: /Volumes/ds_dev/cvc/experiments/artefacts/loja/scalar/validation/project_pipeline_v2026_01_15_09_42.pkl
🔍 Filtrando séries curtas...
🔄 Preparando targets originais para validação...

🚀 [Model: LinearRegression] Iniciando Processo...
   📝 Registrando metadados do experimento...
   🏋️ Treinando com dados até 2025-01-01...


2026/01/15 12:45:51 INFO mlflow.pyfunc: Validating input example against model signature


   💾 Registrando modelo como: ds_dev.cvc.loja_LinearRegression


Downloading artifacts:   0%|          | 0/1 [00:00<?, ?it/s]

Downloading artifacts:   0%|          | 0/1 [00:00<?, ?it/s]

Uploading artifacts:   0%|          | 0/13 [00:00<?, ?it/s]

Registered model 'ds_dev.cvc.loja_LinearRegression' already exists. Creating a new version of this model...


Uploading artifacts:   0%|          | 0/13 [00:00<?, ?it/s]

Created version '22' of model 'ds_dev.cvc.loja_linearregression'.


   🔮 Iniciando Inferência Walk-Forward (5 folds)...
     📅 2025-01: SMAPE=200.00%, RMSE=1188891008021282.50
     📅 2025-02: SMAPE=200.00%, RMSE=363331800284236.06
     📅 2025-03: SMAPE=200.00%, RMSE=4394207113599525.50
     📅 2025-04: SMAPE=200.00%, RMSE=9098643977247738.00
     📅 2025-05: SMAPE=200.00%, RMSE=6052933627652242.00
   📊 GLOBAL: MAPE=inf%, RMSE=5090182958597040.00

🚀 [Model: RandomForest] Iniciando Processo...
   📝 Registrando metadados do experimento...
   🏋️ Treinando com dados até 2025-01-01...


2026/01/15 12:45:58 INFO mlflow.pyfunc: Validating input example against model signature


   💾 Registrando modelo como: ds_dev.cvc.loja_RandomForest


Downloading artifacts:   0%|          | 0/1 [00:00<?, ?it/s]

Downloading artifacts:   0%|          | 0/1 [00:00<?, ?it/s]

Uploading artifacts:   0%|          | 0/13 [00:00<?, ?it/s]

Registered model 'ds_dev.cvc.loja_RandomForest' already exists. Creating a new version of this model...


Uploading artifacts:   0%|          | 0/13 [00:00<?, ?it/s]

Created version '14' of model 'ds_dev.cvc.loja_randomforest'.


   🔮 Iniciando Inferência Walk-Forward (5 folds)...
     📅 2025-01: SMAPE=116.56%, RMSE=1798.33
     📅 2025-02: SMAPE=133.26%, RMSE=3223.38
     📅 2025-03: SMAPE=96.35%, RMSE=2777.61
     📅 2025-04: SMAPE=94.24%, RMSE=7384.77
     📅 2025-05: SMAPE=111.89%, RMSE=2372.42
   📊 GLOBAL: MAPE=inf%, RMSE=4355.29

🚀 [Model: LightGBM] Iniciando Processo...
   📝 Registrando metadados do experimento...
   🏋️ Treinando com dados até 2025-01-01...
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000530 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 338
[LightGBM] [Info] Number of data points in the train set: 52, number of used features: 77
[LightGBM] [Info] Start training from score 0.081850
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000023 seconds.
You can set `force_row_wise=true` to remove the overhea

2026/01/15 12:46:06 INFO mlflow.pyfunc: Validating input example against model signature


   💾 Registrando modelo como: ds_dev.cvc.loja_LightGBM


Downloading artifacts:   0%|          | 0/1 [00:00<?, ?it/s]

Downloading artifacts:   0%|          | 0/1 [00:00<?, ?it/s]

Uploading artifacts:   0%|          | 0/13 [00:00<?, ?it/s]

Registered model 'ds_dev.cvc.loja_LightGBM' already exists. Creating a new version of this model...


Uploading artifacts:   0%|          | 0/13 [00:00<?, ?it/s]

Created version '14' of model 'ds_dev.cvc.loja_lightgbm'.


   🔮 Iniciando Inferência Walk-Forward (5 folds)...
     📅 2025-01: SMAPE=101.51%, RMSE=1638.61
     📅 2025-02: SMAPE=97.22%, RMSE=1630.30
     📅 2025-03: SMAPE=52.19%, RMSE=1264.94
     📅 2025-04: SMAPE=116.66%, RMSE=7850.24
     📅 2025-05: SMAPE=84.25%, RMSE=1551.38
   📊 GLOBAL: MAPE=inf%, RMSE=4140.33

🚀 [Model: XGBoost] Iniciando Processo...
   📝 Registrando metadados do experimento...
   🏋️ Treinando com dados até 2025-01-01...


2026/01/15 12:46:16 INFO mlflow.pyfunc: Validating input example against model signature


   💾 Registrando modelo como: ds_dev.cvc.loja_XGBoost


Downloading artifacts:   0%|          | 0/1 [00:00<?, ?it/s]

Downloading artifacts:   0%|          | 0/1 [00:00<?, ?it/s]

Uploading artifacts:   0%|          | 0/13 [00:00<?, ?it/s]

Registered model 'ds_dev.cvc.loja_XGBoost' already exists. Creating a new version of this model...


Uploading artifacts:   0%|          | 0/13 [00:00<?, ?it/s]

Created version '17' of model 'ds_dev.cvc.loja_xgboost'.


   🔮 Iniciando Inferência Walk-Forward (5 folds)...
     📅 2025-01: SMAPE=104.57%, RMSE=3101.59
     📅 2025-02: SMAPE=111.95%, RMSE=3150.04
     📅 2025-03: SMAPE=72.56%, RMSE=2827.61
     📅 2025-04: SMAPE=113.52%, RMSE=8080.54
     📅 2025-05: SMAPE=103.10%, RMSE=2897.66
   📊 GLOBAL: MAPE=inf%, RMSE=4817.50

🚀 [Model: CatBoost] Iniciando Processo...
   📝 Registrando metadados do experimento...
   🏋️ Treinando com dados até 2025-01-01...


2026/01/15 12:48:01 INFO mlflow.pyfunc: Validating input example against model signature


   💾 Registrando modelo como: ds_dev.cvc.loja_CatBoost


Downloading artifacts:   0%|          | 0/1 [00:00<?, ?it/s]

Downloading artifacts:   0%|          | 0/1 [00:00<?, ?it/s]

Uploading artifacts:   0%|          | 0/13 [00:00<?, ?it/s]

Registered model 'ds_dev.cvc.loja_CatBoost' already exists. Creating a new version of this model...


Uploading artifacts:   0%|          | 0/13 [00:00<?, ?it/s]

Created version '9' of model 'ds_dev.cvc.loja_catboost'.


   🔮 Iniciando Inferência Walk-Forward (5 folds)...
     📅 2025-01: SMAPE=107.15%, RMSE=2533.92
     📅 2025-02: SMAPE=104.32%, RMSE=2131.13
     📅 2025-03: SMAPE=52.62%, RMSE=1209.97
     📅 2025-04: SMAPE=111.42%, RMSE=7889.40
     📅 2025-05: SMAPE=82.69%, RMSE=1396.37
   📊 GLOBAL: MAPE=inf%, RMSE=4320.57

🚀 [Model: TFT] Iniciando Processo...
   📝 Registrando metadados do experimento...
   🏋️ Treinando com dados até 2025-01-01...


GPU available: False, used: False
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs

   | Name                              | Type                             | Params | Mode 
------------------------------------------------------------------------------------------------
0  | train_metrics                     | MetricCollection                 | 0      | train
1  | val_metrics                       | MetricCollection                 | 0      | train
2  | input_embeddings                  | _MultiEmbedding                  | 0      | train
3  | static_covariates_vsn             | _VariableSelectionNetwork        | 15.8 K | train
4  | encoder_vsn                       | _VariableSelectionNetwork        | 583 K  | train
5  | decoder_vsn                       | _VariableSelectionNetwork        | 247 K  | train
6  | static_context_grn                | _GatedResidualNetwork            | 66.3 K | train
7  | static_context_hidden_encoder_grn | _GatedResidualNetwork 

Training: |          | 0/? [00:00<?, ?it/s]

2026/01/15 12:48:56 INFO mlflow.pyfunc: Validating input example against model signature


   💾 Registrando modelo como: ds_dev.cvc.loja_TFT


Downloading artifacts:   0%|          | 0/1 [00:00<?, ?it/s]

Downloading artifacts:   0%|          | 0/1 [00:00<?, ?it/s]

  "dataframe_split": {
    "columns": [
      "n"
    ],
    "data": [
      [
        35
      ]
    ]
  }
}. Alternatively, you can avoid passing input example and pass model signature instead when logging the model. To ensure the input example is valid prior to serving, please try calling `mlflow.models.validate_serving_input` on the model uri and serving input example. A serving input example can be generated from model input example using `mlflow.models.convert_input_example_to_serving_input` function.
Got error: A load persistent id instruction was encountered, but no persistent_load function was specified.


Uploading artifacts:   0%|          | 0/13 [00:00<?, ?it/s]

Registered model 'ds_dev.cvc.loja_TFT' already exists. Creating a new version of this model...


Uploading artifacts:   0%|          | 0/13 [00:00<?, ?it/s]

Created version '7' of model 'ds_dev.cvc.loja_tft'.
GPU available: False, used: False
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs


   🔮 Iniciando Inferência Walk-Forward (5 folds)...


Predicting: |          | 0/? [00:00<?, ?it/s]

GPU available: False, used: False
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs


     📅 2025-01: SMAPE=127.28%, RMSE=3519.47


Predicting: |          | 0/? [00:00<?, ?it/s]

GPU available: False, used: False
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs


     📅 2025-02: SMAPE=162.67%, RMSE=3930.39


Predicting: |          | 0/? [00:00<?, ?it/s]

GPU available: False, used: False
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs


     📅 2025-03: SMAPE=108.92%, RMSE=2763.98


Predicting: |          | 0/? [00:00<?, ?it/s]

ERROR:darts.utils.data.inference_dataset:ValueError: For the given forecasting horizon `n=35`, the provided future covariates at dataset index `0` do not extend far enough into the future. As `n <= output_chunk_length` the future covariates must end at time step `2025-06-03 00:00:00`, whereas now they end at time step `2025-05-31 00:00:00`.


     📅 2025-04: SMAPE=123.23%, RMSE=7542.82
❌ Error training TFT: For the given forecasting horizon `n=35`, the provided future covariates at dataset index `0` do not extend far enough into the future. As `n <= output_chunk_length` the future covariates must end at time step `2025-06-03 00:00:00`, whereas now they end at time step `2025-05-31 00:00:00`.

🚀 [Model: NBEATS] Iniciando Processo...


Traceback (most recent call last):
  File "/root/.ipykernel/9536/command-710650326555544-3165449090", line 183, in train_evaluate_walkforward
    preds_scaled = model.predict(**predict_kwargs)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.12/site-packages/darts/utils/torch.py", line 103, in decorator
    return decorated(self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.12/site-packages/darts/models/forecasting/torch_forecasting_model.py", line 1465, in predict
    predictions = self.predict_from_dataset(
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.12/site-packages/darts/utils/torch.py", line 103, in decorator
    return decorated(self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/local_disk0/.ephemeral_nfs/cluster_librarie

   📝 Registrando metadados do experimento...


GPU available: False, used: False
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs

  | Name            | Type             | Params | Mode 
-------------------------------------------------------------
0 | criterion       | MSELoss          | 0      | train
1 | train_criterion | MSELoss          | 0      | train
2 | val_criterion   | MSELoss          | 0      | train
3 | train_metrics   | MetricCollection | 0      | train
4 | val_metrics     | MetricCollection | 0      | train
5 | stacks          | ModuleList       | 7.9 M  | train
-------------------------------------------------------------
7.8 M     Trainable params
16.4 K    Non-trainable params
7.9 M     Total params
31.452    Total estimated model params size (MB)
111       Modules in train mode
0         Modules in eval mode


   🏋️ Treinando com dados até 2025-01-01...


Training: |          | 0/? [00:00<?, ?it/s]

2026/01/15 12:49:18 INFO mlflow.pyfunc: Validating input example against model signature


   💾 Registrando modelo como: ds_dev.cvc.loja_NBEATS


Downloading artifacts:   0%|          | 0/1 [00:00<?, ?it/s]

  "dataframe_split": {
    "columns": [
      "n"
    ],
    "data": [
      [
        35
      ]
    ]
  }
}. Alternatively, you can avoid passing input example and pass model signature instead when logging the model. To ensure the input example is valid prior to serving, please try calling `mlflow.models.validate_serving_input` on the model uri and serving input example. A serving input example can be generated from model input example using `mlflow.models.convert_input_example_to_serving_input` function.
Got error: A load persistent id instruction was encountered, but no persistent_load function was specified.


Uploading artifacts:   0%|          | 0/12 [00:00<?, ?it/s]

Registered model 'ds_dev.cvc.loja_NBEATS' already exists. Creating a new version of this model...


Uploading artifacts:   0%|          | 0/12 [00:00<?, ?it/s]

Created version '7' of model 'ds_dev.cvc.loja_nbeats'.
GPU available: False, used: False
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs


   🔮 Iniciando Inferência Walk-Forward (5 folds)...


Predicting: |          | 0/? [00:00<?, ?it/s]

GPU available: False, used: False
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs


     📅 2025-01: SMAPE=126.92%, RMSE=2948.03


Predicting: |          | 0/? [00:00<?, ?it/s]

     📅 2025-02: SMAPE=122.74%, RMSE=3004.73


GPU available: False, used: False
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs


Predicting: |          | 0/? [00:00<?, ?it/s]

GPU available: False, used: False
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs


     📅 2025-03: SMAPE=127.73%, RMSE=3826.65


Predicting: |          | 0/? [00:00<?, ?it/s]

GPU available: False, used: False
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs


     📅 2025-04: SMAPE=145.47%, RMSE=10060.44


Predicting: |          | 0/? [00:00<?, ?it/s]

     📅 2025-05: SMAPE=114.09%, RMSE=3050.67
   📊 GLOBAL: MAPE=inf%, RMSE=5778.73

🚀 [Model: Transformer] Iniciando Processo...
   📝 Registrando metadados do experimento...


GPU available: False, used: False
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs

  | Name                | Type                | Params | Mode 
--------------------------------------------------------------------
0 | criterion           | MSELoss             | 0      | train
1 | train_criterion     | MSELoss             | 0      | train
2 | val_criterion       | MSELoss             | 0      | train
3 | train_metrics       | MetricCollection    | 0      | train
4 | val_metrics         | MetricCollection    | 0      | train
5 | encoder             | Linear              | 8.2 K  | train
6 | positional_encoding | _PositionalEncoding | 0      | train
7 | transformer         | Transformer         | 994 K  | train
8 | decoder             | Linear              | 4.5 K  | train
--------------------------------------------------------------------
1.0 M     Trainable params
0         Non-trainable params
1.0 M     Total params
4.028     Total estimated model params 

   🏋️ Treinando com dados até 2025-01-01...


Training: |          | 0/? [00:00<?, ?it/s]

2026/01/15 12:49:26 INFO mlflow.pyfunc: Validating input example against model signature


   💾 Registrando modelo como: ds_dev.cvc.loja_Transformer


Downloading artifacts:   0%|          | 0/1 [00:00<?, ?it/s]

  "dataframe_split": {
    "columns": [
      "n"
    ],
    "data": [
      [
        35
      ]
    ]
  }
}. Alternatively, you can avoid passing input example and pass model signature instead when logging the model. To ensure the input example is valid prior to serving, please try calling `mlflow.models.validate_serving_input` on the model uri and serving input example. A serving input example can be generated from model input example using `mlflow.models.convert_input_example_to_serving_input` function.
Got error: A load persistent id instruction was encountered, but no persistent_load function was specified.


Uploading artifacts:   0%|          | 0/12 [00:00<?, ?it/s]

Registered model 'ds_dev.cvc.loja_Transformer' already exists. Creating a new version of this model...


Uploading artifacts:   0%|          | 0/12 [00:00<?, ?it/s]

Created version '7' of model 'ds_dev.cvc.loja_transformer'.
GPU available: False, used: False
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs


   🔮 Iniciando Inferência Walk-Forward (5 folds)...


Predicting: |          | 0/? [00:00<?, ?it/s]

     📅 2025-01: SMAPE=133.75%, RMSE=2717.40


GPU available: False, used: False
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs


Predicting: |          | 0/? [00:00<?, ?it/s]

GPU available: False, used: False
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs


     📅 2025-02: SMAPE=139.96%, RMSE=2805.05


Predicting: |          | 0/? [00:00<?, ?it/s]

GPU available: False, used: False
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs


     📅 2025-03: SMAPE=123.52%, RMSE=2819.38


Predicting: |          | 0/? [00:00<?, ?it/s]

GPU available: False, used: False
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs


     📅 2025-04: SMAPE=134.98%, RMSE=8394.78


Predicting: |          | 0/? [00:00<?, ?it/s]

     📅 2025-05: SMAPE=121.38%, RMSE=2361.16
   📊 GLOBAL: MAPE=inf%, RMSE=4839.26

🚀 [Model: BlockRNN] Iniciando Processo...
   📝 Registrando metadados do experimento...


GPU available: False, used: False
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs

  | Name            | Type             | Params | Mode 
-------------------------------------------------------------
0 | criterion       | MSELoss          | 0      | train
1 | train_criterion | MSELoss          | 0      | train
2 | val_criterion   | MSELoss          | 0      | train
3 | train_metrics   | MetricCollection | 0      | train
4 | val_metrics     | MetricCollection | 0      | train
5 | rnn             | LSTM             | 230 K  | train
6 | fc              | Sequential       | 4.5 K  | train
-------------------------------------------------------------
235 K     Trainable params
0         Non-trainable params
235 K     Total params
0.942     Total estimated model params size (MB)
8         Modules in train mode
0         Modules in eval mode


   🏋️ Treinando com dados até 2025-01-01...


Training: |          | 0/? [00:00<?, ?it/s]

2026/01/15 12:49:33 INFO mlflow.pyfunc: Validating input example against model signature


   💾 Registrando modelo como: ds_dev.cvc.loja_BlockRNN


Downloading artifacts:   0%|          | 0/1 [00:00<?, ?it/s]

  "dataframe_split": {
    "columns": [
      "n"
    ],
    "data": [
      [
        35
      ]
    ]
  }
}. Alternatively, you can avoid passing input example and pass model signature instead when logging the model. To ensure the input example is valid prior to serving, please try calling `mlflow.models.validate_serving_input` on the model uri and serving input example. A serving input example can be generated from model input example using `mlflow.models.convert_input_example_to_serving_input` function.
Got error: A load persistent id instruction was encountered, but no persistent_load function was specified.


Uploading artifacts:   0%|          | 0/12 [00:00<?, ?it/s]

Registered model 'ds_dev.cvc.loja_BlockRNN' already exists. Creating a new version of this model...


Uploading artifacts:   0%|          | 0/12 [00:00<?, ?it/s]

Created version '7' of model 'ds_dev.cvc.loja_blockrnn'.
GPU available: False, used: False
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs


   🔮 Iniciando Inferência Walk-Forward (5 folds)...


Predicting: |          | 0/? [00:00<?, ?it/s]

GPU available: False, used: False
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs


     📅 2025-01: SMAPE=70.23%, RMSE=826.44


Predicting: |          | 0/? [00:00<?, ?it/s]

GPU available: False, used: False
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs


     📅 2025-02: SMAPE=85.77%, RMSE=1022.56


Predicting: |          | 0/? [00:00<?, ?it/s]

GPU available: False, used: False
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs


     📅 2025-03: SMAPE=70.35%, RMSE=1192.15


Predicting: |          | 0/? [00:00<?, ?it/s]

GPU available: False, used: False
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs


     📅 2025-04: SMAPE=110.37%, RMSE=7586.78


Predicting: |          | 0/? [00:00<?, ?it/s]

     📅 2025-05: SMAPE=88.44%, RMSE=1311.85
   📊 GLOBAL: MAPE=inf%, RMSE=3895.80

🚀 [Model: TCN] Iniciando Processo...
   📝 Registrando metadados do experimento...


GPU available: False, used: False
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs

  | Name            | Type             | Params | Mode 
-------------------------------------------------------------
0 | criterion       | MSELoss          | 0      | train
1 | train_criterion | MSELoss          | 0      | train
2 | val_criterion   | MSELoss          | 0      | train
3 | train_metrics   | MetricCollection | 0      | train
4 | val_metrics     | MetricCollection | 0      | train
5 | res_blocks      | ModuleList       | 39.4 K | train
-------------------------------------------------------------
39.4 K    Trainable params
0         Non-trainable params
39.4 K    Total params
0.157     Total estimated model params size (MB)
28        Modules in train mode
0         Modules in eval mode


   🏋️ Treinando com dados até 2025-01-01...


Training: |          | 0/? [00:00<?, ?it/s]

2026/01/15 12:49:38 INFO mlflow.pyfunc: Validating input example against model signature


   💾 Registrando modelo como: ds_dev.cvc.loja_TCN


Downloading artifacts:   0%|          | 0/1 [00:00<?, ?it/s]

  "dataframe_split": {
    "columns": [
      "n"
    ],
    "data": [
      [
        35
      ]
    ]
  }
}. Alternatively, you can avoid passing input example and pass model signature instead when logging the model. To ensure the input example is valid prior to serving, please try calling `mlflow.models.validate_serving_input` on the model uri and serving input example. A serving input example can be generated from model input example using `mlflow.models.convert_input_example_to_serving_input` function.
Got error: A load persistent id instruction was encountered, but no persistent_load function was specified.


Uploading artifacts:   0%|          | 0/12 [00:00<?, ?it/s]

Registered model 'ds_dev.cvc.loja_TCN' already exists. Creating a new version of this model...


Uploading artifacts:   0%|          | 0/12 [00:00<?, ?it/s]

Created version '7' of model 'ds_dev.cvc.loja_tcn'.
GPU available: False, used: False
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs


   🔮 Iniciando Inferência Walk-Forward (5 folds)...


Predicting: |          | 0/? [00:00<?, ?it/s]

GPU available: False, used: False
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs


     📅 2025-01: SMAPE=142.11%, RMSE=3422.63


Predicting: |          | 0/? [00:00<?, ?it/s]

     📅 2025-02: SMAPE=172.69%, RMSE=1576.20


GPU available: False, used: False
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs


Predicting: |          | 0/? [00:00<?, ?it/s]

GPU available: False, used: False
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs


     📅 2025-03: SMAPE=200.00%, RMSE=9240.84


Predicting: |          | 0/? [00:00<?, ?it/s]

GPU available: False, used: False
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs


     📅 2025-04: SMAPE=200.00%, RMSE=12944.18


Predicting: |          | 0/? [00:00<?, ?it/s]

     📅 2025-05: SMAPE=166.75%, RMSE=7710.85
   📊 GLOBAL: MAPE=inf%, RMSE=8172.27
✅ Processo Finalizado.
