# üöÄ Deploy Final de Modelos de S√©ries Temporais - CVC Lojas

## üéØ Objetivo
Treinar a vers√£o **FINAL** do modelo de previs√£o de vendas, utilizando os melhores hiperpar√¢metros validados, e registrar um artefato "All-in-One" no Unity Catalog pronto para infer√™ncia produtiva.

## üìÖ Escopo Temporal (Concept Drift Prevention)
**REGRA DE NEG√ìCIO CR√çTICA:** Para evitar dados obsoletos (pr√©-pandemia/2019) que n√£o refletem o comportamento atual do consumidor, este treino utiliza estritamente:
*   **In√≠cio:** `2021-01-01`
*   **Fim:** `2025-12-31`

Qualquer dado fora deste intervalo √© filtrado no Spark antes da inje√ß√£o.

## üì¶ UnifiedForecaster (Model Wrapper)
O modelo √© salvo como um MLflow PyFunc customizado (`UnifiedForecaster`) que encapsula:
1.  **Pipeline de Preprocessamento** (Scaler, Missing Values) - Garante que o input bruto seja tratado igual ao treino.
2.  **Modelo Darts Treinado** (LGBM, TFT, etc).
3.  **L√≥gica de P√≥s-processamento** (Inverse Transform) - Entrega a previs√£o na escala real (R$).


In [0]:
# --- CONFIGURA√á√ïES DE AMBIENTE ---
%load_ext autoreload
%autoreload 2

import sys
import os
# Garante que o src seja encontr√°vel
sys.path.append(os.getcwd())

# Imports da nossa lib modularizada
from src.validation.config import Config
from src.validation.data import DataIngestion
from src.validation.pipeline import ProjectPipeline
from src.deploy.wrapper import UnifiedForecaster

# Bibliotecas Externas
import mlflow
import pandas as pd
import pickle
import pyspark.sql.functions as F
from mlflow.models import ModelSignature
from mlflow.tracking import MlflowClient
from mlflow.types.schema import Schema, ColSpec
from darts.models import LightGBMModel, TFTModel # Exemplo: Importe o modelo vencedor aqui
from datetime import datetime

# --- CONFIGS GLOBAIS ---
spark.conf.set("spark.databricks.delta.optimizeWrite.enabled", "true")
spark.conf.set("spark.databricks.delta.autoCompact.enabled", "true")

In [0]:
# --- 1. DEFINI√á√ÉO DO ESCOPO TEMPORAL (HARDCODED) ---
# Sobrescrevemos configs de widget para garantir a regra de neg√≥cio de Deploy
config = Config(spark)
config.DATA_START = "2024-10-01"
config.TRAIN_END_DATE = "2025-12-31"
config.INGESTION_END = config.TRAIN_END_DATE # Para deploy, usamos tudo at√© hoje
config.SCHEMA="cvc_pred"
config.EXPERIMENT_NAME="Model_Deploy_CVC_Loja"

print(f"‚è±Ô∏è PER√çODO DE TREINO DEFINIDO: {config.DATA_START} at√© {config.TRAIN_END_DATE}")
print(f"‚ö†Ô∏è Dados anteriores a 2021 ser√£o ignorados (Concept Drift Strategy).")

In [0]:
# --- 2. INGEST√ÉO E FILTRAGEM (SPARK SIDE) ---
ingestion = DataIngestion(spark, config)

# a) Cria o dataset bruto via Feature Store
df_spark_raw = ingestion.create_training_set()

# b) FILTRAGEM R¬ÅGIDA NO SPARK (Eficiente)
print(f"   ‚ôªÔ∏è Aplicando filtro temporal no Cluster Spark ({config.DATA_START} - {config.TRAIN_END_DATE})...")
df_spark_filtered = df_spark_raw.filter(
    F.col("DATA").between(config.DATA_START, config.TRAIN_END_DATE)
)

# c) Carrega suportes globais (j√° filtrados pela classe DataIngestion se config estiver certa, mas refor√ßamos)
df_global_support = ingestion.get_global_support()
df_global_support = df_global_support[config.DATA_START : config.TRAIN_END_DATE]

# d) Converte para Darts (Pandas no Driver)
target_series_list, full_covariates_list = ingestion.build_darts_objects(df_spark_filtered, df_global_support)


In [0]:
# --- 3. PIPELINE E TREINAMENTO FINAL ---

# Inicializa Pipeline Unificado
pipeline = ProjectPipeline()

print("‚öôÔ∏è Fit & Transform dos Scalers...")
# Ajusta scalers nos dados filtrados (2021-2025)
pipeline.fit(target_series_list, full_covariates_list)
scaled_series, scaled_covariates = pipeline.transform(target_series_list, full_covariates_list)

# Inicializa o Modelo Vencedor (Exemplo: LightGBM com params otimizados)
print("?? Iniciando Treinamento do Modelo (LightGBM)...")
model = LightGBMModel(
    lags=12, 
    lags_future_covariates=[0,1,2,3],
    output_chunk_length=1,
    random_state=42
)

# Treinamento Full
model.fit(
    scaled_series,
    future_covariates=scaled_covariates
)
print("‚úÖ Modelo Treinado com Sucesso!")

In [0]:
config.EXPERIMENT_NAME= "/Workspace/Shared/data_science/projetos/cvc_curva_de_vendas_por_canal/experiments/Model_Deploy_CVC_Loja"

In [0]:
from mlflow.models import infer_signature
import pandas as pd
import pickle
import os

# --- 4. REGISTRO NO UNITY CATALOG (COM METADADOS DE ORDEM) ---
mlflow.set_experiment(config.EXPERIMENT_NAME)
catalog_model_name = f"{config.CATALOG}.{config.SCHEMA}.cvc_lojas_forecast_production"

print(f"üöÄ Iniciando registro do modelo: {catalog_model_name}")

with mlflow.start_run(run_name=f"Deploy_Production_{config.VERSION}") as run:
    # ---------------------------------------------------------
    # 1. CAPTURAR METADADOS (ORDEM DAS COLUNAS)
    # ---------------------------------------------------------
    sample_ts = target_series_list[0]
    sample_cov = full_covariates_list[0]

    training_metadata = {
        # FILTRO IMPORTANTE: Removemos 'CODIGO_LOJA' pois ele √© chave de grupo, n√£o feature est√°tica de entrada
        "static_cols_order": [c for c in sample_ts.static_covariates.columns.tolist() if c != "CODIGO_LOJA"],
        
        "covariate_cols_order": sample_cov.components.tolist()
    }
    print(f"üîí Ordem Est√°tica Travada: {training_metadata['static_cols_order']}")
    print(f"üîí Ordem Covari√°veis Travada: {training_metadata['covariate_cols_order']}")

    # ---------------------------------------------------------
    # 2. SALVAR ARTEFATOS
    # ---------------------------------------------------------
    pipeline_path = "pipeline.pkl"
    model_path = "lgbm_model.pkl"
    covariates_path = "future_covariates.pkl"
    metadata_path = "model_metadata.pkl"
    
    with open(pipeline_path, "wb") as f: pickle.dump(pipeline, f)
    with open(model_path, "wb") as f: pickle.dump(model, f)
    with open(covariates_path, "wb") as f: pickle.dump(scaled_covariates, f)
    with open(metadata_path, "wb") as f: pickle.dump(training_metadata, f)
    artifacts = {
        "pipeline": pipeline_path,
        "darts_model": model_path,
        "future_covariates": covariates_path,
        "metadata": metadata_path # <--- O Wrapper vai procurar isso!
    }
    # ---------------------------------------------------------
    # 3. CRIA√á√ÉO DIN√ÇMICA DA ASSINATURA (SIGNATURE)
    # ---------------------------------------------------------
    base_example = {
        "DATA": ["2025-01-01"],
        "CODIGO_LOJA": ["1"],
        "TARGET_VENDAS": [1000.0],
        "IS_FERIADO": [0.0],
        "CLUSTER_LOJA": ["A"],
        "SIGLA_UF": ["SP"],
        "TIPO_LOJA": ["SHOPPING"],
        "MODELO_LOJA": ["PADRAO"],
        "n": [35]
    }
    # Adiciona colunas de mercado automaticamente se existirem no suporte global
    market_cols = [col for col in df_global_support.columns if col not in base_example]
    market_example = {col: [0.0] for col in market_cols}
    
    full_input_dict = {**base_example, **market_example}
    input_example = pd.DataFrame(full_input_dict)
    
    output_example = pd.DataFrame({
        "DATA_PREVISAO": ["2025-01-02"], 
        "PREVISAO_VENDA": [1050.0], 
        "CODIGO_LOJA": ["1"]
    })
    signature = infer_signature(input_example, output_example)

    # ---------------------------------------------------------
    # 4. LOG DO MODELO
    # ---------------------------------------------------------
    model_info = mlflow.pyfunc.log_model(
        artifact_path="model",
        python_model=UnifiedForecaster(), 
        artifacts=artifacts,
        input_example=input_example,
        signature=signature,
        registered_model_name=catalog_model_name
    )
    # --- ATRIBUI√á√ÉO DO ALIAS CHAMPION ---
    client = MlflowClient()
    model_version = model_info.registered_model_version
    
    print(f"üèÖ Atribuindo alias 'Champion' √† vers√£o {model_version}...")
    client.set_registered_model_alias(
        name=catalog_model_name,
        alias="Champion",
        version=model_version
    )
    print(f"‚ú® Sucesso! Modelo registrado e promovido a Champion: {catalog_model_name} (v{model_version})")
# Limpeza
for p in [pipeline_path, model_path, covariates_path, metadata_path]:
    if os.path.exists(p): os.remove(p)