# Notebook 03: Final Model Inference & 2024 Forecast
## Objective
This notebook's purpose is to generate the final, production-ready demand forecast for 2024.

## Methodology
Based on the critical findings from Notebook 02, we concluded:

1. The complex "Hybrid Model" (GBR + Risk) was financially unviable.

2. The "GBR Base Model" was the clear winner, delivering +$610M in net benefit over the client's current model.

Therefore, our inference pipeline is now simple and robust:

1. Load the full historical dataset (2012-2023).

2. Load the validated hyperparameters from our gbr_model.joblib artifact (from Notebook 01).

3. Re-train the GBR model on the entire dataset to maximize its accuracy.

4. Generate the 2024 forecast.

5. Save this forecast as the final, processed output.

## 0. Setup and Imports

In [7]:
import pandas as pd
import numpy as np
import joblib
from sklearn.ensemble import GradientBoostingRegressor

print("Libraries imported.")

Libraries imported.


## 1. Define Feature Engineering Function
This function must be identical to the one used in 01-model_core.ipynb to ensure consistency between training and inference.

In [8]:
def create_features(data):
    """
    Creates time-series features from the input dataframe.
    
    Args:
        data (pd.DataFrame): DataFrame with 'fecha', 'prod_id', 'ventas', 'precio_promedio'.
        
    Returns:
        pd.DataFrame: DataFrame with new features.
    """
    df_feat = data.sort_values(['prod_id', 'fecha']).copy()
    
    # --- Calendar Features ---
    df_feat['year'] = df_feat['fecha'].dt.year
    df_feat['month'] = df_feat['fecha'].dt.month
    
    # --- Lag Features (Annual) ---
    df_feat['lag_12'] = df_feat.groupby('prod_id')['ventas'].shift(12)
    df_feat['lag_13'] = df_feat.groupby('prod_id')['ventas'].shift(13)
    df_feat['lag_24'] = df_feat.groupby('prod_id')['ventas'].shift(24)
    
    # --- Rolling Window Features ---
    df_feat['rolling_mean_3_lag12'] = df_feat.groupby('prod_id')['lag_12'].transform(lambda x: x.rolling(3).mean())
    
    # --- Price Features ---
    df_feat['precio_lag_12'] = df_feat.groupby('prod_id')['precio_promedio'].shift(12)
    
    return df_feat

## 2. Load Data and Model Parameters
We load the full history to train on, and the parameters from the model we validated in Notebook 01.

In [9]:
# --- 1. Load ALL historical data ---
RAW_DATA_PATH = '../data/raw/demanding_forecast.csv'
df_history = pd.read_csv(RAW_DATA_PATH)
df_history['fecha'] = pd.to_datetime(df_history['fecha'])
print(f"Full historical data loaded: {len(df_history)} rows.")

# --- 2. Load validated model parameters ---
MODEL_PATH = '../models/gbr_model.joblib'
gbr_validated = joblib.load(MODEL_PATH)
model_params = gbr_validated.get_params()

print(f"Model parameters loaded from: {MODEL_PATH}")
print(f"Using parameters: LR={model_params['learning_rate']}, N_Estimators={model_params['n_estimators']}, Max_Depth={model_params['max_depth']}")

Full historical data loaded: 80748 rows.
Model parameters loaded from: ../models/gbr_model.joblib
Using parameters: LR=0.05, N_Estimators=500, Max_Depth=7


## 3. Re-train the Final Model
This is the key production step. We re-train a new GBR model using our validated parameters on 100% of the historical data (2012-2023). This ensures the 2024 forecast is as accurate as possible.

In [10]:
# --- 1. Create features on the full historical dataset ---
df_model = create_features(df_history)
df_model = df_model.dropna()

# --- 2. Define the FINAL training set (all data <= 2023) ---
TARGET_VARIABLE = 'ventas'
FEATURES = [
    'month', 'year', 'prod_id', 
    'lag_12', 'lag_13', 'lag_24', 
    'rolling_mean_3_lag12', 'precio_lag_12'
]

train_full = df_model[df_model['year'] <= 2023]
print(f"Re-training model on {len(train_full)} rows (full 2012-2023 history).")

# --- 3. Initialize and train the final model ---
gbr_final = GradientBoostingRegressor(**model_params)

# IMPORTANT: Disable early stopping for the final fit
# We set n_iter_no_change=None so it trains on the full n_estimators
gbr_final.set_params(n_iter_no_change=None)

# Fit the final model
gbr_final.fit(train_full[FEATURES], train_full[TARGET_VARIABLE])

print("Final model re-training complete.")

Re-training model on 57420 rows (full 2012-2023 history).
Final model re-training complete.


## 4. Create 2024 Forecast Scaffold
We now build the "scaffold" (the empty rows for 2024), join it to the history to create the necessary lag features, and then predict.

In [11]:
# --- 1. Create future dataframe for 2024 ---
future_dates = pd.date_range(start='2024-01-01', end='2024-12-01', freq='MS')
prod_ids = df_history['prod_id'].unique()

df_future_rows = []
for pid in prod_ids:
    for date in future_dates:
        df_future_rows.append({
            'fecha': date, 'prod_id': pid,
            'ventas': np.nan, 'precio_promedio': np.nan
        })

df_future = pd.DataFrame(df_future_rows)

# --- 2. Concatenate history + future to create features ---
df_full_history = pd.concat([df_history, df_future], axis=0) 
df_full_features = create_features(df_full_history) 

# --- 3. Filter to get just the 2024 rows (now with features) ---
df_2024 = df_full_features[df_full_features['fecha'].dt.year == 2024].copy()

# --- 4. Predict ---
# Fill NaNs in features (e.g., for products with <2 years history) just in case
df_2024[FEATURES] = df_2024[FEATURES].fillna(0)
df_2024['prediccion_ventas'] = gbr_final.predict(df_2024[FEATURES])

print("2024 forecast generated.")

2024 forecast generated.


## 5. Save Processed Output
This is the final forecast file, which will be used in the reporting notebook. In a production environment, this file would be loaded into the client's ERP or planning system.

In [12]:
# --- 5. Save Processed Data ---
PROCESSED_DATA_PATH = '../data/processed/demand_forecasts_2024.csv'
output_cols = ['fecha', 'prod_id', 'prediccion_ventas']

df_2024[output_cols].to_csv(PROCESSED_DATA_PATH, index=False)

print(f"2024 forecast saved to: {PROCESSED_DATA_PATH}")
print(df_2024[output_cols].head())

2024 forecast saved to: ../data/processed/demand_forecasts_2024.csv
       fecha  prod_id  prediccion_ventas
0 2024-01-01        0        2059.984012
1 2024-02-01        0        1472.463920
2 2024-03-01        0        1106.306216
3 2024-04-01        0         880.477605
4 2024-05-01        0         827.766047


## 6. Project Conclusion (Next Steps)

This notebook completes the core inference pipeline.

**Business Action Plan**:

- The file predicciones_demanda_2024.csv represents the final recommended inventory order quantity (after rounding).

- We do NOT add a statistical safety stock. Our research in Notebook 02 proved this was financially unviable due to extreme data outliers.

- The $610M in value comes from replacing the client's model_actual with this more precise GBR forecast.

Next Notebook: 04-generate_final_report.ipynb, where we will load this forecast and the historical data to create the final visualizations for the business.