AutoGluon - Predicción de ventas (tn) por producto para febrero 2020

In [2]:
# 📦 1. Importar librerías
import pandas as pd

In [1]:
# 💬 Instalar AutoGluon si es necesario
%pip install autogluon.timeseries

from autogluon.timeseries import TimeSeriesPredictor, TimeSeriesDataFrame

Collecting autogluon.timeseriesNote: you may need to restart the kernel to use updated packages.

  Downloading autogluon.timeseries-1.4.0-py3-none-any.whl.metadata (12 kB)
Collecting lightning<2.8,>=2.2 (from autogluon.timeseries)
  Downloading lightning-2.5.2-py3-none-any.whl.metadata (38 kB)
Collecting pytorch-lightning (from autogluon.timeseries)
  Downloading pytorch_lightning-2.5.2-py3-none-any.whl.metadata (21 kB)
Collecting accelerate<2.0,>=0.34.0 (from autogluon.timeseries)
  Downloading accelerate-1.9.0-py3-none-any.whl.metadata (19 kB)
Collecting gluonts<0.17,>=0.15.0 (from autogluon.timeseries)
  Downloading gluonts-0.16.2-py3-none-any.whl.metadata (9.8 kB)
Collecting mlforecast<0.15.0,>=0.14.0 (from autogluon.timeseries)
  Downloading mlforecast-0.14.0-py3-none-any.whl.metadata (12 kB)
Collecting utilsforecast<0.2.12,>=0.2.3 (from autogluon.timeseries)
  Downloading utilsforecast-0.2.11-py3-none-any.whl.metadata (7.7 kB)
Collecting orjson~=3.9 (from autogluon.timeseries)
 

  You can safely remove it manually.
  You can safely remove it manually.


In [4]:
# 📄 2. Cargar datasets
df_sellin = pd.read_csv("datasets/sell-in.txt", sep="\t", dtype={"periodo": str})
df_productos = pd.read_csv("datasets/tb_productos.txt", sep="\t")

In [6]:
# 📄 Leer lista de productos a predecir
with open("datasets/product_id_apredecir201912.TXT", "r") as f:
    product_ids = [int(line.strip()) for line in f if line.strip().isdigit()]

In [7]:
# 🧹 3. Preprocesamiento
# Convertir periodo a datetime
df_sellin['timestamp'] = pd.to_datetime(df_sellin['periodo'], format='%Y%m')

In [8]:
# Filtrar hasta dic 2019 y productos requeridos
df_filtered = df_sellin[
    (df_sellin['timestamp'] <= '2019-12-01') &
    (df_sellin['product_id'].isin(product_ids))
]

In [9]:
# Agregar tn por periodo, cliente y producto
df_grouped = df_filtered.groupby(['timestamp', 'customer_id', 'product_id'], as_index=False)['tn'].sum()

In [10]:
# Agregar tn total por periodo y producto
df_monthly_product = df_grouped.groupby(['timestamp', 'product_id'], as_index=False)['tn'].sum()

In [11]:
# Agregar columna 'item_id' para AutoGluon
df_monthly_product['item_id'] = df_monthly_product['product_id']

In [12]:
# ⏰ 4. Crear TimeSeriesDataFrame
ts_data = TimeSeriesDataFrame.from_data_frame(
    df_monthly_product,
    id_column='item_id',
    timestamp_column='timestamp'
)

In [13]:
# Completar valores faltantes
ts_data = ts_data.fill_missing_values()

In [14]:
# ⚙️ 5. Definir y entrenar predictor
predictor = TimeSeriesPredictor(
    prediction_length=2,
    target='tn',
    freq='MS'  # Frecuencia mensual (Month Start), 
)

predictor.fit(ts_data, num_val_windows=2, time_limit=60*60)

Beginning AutoGluon training... Time limit = 3600s
AutoGluon will save models to 'c:\22-Labo3\AutogluonModels\ag-20250805_191751'
AutoGluon Version:  1.4.0
Python Version:     3.12.4
Operating System:   Windows
Platform Machine:   AMD64
Platform Version:   10.0.26100
CPU Count:          20
GPU Count:          1
Memory Avail:       3.40 GB / 15.64 GB (21.8%)
Disk Space Avail:   545.58 GB / 926.44 GB (58.9%)

Fitting with arguments:
{'enable_ensemble': True,
 'eval_metric': WQL,
 'freq': 'MS',
 'hyperparameters': 'default',
 'known_covariates_names': [],
 'num_val_windows': 2,
 'prediction_length': 2,
 'quantile_levels': [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9],
 'random_seed': 123,
 'refit_every_n_windows': 1,
 'refit_full': False,
 'skip_model_selection': False,
 'target': 'tn',
 'time_limit': 3600,
 'verbosity': 2}

train_data with frequency 'IRREG' has been resampled to frequency 'MS'.
Provided train_data has 22375 rows (NaN fraction=0.1%), 780 time series. Median time series le

config.json: 0.00B [00:00, ?B/s]

model.safetensors:   0%|          | 0.00/821M [00:00<?, ?B/s]

	-0.1905       = Validation score (-WQL)
	20.27   s     = Training runtime
	1.86    s     = Validation (prediction) runtime
Training timeseries model ChronosFineTuned[bolt_small]. Training for up to 578.0s of the 3468.1s of remaining time.
	Skipping covariate_regressor since the dataset contains no known_covariates or static_features.


config.json: 0.00B [00:00, ?B/s]

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to see activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


model.safetensors:   0%|          | 0.00/191M [00:00<?, ?B/s]

	Saving fine-tuned model to c:\22-Labo3\AutogluonModels\ag-20250805_191751\models\ChronosFineTuned[bolt_small]\W0\fine-tuned-ckpt
	Skipping covariate_regressor since the dataset contains no known_covariates or static_features.
	Saving fine-tuned model to c:\22-Labo3\AutogluonModels\ag-20250805_191751\models\ChronosFineTuned[bolt_small]\W1\fine-tuned-ckpt
	-0.1823       = Validation score (-WQL)
	119.03  s     = Training runtime
	0.05    s     = Validation (prediction) runtime
Training timeseries model TemporalFusionTransformer. Training for up to 687.3s of the 3349.0s of remaining time.
	-0.1904       = Validation score (-WQL)
	159.25  s     = Training runtime
	0.21    s     = Validation (prediction) runtime
Training timeseries model DeepAR. Training for up to 863.2s of the 3189.5s of remaining time.
	-0.2039       = Validation score (-WQL)
	85.29   s     = Training runtime
	0.52    s     = Validation (prediction) runtime
Training timeseries model PatchTST. Training for up to 1251.9s o

<autogluon.timeseries.predictor.TimeSeriesPredictor at 0x1e3e7db79b0>

In [16]:
# 🔮 6. Generar predicción
forecast = predictor.predict(ts_data)

data with frequency 'IRREG' has been resampled to frequency 'MS'.
Model not specified in predict, will default to the model with the best validation score: WeightedEnsemble


In [17]:
# Extraer predicción media y filtrar febrero 2020
forecast_mean = forecast['mean'].reset_index()
print(forecast_mean.columns)

Index(['item_id', 'timestamp', 'mean'], dtype='object')


In [18]:
# Tomar solo item_id y la predicción 'mean'
resultado = forecast['mean'].reset_index()[['item_id', 'mean']]
resultado.columns = ['product_id', 'tn']

# Filtrar solo febrero 2020
resultado = forecast['mean'].reset_index()
resultado = resultado[resultado['timestamp'] == '2020-02-01']

# Renombrar columnas
resultado = resultado[['item_id', 'mean']]
resultado.columns = ['product_id', 'tn']


In [20]:
# 💾 7. Guardar archivo
resultado.to_csv("data/pred_autogluon_01.csv", index=False)
resultado.head()

Unnamed: 0,product_id,tn
1,20001,1322.782727
3,20002,1076.158858
5,20003,696.17287
7,20004,519.217793
9,20005,501.01027
