The goal of this project is to build a reliable time-series regression model capable of forecasting daily sales for each store and product family within the Corporación Favorita dataset.

Precise sales forecasting plays a vital role in retail decision-making, supporting effective inventory management, promotional planning, and overall supply chain efficiency. This problem is treated as a supervised learning task, where historical sales data along with relevant external variables are leveraged to estimate future demand patterns.

The model’s performance is assessed using Root Mean Squared Logarithmic Error (RMSLE), a metric that emphasizes relative differences in predictions while reducing the impact of scale variations across different stores and product categories.

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.metrics import mean_squared_log_error
from lightgbm import LGBMRegressor


In [2]:
train_df = pd.read_csv("/kaggle/input/store-sales-time-series-forecasting/train.csv")
test_df = pd.read_csv("/kaggle/input/store-sales-time-series-forecasting/test.csv")
sample_df = pd.read_csv("/kaggle/input/store-sales-time-series-forecasting/sample_submission.csv")

stores_df = pd.read_csv("/kaggle/input/store-sales-time-series-forecasting/stores.csv")
oil_df = pd.read_csv("/kaggle/input/store-sales-time-series-forecasting/oil.csv")
holidays_df = pd.read_csv("/kaggle/input/store-sales-time-series-forecasting/holidays_events.csv")
transactions_df = pd.read_csv("/kaggle/input/store-sales-time-series-forecasting/transactions.csv")

In [3]:
train_df.head()

Unnamed: 0,id,date,store_nbr,family,sales,onpromotion
0,0,2013-01-01,1,AUTOMOTIVE,0.0,0
1,1,2013-01-01,1,BABY CARE,0.0,0
2,2,2013-01-01,1,BEAUTY,0.0,0
3,3,2013-01-01,1,BEVERAGES,0.0,0
4,4,2013-01-01,1,BOOKS,0.0,0


In [4]:
train_df.describe()

Unnamed: 0,id,store_nbr,sales,onpromotion
count,3000888.0,3000888.0,3000888.0,3000888.0
mean,1500444.0,27.5,357.7757,2.60277
std,866281.9,15.58579,1101.998,12.21888
min,0.0,1.0,0.0,0.0
25%,750221.8,14.0,0.0,0.0
50%,1500444.0,27.5,11.0,0.0
75%,2250665.0,41.0,195.8473,0.0
max,3000887.0,54.0,124717.0,741.0
