---

This notebook contains the Feature Engineering for the first module of the **"Intelligent System for Supply Chain Management"** project. 

---

Import Libraries

In [1]:
import pandas as pd
import numpy as np
import os
import json
import plotly.express as px
import plotly.graph_objects as go
import plotly.io as pio
from plotly.subplots import make_subplots

# import functions personalized
from smart_supply_chain_ai.utils.time_series_functions import TimeSeriesFeatureGenerator

# Configure display and graphs
pd.set_option('display.max_columns', None)
pio.templates.default = "plotly_white"

import warnings
warnings.filterwarnings('ignore')

Load Data

In [2]:
# Define data paths
data_path = os.path.join('../data', 'processed')
docs_path = os.path.join('../docs/')

# Load Pickle file
read_data = pd.read_pickle(data_path + '/grocery.pkl', )

# Load column descriptions from JSON file into a dictionary for reference or documentation
with open(docs_path + 'column_descriptions.json') as f:
    column_descriptions = json.load(f)

In [3]:
# Create a copy of the original DataFrame to preserve the raw data
df = read_data.copy()

In [4]:
# View the firsts lines of data
df.head()

Unnamed: 0,received_date,lpo,in_season,product,product_id,category,sub_category,shelf_life_days,maximum_days_on_sale,unit_of_measurement,supplier_rating,supplier,supplier_id,distance_km,moq,storage_recommendation,temperature_classification,precipitation_classification,wind_classification,weather_severity,day_classification,is_holiday,is_weekend,sales_demand,sales_volume,lead_time,min_stock,max_stock,stock_quantity,delivery_lag,expiration_status,inventory_turnover_rate,doi_inventory_turnover
0,2022-12-09,2022-12-07,False,Dijon Mustard,1070686|P,Pantry,Condiments,365,90,unit,3,Condiment Masters,1184993|S,75,60,Room Temperature,Warm,No precipitation,Gentle to Fresh Breeze,Moderate,Weekdays,False,False,Normal,50,4,240,300,317,2,Safe,0.839816,1212
1,2022-12-09,2022-12-07,True,Orange,1362741|P,Fresh Foods,Fruits,14,7,lb,3,OrchardBest Fruits,1677419|S,200,120,Room Temperature,Warm,No precipitation,Gentle to Fresh Breeze,Moderate,Weekdays,False,False,Normal,590,6,3678,4291,4608,2,Safe,0.378488,2689
2,2022-12-09,2022-12-06,False,Asparagus,1308864|P,Fresh Foods,Vegetables,5,2,lb,5,Asparagus Experts,1197859|S,115,18,Refrigerated,Warm,No precipitation,Gentle to Fresh Breeze,Moderate,Weekdays,False,False,Normal,532,4,4652,5815,6765,3,Nearing,0.710163,1433
3,2022-12-09,2022-11-30,False,Cashews,1393487|P,Pantry,Nuts & Seeds,180,60,lb,5,Nut & Seed Co.,1535281|S,125,55,Room Temperature,Warm,No precipitation,Gentle to Fresh Breeze,Moderate,Weekdays,False,False,Normal,15,4,72,90,80,9,Safe,0.725117,1403
4,2022-12-10,2022-12-04,False,Frozen Shrimp,1345059|P,Frozen Foods,Seafood,180,60,lb,5,FrozenFoods Express,1579962|S,65,90,Frozen,Warm,No precipitation,Gentle to Fresh Breeze,Moderate,Saturday,False,True,High,14,5,40,90,52,6,Safe,0.693274,1468


In [5]:
# Calculate the difference between expected and actual delivery time; positive means early, negative means delayed
df['delivery_time_variation'] = df['lead_time'] - df['delivery_lag']

# Add a description for the 'delivery_time_variation' column to clarify its meaning and interpretation
column_descriptions.update({
    'delivery_time_variation': 'Number of days between the expected delivery date and the actual delivery date. A positive value indicates early delivery, while a negative value indicates a delay.'
})


In [6]:
# Set 'received_date' column as the DataFrame index to enable time-based operations
df.set_index('received_date', inplace=True)


In [7]:
# Take information of Date
df['Year'] = df.index.get_level_values('received_date').year
df['Month'] = df.index.get_level_values('received_date').month
df['Day'] = df.index.get_level_values('received_date').day
df['DayOfYear'] = df.index.get_level_values('received_date').dayofyear
df['Weekday'] = df.index.get_level_values('received_date').weekday
df['QuarterOfYear'] = df.index.get_level_values('received_date').quarter
df['WeekOfYear'] = df.index.get_level_values('received_date').isocalendar().week.values

In [8]:
# Remove multi index
df = df.reset_index()

In [9]:
def calculate_std_demand_lead_time(df, product_col='product', sales_col='sales_volume', lead_col='delivery_lag'):
    """
    Calculate the standard deviation of demand during lead time for each product.

    Parameters:
    - df (DataFrame): Input DataFrame containing sales and delivery data.
    - product_col (str): Column name representing the product identifier.
    - sales_col (str): Column name representing daily sales volume.
    - lead_col (str): Column name representing delivery lead time (in days).

    Returns:
    - DataFrame: Original DataFrame with additional columns:
        - 'avg_daily_sales': Average daily sales per product.
        - 'std_daily_sales': Standard deviation of daily sales per product.
        - 'avg_daily_lag': Average delivery lead time per product.
        - 'std_daily_lag': Standard deviation of delivery lead time per product.
        - 'std_demand_lead_time': Standard deviation of demand during lead time per product.
    """

    # Calculate the average daily sales for each product
    df['avg_daily_sales'] = df.groupby(product_col)[sales_col].transform('mean')

    # Calculate the standard deviation of daily sales for each product
    df['std_daily_sales'] = df.groupby(product_col)[sales_col].transform('std')

    # Calculate the average delivery lead time for each product
    df['avg_daily_lag'] = df.groupby(product_col)[lead_col].transform('mean')

    # Calculate the standard deviation of delivery lead time for each product
    df['std_daily_lag'] = df.groupby(product_col)[lead_col].transform('std')

    # Compute the standard deviation of demand during lead time using the formula:
    # sqrt(LeadTimeAvg * SalesStd^2 + SalesAvg^2 * LeadTimeStd^2)
    df['std_demand_lead_time'] = np.sqrt(
        (df['avg_daily_lag'] * (df['std_daily_sales'] ** 2)) +
        ((df['avg_daily_sales'] ** 2) * (df['std_daily_lag'] ** 2))
    )

    return df


In [10]:
# Apply the function to compute demand variability during lead time and update the DataFrame with new metrics
df = calculate_std_demand_lead_time(df=df)



---

### **Service Level Factor (Z) Based on the Standard Normal Distribution**

The Service Level Factor (Z) is obtained from a standard normal distribution table.

- **80% Service Level**: Z = 0.84  
- **85% Service Level**: Z = 1.04  
- **90% Service Level**: Z = 1.28  
- **95% Service Level**: Z = 1.64  
- **98% Service Level**: Z = 2.05  
- **99% Service Level**: Z = 2.33

---


In [11]:
# Set the Z factor for a 95% service level
Z = 1.64

# Calculate the Safety Stock
df['safety_stock'] = (Z * df['std_demand_lead_time'])

# Add description for the 'safety_stock' column to clarify its role in inventory management
column_descriptions.update({'safety_stock': 'Extra inventory buffer maintained to protect against uncertainties in demand and delivery lead time.'})



In [12]:
df.select_dtypes(np.number)[:2]

Unnamed: 0,shelf_life_days,maximum_days_on_sale,moq,sales_volume,lead_time,min_stock,max_stock,stock_quantity,delivery_lag,inventory_turnover_rate,doi_inventory_turnover,delivery_time_variation,Year,Month,Day,DayOfYear,Weekday,QuarterOfYear,WeekOfYear,avg_daily_sales,std_daily_sales,avg_daily_lag,std_daily_lag,std_demand_lead_time,safety_stock
0,365,90,60,50,4,240,300,317,2,0.839816,1212,2,2022,12,9,343,4,4,49,60.066667,27.248766,5.4,3.042555,193.414805,317.20028
1,14,7,120,590,6,3678,4291,4608,2,0.378488,2689,4,2022,12,9,343,4,4,49,613.888889,360.955484,5.555556,2.74368,1886.992869,3094.668305


In [13]:
# Calculates the Reorder Point (ROP) for each product.
df['rop'] = (df['avg_daily_sales'] * df['delivery_lag']) + df['safety_stock']

# Define description for 'rop' (Reorder Point) to indicate when replenishment should be triggered
column_descriptions.update({'rop': 'ROP is the inventory level at which a new order should be placed to avoid stock outs.'})


In [14]:
# Create Variable Reorder Point / Sales Volume
df['reorder_point_coverage'] = df['rop'] / df['avg_daily_sales']

# Add description for 'reorder_point_coverage' to indicate how many days of demand the ROP supports
column_descriptions.update({'reorder_point_coverage': 'Number of days of demand that the reorder point (ROP) can cover'})


In [15]:
# Calculates the reorder quantity as the average sales per product multiplied by the lead time
df['reorder_quantity'] = (df.groupby('product')['sales_volume'].transform(np.mean) * df['lead_time']).astype(int)

# Add description for 'reorder_quantity' to define the optimal quantity to order when replenishing stock
column_descriptions.update({'reorder_quantity': 'Optimal number of units to order once inventory reaches the reorder point, based on expected demand during lead time.'})


In [16]:
# Create Variable Reorder Level, Reorder Quantity and Inventory Turnover Rate
df['reorder_point_quantity_turnover'] = (df['rop'] - df['reorder_quantity']) / df['inventory_turnover_rate']

# Add description for 'reorder_point_quantity_turnover' to explain how quickly the buffer stock is depleted
column_descriptions.update({'reorder_point_quantity_turnover': 'Rate at which the buffer between the reorder level and reorder quantity is consumed'})


In [17]:
# Transform string distance_km in integer for include in train models
df['distance_km'] = df['distance_km'].astype(int)

# Transform Objects, Booleans and Category columns
# Objects
df['product_code'] = df['product'].astype('category').cat.codes
df['supplier_code'] = df['supplier'].astype('category').cat.codes

# Categories
df['category_code'] = df['category'].cat.codes
df['sub_category_code'] = df['sub_category'].cat.codes
df['supplier_rating_code'] = df['supplier_rating'].cat.codes
df['weather_severity_code'] = df['weather_severity'].cat.codes
df['day_classification_code'] = df['day_classification'].cat.codes
df['sales_demand_code'] = df['sales_demand'].cat.codes
df['expiration_status_code'] = df['expiration_status'].cat.codes

# Booleans
df['in_season_code'] = df['in_season'].astype('category').cat.codes
df['is_holiday_code'] = df['is_holiday'].astype('category').cat.codes
df['is_weekend_code'] = df['is_weekend'].astype('category').cat.codes

# Save Complete dataframe with Feature Engineering

In [18]:
# Save the processed DataFrame as a pickle file for efficient loading in future steps
df.to_pickle(data_path + '/feature_eng_complete.pkl')

# Organize columns descriptions in alphabetical order names
column_descriptions_feat_eng = dict(sorted(column_descriptions.items()))

# save Dictionary JSON archive
with open(docs_path + 'column_descriptions_feat_eng.json', 'w') as f:
    json.dump(column_descriptions_feat_eng, f, indent=4)

In [19]:
del df, read_data

# Load Completed Dataframe

In [None]:
# Define data paths
data_path = os.path.join('../data/', 'processed/')
docs_path = os.path.join('../docs/')

# Load Pickle file
df_load = pd.read_pickle(data_path + 'feature_eng_complete.pkl')

# Load column descriptions from JSON file into a dictionary for reference or documentation
with open(docs_path + 'column_descriptions_feat_eng.json') as f:
    column_descriptions = json.load(f)

In [None]:
# Select only datetime and numeric columns from the loaded DataFrame for time series analysis
df_ts = df_load.select_dtypes(['datetime', np.number])

In [None]:
# Drop non-essential or redundant columns from the time series DataFrame to streamline analysis
columns_dropped = ['max_stock', 'avg_daily_sales', 'std_daily_sales', 'avg_daily_lag', 'std_daily_lag', 'Year','min_stock', 'Month', 'lpo', 'DayOfYear', 'Weekday', 'WeekOfYear', 'avg_daily_sales',
       'std_daily_sales', 'avg_daily_lag', 'std_daily_lag', 'std_demand_lead_time', 'reorder_quantity', 'delivery_lag', 'safety_stock', 'maximum_days_on_sale', 'reorder_point_quantity_turnover',
       'reorder_point_coverage',  'doi_inventory_turnover', 'is_weekend_code']

df_ts.drop(columns=columns_dropped, inplace=True)


In [None]:
# Calculate the correlation matrix for all numeric columns
correlation_matrix = df_ts.corr()

In [None]:
corr1 = correlation_matrix[correlation_matrix >= 0.7].replace(1.0, np.nan).dropna(how='all', axis=1)
corr1.dropna(how='all').replace(np.nan, '')

In [None]:
# Hide the top half of the matrix to avoid repeating values
mask = np.tril(np.ones(correlation_matrix.shape), k=-1)
masked_corr = correlation_matrix.where(mask == 1)

# Create a heatmap visualization of the correlation matrix
fig_corr = px.imshow(masked_corr,
                    title='Correlation Matrix - Numeric Variables',
                    color_continuous_scale='RdBu_r',  # Red-Blue reversed color scale
                    aspect="auto",                   # Automatic aspect ratio
                    text_auto=True,                  # Display correlation values on cells
                    zmin=-1, zmax=1)                 # Fix color scale from -1 to +1

# Adjust the figure dimensions
fig_corr.update_layout(width=1200, height=1200)

# Display the interactive heatmap
fig_corr.show()

In [None]:
times_gen = TimeSeriesFeatureGenerator(value_column='sales_volume', date_column='received_date', lags=[7, 14, 21], min_periods=1, replace_NaN=0)

In [None]:
df_a = times_gen.create_grouped_windows(df_ts)

In [None]:
keep_cols = ['received_date', 'sales_volume']
df_a = df_a.drop(columns=[col for col in df_a.columns if col not in keep_cols])


In [None]:
df_a

In [None]:
df_a = df_a.groupby('received_date').sum()

In [None]:
df = times_gen.create_grouped_windows(df_a).reset_index(drop=False)
df

In [None]:
# Supondo que você já tenha o DataFrame chamado df
# df['received_date'] = pd.to_datetime(df['received_date'])

# Seleciona as colunas que queremos visualizar
fig = px.line(
    df,
    x='received_date',
    y=['sales_volume', 'sale_volume_mean_7d'],
    labels={'value': 'Volume de Vendas', 'variable': 'Tipo'},
    title='Vendas Diárias e Médias Móveis'
)

fig.update_layout()
fig.show()

preciso agrpar por data para ver as medias totais ou  
agrupar por produto para ver por produto

In [None]:
df.resample('W').agg(
    {
        'Stock_Quantity': 'sum',
        'Sales_Volume': 'sum'    
    })

In [None]:
# Create a full Date Index
full_index = pd.date_range(start=df.index.min(), end=df.index.max(), freq='D')

In [None]:
# Use Pivot for work with Product
df_pivoted_stock = pd.pivot_table(df, index=df.index, columns=['Product_Name'], values=['Stock_Quantity'], aggfunc='sum')
df_pivoted_sales = pd.pivot_table(df, index=df.index, columns=['Product_Name'], values=['Sales_Volume'], aggfunc='sum')

# **ORGANIZAÇÃO**

> df pivotados, janelas flutuantes - preciso organizar e entender, estçao sendo criados muitas linhas   

> ver como fazer gráficos das janelas e dados   

> no gemini janelamento com series temporais, mostra como fazer a junção dos data framens  

In [None]:
df_pivoted_stock

In [None]:
# janelas deslizantes 
df_pivoted_stock.rolling(window=7, min_periods=1).mean()

In [None]:
# Windows Rollings Moving Average 7 and 21 days

# For Stock Quantity
StockMediaM_7D = df_pivoted_stock.rolling(window=7, min_periods=1).mean()
StockMediaM_21D = df_pivoted_stock.rolling(window=21, min_periods=1).mean()

# For Sales Volume
SalesMediaM_7D = df_pivoted_sales.rolling(window=7, min_periods=1).mean()
SalesMediaM_21D = df_pivoted_sales.rolling(window=21, min_periods=1).mean()

In [None]:
StockMediaM_7D.stack(level=1).reset_index()

In [None]:
df_pivoted_stock.stack(level=1).reset_index()

In [None]:
fig_mm = px.line(df_pivoted_stock, x=df_pivoted_stock.index, y=df_pivoted_stock.stack(level=1))

fig_mm.show()

In [None]:
# Windowing Stock Quantity by Week
df_pivoted_stock_W = df_pivoted_stock.resample('W').agg('mean').copy()
# # Windowing Stock Quantity by 14 days
# df_pivoted_stock_14D = df_pivoted_stock.resample('14D').agg('mean').copy()
# # Windowing Stock Quantity by 21 days
# df_pivoted_stock_21D = df_pivoted_stock.resample('21D').agg('mean').copy()

In [None]:
df_pivoted_stock_W.shape

In [None]:
# Un-Pivot Stock Quantity
df_pivoted_stock_W = df_pivoted_stock_W.stack(level='Product_Name').reset_index()
df_pivoted_stock_W.rename(columns={'Stock_Quantity': 'Stock_Qty_7D_mean'}, inplace=True)
df_pivoted_stock_W.head()

In [None]:
df_pivoted_stock_W.head()

In [None]:
# Windowing Sales Volume by Week
df_pivoted_sales_W = df_pivoted_sales.resample('W').agg(['mean'])
# Windowing Sales Volume by 14 days
df_pivoted_sales_14D = df_pivoted_sales.resample('14D').agg(['mean'])
# Windowing Sales Volume by 21 days
df_pivoted_sales_21D = df_pivoted_sales.resample('21D').agg(['mean'])

In [None]:
# Create a blank subplot figure with 2 vertical panels and titles
fig = make_subplots(rows=2, cols=1, subplot_titles=['Sales Volume', 'Stock Quantity'])

# Create individual line plots for each target variable
fig1 = px.line(df_target, y=['Sales_Volume'], color_discrete_sequence=['blue'])
fig2 = px.line(df_target, y=['Stock_Quantity'], color_discrete_sequence=['green'])

# Add the first plot (Sales Volume) to the top panel
fig.add_trace(fig1.data[0], row=1, col=1)
# Add the second plot (Stock Quantity) to the bottom panel
fig.add_trace(fig2.data[0], row=2, col=1)

# Adjust the overall figure layout (hide legend, set height)
fig.update_layout(showlegend=False, height=600)
# Display the interactive plot
fig.show()

In [None]:
# Create Correlation
correlation = df.corr()

corr_79 = correlation[(correlation >= 0.80) & (correlation != 1)].fillna(0)


In [None]:
corr_ = df_stat_exog.select_dtypes(np.number).corr()
corr_ = corr_[(corr_ < -0.2) | (corr_ > 0.2)]
corr_

In [None]:
# Plot Matrix Correlation
fig_corr = px.imshow(correlation, 
                    title='Correlation Feature Engineering', 
                    color_continuous_scale='RdBu_r', # Red-Blue reversed color scale
                    aspect="auto",                   # Automatic aspect ratio
                    text_auto=True,                  # Display correlation values on cells
                    zmin=-1, zmax=1)                 # Fix color scale from -1 to +1)

fig_corr.update_layout(height=900, width=900)
fig_corr.show()

**Removed Columns with correlation upper than 0.80**

- DOI_Inv_Turnover
- DayOfYear
- QuarterOfYear
- WeekOfYear
- Sales_Inv_Turnover
- Month

In [None]:
# Columns to remove
drop_correlated = ['DOI_Inv_Turnover', 'DayOfYear', 'QuarterOfYear', 'WeekOfYear', 'Sales_Inv_Turnover', 'Month']

correlation.drop(columns=drop_correlated, inplace=True)

In [None]:
# Plot Matrix Correlation
fig_corr2 = px.imshow(correlation, title='Correlation Feature Engineering', contrast_rescaling=False, text_auto=False)

fig_corr2.update_layout(height=900, width=1200)
fig_corr2.show()


In [None]:
# Remove Correlated Columns Main DataFrame
df.drop(columns=drop_correlated, inplace=True)
df

In [None]:
# Aggregate target columns by date, summing values for each day
data = read_data.groupby('Last_Order_Date').sum().reset_index()

# Set the date column as the DataFrame index for time series operations
df_target = data.set_index('Last_Order_Date')

# Resample to daily frequency, ensuring all dates are represented and filling missing days with 0
df_target = df_target.resample('D').sum().fillna(0)

In [None]:
date_range = pd.date_range(
    start=df_ts['received_date'].min(),
    end=df_ts['received_date'].max(),
    freq='D'
              )
len(date_range)

In [None]:
df_ts = df_ts.groupby(['received_date', 'Product_Name', 'Supplier_Name']).agg(
    Stock_Quantity=('Stock_Quantity', 'sum'),
    Sales_Volume=('Sales_Volume', 'sum'),
    Reorder_Level=('Reorder_Level', 'sum'),
    Reorder_Quantity=('Reorder_Quantity', 'mean'),
    Unit_Price=('Unit_Price', 'mean'),
    Inventory_Turnover_Rate=('Inventory_Turnover_Rate', 'sum'),
    Days_For_Expiration=('Days_For_Expiration', 'sum'),
    DOI_Inv_Turnover=('DOI_Inv_Turnover', 'sum'),
    delivery_lag=('delivery_lag', 'sum'),
    Status_Code=('Status_Code', 'sum'),
    Expiration_Status_Code=('Expiration_Status_Code', 'sum'),
    Category_Code=('Category_Code', 'sum'),
    Product_Name_Code=('Product_Name_Code', 'sum'),
    Supplier_Name_Code=('Supplier_Name_Code', 'sum')

)

## Opção A: Agregação por data

In [None]:
_cols = df_ts.select_dtypes(np.number).columns.to_list()
categorical_cols = ['Status_Code', 'Expiration_Status_Code', 'Category_Code', 'Product_Name_Code', 'Supplier_Name_Code']

numeric_cols = list(set(_cols) - set(categorical_cols))

In [None]:
# Agregar por data (soma/média de todos os produtos)
# daily_aggregated = 
df_ts.groupby('received_date')[categorical_cols].sum()

# Agregar por categoria e data
# category_daily = df_ts.groupby(['received_date', 'Product_Cat'])[numeric_cols].mean()


## Opção B: Série para Produtos Específicos

In [None]:
df_ts

In [None]:
# Selecionar um produto específico para análise
# specific_product = df_ts.xs(('Arabica Coffee', 'Chatterpoint'), level=['Product_Name', 'Supplier_Name'])

# Ou para uma categoria inteira
# coffee_products = 
df_ts.xs('Spinach', level='Product_Name')

In [None]:
df_ts.reset_index()

## Opção C: Wide Format

In [None]:
# Pivot table para ter cada produto como coluna
# df_ts_wide = df_ts[['Sales_Volume', 'Stock_Quantity']].unstack(['Product_Name','Supplier_Name'], fill_value=0).reindex(index=date_range)

# Temporal Feture Engineering

In [None]:
# Adicionar features temporais
df_complete = df_complete.reset_index()
df_complete['day_of_week'] = df_complete['received_date'].dt.dayofweek
df_complete['month'] = df_complete['received_date'].dt.month
df_complete['quarter'] = df_complete['received_date'].dt.quarter
df_complete['year'] = df_complete['received_date'].dt.year

# Voltar ao multi-índice
df_complete = df_complete.set_index(['received_date', 'Product_Cat', 'Product_Supplier'])

In [None]:
# Transform Categorical Columns
df_ts['Status_Code'] = df_ts['Status'].cat.codes
df_ts['Expiration_Status_Code'] = df_ts['Expiration_Status'].cat.codes
df_ts['Category_Code'] = df_ts['Category'].cat.codes
df_ts['Product_Name_Code'] = df_ts['Product_Name'].astype('category').cat.codes
df_ts['Supplier_Name_Code'] = df_ts['Supplier_Name'].astype('category').cat.codes

df_ts = df_ts.drop(columns=['Status', 'Expiration_Status', 'Category'])

In [None]:
fig = px.imshow(corr_, text_auto=True, aspect='auto')
fig.show()

Analysis of Exogenous Variables in the Time Series

1. Correlation with the target

Numerical exogenous variables show low correlation with the target and among themselves.

Categorical variables, after one-hot encoding, display weak internal correlations and very low correlation with the target.

2. Expected impact on forecasting

Given the low level of association, these variables are unlikely to provide immediate performance improvements.

Including them at this stage could increase the risk of overfitting, especially considering the limited data available.

3. Strategic approach

Exogenous variables will be retained but not actively used in the current modeling process.

This approach ensures we preserve potential value, as longer historical series may reveal patterns not visible today.

4. Next steps

Reassess the contribution of exogenous variables as more data becomes available.

Evaluate selection techniques tailored to time series (e.g., Granger causality, feature importance from tree-based models, multivariate approaches such as VAR/VARMAX).

Compare model performance with and without exogenous variables to support future decisions.

In [None]:
df_exo = df[variables_exog].sort_values(by='received_date').reset_index(drop=True)

In [None]:
df_exo