# Feature Engineering (Building the Training Dataset)

The goal is create new columns (features) that help the model understand:
- recent trends
- momentum
- delayed effects
- volatility
- seasonality
- relationships between benzina, gasolio, and Brent

These features will make your model smarter than the naïve baseline.

In [2]:
import pandas as pd

weekly = pd.read_csv('../data/processed/weekly_dataset.csv', parse_dates=['date'])
weekly = weekly.sort_values('date').reset_index(drop=True)
weekly.set_index('date', inplace=True)

## Creating LAG features
Fuel prices depend strongly on previous weeks’ values and Brent effects also appear with delay (1–3 weeks). 12 new colunms will be added.

In [3]:
lags = [1, 2, 3, 4]

for col in ['benzina', 'gasolio', 'brent_eur']:
    for lag in lags:
        weekly[f'{col}_lag{lag}'] = weekly[col].shift(lag)

## Rolling averages (smooth trends)
Fuel prices follow smoothed trends. Models learn better with rolling statistics.

In [4]:
windows = [4, 8]

for col in ['benzina', 'gasolio', 'brent_eur']:
    for W in windows:
        weekly[f'{col}_roll{W}'] = weekly[col].rolling(W).mean()

## Percentage changes
Models need to understand acceleration/inversion of trends.

In [5]:
weekly['brent_pct_change'] = weekly['brent_eur'].pct_change()
weekly['benzina_pct_change'] = weekly['benzina'].pct_change()
weekly['gasolio_pct_change'] = weekly['gasolio'].pct_change()

## Price spreads
Difference between fuels reflects refinery margins and can predict future movement.

In [6]:
weekly['spread_bg'] = weekly['benzina'] - weekly['gasolio']
weekly['spread_bg_lag1'] = weekly['spread_bg'].shift(1)


## Calendar features

In [None]:
weekly['month'] = weekly.index.month
weekly['weekofyear'] = weekly.index.isocalendar().week.astype(int)

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 1048 entries, 2005-01-03 to 2025-11-03
Data columns (total 28 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   benzina             1048 non-null   float64
 1   gasolio             1048 non-null   float64
 2   brent_eur           1048 non-null   float64
 3   benzina_lag1        1047 non-null   float64
 4   benzina_lag2        1046 non-null   float64
 5   benzina_lag3        1045 non-null   float64
 6   benzina_lag4        1044 non-null   float64
 7   gasolio_lag1        1047 non-null   float64
 8   gasolio_lag2        1046 non-null   float64
 9   gasolio_lag3        1045 non-null   float64
 10  gasolio_lag4        1044 non-null   float64
 11  brent_eur_lag1      1047 non-null   float64
 12  brent_eur_lag2      1046 non-null   float64
 13  brent_eur_lag3      1045 non-null   float64
 14  brent_eur_lag4      1044 non-null   float64
 15  benzina_roll4       1045 non-null   f

Unnamed: 0_level_0,benzina,gasolio,brent_eur,benzina_lag1,benzina_lag2,benzina_lag3,benzina_lag4,gasolio_lag1,gasolio_lag2,gasolio_lag3,...,gasolio_roll8,brent_eur_roll4,brent_eur_roll8,brent_pct_change,benzina_pct_change,gasolio_pct_change,spread_bg,spread_bg_lag1,month,weekofyear
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2005-01-03,1.11575,1.01828,33.07588,,,,,,,,...,,,,,,,0.09747,,1,1
2005-01-10,1.088,1.00439,34.44945,1.11575,,,,1.01828,,,...,,,,0.041528,-0.024871,-0.013641,0.08361,0.09747,1,2
2005-01-17,1.08814,1.00431,35.047472,1.088,1.11575,,,1.00439,1.01828,,...,,,,0.017359,0.000129,-8e-05,0.08383,0.08361,1,3
2005-01-24,1.09001,1.00431,34.463165,1.08814,1.088,1.11575,,1.00431,1.00439,1.01828,...,,34.258992,,-0.016672,0.001719,0.0,0.0857,0.08383,1,4
2005-01-31,1.13211,1.0226,34.093752,1.09001,1.08814,1.088,1.11575,1.00431,1.00431,1.00439,...,,34.51346,,-0.010719,0.038623,0.018212,0.10951,0.0857,1,5


In [8]:
features = weekly.dropna().reset_index()
features.info()
features.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1041 entries, 0 to 1040
Data columns (total 29 columns):
 #   Column              Non-Null Count  Dtype         
---  ------              --------------  -----         
 0   date                1041 non-null   datetime64[ns]
 1   benzina             1041 non-null   float64       
 2   gasolio             1041 non-null   float64       
 3   brent_eur           1041 non-null   float64       
 4   benzina_lag1        1041 non-null   float64       
 5   benzina_lag2        1041 non-null   float64       
 6   benzina_lag3        1041 non-null   float64       
 7   benzina_lag4        1041 non-null   float64       
 8   gasolio_lag1        1041 non-null   float64       
 9   gasolio_lag2        1041 non-null   float64       
 10  gasolio_lag3        1041 non-null   float64       
 11  gasolio_lag4        1041 non-null   float64       
 12  brent_eur_lag1      1041 non-null   float64       
 13  brent_eur_lag2      1041 non-null   float64     

Unnamed: 0,date,benzina,gasolio,brent_eur,benzina_lag1,benzina_lag2,benzina_lag3,benzina_lag4,gasolio_lag1,gasolio_lag2,...,gasolio_roll8,brent_eur_roll4,brent_eur_roll8,brent_pct_change,benzina_pct_change,gasolio_pct_change,spread_bg,spread_bg_lag1,month,weekofyear
0,2005-02-21,1.13604,1.02279,37.460511,1.13588,1.13587,1.13211,1.09001,1.02246,1.02267,...,1.015226,35.459129,34.85906,0.056573,0.000141,0.000323,0.11325,0.11342,2,8
1,2005-02-28,1.13604,1.02285,39.11936,1.13604,1.13588,1.13587,1.13211,1.02279,1.02246,...,1.015798,36.715531,35.614496,0.044283,0.0,5.9e-05,0.11319,0.11325,2,9
2,2005-03-07,1.15878,1.04759,39.46923,1.13604,1.13604,1.13588,1.13587,1.02285,1.02279,...,1.021197,37.875959,36.241968,0.008944,0.020017,0.024187,0.11119,0.11319,3,10
3,2005-03-14,1.1859,1.07272,41.74809,1.15878,1.13604,1.13604,1.13588,1.04759,1.02285,...,1.029749,39.449298,37.079545,0.057738,0.023404,0.023988,0.11318,0.11119,3,11
4,2005-03-21,1.18598,1.07286,41.612388,1.1859,1.15878,1.13604,1.13604,1.07272,1.04759,...,1.038318,40.487267,37.973198,-0.00325,6.7e-05,0.000131,0.11312,0.11318,3,12


Save the feature dataset

In [9]:
features.to_csv('../data/processed/features_dataset.csv', index=False)