# Final Training

Based on the previus experiments, these are the features that actually matter:

For BENZINA (GBR)
- benzina_lag1
- benzina_lag2
- benzina_lag3
- brent_eur_lag1
- brent_eur_lag2
- benzina_roll4
- brent_pct_change
- month

For GASOLIO (Ridge)
- gasolio_lag1
- gasolio_lag2
- brent_eur_lag1
- brent_eur_lag2
- gasolio_roll4
- brent_pct_change
- month

These represent:
- recent memory
- delayed oil effect
- local trend
- seasonality

Everything else is optional and only adds noise. We keep the model simple, stable, and predictable.

In [7]:
import pandas as pd
from sklearn.linear_model import Ridge
from sklearn.ensemble import GradientBoostingRegressor
import joblib

df = pd.read_csv('../data/processed/features_dataset_v2.csv', parse_dates=['date'])
df = df.sort_values('date').reset_index(drop=True)


In [8]:
df['benzina_next'] = df['benzina'].shift(-1)
df['gasolio_next'] = df['gasolio'].shift(-1)
df = df.dropna()

In [9]:
features_b = ['benzina_lag1','benzina_lag2','benzina_lag3',
              'brent_eur_lag1','brent_eur_lag2',
              'benzina_roll4','brent_pct_change','month']

features_g = ['gasolio_lag1','gasolio_lag2',
              'brent_eur_lag1','brent_eur_lag2',
              'gasolio_roll4','brent_pct_change','month']


In [10]:
gbr_b = GradientBoostingRegressor().fit(df[features_b], df['benzina_next'])
ridge_g = Ridge(alpha=1.0).fit(df[features_g], df['gasolio_next'])


In [11]:
joblib.dump(gbr_b, '../models/gbr_benzina.pkl')
joblib.dump(ridge_g, '../models/ridge_gasolio.pkl')
joblib.dump(features_b, '../models/features_b.pkl')
joblib.dump(features_g, '../models/features_g.pkl')


['../models/features_g.pkl']