# 09 — EUR/USD Final Model (Train / Future Test)

**Goal**
- Train a single final model on **all available historical data**
- Freeze the model
- Prepare it to be evaluated later on **unseen future data**
- Asset: **EUR/USD only**

This notebook represents a **realistic production-like setup**:
- no cross-validation
- no peeking into future test data
- clean separation between *training phase* and *future evaluation*


In [1]:
import sys
from pathlib import Path
import numpy as np
import pandas as pd

ROOT = Path.cwd().parent
SRC = ROOT / "src"
if str(SRC) not in sys.path:
    sys.path.insert(0, str(SRC))

from utils import get_logger
logger = get_logger("eurusd_final", log_file=str(ROOT / "logs" / "eurusd_final.log"))

logger.info("ROOT=%s", ROOT)


2025-12-19 09:47:19,431 | INFO | eurusd_final | ROOT=c:\Users\fayca\Downloads\hackathon_gold_project\hackathon_gold_project


## 1) Load EUR/USD data

In [2]:
from data import load_ohlc_from_xlsx

XLSX = ROOT / "dataset_train.xlsx"
SHEET = "EURUSD"  # adjust if sheet name differs

df = load_ohlc_from_xlsx(str(XLSX), sheet_name=SHEET)
df["Date"] = pd.to_datetime(df["Date"])
df = df.sort_values("Date").reset_index(drop=True)

logger.info("Loaded %d rows for %s", len(df), SHEET)
df.tail()


2025-12-19 09:47:19,450 | INFO | data | Loading sheet=EURUSD from c:\Users\fayca\Downloads\hackathon_gold_project\hackathon_gold_project\dataset_train.xlsx
2025-12-19 09:47:21,120 | INFO | data | Loaded 11288 rows, columns=['Date', 'Open', 'High', 'Low', 'Close', 'smavg_50', 'smavg_100', 'smavg_240']
2025-12-19 09:47:21,133 | INFO | eurusd_final | Loaded 11288 rows for EURUSD


Unnamed: 0,Date,Open,High,Low,Close,smavg_50,smavg_100,smavg_240
11283,2018-12-25,1.1406,1.1432,1.1348,1.1392,1.1375,1.148,1.1793
11284,2018-12-26,1.1392,1.142,1.1343,1.1353,1.1372,1.1478,1.1789
11285,2018-12-27,1.1353,1.1454,1.1352,1.143,1.1372,1.1477,1.1785
11286,2018-12-28,1.143,1.1473,1.1428,1.1444,1.137,1.1477,1.1781
11287,2018-12-31,1.1447,1.1468,1.1422,1.1467,1.1371,1.1478,1.1777


## 2) Feature engineering + target (20 days)

In [3]:
from features import build_features
from labels import add_target_20d_score, fit_score_scaler

df_feat = build_features(df)
df_feat = add_target_20d_score(df_feat, horizon=20)
df_feat = df_feat.dropna().reset_index(drop=True)

logger.info("After features + target: %d rows", len(df_feat))
df_feat.tail()


2025-12-19 09:47:21,172 | INFO | features | Building features...
2025-12-19 09:47:21,236 | INFO | features | Features built. Total columns=34
2025-12-19 09:47:21,259 | INFO | eurusd_final | After features + target: 11029 rows


Unnamed: 0,Date,Open,High,Low,Close,smavg_50,smavg_100,smavg_240,log_ret_1,log_ret_lag_1,...,mom_60,close_to_smavg_50,close_to_smavg_100,close_to_smavg_240,rsi_14,bollinger_pctb_20,atr_rel_14,hl_range_rel,oc_change_rel,fut_ret_20
11024,2018-11-27,1.1328,1.1344,1.1278,1.1289,1.1473,1.1539,1.1854,-0.003449,-0.000794,...,-0.025623,-0.016038,-0.021666,-0.047663,0.398519,0.201666,0.007162,0.005846,-0.003443,0.009124
11025,2018-11-28,1.1289,1.1388,1.1267,1.1366,1.1467,1.1536,1.1852,0.006798,-0.003449,...,-0.022962,-0.008808,-0.014736,-0.041006,0.502177,0.515695,0.007277,0.010646,0.006821,-0.001144
11026,2018-11-29,1.1366,1.1402,1.1349,1.1393,1.1459,1.1534,1.1849,0.002373,0.006798,...,-0.019987,-0.00576,-0.012225,-0.038484,0.541364,0.636255,0.00726,0.004652,0.002376,0.003248
11027,2018-11-30,1.1393,1.14,1.1306,1.1317,1.145,1.153,1.1846,-0.006693,0.002373,...,-0.020639,-0.011616,-0.018474,-0.044656,0.576507,0.323321,0.007145,0.008306,-0.006671,0.011222
11028,2018-12-03,1.133,1.138,1.1319,1.1354,1.1442,1.1526,1.1844,0.003264,-0.006693,...,-0.020918,-0.007691,-0.014923,-0.041371,0.552288,0.493852,0.007033,0.005373,0.002118,0.009952


## 3) Prepare training matrices

In [4]:
exclude = {"Date", "Open", "High", "Low", "Close", "fut_ret_20", "y_score"}
feature_cols = [c for c in df_feat.columns if c not in exclude]

X = df_feat[feature_cols].values
y_raw = df_feat["fut_ret_20"].values

scale = fit_score_scaler(pd.Series(y_raw), std_mult=2.0)
y = np.clip(y_raw / scale, -1.0, 1.0)

logger.info("Features=%d | Samples=%d | Scale=%.6f", X.shape[1], X.shape[0], scale)


2025-12-19 09:47:21,301 | INFO | labels | Fitted score scale=0.057319 (std_mult=2.00, std=0.028659)
2025-12-19 09:47:21,303 | INFO | eurusd_final | Features=29 | Samples=11029 | Scale=0.057319


## 4) Train final model (GBRT)

In [5]:
from sklearn.ensemble import GradientBoostingRegressor

final_model = GradientBoostingRegressor(
    n_estimators=300,
    learning_rate=0.05,
    max_depth=3,
    random_state=42,
)

final_model.fit(X, y)

logger.info("Final EUR/USD model trained on full history")


2025-12-19 09:47:59,673 | INFO | eurusd_final | Final EUR/USD model trained on full history


## 5) Save model artifacts (READY FOR FUTURE TEST)

In [6]:
import joblib

out_dir = ROOT / "outputs"
out_dir.mkdir(exist_ok=True)

joblib.dump(final_model, out_dir / "eurusd_final_model.joblib")
joblib.dump(scale, out_dir / "eurusd_target_scale.joblib")
joblib.dump(feature_cols, out_dir / "eurusd_feature_columns.joblib")

logger.info("Artifacts saved in outputs/")


2025-12-19 09:48:00,141 | INFO | eurusd_final | Artifacts saved in outputs/


## 6) How this model will be used later

When **new unseen EUR/USD data** becomes available:

1. Load the model, scale, and feature list
2. Recompute features on the new data
3. Apply the *same* scaling
4. Generate predictions
5. Use the trading bot logic

This guarantees:
- no data leakage
- realistic forward evaluation
- production-like workflow


In [7]:
# Example (DO NOT RUN NOW – FUTURE DATA ONLY)

# model = joblib.load("outputs/eurusd_final_model.joblib")
# scale = joblib.load("outputs/eurusd_target_scale.joblib")
# feature_cols = joblib.load("outputs/eurusd_feature_columns.joblib")

# df_new = build_features(new_eurusd_df)
# X_new = df_new[feature_cols].values
# y_pred = np.clip(model.predict(X_new), -1, 1)
