# 10 — EUR/USD Future Test (Unseen Data Evaluation)

**Goal**
- Load the frozen EUR/USD model trained in notebook **09**
- Load **new unseen EUR/USD data** (future period)
- Recompute the same features
- Align columns
- Produce predictions `y_pred ∈ [-1, 1]`
- Export predictions to `outputs/eurusd_future_predictions.csv`
- (Optional) Create trade signals with a threshold

⚠️ Important:
- This notebook must be run only when you have **new EUR/USD data not used in training**.
- Do **not** refit the model here.


In [None]:
import sys
from pathlib import Path
import numpy as np
import pandas as pd

ROOT = Path.cwd().parent
SRC = ROOT / "src"
if str(SRC) not in sys.path:
    sys.path.insert(0, str(SRC))

from utils import get_logger
logger = get_logger("eurusd_future_test", log_file=str(ROOT/"logs"/"eurusd_future_test.log"))

logger.info("ROOT=%s", ROOT)


## 1) Load frozen artifacts from notebook 09

In [None]:
import joblib

out_dir = ROOT / "outputs"
model_path = out_dir / "eurusd_final_model.joblib"
scale_path = out_dir / "eurusd_target_scale.joblib"
cols_path  = out_dir / "eurusd_feature_columns.joblib"

assert model_path.exists(), f"Missing model: {model_path} (run notebook 09)"
assert scale_path.exists(), f"Missing scale: {scale_path} (run notebook 09)"
assert cols_path.exists(),  f"Missing feature cols: {cols_path} (run notebook 09)"

model = joblib.load(model_path)
scale = joblib.load(scale_path)
feature_cols = joblib.load(cols_path)

logger.info("Loaded model=%s", model_path)
logger.info("Loaded scale=%.6f", float(scale))
logger.info("Loaded %d feature columns", len(feature_cols))


## 2) Load NEW unseen EUR/USD data

Choose ONE option below:
- **Option A:** Excel file with same structure (recommended)
- **Option B:** CSV file with columns: Date, Open, High, Low, Close


In [None]:
# -------- Option A: Excel (recommended) --------
from data import load_ohlc_from_xlsx

# Put your new file here (example path)
TEST_XLSX = ROOT / "data" / "eurusd_test.xlsx"
TEST_SHEET = "EURUSD"  # adjust if needed

# If you use Option A, ensure the file exists:
# assert TEST_XLSX.exists(), f"Test Excel not found: {TEST_XLSX}"

# df_new = load_ohlc_from_xlsx(str(TEST_XLSX), sheet_name=TEST_SHEET)

# -------- Option B: CSV --------
# TEST_CSV = ROOT / "data" / "eurusd_test.csv"
# df_new = pd.read_csv(TEST_CSV, parse_dates=["Date"])

# ---- Safety: show what you're using ----
# logger.info("Loaded new data rows=%d cols=%s", len(df_new), list(df_new.columns))
# df_new.head()


## 3) Standardize / sort / sanity checks

In [None]:
# Uncomment after loading df_new above:
# df_new["Date"] = pd.to_datetime(df_new["Date"])
# df_new = df_new.sort_values("Date").reset_index(drop=True)

# required_cols = {"Date","Open","High","Low","Close"}
# missing = required_cols - set(df_new.columns)
# assert not missing, f"Missing required columns in new data: {missing}"

# logger.info("New data date range: %s -> %s", df_new["Date"].min(), df_new["Date"].max())
# df_new.tail()


## 4) Recompute features on new data and align with training columns

In [None]:
from features import build_features

# Uncomment after df_new is loaded:
# df_feat = build_features(df_new).copy()

# Align: ensure all expected training columns exist
# for c in feature_cols:
#     if c not in df_feat.columns:
#         df_feat[c] = np.nan  # create missing columns

# Keep the same order
# X_new = df_feat[feature_cols].values

# Drop rows with NaNs in features (due to rolling windows)
# valid_mask = ~np.isnan(X_new).any(axis=1)
# df_out = df_feat.loc[valid_mask, ["Date","Open","High","Low","Close"]].copy()
# X_new = X_new[valid_mask]

# logger.info("After feature alignment: X_new=%s (kept %d rows)", X_new.shape, len(df_out))
# df_out.head()


## 5) Predict scores on unseen data

In [None]:
# Uncomment after X_new is built:
# y_pred = model.predict(X_new)
# y_pred = np.clip(y_pred, -1.0, 1.0)

# df_out["y_pred"] = y_pred

# logger.info("Predictions done. y_pred stats: mean=%.4f std=%.4f min=%.4f max=%.4f",
#             float(np.mean(y_pred)), float(np.std(y_pred)), float(np.min(y_pred)), float(np.max(y_pred)))

# df_out.tail()


## 6) Create trade signals with a threshold (optional)

Interpretation:
- `y_pred` is **signal intensity**, not a probability.
- Use thresholds to trade only strong signals.

In [None]:
# Uncomment after df_out["y_pred"] exists:
# threshold = 0.7
# df_out["signal"] = 0
# df_out.loc[df_out["y_pred"] > threshold, "signal"] = 1
# df_out.loc[df_out["y_pred"] < -threshold, "signal"] = -1

# coverage = (df_out["signal"] != 0).mean()
# logger.info("Threshold=%.2f | coverage=%.2f%%", threshold, 100*coverage)

# df_out.tail()


## 7) Export predictions

In [None]:
# Uncomment after df_out is ready:
# export_path = ROOT / "outputs" / "eurusd_future_predictions.csv"
# df_out.to_csv(export_path, index=False)
# logger.info("Exported: %s", export_path)

# export_path


## 8) Notes for future evaluation

If your future test data contains enough horizon to compute `fut_ret_20`,
we can create a proper evaluation notebook to compute IC / DirAcc / backtest.
For now, this notebook focuses on **pure forward inference** (production-like).