# AutoGluon 訓練（Colab / 本機）

- **輸入**：`merged_for_autogluon_0900.csv`（由 `merge_and_train.py` 產出，含 `target_return` 與壓縮特徵）。
- **流程**：讀取合併表 → 去掉 `date`、dropna → 依時間切分 train/val/test → TabularPredictor 迴歸訓練 → 存模型。
- **Colab**：請先掛載 Google Drive，並將下方 `DATA_ROOT` 設為含 `output_0900/merged_for_autogluon_0900/` 的目錄（或直接設 `MERGED_CSV_PATH`）。
- **本機**：可設 `DATA_ROOT` 為專案 `data/` 路徑，或直接指定 `MERGED_CSV_PATH`。

## 1. 掛載 Google Drive（Colab 必跑；本機可略）

In [None]:
try:
    from google.colab import drive
    drive.mount("/content/drive")
    IN_COLAB = True
except Exception:
    IN_COLAB = False
print("Colab:", IN_COLAB)

## 2. 路徑與參數

In [None]:
from pathlib import Path

# Colab：設為 Drive 上專案 data 目錄，例如 "/content/drive/MyDrive/Thesis-AutoGluon-TXF-Research/data"
# 本機：設為專案 data 目錄，或留空改設 MERGED_CSV_PATH
DATA_ROOT = Path("/content/drive/MyDrive/Thesis-AutoGluon-TXF-Research/data") if IN_COLAB else Path.cwd().resolve().parent.parent / "data"

# 合併表路徑（若已指定則優先使用，否則用 DATA_ROOT 推）
MERGED_CSV_PATH = None  # 例如 Path("/content/drive/.../merged_for_autogluon_0900.csv")
if MERGED_CSV_PATH is None:
    MERGED_CSV_PATH = DATA_ROOT / "output_0900" / "merged_for_autogluon_0900" / "merged_for_autogluon_0900.csv"

# 模型存檔目錄（可改為 Drive 路徑以保留模型）
MODEL_SAVE_DIR = DATA_ROOT / "output_0900" / "models" / "autogluon_merged"

LABEL = "target_return"
TIME_LIMIT = 600  # 秒
TRAIN_RATIO, VAL_RATIO = 0.6, 0.2  # test = 1 - 0.6 - 0.2 = 0.2

print("MERGED_CSV_PATH:", MERGED_CSV_PATH)
print("MODEL_SAVE_DIR:", MODEL_SAVE_DIR)

## 3. 安裝 AutoGluon（Colab 通常需執行一次）

In [None]:
!pip install autogluon.tabular --quiet

## 4. 載入資料、去 date、dropna、切分

In [None]:
import pandas as pd

df = pd.read_csv(MERGED_CSV_PATH)
df = df.drop(columns=["date", "datetime"], errors="ignore").dropna()
print("Shape after drop date + dropna:", df.shape)
if LABEL not in df.columns:
    raise ValueError(f"No column '{LABEL}' in CSV.")

n = len(df)
train_end = int(n * TRAIN_RATIO)
val_end = int(n * (TRAIN_RATIO + VAL_RATIO))
train_data = df.iloc[:train_end]
val_data = df.iloc[train_end:val_end]
test_data = df.iloc[val_end:]
print(f"Train: {len(train_data)}, Val: {len(val_data)}, Test: {len(test_data)}")

## 5. 訓練並存檔

In [None]:
from autogluon.tabular import TabularPredictor

MODEL_SAVE_DIR = Path(MODEL_SAVE_DIR)
MODEL_SAVE_DIR.mkdir(parents=True, exist_ok=True)

predictor = TabularPredictor(
    label=LABEL,
    problem_type="regression",
    eval_metric="rmse",
    path=str(MODEL_SAVE_DIR),
).fit(
    train_data,
    time_limit=TIME_LIMIT,
    tuning_data=val_data,
    presets="best_quality",
)
print("Training done. Model saved to:", MODEL_SAVE_DIR)

## 6. 驗證集 / 測試集評估（可選）

In [None]:
print("=== Leaderboard (validation) ===")
print(predictor.leaderboard(val_data, silent=True))
print("\n=== Leaderboard (test) ===")
print(predictor.leaderboard(test_data, silent=True))
print("\n=== Evaluate on test ===")
print(predictor.evaluate(test_data))