# AutoGluon 訓練（Colab / 本機）

- **輸入**：`merged_for_autogluon_0900.csv`（由 `merge_and_train.py` 產出，含 `target_return` 與壓縮特徵）。
- **流程**：讀取合併表 → 去掉 `date`、dropna → 依時間切分 train/val/test → TabularPredictor 迴歸訓練 → 存模型。
- **Colab**：請先掛載 Google Drive，並將下方 `DATA_ROOT` 設為含 `output_0900/merged_for_autogluon_0900/` 的目錄（或直接設 `MERGED_CSV_PATH`）。
- **本機**：可設 `DATA_ROOT` 為專案 `data/` 路徑，或直接指定 `MERGED_CSV_PATH`。

## 1. 掛載 Google Drive（Colab 必跑；本機可略）

In [1]:
try:
    from google.colab import drive
    drive.mount("/content/drive")
    IN_COLAB = True
except Exception:
    IN_COLAB = False
print("Colab:", IN_COLAB)

Mounted at /content/drive
Colab: True


## 2. 路徑與參數

In [3]:
from pathlib import Path

# Colab：設為 Drive 上專案 data 目錄，例如 "/content/drive/MyDrive/Thesis-AutoGluon-TXF-Research/data"
# 本機：設為專案 data 目錄，或留空改設 MERGED_CSV_PATH
DATA_ROOT = Path("/content/drive/MyDrive/Thesis-AutoGluon-TXF-Research/data") if IN_COLAB else Path.cwd().resolve().parent.parent / "data"

# 合併表路徑（若已指定則優先使用，否則用 DATA_ROOT 推）
MERGED_CSV_PATH = "/content/drive/MyDrive/2026/論文/Thesis-AutoGluon-TXF-Research/data/merged_for_autogluon_0900/merged_for_autogluon_0900.csv"  # 例如 Path("/content/drive/.../merged_for_autogluon_0900.csv")
if MERGED_CSV_PATH is None:
    MERGED_CSV_PATH = DATA_ROOT / "output_0900" / "merged_for_autogluon_0900" / "merged_for_autogluon_0900.csv"

# 模型存檔目錄（寫死：data/models，Colab 時為 Drive 上專案的 data/models）
MODEL_SAVE_DIR = DATA_ROOT / "models"

LABEL = "target_return"
TIME_LIMIT = 600  # 秒
TRAIN_RATIO, VAL_RATIO = 0.6, 0.2  # test = 1 - 0.6 - 0.2 = 0.2

# 預先建立模型輸出目錄（若不存在則建立）
MODEL_SAVE_DIR = Path(MODEL_SAVE_DIR)
MODEL_SAVE_DIR.mkdir(parents=True, exist_ok=True)

print("MERGED_CSV_PATH:", MERGED_CSV_PATH)
print("MODEL_SAVE_DIR:", MODEL_SAVE_DIR)
print("模型目錄已建立:", MODEL_SAVE_DIR.exists())

MERGED_CSV_PATH: /content/drive/MyDrive/2026/論文/Thesis-AutoGluon-TXF-Research/data/merged_for_autogluon_0900/merged_for_autogluon_0900.csv
MODEL_SAVE_DIR: /content/drive/MyDrive/Thesis-AutoGluon-TXF-Research/data/output_0900/models/autogluon_merged


## 3. 安裝 AutoGluon（Colab 通常需執行一次）

In [4]:
!pip install autogluon.tabular --quiet

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/515.2 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━━━━[0m [32m358.4/515.2 kB[0m [31m10.6 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m515.2/515.2 kB[0m [31m10.1 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/227.6 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m227.6/227.6 kB[0m [31m18.3 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/98.9 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m98.9/98.9 kB[0m [31m8.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m74.4/74.4 kB[0m [31m5.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━

## 4. 載入資料、去 date、dropna、切分

In [5]:
import pandas as pd

df = pd.read_csv(MERGED_CSV_PATH)
df = df.drop(columns=["date", "datetime"], errors="ignore").dropna()
print("Shape after drop date + dropna:", df.shape)
if LABEL not in df.columns:
    raise ValueError(f"No column '{LABEL}' in CSV.")

n = len(df)
train_end = int(n * TRAIN_RATIO)
val_end = int(n * (TRAIN_RATIO + VAL_RATIO))
train_data = df.iloc[:train_end]
val_data = df.iloc[train_end:val_end]
test_data = df.iloc[val_end:]
# best_quality 袋裝模式要求把 train+val 一起當 train_data，不另傳 tuning_data
train_data_for_fit = pd.concat([train_data, val_data], ignore_index=True)
print(f"Train: {len(train_data)}, Val: {len(val_data)}, Test: {len(test_data)}")
print(f"Train for fit (train+val): {len(train_data_for_fit)}")

Shape after drop date + dropna: (2271, 50)
Train: 1362, Val: 454, Test: 455


**若出現「Learner is already fit」**：表示 `path` 指向的目錄裡已有先前訓練的模型，AutoGluon 會載入該模型，因此不能再呼叫 `.fit()`。  
**解法**：上方參數設 `USE_TIMESTAMPED_DIR = True`（預設），每次訓練會存到新子目錄（如 `autogluon_merged/20250124_123456`），即不會載到舊模型。

## 5. 訓練並存檔

In [7]:
from autogluon.tabular import TabularPredictor
import shutil

MODEL_SAVE_DIR = Path(MODEL_SAVE_DIR)
# 訓練前清空目錄，避免 path 已有舊模型被載入而觸發 AssertionError: Learner is already fit
if MODEL_SAVE_DIR.exists():
    shutil.rmtree(MODEL_SAVE_DIR)
MODEL_SAVE_DIR.mkdir(parents=True, exist_ok=True)

predictor = TabularPredictor(
    label=LABEL,
    problem_type="regression",
    eval_metric="rmse",
    path=str(MODEL_SAVE_DIR),
).fit(
    train_data_for_fit,
    time_limit=TIME_LIMIT,
    presets="best_quality",
)
print("Training done. Model saved to:", MODEL_SAVE_DIR)

Verbosity: 2 (Standard Logging)
AutoGluon Version:  1.5.0
Python Version:     3.12.12
Operating System:   Linux
Platform Machine:   x86_64
Platform Version:   #1 SMP Thu Oct  2 10:42:05 UTC 2025
CPU Count:          2
Pytorch Version:    2.9.0+cpu
CUDA Version:       CUDA is not available
Memory Avail:       11.35 GB / 12.67 GB (89.6%)
Disk Space Avail:   82.12 GB / 107.72 GB (76.2%)
Presets specified: ['best_quality']
Using hyperparameters preset: hyperparameters='zeroshot'
Setting dynamic_stacking from 'auto' to True. Reason: Enable dynamic_stacking when use_bag_holdout is disabled. (use_bag_holdout=False)
Stack configuration (auto_stack=True): num_stack_levels=1, num_bag_folds=8, num_bag_sets=1
DyStack is enabled (dynamic_stacking=True). AutoGluon will try to determine whether the input data is affected by stacked overfitting and enable or disable stacking as a consequence.
	This is used to identify the optimal `num_stack_levels` value. Copies of AutoGluon will be fit on subsets of t

AssertionError: Learner is already fit.

## 6. 驗證集 / 測試集評估（可選）

In [None]:
print("=== Leaderboard (validation) ===")
print(predictor.leaderboard(val_data, silent=True))
print("\n=== Leaderboard (test) ===")
print(predictor.leaderboard(test_data, silent=True))
print("\n=== Evaluate on test ===")
print(predictor.evaluate(test_data))